Llama 3.2 3B Reasoning Model - GGUF Format
This directory contains GGUF format versions of the fine-tuned Llama 3.2 3B reasoning model.
Model Files
model-f16.gguf(6.0GB) - Full precision F16 versionmodel-q4_0.gguf(1.8GB) - Quantized Q4_0 version (recommended for most users)tokenizer.json- Tokenizer configurationtokenizer_config.json- Tokenizer settingsspecial_tokens_map.json- Special token mappings
Model Details
Base Model: Meta Llama 3.2 3B
Fine-tuning: Full-weight training on 8k DeepSeek R1 reasoning examples
Training Infrastructure: H100 GPU with bf16 precision
Context Length: 131,072 tokens
Reasoning Format: Structured thinking with <think></think> and <answer></answer> tags
Usage with llama.cpp
Basic Inference
./llama-cli -m model-q4_0.gguf -p "Solve this step by step: What is 15% of 240?" -n 512
Interactive Chat
./llama-cli -m model-q4_0.gguf -i --chat-template
With System Prompt
./llama-cli -m model-q4_0.gguf -p "System: You are a helpful reasoning assistant. Always show your step-by-step thinking process.
User: A train travels 300km in 4 hours. What is its average speed?" -n 512
Sampling Parameters
./llama-cli -m model-q4_0.gguf \
--temp 0.3 \
--top-p 0.9 \
--top-k 40 \
--repeat-penalty 1.15 \
-p "Your prompt here" \
-n 1024
Expected Output Format
The model will structure its responses with reasoning tags:
<think>
Let me solve this step by step...
Speed = Distance / Time
Speed = 300km / 4 hours = 75 km/h
</think>
<answer>
The average speed of the train is 75 km/h (kilometers per hour).
</answer>
Performance Recommendations
- Q4_0 version: Recommended for most users - good balance of quality and size
- F16 version: For maximum quality when you have sufficient VRAM/RAM
- Memory requirements:
- Q4_0: ~2.5GB RAM minimum
- F16: ~7GB RAM minimum
Model Capabilities
โ Strengths:
- Mathematical reasoning and calculations
- Step-by-step problem solving
- Logical analysis and deduction
- Code reasoning and debugging
- Scientific problem solving
โ ๏ธ Limitations:
- May generate verbose reasoning for simple questions
- Occasional repetition in thinking process
- Not trained for specific domain knowledge beyond general reasoning
License
This model is based on Llama 3.2 and follows Meta's licensing terms.
- Downloads last month
- 21
Hardware compatibility
Log In
to view the estimation
4-bit
16-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for x1nx3r/Llama-3.2-3B-thinking-8k-v1-GGUF
Base model
meta-llama/Llama-3.2-3B