Qwen3-1.7B Magistral Math (BF16)
Disclaimer:
“Magistral” here refers to a lecture-style (magistral) math reasoning supervision. This is an independent community model, not affiliated with or endorsed by Mistral
TL;DR
Qwen3-1.7B Magistral Math is a math-specialized fine-tune of
unsloth/Qwen3-1.7B-Base,
trained in full BF16 on a compact, high-quality chain-of-thought dataset:
Data:
HAD653/GSM8K-OpenMath-MathReason-13k– 13.9k grade-school & early high-school word problems with structured CoT.Goal: a 1.7B model that reliably solves GSM8K / OpenMath-style problems with clear step-by-step reasoning.
Answer format:
Problem: ... Reasoning: ... Answer: <final numeric answer>
- Best use: GSM8K-like word problems, OpenMath-style exos, local math tutor.
Model Details
- Base model:
unsloth/Qwen3-1.7B-Base - Architecture: Qwen3 dense causal LM, ~1.7B params, 28 layers, GQA attention, long-context support.
- Training stage: supervised fine-tuning (SFT) for math reasoning.
- Precision: BF16 (
torch_dtype=torch.bfloat16recommended). - Intended backend: Hugging Face
transformers, vLLM, TGI.
This repo stores full fine-tuned weights (no LoRA, no adapters).
Training Data
The model is fine-tuned on:
Dataset:
HAD653/GSM8K-OpenMath-MathReason-13kSize: 13,857 samples.
Fields:
question: natural-language math word problem.cot: chain-of-thought solution with 3 blocks:Problem:Reasoning:Answer:
final_answer: canonical numeric answer (string).
The dataset focuses on easy–medium difficulty:
- arithmetic, percentages, fractions,
- basic algebra,
- simple combinatorics / number problems.
It is deliberately aimed at what a 1–3B model can realistically master.
Training Setup (Summary)
Fine-tuning was done with Unsloth + TRL on a single RTX 4090, using full BF16 finetuning (no LoRA).
Hyperparameters
Base:
unsloth/Qwen3-1.7B-BaseSequence length: 2048
Epochs: 2
Batching:
per_device_train_batch_size = 2gradient_accumulation_steps = 8- Effective batch size ≈ 16 sequences
Optimizer / schedule:
learning_rate = 7e-5- Linear LR scheduler,
warmup_ratio = 0.05 weight_decay = 0.01
Precision / memory:
dtype = bfloat16gradient_checkpointing = True
Supervision format
Each example is converted to a single text field:
### Instruction:
{question}
### Response:
{cot}</s>
Where </s> is the tokenizer EOS token.
Adding the EOS token at the end of the target helps the model learn when to stop, and largely removes pathological loops like:
Answer:
36
Answer:
36
Answer:
36
...
Prompting & Templates
Recommended system prompt (optional but helpful)
You are a math reasoning assistant.
For every question, answer in exactly this format:
Problem:
<restate the problem in your own words>
Reasoning:
<step-by-step reasoning showing all intermediate steps>
Answer:
<final numeric answer only, on its own line>
Do not add any extra commentary before or after the answer.
Do not repeat the answer multiple times.
Stop after writing the final answer.
Inference template (matches training)
For single-turn usage:
### Instruction:
{question}
### Response:
The model will then generate:
Problem:
...
Reasoning:
...
Answer:
<number>
Suggested decoding
For math, use low-temperature decoding:
temperature: 0.0 – 0.2top_p: 0.9top_k: 20–40 (optional)repetition_penalty: 1.05 – 1.10max_new_tokens: 256–512 for long CoT
Usage (Transformers)
Basic example
from transformers import AutoTokenizer, AutoModelForCausalLM
MODEL_NAME = "HAD653/qwen3-1.7b-magistral-math"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
torch_dtype="auto", # will be bf16 if supported
device_map="auto",
)
def format_prompt(question: str) -> str:
return f"### Instruction:\n{question}\n\n### Response:\n"
question = "Albert buys 2 large pizzas and 2 small pizzas. A large pizza has 16 slices and a small pizza has 8 slices. If he eats it all, how many pieces does he eat that day?"
prompt = format_prompt(question)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False, # greedy is usually best for math
repetition_penalty=1.05,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
vLLM / TGI
The model is a standard qwen3 architecture checkpoint, so it should work
out-of-the-box with any backend that supports Qwen3 via transformers>=4.51.
Intended Uses & Limitations
Intended uses
- Step-by-step solutions to GSM8K-like and OpenMath-style word problems.
- Experiments on small-model math reasoning (1–3B scale).
- As a local math tutor for grade-school / early high-school algebra & arithmetic.
Limitations
- Not a general instruction model; it is biased toward math.
- Chain-of-thought traces are synthetic (teacher model), not human-authored.
- Not suitable for high-stakes educational or decision-making use without human review.
- Limited performance on very hard competition math (Olympiad / proof-heavy).
Please also ensure there is no data leakage if you evaluate on GSM8K/OpenMath-derived benchmarks.
Related Models
- GGUF export (CPU / llama.cpp / LM Studio):
HAD653/qwen3-1.7b-magistral-math-gguf
Citation
If you use this model in your work, please cite:
@misc{had653_qwen3_magistral_math_2025,
author = {HAD653},
title = {Qwen3-1.7B Magistral Math: A 1.7B Math Reasoning Model with Magistral Chain-of-Thought},
year = {2025},
howpublished = {\url{https://huggingface.co/HAD653/qwen3-1.7b-magistral-math}},
note = {Fine-tuned on GSM8K + OpenMath MathReason 13k with full BF16 supervision.}
}
- Downloads last month
- 34