Qwen3-1.7B Magistral Math (BF16)

Disclaimer:

“Magistral” here refers to a lecture-style (magistral) math reasoning supervision. This is an independent community model, not affiliated with or endorsed by Mistral

TL;DR

Qwen3-1.7B Magistral Math is a math-specialized fine-tune of unsloth/Qwen3-1.7B-Base, trained in full BF16 on a compact, high-quality chain-of-thought dataset:

Data: HAD653/GSM8K-OpenMath-MathReason-13k – 13.9k grade-school & early high-school word problems with structured CoT.
Goal: a 1.7B model that reliably solves GSM8K / OpenMath-style problems with clear step-by-step reasoning.

Answer format:

Problem:
...

Reasoning:
...

Answer:
<final numeric answer>

Best use: GSM8K-like word problems, OpenMath-style exos, local math tutor.

Model Details

Base model: unsloth/Qwen3-1.7B-Base
Architecture: Qwen3 dense causal LM, ~1.7B params, 28 layers, GQA attention, long-context support.
Training stage: supervised fine-tuning (SFT) for math reasoning.
Precision: BF16 (torch_dtype=torch.bfloat16 recommended).
Intended backend: Hugging Face transformers, vLLM, TGI.

This repo stores full fine-tuned weights (no LoRA, no adapters).

Training Data

The model is fine-tuned on:

Dataset: HAD653/GSM8K-OpenMath-MathReason-13k
Size: 13,857 samples.
Fields:
- question: natural-language math word problem.
- cot: chain-of-thought solution with 3 blocks:
  - Problem:
  - Reasoning:
  - Answer:
- final_answer: canonical numeric answer (string).

The dataset focuses on easy–medium difficulty:

arithmetic, percentages, fractions,
basic algebra,
simple combinatorics / number problems.

It is deliberately aimed at what a 1–3B model can realistically master.

Training Setup (Summary)

Fine-tuning was done with Unsloth + TRL on a single RTX 4090, using full BF16 finetuning (no LoRA).

Hyperparameters

Base: unsloth/Qwen3-1.7B-Base
Sequence length: 2048
Epochs: 2
Batching:
- per_device_train_batch_size = 2
- gradient_accumulation_steps = 8
- Effective batch size ≈ 16 sequences
Optimizer / schedule:
- learning_rate = 7e-5
- Linear LR scheduler, warmup_ratio = 0.05
- weight_decay = 0.01
Precision / memory:
- dtype = bfloat16
- gradient_checkpointing = True

Supervision format

Each example is converted to a single text field:

### Instruction:
{question}

### Response:
{cot}</s>

Where </s> is the tokenizer EOS token.

Adding the EOS token at the end of the target helps the model learn when to stop, and largely removes pathological loops like:

Answer:
36

Answer:
36

Answer:
36
...

Prompting & Templates

Recommended system prompt (optional but helpful)

You are a math reasoning assistant.

For every question, answer in exactly this format:

Problem:
<restate the problem in your own words>

Reasoning:
<step-by-step reasoning showing all intermediate steps>

Answer:
<final numeric answer only, on its own line>

Do not add any extra commentary before or after the answer.
Do not repeat the answer multiple times.
Stop after writing the final answer.

Inference template (matches training)

For single-turn usage:

### Instruction:
{question}

### Response:

The model will then generate:

Problem:
...

Reasoning:
...

Answer:
<number>

Suggested decoding

For math, use low-temperature decoding:

temperature: 0.0 – 0.2
top_p: 0.9
top_k: 20–40 (optional)
repetition_penalty: 1.05 – 1.10
max_new_tokens: 256–512 for long CoT

Usage (Transformers)

Basic example

from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_NAME = "HAD653/qwen3-1.7b-magistral-math"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype="auto",          # will be bf16 if supported
    device_map="auto",
)

def format_prompt(question: str) -> str:
    return f"### Instruction:\n{question}\n\n### Response:\n"

question = "Albert buys 2 large pizzas and 2 small pizzas. A large pizza has 16 slices and a small pizza has 8 slices. If he eats it all, how many pieces does he eat that day?"

prompt = format_prompt(question)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=False,          # greedy is usually best for math
    repetition_penalty=1.05,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.eos_token_id,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

vLLM / TGI

The model is a standard qwen3 architecture checkpoint, so it should work out-of-the-box with any backend that supports Qwen3 via transformers>=4.51.

Intended Uses & Limitations

Intended uses

Step-by-step solutions to GSM8K-like and OpenMath-style word problems.
Experiments on small-model math reasoning (1–3B scale).
As a local math tutor for grade-school / early high-school algebra & arithmetic.

Limitations

Not a general instruction model; it is biased toward math.
Chain-of-thought traces are synthetic (teacher model), not human-authored.
Not suitable for high-stakes educational or decision-making use without human review.
Limited performance on very hard competition math (Olympiad / proof-heavy).

Please also ensure there is no data leakage if you evaluate on GSM8K/OpenMath-derived benchmarks.

Related Models

GGUF export (CPU / llama.cpp / LM Studio): HAD653/qwen3-1.7b-magistral-math-gguf

Citation

If you use this model in your work, please cite:

@misc{had653_qwen3_magistral_math_2025,
  author       = {HAD653},
  title        = {Qwen3-1.7B Magistral Math: A 1.7B Math Reasoning Model with Magistral Chain-of-Thought},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/HAD653/qwen3-1.7b-magistral-math}},
  note         = {Fine-tuned on GSM8K + OpenMath MathReason 13k with full BF16 supervision.}
}

Downloads last month: 34

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for HAD653/qwen3-1.7b-magistral-math

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

unsloth/Qwen3-1.7B-Base