---
language:
- en
license: apache-2.0
tags:
- gguf
- qwen3
- unsloth
- math
- mathematical-reasoning
- chain-of-thought
- cot
- gsm8k
- openmath
- instruction-tuning
- quantization
base_model: HAD653/qwen3-1.7b-magistral-math
datasets:
- HAD653/GSM8K-OpenMath-MathReason-13k
pipeline_tag: text-generation

model-index:
- name: Qwen3-1.7B-Magistral-Math-GGUF
  results: []
  quantized_from:
    - HAD653/qwen3-1.7b-magistral-math
---

# Qwen3-1.7B Magistral Math (GGUF)

[![License: Apache-2.0](https://img.shields.io/badge/License-Apache--2.0-brightgreen.svg)](https://www.apache.org/licenses/LICENSE-2.0)
![Model: Qwen3-1.7B](https://img.shields.io/badge/Model-Qwen3--1.7B-blue.svg)
![Format: GGUF](https://img.shields.io/badge/Format-GGUF-orange.svg)
![Domain: Math Reasoning](https://img.shields.io/badge/Domain-Math%20Reasoning-purple.svg)
![Quantizations: F16, Q8_0, Q4_K_M](https://img.shields.io/badge/Quants-F16%20%7C%20Q8_0%20%7C%20Q4_K_M-lightgrey.svg)

---

## TL;DR

This is a **math-focused fine-tune** of [`unsloth/Qwen3-1.7B-Base`](https://huggingface.co/unsloth/Qwen3-1.7B-Base),
exported to **GGUF** (F16 / Q8_0 / Q4_K_M) with **Unsloth**.

- **Goal:** small 1.7B model specialized for **grade-school & early high-school math reasoning**.
- **Data:** [`HAD653/GSM8K-OpenMath-MathReason-13k`](https://huggingface.co/datasets/HAD653/GSM8K-OpenMath-Magistral-13k) – 13.9k math word problems with structured chain-of-thought.
- **Format:** answers always follow the same pattern:

  ```text
  Problem:
  ...

  Reasoning:
  ...

  Answer:
  <final numeric answer>
  ````

* **Best use:** GSM8K-style problems, OpenMath-style word problems, step-by-step reasoning with a **single numeric final answer**.

---

## Model Description

* **Base model:** [`unsloth/Qwen3-1.7B-Base`](https://huggingface.co/unsloth/Qwen3-1.7B-Base) (Apache-2.0)
* **Architecture:** Qwen3 dense causal LM, ~1.7B params, 28 layers, GQA attention, **32k context**.
* **Type:** decoder-only LLM, text generation.
* **This repo:** inference-only **GGUF weights** for llama.cpp / LM Studio / Ollama / text-generation-webui.

### Available files

From the **Files** tab:

* `Qwen3-1.7B-Magistral-Math-F16.gguf` – highest quality, requires the most VRAM.
* `Qwen3-1.7B-Magistral-Math-Q8_0.gguf` – 8-bit quantization.
* `Qwen3-1.7B-Magistral-Math-Q4_K_M.gguf` – 4-bit K-quant, best for smaller GPUs.

> These files contain **fine-tuned math weights**, exported via `model.save_pretrained_gguf` after full BF16 training.

---

## Training Data

This model is fine-tuned on:

* **Dataset:** [`HAD653/GSM8K-OpenMath-MathReason-13k`](https://huggingface.co/datasets/HAD653/GSM8K-OpenMath-MathReason-13k)
* **Size:** 13,857 examples.
* **Fields:**

  * `question`: natural language math word problem.
  * `cot`: structured solution with three blocks:

    * `Problem:`
    * `Reasoning:`
    * `Answer:`
  * `final_answer`: canonical numeric answer (string).

The dataset focuses on **easy–medium difficulty**:
basic arithmetic, fractions, percentages, rate problems, simple algebra, and simple combinatorics – the kind of tasks a **1–3B model can genuinely master**.

---

## Training Setup (Summary)

Fine-tuning was done with **Unsloth + TRL** on a single **RTX 4090**, using **full BF16 fine-tuning** (no LoRA).

Main hyperparameters:

* **Base:** `unsloth/Qwen3-1.7B-Base`
* **Sequence length:** 2048
* **Batching:** `per_device_train_batch_size = 2`, `gradient_accumulation_steps = 8`
* **Effective batch size:** ≈ 16 sequences
* **Epochs:** 2
* **Optimizer / schedule:**

  * `learning_rate = 7e-5`
  * linear scheduler, `warmup_ratio = 0.05`
  * `weight_decay = 0.01`
* **Precision & memory:**

  * `dtype = bfloat16`
  * `gradient_checkpointing = True`

### Supervision format

The training text for each sample is:

```text
### Instruction:
{question}

### Response:
{cot}</s>
```

where `</s>` is the tokenizer EOS token.
Adding `eos_token` at the end of each sample teaches the model **when to stop**, which greatly reduces “Answer: 36 / Answer: 36 / …” loops during inference.

---

## Prompting & Templates

### Recommended system prompt (optional but useful)

```text
You are a math reasoning assistant.

For every question, answer in exactly this format:

Problem:
<restate the problem in your own words>

Reasoning:
<step-by-step reasoning showing all intermediate steps>

Answer:
<final numeric answer only, on its own line>

Do not add any extra commentary before or after the answer.
Do not repeat the answer multiple times.
Stop after writing the final answer.
```

### Inference template (matches training)

Single-turn format:

```text
### Instruction:
{question}

### Response:
```

The model will then generate:

```text
Problem:
...

Reasoning:
...

Answer:
<number>
```

### Stop strings

On top of the EOS token, you can add **stop strings** in your UI:

* `### Instruction:`
* `### Response:`

Many frontends (LM Studio, text-generation-webui, KoboldCpp, etc.) let you configure these so the model stops cleanly when it tries to start the next turn.

---

## Quantization & Hardware Tips

The three variants in this repo roughly behave as follows (ballpark):

* **`Q4_K_M` (~1.1 GB)** – best for:

  * 4–6 GB GPUs or pure CPU inference.
  * Fast experimentation / local tools / “math assistant on a laptop”.
* **`Q8_0` (~1.8 GB)** – good compromise:

  * 8–12 GB GPUs.
  * Often slightly more stable than Q4 on harder problems.
* **`F16` (~3.5 GB)** – highest fidelity:

  * 12+ GB GPUs (4090, 4080, 4070 12GB, A4000 etc.).
  * Recommended if VRAM allows and you care about maximum accuracy.

As a rule of thumb, choose a file that is **1–2 GB smaller than your available VRAM**.

---

## Usage Examples

### llama.cpp

Once you have built `llama.cpp`, you can run the model like this (replace with your path):

```bash
./llama-cli \
  -m Qwen3-1.7B-Magistral-Math-Q4_K_M.gguf \
  -p "### Instruction:
Albert buys 2 large pizzas and 2 small pizzas. A large pizza has 16 slices and a small pizza has 8 slices. If he eats it all, how many pieces does he eat that day?

### Response:
" \
  -n 256 \
  --temp 0.1 \
  --top-p 0.9 \
  --repeat-penalty 1.05
```

Suggested decoding for math:

* `temperature`: 0.0–0.2
* `top_p`: 0.9
* `repeat_penalty`: 1.05–1.1
* `top_k`: 20–40 (optional tweak)

### LM Studio / other UIs

Set the **prompt template** to:

```text
### Instruction:
{{prompt}}

### Response:
```

Add stop strings:

* `### Instruction:`
* `### Response:`

and keep temperature low for math benchmarks.

---

## Intended Uses & Limitations

### Intended uses

* Solving **GSM8K-style** and **OpenMath-style** word problems.
* Training / evaluating **small-scale math reasoning pipelines**.
* Serving as a **local math tutor** for grade-school / early high-school algebra & arithmetic.

### Limitations

* Not a general chat/instruction model; it is **biased toward math**.
* CoT is learned from **synthetic teacher traces**, not human-written solutions.
* Not suitable for **high-stakes educational or decision-making** without human oversight.
* Performance on very hard competition math (Olympiad-level, deep proofs) will be limited – the training data explicitly focuses on **easy–medium difficulty**.

Users are responsible for ensuring there is no data leakage if they evaluate on GSM8K/OpenMath-derived benchmarks.

---

## Acknowledgements

* Base model: [Qwen/Qwen3-1.7B-Base](https://huggingface.co/unsloth/Qwen3-1.7B-Base) and the Qwen / Unsloth teams.
* Unsloth for fast fine-tuning and GGUF export.
* Training data: [`HAD653/GSM8K-OpenMath-MathReason-13k`](https://huggingface.co/datasets/HAD653/GSM8K-OpenMath-MathReason-13k).

---

## Citation

If you use this model in your work, please cite:

```bibtex
@misc{had653_qwen3_magistral_math_gguf_2025,
  author       = {HAD653},
  title        = {Qwen3-1.7B Magistral Math (GGUF): A 1.7B Math Reasoning Model with Magistral Chain-of-Thought},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/HAD653/qwen3-1.7b-magistral-math-gguf}},
  note         = {Fine-tuned on GSM8K + OpenMath MathReason 13k, exported to GGUF (F16 / Q8\_0 / Q4\_K\_M).}
}
```