--- language: - en license: apache-2.0 tags: - gguf - qwen3 - unsloth - math - mathematical-reasoning - chain-of-thought - cot - gsm8k - openmath - instruction-tuning - quantization base_model: HAD653/qwen3-1.7b-magistral-math datasets: - HAD653/GSM8K-OpenMath-MathReason-13k pipeline_tag: text-generation model-index: - name: Qwen3-1.7B-Magistral-Math-GGUF results: [] quantized_from: - HAD653/qwen3-1.7b-magistral-math --- # Qwen3-1.7B Magistral Math (GGUF) [![License: Apache-2.0](https://img.shields.io/badge/License-Apache--2.0-brightgreen.svg)](https://www.apache.org/licenses/LICENSE-2.0) ![Model: Qwen3-1.7B](https://img.shields.io/badge/Model-Qwen3--1.7B-blue.svg) ![Format: GGUF](https://img.shields.io/badge/Format-GGUF-orange.svg) ![Domain: Math Reasoning](https://img.shields.io/badge/Domain-Math%20Reasoning-purple.svg) ![Quantizations: F16, Q8_0, Q4_K_M](https://img.shields.io/badge/Quants-F16%20%7C%20Q8_0%20%7C%20Q4_K_M-lightgrey.svg) --- ## TL;DR This is a **math-focused fine-tune** of [`unsloth/Qwen3-1.7B-Base`](https://huggingface.co/unsloth/Qwen3-1.7B-Base), exported to **GGUF** (F16 / Q8_0 / Q4_K_M) with **Unsloth**. - **Goal:** small 1.7B model specialized for **grade-school & early high-school math reasoning**. - **Data:** [`HAD653/GSM8K-OpenMath-MathReason-13k`](https://huggingface.co/datasets/HAD653/GSM8K-OpenMath-Magistral-13k) – 13.9k math word problems with structured chain-of-thought. - **Format:** answers always follow the same pattern: ```text Problem: ... Reasoning: ... Answer: ```` * **Best use:** GSM8K-style problems, OpenMath-style word problems, step-by-step reasoning with a **single numeric final answer**. --- ## Model Description * **Base model:** [`unsloth/Qwen3-1.7B-Base`](https://huggingface.co/unsloth/Qwen3-1.7B-Base) (Apache-2.0) * **Architecture:** Qwen3 dense causal LM, ~1.7B params, 28 layers, GQA attention, **32k context**. * **Type:** decoder-only LLM, text generation. * **This repo:** inference-only **GGUF weights** for llama.cpp / LM Studio / Ollama / text-generation-webui. ### Available files From the **Files** tab: * `Qwen3-1.7B-Magistral-Math-F16.gguf` – highest quality, requires the most VRAM. * `Qwen3-1.7B-Magistral-Math-Q8_0.gguf` – 8-bit quantization. * `Qwen3-1.7B-Magistral-Math-Q4_K_M.gguf` – 4-bit K-quant, best for smaller GPUs. > These files contain **fine-tuned math weights**, exported via `model.save_pretrained_gguf` after full BF16 training. --- ## Training Data This model is fine-tuned on: * **Dataset:** [`HAD653/GSM8K-OpenMath-MathReason-13k`](https://huggingface.co/datasets/HAD653/GSM8K-OpenMath-MathReason-13k) * **Size:** 13,857 examples. * **Fields:** * `question`: natural language math word problem. * `cot`: structured solution with three blocks: * `Problem:` * `Reasoning:` * `Answer:` * `final_answer`: canonical numeric answer (string). The dataset focuses on **easy–medium difficulty**: basic arithmetic, fractions, percentages, rate problems, simple algebra, and simple combinatorics – the kind of tasks a **1–3B model can genuinely master**. --- ## Training Setup (Summary) Fine-tuning was done with **Unsloth + TRL** on a single **RTX 4090**, using **full BF16 fine-tuning** (no LoRA). Main hyperparameters: * **Base:** `unsloth/Qwen3-1.7B-Base` * **Sequence length:** 2048 * **Batching:** `per_device_train_batch_size = 2`, `gradient_accumulation_steps = 8` * **Effective batch size:** ≈ 16 sequences * **Epochs:** 2 * **Optimizer / schedule:** * `learning_rate = 7e-5` * linear scheduler, `warmup_ratio = 0.05` * `weight_decay = 0.01` * **Precision & memory:** * `dtype = bfloat16` * `gradient_checkpointing = True` ### Supervision format The training text for each sample is: ```text ### Instruction: {question} ### Response: {cot} ``` where `` is the tokenizer EOS token. Adding `eos_token` at the end of each sample teaches the model **when to stop**, which greatly reduces “Answer: 36 / Answer: 36 / …” loops during inference. --- ## Prompting & Templates ### Recommended system prompt (optional but useful) ```text You are a math reasoning assistant. For every question, answer in exactly this format: Problem: Reasoning: Answer: Do not add any extra commentary before or after the answer. Do not repeat the answer multiple times. Stop after writing the final answer. ``` ### Inference template (matches training) Single-turn format: ```text ### Instruction: {question} ### Response: ``` The model will then generate: ```text Problem: ... Reasoning: ... Answer: ``` ### Stop strings On top of the EOS token, you can add **stop strings** in your UI: * `### Instruction:` * `### Response:` Many frontends (LM Studio, text-generation-webui, KoboldCpp, etc.) let you configure these so the model stops cleanly when it tries to start the next turn. --- ## Quantization & Hardware Tips The three variants in this repo roughly behave as follows (ballpark): * **`Q4_K_M` (~1.1 GB)** – best for: * 4–6 GB GPUs or pure CPU inference. * Fast experimentation / local tools / “math assistant on a laptop”. * **`Q8_0` (~1.8 GB)** – good compromise: * 8–12 GB GPUs. * Often slightly more stable than Q4 on harder problems. * **`F16` (~3.5 GB)** – highest fidelity: * 12+ GB GPUs (4090, 4080, 4070 12GB, A4000 etc.). * Recommended if VRAM allows and you care about maximum accuracy. As a rule of thumb, choose a file that is **1–2 GB smaller than your available VRAM**. --- ## Usage Examples ### llama.cpp Once you have built `llama.cpp`, you can run the model like this (replace with your path): ```bash ./llama-cli \ -m Qwen3-1.7B-Magistral-Math-Q4_K_M.gguf \ -p "### Instruction: Albert buys 2 large pizzas and 2 small pizzas. A large pizza has 16 slices and a small pizza has 8 slices. If he eats it all, how many pieces does he eat that day? ### Response: " \ -n 256 \ --temp 0.1 \ --top-p 0.9 \ --repeat-penalty 1.05 ``` Suggested decoding for math: * `temperature`: 0.0–0.2 * `top_p`: 0.9 * `repeat_penalty`: 1.05–1.1 * `top_k`: 20–40 (optional tweak) ### LM Studio / other UIs Set the **prompt template** to: ```text ### Instruction: {{prompt}} ### Response: ``` Add stop strings: * `### Instruction:` * `### Response:` and keep temperature low for math benchmarks. --- ## Intended Uses & Limitations ### Intended uses * Solving **GSM8K-style** and **OpenMath-style** word problems. * Training / evaluating **small-scale math reasoning pipelines**. * Serving as a **local math tutor** for grade-school / early high-school algebra & arithmetic. ### Limitations * Not a general chat/instruction model; it is **biased toward math**. * CoT is learned from **synthetic teacher traces**, not human-written solutions. * Not suitable for **high-stakes educational or decision-making** without human oversight. * Performance on very hard competition math (Olympiad-level, deep proofs) will be limited – the training data explicitly focuses on **easy–medium difficulty**. Users are responsible for ensuring there is no data leakage if they evaluate on GSM8K/OpenMath-derived benchmarks. --- ## Acknowledgements * Base model: [Qwen/Qwen3-1.7B-Base](https://huggingface.co/unsloth/Qwen3-1.7B-Base) and the Qwen / Unsloth teams. * Unsloth for fast fine-tuning and GGUF export. * Training data: [`HAD653/GSM8K-OpenMath-MathReason-13k`](https://huggingface.co/datasets/HAD653/GSM8K-OpenMath-MathReason-13k). --- ## Citation If you use this model in your work, please cite: ```bibtex @misc{had653_qwen3_magistral_math_gguf_2025, author = {HAD653}, title = {Qwen3-1.7B Magistral Math (GGUF): A 1.7B Math Reasoning Model with Magistral Chain-of-Thought}, year = {2025}, howpublished = {\url{https://huggingface.co/HAD653/qwen3-1.7b-magistral-math-gguf}}, note = {Fine-tuned on GSM8K + OpenMath MathReason 13k, exported to GGUF (F16 / Q8\_0 / Q4\_K\_M).} } ```