Model Card for SmolLM3-Custom-SFT (PEFT Adapter)

This repository contains adapter weights fine-tuned from the base model HuggingFaceTB/SmolLM3-3B-Base. It has been trained using TRL and PEFT (Parameter-Efficient Fine-Tuning) on GSM8K Dataset. The chat template and tokenizer used are from the instruction-tuned version of SmolLM3. Only the adapter layers have been uploaded - not the full model.

Ref: Learn more about finetuning from hugging face's a smol course course.

Quick start

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

base_model = "HuggingFaceTB/SmolLM3-3B-Base"
instruct_model = "HuggingFaceTB/SmolLM3-3B"
adapter_model = "psHf/SmolLM3-Custom-SFT"

# Load the adapter and automatically merge with the base
model = AutoPeftModelForCausalLM.from_pretrained(adapter_model)
tokenizer = AutoTokenizer.from_pretrained(instruct_model)

# If needed, merge permanently:
# model = model.merge_and_unload()

text = "Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
     instruct_outputs = model.generate(
         **inputs,
         max_new_tokens=150,
         temperature=0.7,
         do_sample=True,
         pad_token_id=tokenizer.eos_token_id
     )
     instruct_response = tokenizer.decode(instruct_outputs[0], skip_special_tokens=True)

     # Extract only the assistant's response
     assistant_start = instruct_response.find("<|im_start|>assistant\n") + len("<|im_start|>assistant\n")
     assistant_response = instruct_response[assistant_start:].split("<|im_end|>")[0]
     print(assistant_response)


<!-- from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="psHf/SmolLM3-Custom-SFT", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"]) -->

Training procedure

This model was trained with SFT.

Training Details

Base model: HuggingFaceTB/SmolLM3-3B-Base
Fine-tuning method: SFT (Supervised Fine-Tuning) using TRL
PEFT method: LoRA adapters
Tokenizer: SmolLM3 (instruction-tuned tokenizer)
Framework: Hugging Face Transformers + TRL

Framework versions

TRL: 0.24.0
Transformers: 4.57.1
Pytorch: 2.8.0+cu126
Datasets: 4.0.0
Tokenizers: 0.22.1

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for psHf/SmolLM3-Custom-SFT

Base model

HuggingFaceTB/SmolLM3-3B-Base

Finetuned

(70)

this model