Model Card for SmolLM3-Custom-SFT (PEFT Adapter)
This repository contains adapter weights fine-tuned from the base model HuggingFaceTB/SmolLM3-3B-Base. It has been trained using TRL and PEFT (Parameter-Efficient Fine-Tuning) on GSM8K Dataset. The chat template and tokenizer used are from the instruction-tuned version of SmolLM3. Only the adapter layers have been uploaded - not the full model.
Ref: Learn more about finetuning from hugging face's a smol course course.
Quick start
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
base_model = "HuggingFaceTB/SmolLM3-3B-Base"
instruct_model = "HuggingFaceTB/SmolLM3-3B"
adapter_model = "psHf/SmolLM3-Custom-SFT"
# Load the adapter and automatically merge with the base
model = AutoPeftModelForCausalLM.from_pretrained(adapter_model)
tokenizer = AutoTokenizer.from_pretrained(instruct_model)
# If needed, merge permanently:
# model = model.merge_and_unload()
text = "Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
instruct_outputs = model.generate(
**inputs,
max_new_tokens=150,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
instruct_response = tokenizer.decode(instruct_outputs[0], skip_special_tokens=True)
# Extract only the assistant's response
assistant_start = instruct_response.find("<|im_start|>assistant\n") + len("<|im_start|>assistant\n")
assistant_response = instruct_response[assistant_start:].split("<|im_end|>")[0]
print(assistant_response)
<!-- from transformers import pipeline
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="psHf/SmolLM3-Custom-SFT", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"]) -->
Training procedure
This model was trained with SFT.
Training Details
- Base model:
HuggingFaceTB/SmolLM3-3B-Base - Fine-tuning method: SFT (Supervised Fine-Tuning) using TRL
- PEFT method: LoRA adapters
- Tokenizer: SmolLM3 (instruction-tuned tokenizer)
- Framework: Hugging Face Transformers + TRL
Framework versions
- TRL: 0.24.0
- Transformers: 4.57.1
- Pytorch: 2.8.0+cu126
- Datasets: 4.0.0
- Tokenizers: 0.22.1
Citations
Cite TRL as:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for psHf/SmolLM3-Custom-SFT
Base model
HuggingFaceTB/SmolLM3-3B-Base