Llama-3.2-3B-Instruction-LoRA-Adapter
This repo contains a LoRA adapter finetuned on Meta Llama 3.2 (3B), trained using QLoRA and SFTTrainer on an instruction dataset derived from the Databricks Dolly instruction corpus.
Model Details
- Base Model: meta-llama/Llama-3.2-3B
- Model Type: LoRA adapter for causal language modeling
- Finetuning Method: LoRA + QLoRA (4-bit quantization)
- Trainable Parameters: ~0.33% via LoRA (rank=8)
- Quantization: 4-bit NF4
- Language: English
Training Data
- Dataset: databricks-dolly-1k subset of databricks-dolly-15k
- Samples: 982 training, 110 validation
- Split: 90% train/ 10% validation
- Text Field: "text" (instruction + response pairs)
Training Configuration
LoRA Hyperparameters
{
"r": 8,
"lora_alpha": 32,
"target_modules": ["q_proj", "k_proj", "v_proj", "out_proj", "fc_in", "fc_out", "wte"],
"lora_dropout": 0.05,
"bias": "none",
"task_type": "CAUSAL_LM"
}
Training Hyperparameters
{
"num_train_epochs": 3,
"per_device_train_batch_size": 1,
"gradient_accumulation_steps": 1,
"learning_rate": 2e-4,
"weight_decay": 0.001,
"warmup_ratio": 0.03,
"lr_scheduler_type": "constant",
"max_seq_length": 256,
"optim": "paged_adamw_32bit",
"gradient_checkpointing": true,
"eval_strategy": "steps",
"eval_steps": 100,
"save_steps": 100,
"logging_steps": 100,
"packing": true
}
QLoRA Hyperparameters
{
"load_in_4bit": true,
"bnb_4bit_quant_type": "nf4",
"bnb_4bit_compute_dtype": "float16",
"bnb_4bit_use_double_quant": false
}
Training Setup
{
"framework": "PyTorch with Hugging Face Transformers",
"fine_tuning_method": "LoRA (Low-Rank Adaptation)",
"quantization": "4-bit NF4 with QLoRA",
"compute": "Google Colab T4",
"gpu_memory": "~16GB VRAM",
}
Usage
Installation
pip install torch transformers accelerate peft bitsandbytes
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load base model
model_name = "meta-llama/Llama-3.2-3B"
base_model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
return_dict=True,
device_map="auto",
)
# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(
"MagicaNeko/llama-3b-lora-dolly",
subfolder="model-ft-tokenizer"
)
# Load LoRA Adapter
model = PeftModel.from_pretrained(
base_model,
"MagicaNeko/llama-3b-lora-dolly",
subfolder="model-ft-lora-adapter"
)
# Merge adapters
model = model.merge_and_unload()
# Inference
prompt = "What is machine learning?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))```
Training Code
The training notebook is available at: Colab link
License
This LoRA adapter is released as open-source under the Apache 2.0 License. It contains only the adapter weights and does not include any Meta LLaMA 3B base model weights.
You must still comply with the Meta Llama 3 license if using the base model together with this adapter.
Model tree for MagicaNeko/llama-3b-lora-dolly
Base model
meta-llama/Llama-3.2-3B-Instruct