Llama-3.2-3B-Instruction-LoRA-Adapter

This repo contains a LoRA adapter finetuned on Meta Llama 3.2 (3B), trained using QLoRA and SFTTrainer on an instruction dataset derived from the Databricks Dolly instruction corpus.

Model Details

  • Base Model: meta-llama/Llama-3.2-3B
  • Model Type: LoRA adapter for causal language modeling
  • Finetuning Method: LoRA + QLoRA (4-bit quantization)
  • Trainable Parameters: ~0.33% via LoRA (rank=8)
  • Quantization: 4-bit NF4
  • Language: English

Training Data

Training Configuration

LoRA Hyperparameters

{
  "r": 8,
  "lora_alpha": 32,
  "target_modules": ["q_proj", "k_proj", "v_proj", "out_proj", "fc_in", "fc_out", "wte"],
  "lora_dropout": 0.05,
  "bias": "none",
  "task_type": "CAUSAL_LM"
}

Training Hyperparameters

{
  "num_train_epochs": 3,
  "per_device_train_batch_size": 1,
  "gradient_accumulation_steps": 1,
  "learning_rate": 2e-4,
  "weight_decay": 0.001,
  "warmup_ratio": 0.03,
  "lr_scheduler_type": "constant",
  "max_seq_length": 256,
  "optim": "paged_adamw_32bit",
  "gradient_checkpointing": true,
  "eval_strategy": "steps",
  "eval_steps": 100,
  "save_steps": 100,
  "logging_steps": 100,
  "packing": true
}

QLoRA Hyperparameters

{
  "load_in_4bit": true,
  "bnb_4bit_quant_type": "nf4",
  "bnb_4bit_compute_dtype": "float16",
  "bnb_4bit_use_double_quant": false
}

Training Setup

{
  "framework": "PyTorch with Hugging Face Transformers",
  "fine_tuning_method": "LoRA (Low-Rank Adaptation)",
  "quantization": "4-bit NF4 with QLoRA",
  "compute": "Google Colab T4",
  "gpu_memory": "~16GB VRAM",
}

Usage

Installation

pip install torch transformers accelerate peft bitsandbytes

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
model_name = "meta-llama/Llama-3.2-3B"
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
    return_dict=True,
    device_map="auto",
)

# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "MagicaNeko/llama-3b-lora-dolly",
    subfolder="model-ft-tokenizer"
)

# Load LoRA Adapter
model = PeftModel.from_pretrained(
    base_model,
    "MagicaNeko/llama-3b-lora-dolly",
    subfolder="model-ft-lora-adapter"
)

# Merge adapters
model = model.merge_and_unload()

# Inference
prompt = "What is machine learning?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))```

Training Code

The training notebook is available at: Colab link

License

This LoRA adapter is released as open-source under the Apache 2.0 License. It contains only the adapter weights and does not include any Meta LLaMA 3B base model weights.

You must still comply with the Meta Llama 3 license if using the base model together with this adapter.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MagicaNeko/llama-3b-lora-dolly

Adapter
(571)
this model

Datasets used to train MagicaNeko/llama-3b-lora-dolly