|
|
--- |
|
|
base_model: Kwaipilot/KAT-Dev-72B-Exp |
|
|
tags: |
|
|
- rust |
|
|
- Hyperswitch |
|
|
- LoRA |
|
|
- CPT |
|
|
- Fine-Tuned |
|
|
- Causal-LM |
|
|
pipeline_tag: text-generation |
|
|
language: |
|
|
- en |
|
|
datasets: |
|
|
- AdityaNarayan/HyperSwitch-Repo-CPT-Dataset |
|
|
--- |
|
|
# KAT-Dev-72B-Exp-CPT-LoRA-Adapter-HyperSwitch |
|
|
|
|
|
A LoRA fine-tuned model based on **Kwaipilot/KAT-Dev-72B-Exp** specialized for the [Hyperswitch](https://github.com/juspay/hyperswitch) Rust codebase. This model excels at understanding payment processing patterns, Hyperswitch architecture, and Rust development practices. |
|
|
|
|
|
## π― Model Description |
|
|
|
|
|
This LoRA adapter was trained on **16,731 samples** extracted from the Hyperswitch codebase to enhance code understanding, explanation, and generation within the payment processing domain. |
|
|
|
|
|
- **Base Model**: Kwaipilot/KAT-Dev-72B-Exp |
|
|
- **Training Type**: Causal Language Modeling (CLM) with LoRA |
|
|
- **Domain**: Payment Processing, Rust Development |
|
|
- **Specialization**: Hyperswitch codebase patterns and architecture |
|
|
|
|
|
## π Training Details |
|
|
|
|
|
### Dataset Composition |
|
|
- **Total Samples**: 16,731 |
|
|
- **File-level samples**: 2,120 complete files |
|
|
- **Granular samples**: 14,611 extracted components |
|
|
- Functions: 4,121 |
|
|
- Structs: 5,710 |
|
|
- Traits: 223 |
|
|
- Implementations: 4,296 |
|
|
- Modules: 261 |
|
|
|
|
|
### LoRA Configuration |
|
|
```yaml |
|
|
r: 64 # LoRA rank |
|
|
alpha: 128 # LoRA alpha (2*r) |
|
|
dropout: 0.05 # LoRA dropout |
|
|
target_modules: # Applied to all linear layers |
|
|
- q_proj, k_proj, v_proj, o_proj |
|
|
- gate_proj, up_proj, down_proj |
|
|
``` |
|
|
|
|
|
### Training Hyperparameters |
|
|
- **Epochs**: 3 |
|
|
- **Learning Rate**: 5e-5 (cosine schedule) |
|
|
- **Max Context**: 8,192 tokens |
|
|
- **Hardware**: 4 x NVIDIA H200 |
|
|
|
|
|
|
|
|
### Training Results |
|
|
``` |
|
|
"final_train_loss": 0.2641, |
|
|
"final_eval_loss": 0.37574875354766846, |
|
|
"final_train_perplexity": 1.3022584156313823, |
|
|
"final_eval_perplexity": 1.4560812525608204, |
|
|
"final_token_accuracy": 0.9259863365441561, |
|
|
"initial_loss": 1.6648, |
|
|
"initial_perplexity": 5.284616220817229, |
|
|
"initial_accuracy": 0.6015806214883923 |
|
|
``` |
|
|
|
|
|
## π Usage |
|
|
|
|
|
### Quick Start |
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
from peft import PeftModel |
|
|
import torch |
|
|
# Load base model |
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
|
"Kwaipilot/KAT-Dev-72B-Exp", |
|
|
dtype=torch.bfloat16, |
|
|
device_map="auto" |
|
|
) |
|
|
# Load tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained("Kwaipilot/KAT-Dev-72B-Exp") |
|
|
# Load LoRA adapter |
|
|
model = PeftModel.from_pretrained(base_model, "AdityaNarayan/KAT-Dev-72B-Exp-CPT-LoRA-Adapter-HyperSwitch") |
|
|
# Generate code |
|
|
prompt = """// Hyperswitch payment processing |
|
|
pub fn validate_payment_method(""" |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=200, |
|
|
temperature=0.2, # Lower temperature for code generation |
|
|
do_sample=True, |
|
|
pad_token_id=tokenizer.eos_token_id |
|
|
) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
### Recommended Settings |
|
|
- **Temperature**: 0.2-0.3 for code generation |
|
|
- **Temperature**: 0.5-0.7 for explanations and documentation |
|
|
- **Max tokens**: 1024 for most tasks |
|
|
|
|
|
## π οΈ Technical Specifications |
|
|
|
|
|
- **Context Window**: 8,192 tokens |
|
|
- **Precision**: bfloat16 |
|
|
- **Memory Usage**: ~78GB VRAM (32B base model) |
|
|
- **Inference Speed**: Optimized with Flash Attention 2 |
|
|
|
|
|
## π Acknowledgments |
|
|
|
|
|
- **Kwaipilot Team** for the excellent KAT-Dev base model |
|
|
- **Hyperswitch Team** for the open-source payment processing platform |
|
|
- **Hugging Face** for the transformers and PEFT libraries |
|
|
|
|
|
## π Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{hyperswitch-kat-dev-lora-2024, |
|
|
title={KAT-Dev-72B-Exp-CPT-LoRA-Adapter-HyperSwitch}, |
|
|
author={Aditya Narayan}, |
|
|
year={2024}, |
|
|
publisher={Hugging Face}, |
|
|
url={https://huggingface.co/AdityaNarayan/KAT-Dev-72B-Exp-CPT-LoRA-Adapter-HyperSwitch} |
|
|
} |
|
|
``` |