File size: 3,990 Bytes
557ca57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61348bf
557ca57
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
base_model: Kwaipilot/KAT-Dev-72B-Exp
tags:
- rust
- Hyperswitch
- LoRA
- CPT
- Fine-Tuned
- Causal-LM
pipeline_tag: text-generation
language:
- en
datasets:
- AdityaNarayan/HyperSwitch-Repo-CPT-Dataset
---
# Kwaipilot-KAT-Dev-CPT-LoRA-Adapter-HyperSwitch

A LoRA fine-tuned model based on **Kwaipilot/KAT-Dev-72B-Exp** specialized for the [Hyperswitch](https://github.com/juspay/hyperswitch) Rust codebase. This model excels at understanding payment processing patterns, Hyperswitch architecture, and Rust development practices.

## 🎯 Model Description

This LoRA adapter was trained on **16,731 samples** extracted from the Hyperswitch codebase to enhance code understanding, explanation, and generation within the payment processing domain.

- **Base Model**: Kwaipilot/KAT-Dev-72B-Exp
- **Training Type**: Causal Language Modeling (CLM) with LoRA
- **Domain**: Payment Processing, Rust Development
- **Specialization**: Hyperswitch codebase patterns and architecture

## πŸ“Š Training Details

### Dataset Composition
- **Total Samples**: 16,731
  - **File-level samples**: 2,120 complete files
  - **Granular samples**: 14,611 extracted components
    - Functions: 4,121
    - Structs: 5,710  
    - Traits: 223
    - Implementations: 4,296
    - Modules: 261

### LoRA Configuration
```yaml
r: 64                   # LoRA rank
alpha: 128              # LoRA alpha (2*r)
dropout: 0.05           # LoRA dropout
target_modules:         # Applied to all linear layers
  - q_proj, k_proj, v_proj, o_proj
  - gate_proj, up_proj, down_proj
```

### Training Hyperparameters
- **Epochs**: 2.3
- **Steps**: 550
- **Batch Size**: 2 per device (16 effective with gradient accumulation)
- **Learning Rate**: 5e-5 (cosine schedule)
- **Max Context**: 8,192 tokens
- **Hardware**: 2x NVIDIA H200 (80GB each)
- **Training Time**: ~4 hours (2,355 steps)

### Training Results
```
"final_train_loss": 0.2793,
"final_eval_loss": 0.3765236437320709,
"final_train_perplexity": 1.322203945559979,
"final_eval_perplexity": 1.457209992899547,
"final_token_accuracy": 0.9227368004620076,
"initial_loss": 1.6654,
"initial_perplexity": 5.2877879419709135,
"initial_accuracy": 0.6416946474462748
```

## πŸš€ Usage

### Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Kwaipilot/KAT-Dev-72B-Exp",
    dtype=torch.bfloat16,
    device_map="auto"
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Kwaipilot/KAT-Dev-72B-Exp")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "AdityaNarayan/KAT-Dev-72B-Exp-CPT-LoRA-Adapter-HyperSwitch")
# Generate code
prompt = """// Hyperswitch payment processing
pub fn validate_payment_method("""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.2,  # Lower temperature for code generation
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Recommended Settings
- **Temperature**: 0.2-0.3 for code generation
- **Temperature**: 0.5-0.7 for explanations and documentation
- **Max tokens**: 1024 for most tasks

## πŸ› οΈ Technical Specifications

- **Context Window**: 8,192 tokens
- **Precision**: bfloat16
- **Memory Usage**: ~78GB VRAM (32B base model)
- **Inference Speed**: Optimized with Flash Attention 2

## πŸ™ Acknowledgments

- **Kwaipilot Team** for the excellent KAT-Dev base model
- **Hyperswitch Team** for the open-source payment processing platform
- **Hugging Face** for the transformers and PEFT libraries

## πŸ“ž Citation

```bibtex
@misc{hyperswitch-kat-dev-lora-2024,
  title={KAT-Dev-72B-Exp-CPT-LoRA-Adapter-HyperSwitch},
  author={Aditya Narayan},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/AdityaNarayan/KAT-Dev-72B-Exp-CPT-LoRA-Adapter-HyperSwitch}
}
```