updated README.md

53b715d verified about 1 month ago

3.88 kB

	---
	base_model: Kwaipilot/KAT-Dev-72B-Exp
	tags:
	- rust
	- Hyperswitch
	- LoRA
	- CPT
	- Fine-Tuned
	- Causal-LM
	pipeline_tag: text-generation
	language:
	- en
	datasets:
	- AdityaNarayan/HyperSwitch-Repo-CPT-Dataset
	---
	# KAT-Dev-72B-Exp-CPT-LoRA-Adapter-HyperSwitch

	A LoRA fine-tuned model based on Kwaipilot/KAT-Dev-72B-Exp specialized for the [Hyperswitch](https://github.com/juspay/hyperswitch) Rust codebase. This model excels at understanding payment processing patterns, Hyperswitch architecture, and Rust development practices.

	## 🎯 Model Description

	This LoRA adapter was trained on 16,731 samples extracted from the Hyperswitch codebase to enhance code understanding, explanation, and generation within the payment processing domain.

	- Base Model: Kwaipilot/KAT-Dev-72B-Exp
	- Training Type: Causal Language Modeling (CLM) with LoRA
	- Domain: Payment Processing, Rust Development
	- Specialization: Hyperswitch codebase patterns and architecture

	## 📊 Training Details

	### Dataset Composition
	- Total Samples: 16,731
	- File-level samples: 2,120 complete files
	- Granular samples: 14,611 extracted components
	- Functions: 4,121
	- Structs: 5,710
	- Traits: 223
	- Implementations: 4,296
	- Modules: 261

	### LoRA Configuration
	```yaml
	r: 64 # LoRA rank
	alpha: 128 # LoRA alpha (2*r)
	dropout: 0.05 # LoRA dropout
	target_modules: # Applied to all linear layers
	- q_proj, k_proj, v_proj, o_proj
	- gate_proj, up_proj, down_proj
	```

	### Training Hyperparameters
	- Epochs: 3
	- Learning Rate: 5e-5 (cosine schedule)
	- Max Context: 8,192 tokens
	- Hardware: 4 x NVIDIA H200


	### Training Results
	```
	"final_train_loss": 0.2641,
	"final_eval_loss": 0.37574875354766846,
	"final_train_perplexity": 1.3022584156313823,
	"final_eval_perplexity": 1.4560812525608204,
	"final_token_accuracy": 0.9259863365441561,
	"initial_loss": 1.6648,
	"initial_perplexity": 5.284616220817229,
	"initial_accuracy": 0.6015806214883923
	```

	## 🚀 Usage

	### Quick Start
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel
	import torch
	# Load base model
	base_model = AutoModelForCausalLM.from_pretrained(
	"Kwaipilot/KAT-Dev-72B-Exp",
	dtype=torch.bfloat16,
	device_map="auto"
	)
	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained("Kwaipilot/KAT-Dev-72B-Exp")
	# Load LoRA adapter
	model = PeftModel.from_pretrained(base_model, "AdityaNarayan/KAT-Dev-72B-Exp-CPT-LoRA-Adapter-HyperSwitch")
	# Generate code
	prompt = """// Hyperswitch payment processing
	pub fn validate_payment_method("""
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(
	**inputs,
	max_new_tokens=200,
	temperature=0.2, # Lower temperature for code generation
	do_sample=True,
	pad_token_id=tokenizer.eos_token_id
	)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Recommended Settings
	- Temperature: 0.2-0.3 for code generation
	- Temperature: 0.5-0.7 for explanations and documentation
	- Max tokens: 1024 for most tasks

	## 🛠️ Technical Specifications

	- Context Window: 8,192 tokens
	- Precision: bfloat16
	- Memory Usage: ~78GB VRAM (32B base model)
	- Inference Speed: Optimized with Flash Attention 2

	## 🙏 Acknowledgments

	- Kwaipilot Team for the excellent KAT-Dev base model
	- Hyperswitch Team for the open-source payment processing platform
	- Hugging Face for the transformers and PEFT libraries

	## 📞 Citation

	```bibtex
	@misc{hyperswitch-kat-dev-lora-2024,
	title={KAT-Dev-72B-Exp-CPT-LoRA-Adapter-HyperSwitch},
	author={Aditya Narayan},
	year={2024},
	publisher={Hugging Face},
	url={https://huggingface.co/AdityaNarayan/KAT-Dev-72B-Exp-CPT-LoRA-Adapter-HyperSwitch}
	}
	```