moe_lora_transformer_hash_mha_r8_alpha32
This is a Mixture of Experts (MoE) transformer model with LoRA (Low-Rank Adaptation) experts trained for text summarization.
Model Details
- Model Type: Mixture of Experts Transformer with LoRA
- Router Type: Hash Router
- Attention Type: Multi-Head Attention (MHA)
- LoRA Configuration:
- Rank (r): 8
- Alpha: 32
- Training Epochs: 3
- Task: Text Summarization
Architecture
This model uses:
- LoRA-based expert networks for parameter efficiency
- Hash routing for expert selection
- Multi-head attention mechanism
- Trained on text summarization tasks
Usage
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("ManasMittal2005/moe_lora_transformer_hash_mha_r8_alpha32")
model = AutoModel.from_pretrained("ManasMittal2005/moe_lora_transformer_hash_mha_r8_alpha32")
Training
The model was trained using the ANLP-3 assignment framework with:
- LoRA experts for parameter efficiency
- Hash router for deterministic expert selection
- Standard multi-head attention
- Optimized for text summarization tasks
- Downloads last month
- 14
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support