moe_lora_transformer_hash_mha_r8_alpha32

This is a Mixture of Experts (MoE) transformer model with LoRA (Low-Rank Adaptation) experts trained for text summarization.

Model Details

Model Type: Mixture of Experts Transformer with LoRA
Router Type: Hash Router
Attention Type: Multi-Head Attention (MHA)
LoRA Configuration:
- Rank (r): 8
- Alpha: 32
Training Epochs: 3
Task: Text Summarization

Architecture

This model uses:

LoRA-based expert networks for parameter efficiency
Hash routing for expert selection
Multi-head attention mechanism
Trained on text summarization tasks

Usage

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("ManasMittal2005/moe_lora_transformer_hash_mha_r8_alpha32")
model = AutoModel.from_pretrained("ManasMittal2005/moe_lora_transformer_hash_mha_r8_alpha32")

Training

The model was trained using the ANLP-3 assignment framework with:

LoRA experts for parameter efficiency
Hash router for deterministic expert selection
Standard multi-head attention
Optimized for text summarization tasks

Downloads last month: 14

Safetensors

Model size

0.2B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support