moe_lora_transformer_hash_mha_r8_alpha32

This is a Mixture of Experts (MoE) transformer model with LoRA (Low-Rank Adaptation) experts trained for text summarization.

Model Details

  • Model Type: Mixture of Experts Transformer with LoRA
  • Router Type: Hash Router
  • Attention Type: Multi-Head Attention (MHA)
  • LoRA Configuration:
    • Rank (r): 8
    • Alpha: 32
  • Training Epochs: 3
  • Task: Text Summarization

Architecture

This model uses:

  • LoRA-based expert networks for parameter efficiency
  • Hash routing for expert selection
  • Multi-head attention mechanism
  • Trained on text summarization tasks

Usage

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("ManasMittal2005/moe_lora_transformer_hash_mha_r8_alpha32")
model = AutoModel.from_pretrained("ManasMittal2005/moe_lora_transformer_hash_mha_r8_alpha32")

Training

The model was trained using the ANLP-3 assignment framework with:

  • LoRA experts for parameter efficiency
  • Hash router for deterministic expert selection
  • Standard multi-head attention
  • Optimized for text summarization tasks
Downloads last month
14
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support