MoE Text Summarization Model (Trial Run)

Model Description

This is a Mixture-of-Experts (MoE) model for text summarization, trained on a small subset of the XSum dataset as a trial run. The model demonstrates the MoE architecture with 4 experts and top-2 routing.

Model Details

Model Type: Mixture-of-Experts Text Summarization
Architecture: Encoder-Decoder with MoE in encoder
Training Data: XSum dataset (trial: 10 samples)
Routing Type: topk
Number of Experts: 4
Top-K: 2

Training Details

Training Samples: 10 (trial run)
Epochs: 1
Final Loss: 10.604265594482422

Usage

import torch
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-xsum')

# Load model (you'll need the MoE implementation)
# model = MoESummarizationModel.from_pretrained('vivekdhayaal/moe-xsum-trial')

# Example usage
text = "Your input text here..."
# Generate summary with the model

Note

This is a trial run model trained on only 10 samples for demonstration purposes. For production use, train on the full XSum dataset.

Citation

@misc{moe-xsum-trial,
  title={MoE Text Summarization Trial Model},
  author={vivekdhayaal},
  year={2024},
  url={https://huggingface.co/vivekdhayaal/moe-xsum-trial}
}

Downloads last month: 5

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

vivekdhayaal
/

moe-xsum-trial