MoE Text Summarization Model (Trial Run)

Model Description

This is a Mixture-of-Experts (MoE) model for text summarization, trained on a small subset of the XSum dataset as a trial run. The model demonstrates the MoE architecture with 4 experts and top-2 routing.

Model Details

  • Model Type: Mixture-of-Experts Text Summarization
  • Architecture: Encoder-Decoder with MoE in encoder
  • Training Data: XSum dataset (trial: 10 samples)
  • Routing Type: topk
  • Number of Experts: 4
  • Top-K: 2

Training Details

  • Training Samples: 10 (trial run)
  • Epochs: 1
  • Final Loss: 10.604265594482422

Usage

import torch
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-xsum')

# Load model (you'll need the MoE implementation)
# model = MoESummarizationModel.from_pretrained('vivekdhayaal/moe-xsum-trial')

# Example usage
text = "Your input text here..."
# Generate summary with the model

Note

This is a trial run model trained on only 10 samples for demonstration purposes. For production use, train on the full XSum dataset.

Citation

@misc{moe-xsum-trial,
  title={MoE Text Summarization Trial Model},
  author={vivekdhayaal},
  year={2024},
  url={https://huggingface.co/vivekdhayaal/moe-xsum-trial}
}
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train vivekdhayaal/moe-xsum-trial