metadata
language: en
license: mit
tags:
- mixture-of-experts
- text-summarization
- xsum
- trial-run
- pytorch
datasets:
- xsum
metrics:
- rouge
widget:
- text: >-
The tower is 324 metres (1,063 ft) tall, about the same height as an
81-storey building, and the tallest structure in Paris.
example_title: Sample Text
MoE Text Summarization Model (Trial Run)
Model Description
This is a Mixture-of-Experts (MoE) model for text summarization, trained on a small subset of the XSum dataset as a trial run. The model demonstrates the MoE architecture with 4 experts and top-2 routing.
Model Details
- Model Type: Mixture-of-Experts Text Summarization
- Architecture: Encoder-Decoder with MoE in encoder
- Training Data: XSum dataset (trial: 10 samples)
- Routing Type: topk
- Number of Experts: 4
- Top-K: 2
Training Details
- Training Samples: 10 (trial run)
- Epochs: 1
- Final Loss: 10.604265594482422
Usage
import torch
from transformers import AutoTokenizer
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-xsum')
# Load model (you'll need the MoE implementation)
# model = MoESummarizationModel.from_pretrained('vivekdhayaal/moe-xsum-trial')
# Example usage
text = "Your input text here..."
# Generate summary with the model
Note
This is a trial run model trained on only 10 samples for demonstration purposes. For production use, train on the full XSum dataset.
Citation
@misc{moe-xsum-trial,
title={MoE Text Summarization Trial Model},
author={vivekdhayaal},
year={2024},
url={https://huggingface.co/vivekdhayaal/moe-xsum-trial}
}