MoE Text Summarization Model (Trial Run)
Model Description
This is a Mixture-of-Experts (MoE) model for text summarization, trained on a small subset of the XSum dataset as a trial run. The model demonstrates the MoE architecture with 4 experts and top-2 routing.
Model Details
- Model Type: Mixture-of-Experts Text Summarization
- Architecture: Encoder-Decoder with MoE in encoder
- Training Data: XSum dataset (trial: 10 samples)
- Routing Type: topk
- Number of Experts: 4
- Top-K: 2
Training Details
- Training Samples: 10 (trial run)
- Epochs: 1
- Final Loss: 10.604265594482422
Usage
import torch
from transformers import AutoTokenizer
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-xsum')
# Load model (you'll need the MoE implementation)
# model = MoESummarizationModel.from_pretrained('vivekdhayaal/moe-xsum-trial')
# Example usage
text = "Your input text here..."
# Generate summary with the model
Note
This is a trial run model trained on only 10 samples for demonstration purposes. For production use, train on the full XSum dataset.
Citation
@misc{moe-xsum-trial,
title={MoE Text Summarization Trial Model},
author={vivekdhayaal},
year={2024},
url={https://huggingface.co/vivekdhayaal/moe-xsum-trial}
}
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support