moe-xsum-trial / README.md

vivekdhayaal

Upload folder using huggingface_hub

b8590b2 verified about 1 month ago

preview code

raw

history blame contribute delete

1.68 kB

metadata

language: en
license: mit
tags:
  - mixture-of-experts
  - text-summarization
  - xsum
  - trial-run
  - pytorch
datasets:
  - xsum
metrics:
  - rouge
widget:
  - text: >-
      The tower is 324 metres (1,063 ft) tall, about the same height as an
      81-storey building, and the tallest structure in Paris.
    example_title: Sample Text

MoE Text Summarization Model (Trial Run)

Model Description

This is a Mixture-of-Experts (MoE) model for text summarization, trained on a small subset of the XSum dataset as a trial run. The model demonstrates the MoE architecture with 4 experts and top-2 routing.

Model Details

Model Type: Mixture-of-Experts Text Summarization
Architecture: Encoder-Decoder with MoE in encoder
Training Data: XSum dataset (trial: 10 samples)
Routing Type: topk
Number of Experts: 4
Top-K: 2

Training Details

Training Samples: 10 (trial run)
Epochs: 1
Final Loss: 10.604265594482422

Usage

import torch
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-xsum')

# Load model (you'll need the MoE implementation)
# model = MoESummarizationModel.from_pretrained('vivekdhayaal/moe-xsum-trial')

# Example usage
text = "Your input text here..."
# Generate summary with the model

Note

This is a trial run model trained on only 10 samples for demonstration purposes. For production use, train on the full XSum dataset.

Citation

@misc{moe-xsum-trial,
  title={MoE Text Summarization Trial Model},
  author={vivekdhayaal},
  year={2024},
  url={https://huggingface.co/vivekdhayaal/moe-xsum-trial}
}