Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

README.md +66 -0
checkpoint_epoch_1.pt +3 -0
latest_checkpoint.pt +3 -0
training_history.json +8 -0
trial_results.json +24 -0

README.md ADDED Viewed

	@@ -0,0 +1,66 @@

+---
+language: en
+license: mit
+tags:
+- mixture-of-experts
+- text-summarization
+- xsum
+- trial-run
+- pytorch
+datasets:
+- xsum
+metrics:
+- rouge
+---
+# MoE Text Summarization Model (Trial Run)
+## Model Description
+This is a Mixture-of-Experts (MoE) model for text summarization, trained on a small subset of the XSum dataset as a trial run.
+## Model Details
+- **Model Type**: Mixture-of-Experts Text Summarization
+- **Architecture**: Encoder-Decoder with MoE in encoder
+- **Training Data**: XSum dataset (trial: 10 samples)
+- **Routing Type**: Top-K routing
+- **Number of Experts**: 4
+- **Top-K**: 2
+## Files Included
+- `checkpoint_epoch_1.pt`: Model checkpoint after epoch 1
+- `latest_checkpoint.pt`: Latest model checkpoint
+- `training_history.json`: Training metrics and history
+- `trial_results.json`: Complete trial run results
+## Usage
+```python
+import torch
+# Load checkpoint
+checkpoint = torch.load('checkpoint_epoch_1.pt')
+model_state = checkpoint['model_state_dict']
+model_config = checkpoint['model_config']
+# You'll need the MoE implementation to load the model
+# See the original repository for the complete code
+```
+## Training Results
+This model was trained for demonstration purposes on only 10 samples from the XSum dataset.
+For production use, train on the full dataset.
+## Citation
+```bibtex
+@misc{moe-xsum-trial,
+  title={MoE Text Summarization Trial Model},
+  author={vivekdhayaal},
+  year={2024},
+  url={https://huggingface.co/vivekdhayaal/moe-xsum-trial}
+}
+```

checkpoint_epoch_1.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:314eabe1c53f812cd679cab32fd498f4832e8853e34fde2873b7f613c2c8a14c
+size 53576431

latest_checkpoint.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2348e87476215eb00541003e9357f706c6bc8d1cad19b796f5d5d5e46b3c0a4e
+size 53576422

training_history.json ADDED Viewed

	@@ -0,0 +1,8 @@

+[
+  {
+    "epoch": 1,
+    "train_loss": 10.6256,
+    "train_aux_loss": 0.0,
+    "timestamp": "2025-11-14T15:23:56.631190"
+  }
+]

trial_results.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "model_info": {
+    "repo_name": "moe-xsum-trial",
+    "vocab_size": 50265,
+    "d_model": 256,
+    "num_experts": 4,
+    "top_k": 2,
+    "routing_type": "topk",
+    "epochs": 1,
+    "final_loss": 10.6256,
+    "training_samples": 10
+  },
+  "training_history": [
+    {
+      "epoch": 1,
+      "train_loss": 10.6256,
+      "train_aux_loss": 0.0,
+      "timestamp": "2025-11-14T15:23:56.631190"
+    }
+  ],
+  "repo_url": null,
+  "checkpoint_dir": "trial_checkpoints",
+  "timestamp": "2025-11-14T15:23:56.631454"
+}