Upload folder using huggingface_hub

Files changed (6) hide show

README.md CHANGED Viewed

@@ -11,48 +11,53 @@ datasets:
 - xsum
 metrics:
 - rouge
 ---
 # MoE Text Summarization Model (Trial Run)
 ## Model Description
-This is a Mixture-of-Experts (MoE) model for text summarization, trained on a small subset of the XSum dataset as a trial run.
 ## Model Details
 - **Model Type**: Mixture-of-Experts Text Summarization
 - **Architecture**: Encoder-Decoder with MoE in encoder
 - **Training Data**: XSum dataset (trial: 10 samples)
-- **Routing Type**: Top-K routing
 - **Number of Experts**: 4
 - **Top-K**: 2
-## Files Included
-- `checkpoint_epoch_1.pt`: Model checkpoint after epoch 1
-- `latest_checkpoint.pt`: Latest model checkpoint
-- `training_history.json`: Training metrics and history
-- `trial_results.json`: Complete trial run results
 ## Usage
 ```python
 import torch
-# Load checkpoint
-checkpoint = torch.load('checkpoint_epoch_1.pt')
-model_state = checkpoint['model_state_dict']
-model_config = checkpoint['model_config']
-# You'll need the MoE implementation to load the model
-# See the original repository for the complete code
 ```
-## Training Results
-This model was trained for demonstration purposes on only 10 samples from the XSum dataset.
-For production use, train on the full dataset.
 ## Citation

 - xsum
 metrics:
 - rouge
+widget:
+- text: "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris."
+  example_title: "Sample Text"
 ---
 # MoE Text Summarization Model (Trial Run)
 ## Model Description
+This is a Mixture-of-Experts (MoE) model for text summarization, trained on a small subset of the XSum dataset as a trial run. The model demonstrates the MoE architecture with 4 experts and top-2 routing.
 ## Model Details
 - **Model Type**: Mixture-of-Experts Text Summarization
 - **Architecture**: Encoder-Decoder with MoE in encoder
 - **Training Data**: XSum dataset (trial: 10 samples)
+- **Routing Type**: topk
 - **Number of Experts**: 4
 - **Top-K**: 2
+## Training Details
+- **Training Samples**: 10 (trial run)
+- **Epochs**: 1
+- **Final Loss**: 10.604265594482422
 ## Usage
 ```python
 import torch
+from transformers import AutoTokenizer
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-xsum')
+# Load model (you'll need the MoE implementation)
+# model = MoESummarizationModel.from_pretrained('vivekdhayaal/moe-xsum-trial')
+# Example usage
+text = "Your input text here..."
+# Generate summary with the model
 ```
+## Note
+This is a trial run model trained on only 10 samples for demonstration purposes.
+For production use, train on the full XSum dataset.
 ## Citation

checkpoint_epoch_1.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:314eabe1c53f812cd679cab32fd498f4832e8853e34fde2873b7f613c2c8a14c
-size 53576431

 version https://git-lfs.github.com/spec/v1
+oid sha256:12c0245d7bc1ce82267cabbff6a034bf249c9854c9483bbc981a67ebdfcea8ab
+size 547645498

config.json ADDED Viewed

+{
+  "model_type": "moe_summarization",
+  "architectures": [
+    "MoESummarizationModel"
+  ],
+  "vocab_size": 50265,
+  "d_model": 256,
+  "num_experts": 4,
+  "top_k": 2,
+  "routing_type": "topk",
+  "trial_run": true,
+  "training_samples": 10
+}

latest_checkpoint.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2348e87476215eb00541003e9357f706c6bc8d1cad19b796f5d5d5e46b3c0a4e
-size 53576422

 version https://git-lfs.github.com/spec/v1
+oid sha256:716c086f434371987cdb6b0f45bdfd005151816727c9b6228441d6729fe94b04
+size 547645114

training_history.json CHANGED Viewed

@@ -1,8 +1,8 @@
 [
   {
     "epoch": 1,
-    "train_loss": 10.6256,
     "train_aux_loss": 0.0,
-    "timestamp": "2025-11-14T15:23:56.631190"
   }
 ]

 [
   {
     "epoch": 1,
+    "train_loss": 10.604265594482422,
     "train_aux_loss": 0.0,
+    "timestamp": "2025-11-14T15:34:39.752216"
   }
 ]

trial_results.json CHANGED Viewed

@@ -7,18 +7,18 @@
     "top_k": 2,
     "routing_type": "topk",
     "epochs": 1,
-    "final_loss": 10.6256,
     "training_samples": 10
   },
   "training_history": [
     {
       "epoch": 1,
-      "train_loss": 10.6256,
       "train_aux_loss": 0.0,
-      "timestamp": "2025-11-14T15:23:56.631190"
     }
   ],
   "repo_url": null,
   "checkpoint_dir": "trial_checkpoints",
-  "timestamp": "2025-11-14T15:23:56.631454"
 }

     "top_k": 2,
     "routing_type": "topk",
     "epochs": 1,
+    "final_loss": 10.625629997253418,
     "training_samples": 10
   },
   "training_history": [
     {
       "epoch": 1,
+      "train_loss": 10.625629997253418,
       "train_aux_loss": 0.0,
+      "timestamp": "2025-11-14T15:11:49.179446"
     }
   ],
   "repo_url": null,
   "checkpoint_dir": "trial_checkpoints",
+  "timestamp": "2025-11-14T15:11:52.756849"
 }