vivekdhayaal commited on
Commit
854c26d
·
verified ·
1 Parent(s): 566ac65

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - mixture-of-experts
6
+ - text-summarization
7
+ - xsum
8
+ - trial-run
9
+ - pytorch
10
+ datasets:
11
+ - xsum
12
+ metrics:
13
+ - rouge
14
+ ---
15
+
16
+ # MoE Text Summarization Model (Trial Run)
17
+
18
+ ## Model Description
19
+
20
+ This is a Mixture-of-Experts (MoE) model for text summarization, trained on a small subset of the XSum dataset as a trial run.
21
+
22
+ ## Model Details
23
+
24
+ - **Model Type**: Mixture-of-Experts Text Summarization
25
+ - **Architecture**: Encoder-Decoder with MoE in encoder
26
+ - **Training Data**: XSum dataset (trial: 10 samples)
27
+ - **Routing Type**: Top-K routing
28
+ - **Number of Experts**: 4
29
+ - **Top-K**: 2
30
+
31
+ ## Files Included
32
+
33
+ - `checkpoint_epoch_1.pt`: Model checkpoint after epoch 1
34
+ - `latest_checkpoint.pt`: Latest model checkpoint
35
+ - `training_history.json`: Training metrics and history
36
+ - `trial_results.json`: Complete trial run results
37
+
38
+ ## Usage
39
+
40
+ ```python
41
+ import torch
42
+
43
+ # Load checkpoint
44
+ checkpoint = torch.load('checkpoint_epoch_1.pt')
45
+ model_state = checkpoint['model_state_dict']
46
+ model_config = checkpoint['model_config']
47
+
48
+ # You'll need the MoE implementation to load the model
49
+ # See the original repository for the complete code
50
+ ```
51
+
52
+ ## Training Results
53
+
54
+ This model was trained for demonstration purposes on only 10 samples from the XSum dataset.
55
+ For production use, train on the full dataset.
56
+
57
+ ## Citation
58
+
59
+ ```bibtex
60
+ @misc{moe-xsum-trial,
61
+ title={MoE Text Summarization Trial Model},
62
+ author={vivekdhayaal},
63
+ year={2024},
64
+ url={https://huggingface.co/vivekdhayaal/moe-xsum-trial}
65
+ }
66
+ ```
checkpoint_epoch_1.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:314eabe1c53f812cd679cab32fd498f4832e8853e34fde2873b7f613c2c8a14c
3
+ size 53576431
latest_checkpoint.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2348e87476215eb00541003e9357f706c6bc8d1cad19b796f5d5d5e46b3c0a4e
3
+ size 53576422
training_history.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "epoch": 1,
4
+ "train_loss": 10.6256,
5
+ "train_aux_loss": 0.0,
6
+ "timestamp": "2025-11-14T15:23:56.631190"
7
+ }
8
+ ]
trial_results.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_info": {
3
+ "repo_name": "moe-xsum-trial",
4
+ "vocab_size": 50265,
5
+ "d_model": 256,
6
+ "num_experts": 4,
7
+ "top_k": 2,
8
+ "routing_type": "topk",
9
+ "epochs": 1,
10
+ "final_loss": 10.6256,
11
+ "training_samples": 10
12
+ },
13
+ "training_history": [
14
+ {
15
+ "epoch": 1,
16
+ "train_loss": 10.6256,
17
+ "train_aux_loss": 0.0,
18
+ "timestamp": "2025-11-14T15:23:56.631190"
19
+ }
20
+ ],
21
+ "repo_url": null,
22
+ "checkpoint_dir": "trial_checkpoints",
23
+ "timestamp": "2025-11-14T15:23:56.631454"
24
+ }