nielsr HF Staff commited on
Commit
e35a935
Β·
verified Β·
1 Parent(s): e11ff4e

Improve model card with pipeline tag, library name, GitHub link, and additional sections

Browse files

This PR enhances the model card for dParallel-LLaDA-8B-instruct by:
- Adding `pipeline_tag: text-generation` to ensure the model is discoverable in the text generation category.
- Adding `library_name: transformers` to enable the automated "how to use" widget, as the model is compatible with the πŸ€— Transformers library.
- Including a direct link to the GitHub repository in the introductory badges for easier access to the codebase.
- Integrating several useful sections (Updates, Installation, Evaluation, Training, and Acknowledgement) from the GitHub README to provide more complete information for users.

These changes will significantly improve the model's visibility and usability on the Hugging Face Hub.

Files changed (1) hide show
  1. README.md +44 -11
README.md CHANGED
@@ -1,9 +1,9 @@
1
  ---
2
  license: mit
 
 
3
  ---
4
 
5
-
6
-
7
  <div align="center">
8
  <h1>πŸš€ dParallel: Learnable Parallel Decoding for dLLMs</h1>
9
  <div align="center">
@@ -19,16 +19,21 @@ license: mit
19
  <a href="https://huggingface.co/datasets/Zigeng/dParallel_LLaDA_Distill_Data">
20
  <img src="https://img.shields.io/badge/HuggingFace-Data-FFB000.svg" alt="Project">
21
  </a>
 
 
 
22
  </div>
23
  </div>
24
 
 
 
25
  > **dParallel: Learnable Parallel Decoding for dLLMs**
26
  > [Zigeng Chen](https://github.com/czg1225), [Gongfan Fang](https://fangggf.github.io/), [Xinyin Ma](https://horseee.github.io/), [Ruonan Yu](https://scholar.google.com/citations?user=UHP95egAAAAJ&hl=en), [Xinchao Wang](https://sites.google.com/site/sitexinchaowang/)
27
  > [xML Lab](https://sites.google.com/view/xml-nus), National University of Singapore
28
 
29
 
30
  ## πŸ’‘ Introduction
31
- We introduce dParallel, a simple and effective method that unlocks the inherent parallelism of dLLMs for fast sampling. We identify that the key bottleneck to parallel decoding arises from the sequential certainty convergence for masked tokens. Building on this insight, we introduce the core of our approach: certainty-forcing distillation, a novel training strategy that distills the model to follow its original sampling trajectories while enforcing it to achieve high certainty on masked tokens more rapidly and in parallel. Extensive experiments across various benchmarks demonstrate that our method can dramatically reduce the number of decoding steps while maintaining performance. When applied to the LLaDA-8B-Instruct model, dParallel reduces decoding steps from 256 to 30 on GSM8K, achieving an 8.5Γ— speedup without performance degradation. On the MBPP benchmark, it cuts decoding steps from 256 to 24, resulting in a 10.5Γ— speedup while maintaining accuracy.
32
 
33
  <!-- ![figure](assets/intro.png) -->
34
  <div align="center">
@@ -64,6 +69,17 @@ dParallel-LLaDA-Distill Dataset</a></td>
64
  </tbody>
65
  </table>
66
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
  ## πŸš€ Quick Start:
69
  ```python
@@ -89,6 +105,27 @@ print("Response:",tokenizer.batch_decode(out[0][:, input_ids.shape[1]:], skip_sp
89
  print("NFE:",out[1])
90
  ```
91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92
  ## πŸ“– Experimental Results
93
  ### Results on LLaDA-8B-Instruct:
94
  ![llada-exp](assets/llada_exp.png)
@@ -99,6 +136,9 @@ print("NFE:",out[1])
99
  ### Better Speed-Accuracy Trade-off:
100
  ![trade-off](assets/trade-off.png)
101
 
 
 
 
102
  ## Citation
103
  If our research assists your work, please give us a star ⭐ or cite us using:
104
  ```
@@ -111,11 +151,4 @@ If our research assists your work, please give us a star ⭐ or cite us using:
111
  primaryClass={cs.CL},
112
  url={https://arxiv.org/abs/2509.26488},
113
  }
114
- ```
115
-
116
-
117
-
118
-
119
-
120
-
121
-
 
1
  ---
2
  license: mit
3
+ pipeline_tag: text-generation
4
+ library_name: transformers
5
  ---
6
 
 
 
7
  <div align="center">
8
  <h1>πŸš€ dParallel: Learnable Parallel Decoding for dLLMs</h1>
9
  <div align="center">
 
19
  <a href="https://huggingface.co/datasets/Zigeng/dParallel_LLaDA_Distill_Data">
20
  <img src="https://img.shields.io/badge/HuggingFace-Data-FFB000.svg" alt="Project">
21
  </a>
22
+ <a href="https://github.com/czg1225/dParallel">
23
+ <img src="https://img.shields.io/badge/GitHub-Code-blue.svg?logo=github&" alt="GitHub">
24
+ </a>
25
  </div>
26
  </div>
27
 
28
+ https://github.com/user-attachments/assets/89d81255-9cd8-46d1-886e-0733938e5328
29
+
30
  > **dParallel: Learnable Parallel Decoding for dLLMs**
31
  > [Zigeng Chen](https://github.com/czg1225), [Gongfan Fang](https://fangggf.github.io/), [Xinyin Ma](https://horseee.github.io/), [Ruonan Yu](https://scholar.google.com/citations?user=UHP95egAAAAJ&hl=en), [Xinchao Wang](https://sites.google.com/site/sitexinchaowang/)
32
  > [xML Lab](https://sites.google.com/view/xml-nus), National University of Singapore
33
 
34
 
35
  ## πŸ’‘ Introduction
36
+ We introduce dParallel, a simple and effective method that unlocks the inherent parallelism of dLLMs for fast sampling. We identify that the key bottleneck to parallel decoding arises from the sequential certainty convergence for masked tokens. Building on this insight, we introduce the core of our approach: certainty-forcing distillation, a novel training strategy that distills the model to follow its original sampling trajectories while enforcing it to achieve high certainty on masked tokens more rapidly and in parallel. Extensive experiments across various benchmarks demonstrate that our method can dramatically reduce the number of decoding steps while maintaining performance. When applied to the LLaDA-8B-Instruct model, dParallel reduces decoding steps from 256 to 30 on GSM8K, achieving an 8.5x speedup without performance degradation. On the MBPP benchmark, it cuts decoding steps from 256 to 24, resulting in a 10.5x speedup while maintaining accuracy.
37
 
38
  <!-- ![figure](assets/intro.png) -->
39
  <div align="center">
 
69
  </tbody>
70
  </table>
71
 
72
+ ## πŸ”₯Updates
73
+ * πŸ”₯ **[Oct 1, 2025]**: Our arxiv paper is available.
74
+ * πŸ”₯ **[Oct 1, 2025]**: Code, model and dataset are released.
75
+
76
+ ## πŸ”§ Installation:
77
+
78
+ ```bash
79
+ conda create -n dparallel python==3.10
80
+ conda activate dparallel
81
+ pip3 install -r requirements.txt
82
+ ```
83
 
84
  ## πŸš€ Quick Start:
85
  ```python
 
105
  print("NFE:",out[1])
106
  ```
107
 
108
+ ## ⚑ Evaluation:
109
+ We provide evaluation scripts covering GSM8K, Minerva_MATH, HumanEval, and MBPP benchmarks. Importantly, both our reported results and the accompanying code are obtained without using caching or sparse attention techniques. Nevertheless, our method is fully compatible with these optimizations, and integrating them can yield even greater speedups.
110
+ ```bash
111
+ sh eval.sh
112
+ ```
113
+
114
+ ## πŸ”₯ Training
115
+ ### 1. Certainty-Forcing Distillation with LoRA:
116
+ We provide training scripts for our proposed Certainty-Forcing Distillation process. The implementation utilizes LoRA during the training process, with the configuration details specified in [config_lora_llada.yaml](https://github.com/czg1225/dParallel/blob/master/configs/config_lora_llada.yaml). The training can be completed with 24 GB memory GPUs.
117
+ ```python
118
+ deepspeed --master_port 29501 --include localhost:0,1,2,3,4,5,6,7 llada_train.py
119
+ ```
120
+
121
+ ### 2. LoRA Merge:
122
+ After training, merge the LoRA weights to get the dParallel-dLLM.
123
+ ```python
124
+ python merge_lora.py
125
+ ```
126
+
127
+
128
+
129
  ## πŸ“– Experimental Results
130
  ### Results on LLaDA-8B-Instruct:
131
  ![llada-exp](assets/llada_exp.png)
 
136
  ### Better Speed-Accuracy Trade-off:
137
  ![trade-off](assets/trade-off.png)
138
 
139
+ ## β˜€οΈ Acknowledgement
140
+ Our code builds on [LLaDA](https://github.com/ML-GSAI/LLaDA), [Dream](https://github.com/DreamLM/Dream), [Fast-dLLM](https://github.com/NVlabs/Fast-dLLM/tree/main), and [dKV-Cache](https://github.com/horseee/dkv-cache), and we acknowledge these great works for laying the groundwork that made our approach possible.
141
+
142
  ## Citation
143
  If our research assists your work, please give us a star ⭐ or cite us using:
144
  ```
 
151
  primaryClass={cs.CL},
152
  url={https://arxiv.org/abs/2509.26488},
153
  }
154
+ ```