nimafathi commited on
Commit
19a2ef5
·
verified ·
1 Parent(s): 5414c4a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +117 -112
README.md CHANGED
@@ -2,139 +2,144 @@
2
  language:
3
  - en
4
  tags:
5
- - diffusion-language-model
6
  - dllm
 
7
  - text-generation
8
  - diffusion
9
  - language-model
10
- license: mit
11
  ---
12
 
13
- # hdlm-group/hdlm-base-gamma-0.05
14
-
15
- This is a gamma_hybrid diffusion language model trained on text data.
16
-
17
- ## Model Details
18
-
19
- - **Model Type**: gamma_hybrid
20
- - **Architecture**: Diffusion-based language model
21
- - **Training Method**: Gamma-hybrid diffusion training
22
-
23
- ## Configuration
24
-
25
- ```yaml
26
- ngpus: 4
27
- gradient_accumulation_steps: 8
28
- pretrain_autoregressive_path: /home/toolkit/research-diffcodegen/exp_local/openwebtext/mdlm-autoregressive/org-DiTAR-absorb-v2/checkpoints-meta/checkpoint.pth
29
- tokenizer:
30
- tokens: 50257
31
- model: gpt2
32
- training:
33
- batch_size: 512
34
- accum: ${gradient_accumulation_steps}
35
- n_iters: 1000000
36
- snapshot_freq: 500
37
- log_freq: 100
38
- eval_freq: 500
39
- snapshot_freq_for_preemption: 3000
40
- weight: standard
41
- snapshot_sampling: true
42
- ema: 0.9999
43
- warmup_iter: -1
44
- data:
45
- train: openwebtext-train
46
- valid: wikitext103
47
- cache_dir: /home/toolkit/research-diffcodegen/data
48
- debug: false
49
- graph:
50
- type: QGamma
51
- gamma: 0.05
52
- file: /home/toolkit/research-diffcodegen/data
53
- report_all: false
54
- expanded_sigma: true
55
- noise:
56
- type: loglinear
57
- sigma_min: 0.0001
58
- sigma_max: 2.0
59
- ar_diffusion: false
60
- expanded_sigma: ${graph.expanded_sigma}
61
- sampling:
62
- predictor: analytic
63
- steps_per_level: 1
64
- noise_removal: true
65
- strategy: direct
66
- strategy_param: 0.9
67
- annealing:
68
- type: block
69
- efficient: false
70
- width: 1024
71
- tau: 2048
72
- eval_tau: 256
73
- steps_per_level: ${sampling.steps_per_level}
74
- sampling_method: SAR
75
- diffusion_loss_weight: 1.0
76
- ce_loss_weight: 4.0
77
- sampling_eps: 0.0001
78
- attention:
79
- context_type: block_causal
80
- block_type: full
81
- match_inference: true
82
- eval:
83
- batch_size: 32
84
- perplexity: true
85
- perplexity_batch_size: 16
86
- optim:
87
- weight_decay: 0.0
88
- optimizer: AdamW
89
- lr: 0.0003
90
- beta1: 0.9
91
- beta2: 0.999
92
- eps: 1.0e-08
93
- warmup: 10000
94
- grad_clip: 1.0
95
- scheduler: lambda
96
- experiment:
97
- name: QGamma0.05-v2
98
- wandb_project: debug-QGamma
99
- model:
100
- name: gamma_hdlm
101
- type: ddit
102
- hidden_size: 768
103
- cond_dim: 128
104
- length: 1024
105
- n_blocks: 12
106
- n_heads: 12
107
- scale_by_sigma: false
108
- dropout: 0.1
109
- transformer_sigma_conditioning: true
110
- hybrid_sigma_embedding: true
111
- post_process_logits: true
112
- use_timestep_embedding: true
113
- model_type: gamma_hybrid
114
 
115
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116
 
117
  ## Usage
118
 
 
 
119
  ```python
120
- from our.hf_utils import smart_model_loader
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
121
 
122
- # Load the model
123
- model, config, device, accelerator, metaschedule = smart_model_loader(
124
- "hdlm-group/hdlm-base-gamma-0.05",
125
- model_type="gamma_hybrid"
 
126
  )
127
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
128
  ```
129
 
130
  ## Training Details
131
 
132
- please refer to the official GitHub Repository: https://github.com/ServiceNow/hdlm
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
133
 
134
  ## Citation
135
 
136
- If you use this model in your research, please cite the original paper and this implementation.
 
 
 
 
 
 
 
137
 
138
  ## License
139
 
140
- This model is released under the Apache License Version 2.0.
 
2
  language:
3
  - en
4
  tags:
 
5
  - dllm
6
+ - diffusion-language-model
7
  - text-generation
8
  - diffusion
9
  - language-model
10
+ license: apache-2.0
11
  ---
12
 
13
+ # HDLM-Gamma: Hybrid Diffusion Language Model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
+ [![Paper](https://img.shields.io/badge/Paper-arXiv-red)](https://arxiv.org/abs/2504.06416)
16
+ [![Code](https://img.shields.io/badge/Code-GitHub-blue)](https://github.com/ServiceNow/hdlm)
17
+
18
+ This is the model card for **dlm-group/hdlm-base-gamma-0.05**.
19
+
20
+ ## Model Description
21
+
22
+ HDLM-Gamma is a hybrid diffusion language model that unifies autoregressive and diffusion-based sequence generation through gamma-hybrid noising. This model interpolates transition operators between absorbing and uniform processes, making it conceptually closer to SEDD (Lou et al. 2024) while maintaining the benefits of both paradigms.
23
+
24
+ The gamma parameter (γ) controls the blend between absorbing and uniform transition matrices: Q_gamma = (1-γ) * Q_absorb + γ * Q_uniform, where smaller values emphasize the absorbing process and larger values incorporate more uniform transitions.
25
+
26
+ ## Model Architecture
27
+
28
+ - **Base Model**: Transformer architecture with staggered score conditioning
29
+ - **Vocabulary Size**: 50,258 tokens (GPT-2 vocabulary + absorbing token)
30
+ - **Context Length**: Variable (supports up to 2048 tokens)
31
+ - **Training**: Continuous-time diffusion with gamma-hybrid graph structure
32
+ - **Inference**: Analytic predictor with staggered score computation
33
 
34
  ## Usage
35
 
36
+ ### Quick Start
37
+
38
  ```python
39
+ from hdlm.hf_utils import smart_model_loader
40
+ from hdlm.gamma_hybrid.sampling import get_sa_sampling_fn
41
+ from transformers import GPT2TokenizerFast
42
+ import torch
43
+
44
+ # Load model using smart loader (automatically detects model type)
45
+ model, cfg, device, accelerator, metaschedule = smart_model_loader(
46
+ model_path="hdlm-group/hdlm-base-gamma-0.05",
47
+ model_type="auto", # automatically detects gamma_hybrid
48
+ device="cuda"
49
+ )
50
+
51
+ # Load tokenizer
52
+ tokenizer = GPT2TokenizerFast.from_pretrained('gpt2')
53
+
54
+ # Generate text
55
+ prompt = "The future of artificial intelligence"
56
+ prompt_ids = tokenizer.encode(prompt, return_tensors='pt').to(device)
57
+
58
+ # Configure sampling function (automatically set up from config)
59
+ sampling_fn = get_sa_sampling_fn(
60
+ config=cfg,
61
+ graph=None, # Will be created from config
62
+ noise=None, # Will be created from config
63
+ meta_schedule=metaschedule,
64
+ batch_dims=(1,),
65
+ eps=1e-4,
66
+ device=device
67
+ )
68
 
69
+ # Generate samples
70
+ generated = sampling_fn(
71
+ model=model,
72
+ prompt=prompt_ids,
73
+ context_length=1024
74
  )
75
 
76
+ # Decode generated text
77
+ generated_text = tokenizer.decode(generated[0], skip_special_tokens=True)
78
+ print(generated_text)
79
+ ```
80
+
81
+ ### Evaluation
82
+
83
+ ```bash
84
+ # Text generation evaluation
85
+ python hdlm/eval_generation.py \
86
+ --checkpoint_path hdlm-group/hdlm-base-gamma-0.05 \
87
+ --sampling_method SAR \
88
+ --save_samples
89
+
90
+ # Perplexity evaluation
91
+ python hdlm/eval_modeling.py \
92
+ --checkpoint_path hdlm-group/hdlm-base-gamma-0.05 \
93
+ --work_dir "./logs/eval_modeling_gamma" \
94
+ --dataset ptb
95
  ```
96
 
97
  ## Training Details
98
 
99
+ - **Dataset**: OpenWebText
100
+ - **Batch Size**: 256
101
+ - **Learning Rate**: 3e-4 with lambda scheduling
102
+ - **Gamma (γ)**: 0.01 (controls hybrid transition blend)
103
+ - **Graph Type**: QGamma with expanded sigma conditioning
104
+ - **Noise Schedule**: Log-linear (σ_min=1e-4, σ_max=10.0)
105
+ - **Training Steps**: 1M iterations
106
+ - **Warmup**: 50K steps
107
+
108
+ ## Key Components
109
+
110
+ ### Graph Structure
111
+ The QGamma graph combines absorbing and uniform transition matrices:
112
+ - **Absorbing component**: Transitions to absorbing state (mask token)
113
+ - **Uniform component**: Uniform transitions between all tokens
114
+ - **Hybrid blend**: Controlled by gamma parameter
115
+
116
+ ### Staggered Score
117
+ The model uses staggered score computation that applies different transformations to absorbing and uniform branches before combining them, enabling more flexible generation patterns.
118
+
119
+ ### Sampling Strategy
120
+ - **Predictor**: Analytic predictor with exact transition computation
121
+ - **Strategy**: Direct sampling with configurable strategy parameter
122
+ - **Noise Removal**: Optional final denoising step
123
+
124
+ ## Model Variants
125
+
126
+ Available gamma values and their characteristics:
127
+
128
+ - **γ = 0.01**: Minimal uniform transitions, closest to pure absorbing process
129
+ - **γ = 0.1**: Moderate hybrid behavior with increased uniform mixing
130
+ - **γ = 0.5**: Balanced absorbing-uniform transition blend
131
 
132
  ## Citation
133
 
134
+ ```bibtex
135
+ @article{fathi2025unifying,
136
+ title={Unifying autoregressive and diffusion-based sequence generation},
137
+ author={Fathi, Nima and Scholak, Torsten and No{\"e}l, Pierre-Andr{\'e}},
138
+ journal={arXiv preprint arXiv:2504.06416},
139
+ year={2025}
140
+ }
141
+ ```
142
 
143
  ## License
144
 
145
+ This model is released under the same license as the original HDLM codebase. Please refer to the [GitHub repository](https://github.com/ServiceNow/hdlm) for license details.