xinxin66 commited on
Commit
1958f30
Β·
verified Β·
1 Parent(s): e772d62

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +104 -3
README.md CHANGED
@@ -1,3 +1,104 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ # 🌟 Beyond Modality Collapse: Representations Blending for Multimodal Dataset Distillation
5
+ # NeurIPS 2025 (Rating: 4445)
6
+ > [Beyond Modality Collapse: Representations Blending for Multimodal Dataset Distillation](https://arxiv.org/pdf/2505.14705?).<br>
7
+ > [Xin Zhang](https://zhangxin-xd.github.io/), Ziruo Zhang, [Jiawei Du](https://scholar.google.com/citations?user=WrJKEzEAAAAJ&hl=zh-CN), [Zuozhu Liu](https://person.zju.edu.cn/en/lzz), [Joey Tianyi Zhou](https://joeyzhouty.github.io/) <br>
8
+ > Agency for Science, Technology, and Research (ASTAR), Singapore <br>
9
+ > National University of Singapore, Singapore <br>
10
+ > Zhejiang University, China
11
+ ## πŸ“– Introduction
12
+ <p align="center">
13
+ <img src="imgs/problem.png" alt="problem" title="problem" width="700">
14
+ </p>
15
+
16
+ <p align="justify">
17
+ <strong> Multimodal embedding distributions across various distillation methods </strong>:
18
+ We extract image and text embeddings from a finetuned CLIP and project them into a shared representation space using DOSNES.
19
+ Red triangles and blue circles denote image and text embeddings, respectively.
20
+ Left: Embeddings from randomly sampled data in the original dataset exhibit a well-spread and modality-aligned distribution.
21
+ Middle: The distilled dataset generated by a sota MDD method (LoRS) leads to Modality Collapse, where image and text embeddings are poorly aligned and concentrated in distinct regions.
22
+ Right: Our method effectively mitigates modality collapse, yielding a distribution that better preserves cross-modal alignment and exhibits greater representational diversity.
23
+ </p>
24
+
25
+ ## βš™οΈ Installation
26
+
27
+ To get started, follow these instructions to set up the environment and install dependencies.
28
+
29
+ 1. **Clone this repository**:
30
+ ```bash
31
+ git clone https://github.com/zhangxin-xd/RepBlend.git
32
+ cd RepBlend
33
+ ```
34
+
35
+ 2. **Install required packages**:
36
+ ```
37
+ conda create -n RepBlend python=3.10
38
+ conda activate RepBlend
39
+ pip install -r requirements.txt
40
+ ```
41
+ ---
42
+
43
+ ## πŸš€ Usage
44
+
45
+ Here’s how to use RepBlend for Multimodal Dataset Distillation:
46
+ ### Pretrained Weights
47
+ The checkpoints for all experimental networks are available from their respective official repositories. For convenience, we have also provided them together [here](https://huggingface.co/xinxin66/RepBlend).
48
+ Once downloaded, put them in `distill_utils/checkpoints/`.
49
+
50
+ ### Experimental Datasets
51
+ The dataset hase been validated on various benchmarks, you can download from their respective links. Once downloaded, put them in `distill_utils/data/`.
52
+ | datasets | links|
53
+ |-----|-----|
54
+ | Flickr30K | [images](https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset), [annotations](https://huggingface.co/xinxin66/RepBlend/)|
55
+ | COCO | [images](https://cocodataset.org/#download), [annotations](https://huggingface.co/xinxin66/RepBlend) |
56
+ |LLaVA-cc3m|[images](https://github.com/haotian-liu/LLaVA/blob/main/docs/Data.md), [annotations](https://huggingface.co/xinxin66/RepBlend)|
57
+
58
+ ### Generate Expert Trajectories
59
+ You can generate expert trajectories by running the `scripts/buffer.sh`, or alternatively, download our [pre-generated trajectories](https://huggingface.co/xinxin66/RepBlend) for faster reproduction.
60
+ ```
61
+ bash scripts/buffer.sh
62
+ ```
63
+ ### Distill Multimodal Dataset
64
+ You can distill multimodal datasets with RepBlend by running `scripts/distill_coco_repblend.sh` and `scripts/distill_flickr_repblend.sh`.
65
+ ```
66
+ bash scripts/distill_coco_repblend.sh
67
+ bash scripts/distill_flickr_repblend.sh
68
+ ```
69
+
70
+ ## πŸ“Š Results
71
+
72
+ Our experiments demonstrate the effectiveness of the proposed approach across various benchmarks.
73
+ <div style="display: flex; justify-content: center; align-items: center;">
74
+ <img src="imgs/results 1.png" alt="Results 1" width="800"/>
75
+ </div>
76
+ <br>
77
+ <div style="display: flex; justify-content: center; align-items: center;">
78
+ <img src="imgs/table 1.png" alt="table 1" width="400"/>
79
+ <img src="imgs/table 2.png" alt="table 2" width="400"/>
80
+ </div>
81
+
82
+ For detailed experimental results and further analysis, please refer to the full paper.
83
+
84
+ ---
85
+
86
+ ## πŸ“‘ Citation
87
+
88
+ If you find this code useful in your research, please consider citing our work:
89
+
90
+ ```bibtex
91
+ @inproceedings{RepBlend2025neurips,
92
+ title={Beyond Modality Collapse: Representations Blending for Multimodal Dataset Distillation},
93
+ author={Zhang, Xin and Zhang, Ziruo, and Du, Jiawei and Liu, Zuozhu and Zhou, Joey Tianyi},
94
+ booktitle={Adv. Neural Inf. Process. Syst. (NeurIPS)},
95
+ year={2025}
96
+ }
97
+ ```
98
+ ---
99
+ ## πŸŽ‰ Reference
100
+ Our code has referred to previous works:
101
+ - [LoRS: Low-Rank Similarity Mining](https://github.com/silicx/LoRS_Distill)
102
+ - [Vision-Language Dataset Distillation](https://github.com/princetonvisualai/multimodal_dataset_distillation)
103
+ - [Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory (TESLA)](https://github.com/justincui03/tesla)
104
+