|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- diffusion |
|
|
- gan |
|
|
- hybrid |
|
|
- fashion |
|
|
- multimodal |
|
|
- controlnet |
|
|
- pose-guided |
|
|
- pytorch |
|
|
library_name: pytorch_lightning |
|
|
pipeline_tag: text-to-image |
|
|
language: |
|
|
- en |
|
|
spaces: |
|
|
- NguyenDinhHieu/EquiFashion |
|
|
datasets: |
|
|
- NguyenDinhHieu/EquiFashion-DB |
|
|
--- |
|
|
|
|
|
# 👗 |
|
|
|
|
|
**Authors:** |
|
|
Nguyen Dinh Hieu [0009-0002-6683-8036], elt.. |
|
|
**Institution:** FPT University, Hanoi, Vietnam |
|
|
📧 [email protected] |
|
|
|
|
|
## 🧩 Overview |
|
|
|
|
|
**EquiFashion** is a hybrid *GAN–Diffusion* framework that reconciles the long-standing trade-off between **stylistic diversity** and **photorealistic fidelity** in generative fashion design. |
|
|
It integrates a GAN-based ideation branch for creative exploration and a diffusion-based refinement branch for faithful reconstruction, enabling high-quality, diverse, and robust fashion image generation. |
|
|
|
|
|
> 🎨 Try the live demo here: |
|
|
> 👉 [EquiFashion Demo on Hugging Face Spaces](https://huggingface.co/spaces/NguyenDinhHieu/EquiFashion) |
|
|
|
|
|
## 🎯 Motivation |
|
|
|
|
|
Fashion design requires models that are simultaneously **creative**, **robust**, and **trustworthy**. |
|
|
While GANs generate diverse styles but lack stability, and Diffusion Models produce realism but constrain creativity, **EquiFashion** bridges both worlds—achieving controlled diversity, semantic alignment, and realistic garment rendering. |
|
|
|
|
|
## 🧱 Architecture Overview |
|
|
|
|
|
| Component | Description | |
|
|
|------------|-------------| |
|
|
| **Latent Diffusion Backbone** | Operates in latent space for efficient denoising with high-resolution reconstruction. | |
|
|
| **GAN Ideation Module** | Explores stylistic variations through stochastic latent sampling. | |
|
|
| **Structural Semantic Consensus** | Ensures linguistic–visual correspondence between attributes and garment parts. | |
|
|
| **Semantic-Bundled Attention** | Couples adjective–noun pairs (e.g., “red collar”) for coherent attribute localization. | |
|
|
| **Pose-Guided Conditioning** | Aligns garments naturally to human body structure using OpenPose keypoints. | |
|
|
|
|
|
## 📂 Dataset Access **EquiFashion-DB** |
|
|
|
|
|
The dataset used for training and evaluation is available on Hugging Face: |
|
|
|
|
|
**➡️ [NguyenDinhHieu/EquiFashion-DB](https://huggingface.co/datasets/NguyenDinhHieu/EquiFashion-DB)** |
|
|
|
|
|
| Property | Description | |
|
|
|-----------|--------------| |
|
|
| Scale | 350 K images | |
|
|
| Resolution | 512×512 | |
|
|
| Modalities | Image, Text, Sketch, Pose, Fabric | |
|
|
| Coverage | 40+ apparel categories | |
|
|
| Key Feature | Noise-aware text, balanced demographics | |
|
|
| Purpose | Training + robust benchmarking for generative fashion | |
|
|
|
|
|
You can load it directly using the `datasets` library: |
|
|
|
|
|
```python |
|
|
from datasets import load_dataset |
|
|
|
|
|
dataset = load_dataset("NguyenDinhHieu/EquiFashion-DB") |
|
|
print(dataset) |
|
|
``` |
|
|
|
|
|
## 🧮 Training Configuration |
|
|
|
|
|
| Setting | Value | |
|
|
|----------|-------| |
|
|
| Framework | PyTorch Lightning 2.2 | |
|
|
| GPU | NVIDIA A100 (40 GB, CUDA 12.2) | |
|
|
| Optimizer | AdamW | |
|
|
| Learning Rate | 2e-4 (G), 1e-4 (D) | |
|
|
| Scheduler | Cosine Decay | |
|
|
| Epochs | 400 (200 pretrain + 200 joint) | |
|
|
| Precision | FP16 | |
|
|
| Batch Size | 32 | |
|
|
| Timesteps (T) | 8 | |
|
|
| Fusion Decay (γ) | 0.7 | |
|
|
|
|
|
## 🧠 Core Equation |
|
|
|
|
|
The total loss combines autoencoding, adversarial, semantic, and perceptual components: |
|
|
|
|
|
\[ |
|
|
L_{total} = λ_{AE}L_{AE} + λ_{cons}L_{cons} + λ_{bundle}L_{bundle} + λ_{comp}L_{comp} + λ_G(L_G + λ_{MS}L_{MS}) + λ_{den}L_{denoise} + λ_{rob}L_{rob} + λ_{perc}L_{perc} |
|
|
\] |
|
|
|
|
|
## 📊 Quantitative Results |
|
|
|
|
|
| Metric | Value | Benchmark | |
|
|
|---------|--------|------------| |
|
|
| FID ↓ | **10.3** | FashionAI subset | |
|
|
| IS ↑ | **7.8** | – | |
|
|
| CLIP-S ↑ | **0.315** | – | |
|
|
| Coverage ↑ | **87%** | – | |
|
|
| Inference Time | **3.8 s / sample (512×512, A100, FP16)** | – | |
|
|
|
|
|
## 🖼️ Visual Results |
|
|
|
|
|
| Input Pose | Generated Outfit | |
|
|
|-------------|------------------| |
|
|
|  |  | |
|
|
|
|
|
## 🚀 Usage Example |
|
|
|
|
|
```python |
|
|
from huggingface_hub import hf_hub_download |
|
|
from cldm.model import create_model, load_state_dict |
|
|
import torch |
|
|
|
|
|
# Download checkpoint |
|
|
ckpt = hf_hub_download("NguyenDinhHieu/EquiFashionModel", filename="eqf_final.ckpt") |
|
|
|
|
|
# Load model |
|
|
model = create_model("utils/configs/cldm_v2.yaml").to("cuda") |
|
|
model.load_state_dict(load_state_dict(ckpt, location="cuda")) |
|
|
model.eval() |
|
|
|
|
|
prompt = "long-sleeve floral dress with tied waist, elegant, 8k detail" |
|
|
``` |
|
|
|
|
|
## 💡 Citation |
|
|
|
|
|
If you use this model or dataset, please cite: |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{nguyen2025equifashion, |
|
|
title={EquiFashion: Hybrid GAN–Diffusion Balancing Diversity–Fidelity for Fashion Design Generation}, |
|
|
author={Tran Minh Khuong and Nguyen Dinh Hieu and Ngo Dinh Hoang Minh and Nguyen Dinh Bach and Phan Duy Hung}, |
|
|
booktitle={Proceedings of the ..... Conference}, |
|
|
year={2025}, |
|
|
organization={FPT University, Hanoi} |
|
|
} |
|
|
``` |
|
|
|
|
|
## 🧩 File Descriptions |
|
|
|
|
|
| File | Description | |
|
|
|------|--------------| |
|
|
| `eqf_final.ckpt` | Main Hybrid GAN–Diffusion model checkpoint | |
|
|
| `body_pose_model.pth`, `hand_pose_model.pth` | OpenPose keypoint weights | |
|
|
| `open_clip_pytorch_model.bin` | Pretrained OpenCLIP text encoder | |
|
|
| `app.py` | Gradio demo UI | |
|
|
| `utils/configs/cldm_v2.yaml` | Architecture configuration | |
|
|
|
|
|
## 📚 References |
|
|
|
|
|
1. Zhu et al. *Be Your Own Prada* (ICCV 2017) |
|
|
2. Chen et al. *TailorGAN* (WACV 2020) |
|
|
3. Li et al. *BC-GAN* (CVPR 2019) |
|
|
4. Xu et al. *AttnGAN* (CVPR 2018) |
|
|
5. Karras et al. *StyleGAN* (CVPR 2019) |
|
|
6. Zhang et al. *DiffCloth* (ICCV 2023) |
|
|
7. Xie et al. *HieraFashDiff* (AAAI 2025) |
|
|
8. Kim et al. *FashionSD-X* (arXiv 2024) |
|
|
9. Baldrati et al. *Multimodal Garment Designer* (ICCV 2023) |
|
|
10. Rombach et al. *Latent Diffusion Models* (CVPR 2022) |
|
|
|
|
|
## 🪪 License |
|
|
Released under the **MIT License**. |
|
|
You may use, modify, and distribute the model and dataset with attribution. |
|
|
|
|
|
## 🧩 Acknowledgment |
|
|
Developed by **FPT University AI Research Group**, Hanoi, Vietnam |
|
|
as part of the **EquiAI Research Suite** on fairness, robustness, and trustworthy generative AI. |
|
|
|