---
license: mit
tags:
- diffusion
- gan
- hybrid
- fashion
- multimodal
- controlnet
- pose-guided
- pytorch
library_name: pytorch_lightning
pipeline_tag: text-to-image
language:
- en
spaces:
  - NguyenDinhHieu/EquiFashion
datasets:
  - NguyenDinhHieu/EquiFashion-DB
---

# 👗

**Authors:**  
Nguyen Dinh Hieu [0009-0002-6683-8036], elt..
**Institution:** FPT University, Hanoi, Vietnam  
📧 hieundhe180318@fpt.edu.vn

## 🧩 Overview

**EquiFashion** is a hybrid *GAN–Diffusion* framework that reconciles the long-standing trade-off between **stylistic diversity** and **photorealistic fidelity** in generative fashion design.  
It integrates a GAN-based ideation branch for creative exploration and a diffusion-based refinement branch for faithful reconstruction, enabling high-quality, diverse, and robust fashion image generation.

> 🎨 Try the live demo here:  
> 👉 [EquiFashion Demo on Hugging Face Spaces](https://huggingface.co/spaces/NguyenDinhHieu/EquiFashion)

## 🎯 Motivation

Fashion design requires models that are simultaneously **creative**, **robust**, and **trustworthy**.  
While GANs generate diverse styles but lack stability, and Diffusion Models produce realism but constrain creativity, **EquiFashion** bridges both worlds—achieving controlled diversity, semantic alignment, and realistic garment rendering.

## 🧱 Architecture Overview

| Component | Description |
|------------|-------------|
| **Latent Diffusion Backbone** | Operates in latent space for efficient denoising with high-resolution reconstruction. |
| **GAN Ideation Module** | Explores stylistic variations through stochastic latent sampling. |
| **Structural Semantic Consensus** | Ensures linguistic–visual correspondence between attributes and garment parts. |
| **Semantic-Bundled Attention** | Couples adjective–noun pairs (e.g., “red collar”) for coherent attribute localization. |
| **Pose-Guided Conditioning** | Aligns garments naturally to human body structure using OpenPose keypoints. |

## 📂 Dataset Access **EquiFashion-DB**

The dataset used for training and evaluation is available on Hugging Face:

**➡️ [NguyenDinhHieu/EquiFashion-DB](https://huggingface.co/datasets/NguyenDinhHieu/EquiFashion-DB)**

| Property | Description |
|-----------|--------------|
| Scale | 350 K images |
| Resolution | 512×512 |
| Modalities | Image, Text, Sketch, Pose, Fabric |
| Coverage | 40+ apparel categories |
| Key Feature | Noise-aware text, balanced demographics |
| Purpose | Training + robust benchmarking for generative fashion |

You can load it directly using the `datasets` library:

```python
from datasets import load_dataset

dataset = load_dataset("NguyenDinhHieu/EquiFashion-DB")
print(dataset)
```

## 🧮 Training Configuration

| Setting | Value |
|----------|-------|
| Framework | PyTorch Lightning 2.2 |
| GPU | NVIDIA A100 (40 GB, CUDA 12.2) |
| Optimizer | AdamW |
| Learning Rate | 2e-4 (G), 1e-4 (D) |
| Scheduler | Cosine Decay |
| Epochs | 400 (200 pretrain + 200 joint) |
| Precision | FP16 |
| Batch Size | 32 |
| Timesteps (T) | 8 |
| Fusion Decay (γ) | 0.7 |

## 🧠 Core Equation

The total loss combines autoencoding, adversarial, semantic, and perceptual components:

\[
L_{total} = λ_{AE}L_{AE} + λ_{cons}L_{cons} + λ_{bundle}L_{bundle} + λ_{comp}L_{comp} + λ_G(L_G + λ_{MS}L_{MS}) + λ_{den}L_{denoise} + λ_{rob}L_{rob} + λ_{perc}L_{perc}
\]

## 📊 Quantitative Results

| Metric | Value | Benchmark |
|---------|--------|------------|
| FID ↓ | **10.3** | FashionAI subset |
| IS ↑ | **7.8** | – |
| CLIP-S ↑ | **0.315** | – |
| Coverage ↑ | **87%** | – |
| Inference Time | **3.8 s / sample (512×512, A100, FP16)** | – |

## 🖼️ Visual Results

| Input Pose | Generated Outfit |
|-------------|------------------|
| ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusion_pose.png) | ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/fashion_diffusion.png) |

## 🚀 Usage Example

```python
from huggingface_hub import hf_hub_download
from cldm.model import create_model, load_state_dict
import torch

# Download checkpoint
ckpt = hf_hub_download("NguyenDinhHieu/EquiFashionModel", filename="eqf_final.ckpt")

# Load model
model = create_model("utils/configs/cldm_v2.yaml").to("cuda")
model.load_state_dict(load_state_dict(ckpt, location="cuda"))
model.eval()

prompt = "long-sleeve floral dress with tied waist, elegant, 8k detail"
```

## 💡 Citation

If you use this model or dataset, please cite:

```bibtex
@inproceedings{nguyen2025equifashion,
  title={EquiFashion: Hybrid GAN–Diffusion Balancing Diversity–Fidelity for Fashion Design Generation},
  author={Tran Minh Khuong and Nguyen Dinh Hieu and Ngo Dinh Hoang Minh and Nguyen Dinh Bach and Phan Duy Hung},
  booktitle={Proceedings of the ..... Conference},
  year={2025},
  organization={FPT University, Hanoi}
}
```

## 🧩 File Descriptions

| File | Description |
|------|--------------|
| `eqf_final.ckpt` | Main Hybrid GAN–Diffusion model checkpoint |
| `body_pose_model.pth`, `hand_pose_model.pth` | OpenPose keypoint weights |
| `open_clip_pytorch_model.bin` | Pretrained OpenCLIP text encoder |
| `app.py` | Gradio demo UI |
| `utils/configs/cldm_v2.yaml` | Architecture configuration |

## 📚 References

1. Zhu et al. *Be Your Own Prada* (ICCV 2017)  
2. Chen et al. *TailorGAN* (WACV 2020)  
3. Li et al. *BC-GAN* (CVPR 2019)  
4. Xu et al. *AttnGAN* (CVPR 2018)  
5. Karras et al. *StyleGAN* (CVPR 2019)  
6. Zhang et al. *DiffCloth* (ICCV 2023)  
7. Xie et al. *HieraFashDiff* (AAAI 2025)  
8. Kim et al. *FashionSD-X* (arXiv 2024)  
9. Baldrati et al. *Multimodal Garment Designer* (ICCV 2023)  
10. Rombach et al. *Latent Diffusion Models* (CVPR 2022)  

## 🪪 License
Released under the **MIT License**.  
You may use, modify, and distribute the model and dataset with attribution.

## 🧩 Acknowledgment
Developed by **FPT University AI Research Group**, Hanoi, Vietnam  
as part of the **EquiAI Research Suite** on fairness, robustness, and trustworthy generative AI.