EquiFashionModel / README.md
NguyenDinhHieu's picture
EquiFashionModel
9485045 verified
---
license: mit
tags:
- diffusion
- gan
- hybrid
- fashion
- multimodal
- controlnet
- pose-guided
- pytorch
library_name: pytorch_lightning
pipeline_tag: text-to-image
language:
- en
spaces:
- NguyenDinhHieu/EquiFashion
datasets:
- NguyenDinhHieu/EquiFashion-DB
---
# 👗
**Authors:**
Nguyen Dinh Hieu [0009-0002-6683-8036], elt..
**Institution:** FPT University, Hanoi, Vietnam
📧 [email protected]
## 🧩 Overview
**EquiFashion** is a hybrid *GAN–Diffusion* framework that reconciles the long-standing trade-off between **stylistic diversity** and **photorealistic fidelity** in generative fashion design.
It integrates a GAN-based ideation branch for creative exploration and a diffusion-based refinement branch for faithful reconstruction, enabling high-quality, diverse, and robust fashion image generation.
> 🎨 Try the live demo here:
> 👉 [EquiFashion Demo on Hugging Face Spaces](https://huggingface.co/spaces/NguyenDinhHieu/EquiFashion)
## 🎯 Motivation
Fashion design requires models that are simultaneously **creative**, **robust**, and **trustworthy**.
While GANs generate diverse styles but lack stability, and Diffusion Models produce realism but constrain creativity, **EquiFashion** bridges both worlds—achieving controlled diversity, semantic alignment, and realistic garment rendering.
## 🧱 Architecture Overview
| Component | Description |
|------------|-------------|
| **Latent Diffusion Backbone** | Operates in latent space for efficient denoising with high-resolution reconstruction. |
| **GAN Ideation Module** | Explores stylistic variations through stochastic latent sampling. |
| **Structural Semantic Consensus** | Ensures linguistic–visual correspondence between attributes and garment parts. |
| **Semantic-Bundled Attention** | Couples adjective–noun pairs (e.g., “red collar”) for coherent attribute localization. |
| **Pose-Guided Conditioning** | Aligns garments naturally to human body structure using OpenPose keypoints. |
## 📂 Dataset Access **EquiFashion-DB**
The dataset used for training and evaluation is available on Hugging Face:
**➡️ [NguyenDinhHieu/EquiFashion-DB](https://huggingface.co/datasets/NguyenDinhHieu/EquiFashion-DB)**
| Property | Description |
|-----------|--------------|
| Scale | 350 K images |
| Resolution | 512×512 |
| Modalities | Image, Text, Sketch, Pose, Fabric |
| Coverage | 40+ apparel categories |
| Key Feature | Noise-aware text, balanced demographics |
| Purpose | Training + robust benchmarking for generative fashion |
You can load it directly using the `datasets` library:
```python
from datasets import load_dataset
dataset = load_dataset("NguyenDinhHieu/EquiFashion-DB")
print(dataset)
```
## 🧮 Training Configuration
| Setting | Value |
|----------|-------|
| Framework | PyTorch Lightning 2.2 |
| GPU | NVIDIA A100 (40 GB, CUDA 12.2) |
| Optimizer | AdamW |
| Learning Rate | 2e-4 (G), 1e-4 (D) |
| Scheduler | Cosine Decay |
| Epochs | 400 (200 pretrain + 200 joint) |
| Precision | FP16 |
| Batch Size | 32 |
| Timesteps (T) | 8 |
| Fusion Decay (γ) | 0.7 |
## 🧠 Core Equation
The total loss combines autoencoding, adversarial, semantic, and perceptual components:
\[
L_{total} = λ_{AE}L_{AE} + λ_{cons}L_{cons} + λ_{bundle}L_{bundle} + λ_{comp}L_{comp} + λ_G(L_G + λ_{MS}L_{MS}) + λ_{den}L_{denoise} + λ_{rob}L_{rob} + λ_{perc}L_{perc}
\]
## 📊 Quantitative Results
| Metric | Value | Benchmark |
|---------|--------|------------|
| FID ↓ | **10.3** | FashionAI subset |
| IS ↑ | **7.8** | – |
| CLIP-S ↑ | **0.315** | – |
| Coverage ↑ | **87%** | – |
| Inference Time | **3.8 s / sample (512×512, A100, FP16)** | – |
## 🖼️ Visual Results
| Input Pose | Generated Outfit |
|-------------|------------------|
| ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusion_pose.png) | ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/fashion_diffusion.png) |
## 🚀 Usage Example
```python
from huggingface_hub import hf_hub_download
from cldm.model import create_model, load_state_dict
import torch
# Download checkpoint
ckpt = hf_hub_download("NguyenDinhHieu/EquiFashionModel", filename="eqf_final.ckpt")
# Load model
model = create_model("utils/configs/cldm_v2.yaml").to("cuda")
model.load_state_dict(load_state_dict(ckpt, location="cuda"))
model.eval()
prompt = "long-sleeve floral dress with tied waist, elegant, 8k detail"
```
## 💡 Citation
If you use this model or dataset, please cite:
```bibtex
@inproceedings{nguyen2025equifashion,
title={EquiFashion: Hybrid GAN–Diffusion Balancing Diversity–Fidelity for Fashion Design Generation},
author={Tran Minh Khuong and Nguyen Dinh Hieu and Ngo Dinh Hoang Minh and Nguyen Dinh Bach and Phan Duy Hung},
booktitle={Proceedings of the ..... Conference},
year={2025},
organization={FPT University, Hanoi}
}
```
## 🧩 File Descriptions
| File | Description |
|------|--------------|
| `eqf_final.ckpt` | Main Hybrid GAN–Diffusion model checkpoint |
| `body_pose_model.pth`, `hand_pose_model.pth` | OpenPose keypoint weights |
| `open_clip_pytorch_model.bin` | Pretrained OpenCLIP text encoder |
| `app.py` | Gradio demo UI |
| `utils/configs/cldm_v2.yaml` | Architecture configuration |
## 📚 References
1. Zhu et al. *Be Your Own Prada* (ICCV 2017)
2. Chen et al. *TailorGAN* (WACV 2020)
3. Li et al. *BC-GAN* (CVPR 2019)
4. Xu et al. *AttnGAN* (CVPR 2018)
5. Karras et al. *StyleGAN* (CVPR 2019)
6. Zhang et al. *DiffCloth* (ICCV 2023)
7. Xie et al. *HieraFashDiff* (AAAI 2025)
8. Kim et al. *FashionSD-X* (arXiv 2024)
9. Baldrati et al. *Multimodal Garment Designer* (ICCV 2023)
10. Rombach et al. *Latent Diffusion Models* (CVPR 2022)
## 🪪 License
Released under the **MIT License**.
You may use, modify, and distribute the model and dataset with attribution.
## 🧩 Acknowledgment
Developed by **FPT University AI Research Group**, Hanoi, Vietnam
as part of the **EquiAI Research Suite** on fairness, robustness, and trustworthy generative AI.