EquiFashionModel / README.md

EquiFashionModel

9485045 verified about 2 months ago

6.13 kB

	---
	license: mit
	tags:
	- diffusion
	- gan
	- hybrid
	- fashion
	- multimodal
	- controlnet
	- pose-guided
	- pytorch
	library_name: pytorch_lightning
	pipeline_tag: text-to-image
	language:
	- en
	spaces:
	- NguyenDinhHieu/EquiFashion
	datasets:
	- NguyenDinhHieu/EquiFashion-DB
	---

	# 👗

	Authors:
	Nguyen Dinh Hieu [0009-0002-6683-8036], elt..
	Institution: FPT University, Hanoi, Vietnam
	📧 [email protected]

	## 🧩 Overview

	EquiFashion is a hybrid GAN–Diffusion framework that reconciles the long-standing trade-off between stylistic diversity and photorealistic fidelity in generative fashion design.
	It integrates a GAN-based ideation branch for creative exploration and a diffusion-based refinement branch for faithful reconstruction, enabling high-quality, diverse, and robust fashion image generation.

	> 🎨 Try the live demo here:
	> 👉 [EquiFashion Demo on Hugging Face Spaces](https://huggingface.co/spaces/NguyenDinhHieu/EquiFashion)

	## 🎯 Motivation

	Fashion design requires models that are simultaneously creative, robust, and trustworthy.
	While GANs generate diverse styles but lack stability, and Diffusion Models produce realism but constrain creativity, EquiFashion bridges both worlds—achieving controlled diversity, semantic alignment, and realistic garment rendering.

	## 🧱 Architecture Overview

	\| Component \| Description \|
	\|------------\|-------------\|
	\| Latent Diffusion Backbone \| Operates in latent space for efficient denoising with high-resolution reconstruction. \|
	\| GAN Ideation Module \| Explores stylistic variations through stochastic latent sampling. \|
	\| Structural Semantic Consensus \| Ensures linguistic–visual correspondence between attributes and garment parts. \|
	\| Semantic-Bundled Attention \| Couples adjective–noun pairs (e.g., “red collar”) for coherent attribute localization. \|
	\| Pose-Guided Conditioning \| Aligns garments naturally to human body structure using OpenPose keypoints. \|

	## 📂 Dataset Access EquiFashion-DB

	The dataset used for training and evaluation is available on Hugging Face:

	➡️ [NguyenDinhHieu/EquiFashion-DB](https://huggingface.co/datasets/NguyenDinhHieu/EquiFashion-DB)

	\| Property \| Description \|
	\|-----------\|--------------\|
	\| Scale \| 350 K images \|
	\| Resolution \| 512×512 \|
	\| Modalities \| Image, Text, Sketch, Pose, Fabric \|
	\| Coverage \| 40+ apparel categories \|
	\| Key Feature \| Noise-aware text, balanced demographics \|
	\| Purpose \| Training + robust benchmarking for generative fashion \|

	You can load it directly using the `datasets` library:

	```python
	from datasets import load_dataset

	dataset = load_dataset("NguyenDinhHieu/EquiFashion-DB")
	print(dataset)
	```

	## 🧮 Training Configuration

	\| Setting \| Value \|
	\|----------\|-------\|
	\| Framework \| PyTorch Lightning 2.2 \|
	\| GPU \| NVIDIA A100 (40 GB, CUDA 12.2) \|
	\| Optimizer \| AdamW \|
	\| Learning Rate \| 2e-4 (G), 1e-4 (D) \|
	\| Scheduler \| Cosine Decay \|
	\| Epochs \| 400 (200 pretrain + 200 joint) \|
	\| Precision \| FP16 \|
	\| Batch Size \| 32 \|
	\| Timesteps (T) \| 8 \|
	\| Fusion Decay (γ) \| 0.7 \|

	## 🧠 Core Equation

	The total loss combines autoencoding, adversarial, semantic, and perceptual components:

	\[
	L_{total} = λ_{AE}L_{AE} + λ_{cons}L_{cons} + λ_{bundle}L_{bundle} + λ_{comp}L_{comp} + λ_G(L_G + λ_{MS}L_{MS}) + λ_{den}L_{denoise} + λ_{rob}L_{rob} + λ_{perc}L_{perc}
	\]

	## 📊 Quantitative Results

	\| Metric \| Value \| Benchmark \|
	\|---------\|--------\|------------\|
	\| FID ↓ \| 10.3 \| FashionAI subset \|
	\| IS ↑ \| 7.8 \| – \|
	\| CLIP-S ↑ \| 0.315 \| – \|
	\| Coverage ↑ \| 87% \| – \|
	\| Inference Time \| 3.8 s / sample (512×512, A100, FP16) \| – \|

	## 🖼️ Visual Results

	\| Input Pose \| Generated Outfit \|
	\|-------------\|------------------\|
	\| ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusion_pose.png) \| ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/fashion_diffusion.png) \|

	## 🚀 Usage Example

	```python
	from huggingface_hub import hf_hub_download
	from cldm.model import create_model, load_state_dict
	import torch

	# Download checkpoint
	ckpt = hf_hub_download("NguyenDinhHieu/EquiFashionModel", filename="eqf_final.ckpt")

	# Load model
	model = create_model("utils/configs/cldm_v2.yaml").to("cuda")
	model.load_state_dict(load_state_dict(ckpt, location="cuda"))
	model.eval()

	prompt = "long-sleeve floral dress with tied waist, elegant, 8k detail"
	```

	## 💡 Citation

	If you use this model or dataset, please cite:

	```bibtex
	@inproceedings{nguyen2025equifashion,
	title={EquiFashion: Hybrid GAN–Diffusion Balancing Diversity–Fidelity for Fashion Design Generation},
	author={Tran Minh Khuong and Nguyen Dinh Hieu and Ngo Dinh Hoang Minh and Nguyen Dinh Bach and Phan Duy Hung},
	booktitle={Proceedings of the ..... Conference},
	year={2025},
	organization={FPT University, Hanoi}
	}
	```

	## 🧩 File Descriptions

	\| File \| Description \|
	\|------\|--------------\|
	\| `eqf_final.ckpt` \| Main Hybrid GAN–Diffusion model checkpoint \|
	\| `body_pose_model.pth`, `hand_pose_model.pth` \| OpenPose keypoint weights \|
	\| `open_clip_pytorch_model.bin` \| Pretrained OpenCLIP text encoder \|
	\| `app.py` \| Gradio demo UI \|
	\| `utils/configs/cldm_v2.yaml` \| Architecture configuration \|

	## 📚 References

	1. Zhu et al. Be Your Own Prada (ICCV 2017)
	2. Chen et al. TailorGAN (WACV 2020)
	3. Li et al. BC-GAN (CVPR 2019)
	4. Xu et al. AttnGAN (CVPR 2018)
	5. Karras et al. StyleGAN (CVPR 2019)
	6. Zhang et al. DiffCloth (ICCV 2023)
	7. Xie et al. HieraFashDiff (AAAI 2025)
	8. Kim et al. FashionSD-X (arXiv 2024)
	9. Baldrati et al. Multimodal Garment Designer (ICCV 2023)
	10. Rombach et al. Latent Diffusion Models (CVPR 2022)

	## 🪪 License
	Released under the MIT License.
	You may use, modify, and distribute the model and dataset with attribution.

	## 🧩 Acknowledgment
	Developed by FPT University AI Research Group, Hanoi, Vietnam
	as part of the EquiAI Research Suite on fairness, robustness, and trustworthy generative AI.