--- license: mit tags: - diffusion - gan - hybrid - fashion - multimodal - controlnet - pose-guided - pytorch library_name: pytorch_lightning pipeline_tag: text-to-image language: - en spaces: - NguyenDinhHieu/EquiFashion datasets: - NguyenDinhHieu/EquiFashion-DB --- # 👗 **Authors:** Nguyen Dinh Hieu [0009-0002-6683-8036], elt.. **Institution:** FPT University, Hanoi, Vietnam 📧 hieundhe180318@fpt.edu.vn ## 🧩 Overview **EquiFashion** is a hybrid *GAN–Diffusion* framework that reconciles the long-standing trade-off between **stylistic diversity** and **photorealistic fidelity** in generative fashion design. It integrates a GAN-based ideation branch for creative exploration and a diffusion-based refinement branch for faithful reconstruction, enabling high-quality, diverse, and robust fashion image generation. > 🎨 Try the live demo here: > 👉 [EquiFashion Demo on Hugging Face Spaces](https://huggingface.co/spaces/NguyenDinhHieu/EquiFashion) ## 🎯 Motivation Fashion design requires models that are simultaneously **creative**, **robust**, and **trustworthy**. While GANs generate diverse styles but lack stability, and Diffusion Models produce realism but constrain creativity, **EquiFashion** bridges both worlds—achieving controlled diversity, semantic alignment, and realistic garment rendering. ## 🧱 Architecture Overview | Component | Description | |------------|-------------| | **Latent Diffusion Backbone** | Operates in latent space for efficient denoising with high-resolution reconstruction. | | **GAN Ideation Module** | Explores stylistic variations through stochastic latent sampling. | | **Structural Semantic Consensus** | Ensures linguistic–visual correspondence between attributes and garment parts. | | **Semantic-Bundled Attention** | Couples adjective–noun pairs (e.g., “red collar”) for coherent attribute localization. | | **Pose-Guided Conditioning** | Aligns garments naturally to human body structure using OpenPose keypoints. | ## 📂 Dataset Access **EquiFashion-DB** The dataset used for training and evaluation is available on Hugging Face: **➡️ [NguyenDinhHieu/EquiFashion-DB](https://huggingface.co/datasets/NguyenDinhHieu/EquiFashion-DB)** | Property | Description | |-----------|--------------| | Scale | 350 K images | | Resolution | 512×512 | | Modalities | Image, Text, Sketch, Pose, Fabric | | Coverage | 40+ apparel categories | | Key Feature | Noise-aware text, balanced demographics | | Purpose | Training + robust benchmarking for generative fashion | You can load it directly using the `datasets` library: ```python from datasets import load_dataset dataset = load_dataset("NguyenDinhHieu/EquiFashion-DB") print(dataset) ``` ## 🧮 Training Configuration | Setting | Value | |----------|-------| | Framework | PyTorch Lightning 2.2 | | GPU | NVIDIA A100 (40 GB, CUDA 12.2) | | Optimizer | AdamW | | Learning Rate | 2e-4 (G), 1e-4 (D) | | Scheduler | Cosine Decay | | Epochs | 400 (200 pretrain + 200 joint) | | Precision | FP16 | | Batch Size | 32 | | Timesteps (T) | 8 | | Fusion Decay (γ) | 0.7 | ## 🧠 Core Equation The total loss combines autoencoding, adversarial, semantic, and perceptual components: \[ L_{total} = λ_{AE}L_{AE} + λ_{cons}L_{cons} + λ_{bundle}L_{bundle} + λ_{comp}L_{comp} + λ_G(L_G + λ_{MS}L_{MS}) + λ_{den}L_{denoise} + λ_{rob}L_{rob} + λ_{perc}L_{perc} \] ## 📊 Quantitative Results | Metric | Value | Benchmark | |---------|--------|------------| | FID ↓ | **10.3** | FashionAI subset | | IS ↑ | **7.8** | – | | CLIP-S ↑ | **0.315** | – | | Coverage ↑ | **87%** | – | | Inference Time | **3.8 s / sample (512×512, A100, FP16)** | – | ## 🖼️ Visual Results | Input Pose | Generated Outfit | |-------------|------------------| | ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusion_pose.png) | ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/fashion_diffusion.png) | ## 🚀 Usage Example ```python from huggingface_hub import hf_hub_download from cldm.model import create_model, load_state_dict import torch # Download checkpoint ckpt = hf_hub_download("NguyenDinhHieu/EquiFashionModel", filename="eqf_final.ckpt") # Load model model = create_model("utils/configs/cldm_v2.yaml").to("cuda") model.load_state_dict(load_state_dict(ckpt, location="cuda")) model.eval() prompt = "long-sleeve floral dress with tied waist, elegant, 8k detail" ``` ## 💡 Citation If you use this model or dataset, please cite: ```bibtex @inproceedings{nguyen2025equifashion, title={EquiFashion: Hybrid GAN–Diffusion Balancing Diversity–Fidelity for Fashion Design Generation}, author={Tran Minh Khuong and Nguyen Dinh Hieu and Ngo Dinh Hoang Minh and Nguyen Dinh Bach and Phan Duy Hung}, booktitle={Proceedings of the ..... Conference}, year={2025}, organization={FPT University, Hanoi} } ``` ## 🧩 File Descriptions | File | Description | |------|--------------| | `eqf_final.ckpt` | Main Hybrid GAN–Diffusion model checkpoint | | `body_pose_model.pth`, `hand_pose_model.pth` | OpenPose keypoint weights | | `open_clip_pytorch_model.bin` | Pretrained OpenCLIP text encoder | | `app.py` | Gradio demo UI | | `utils/configs/cldm_v2.yaml` | Architecture configuration | ## 📚 References 1. Zhu et al. *Be Your Own Prada* (ICCV 2017) 2. Chen et al. *TailorGAN* (WACV 2020) 3. Li et al. *BC-GAN* (CVPR 2019) 4. Xu et al. *AttnGAN* (CVPR 2018) 5. Karras et al. *StyleGAN* (CVPR 2019) 6. Zhang et al. *DiffCloth* (ICCV 2023) 7. Xie et al. *HieraFashDiff* (AAAI 2025) 8. Kim et al. *FashionSD-X* (arXiv 2024) 9. Baldrati et al. *Multimodal Garment Designer* (ICCV 2023) 10. Rombach et al. *Latent Diffusion Models* (CVPR 2022) ## 🪪 License Released under the **MIT License**. You may use, modify, and distribute the model and dataset with attribution. ## 🧩 Acknowledgment Developed by **FPT University AI Research Group**, Hanoi, Vietnam as part of the **EquiAI Research Suite** on fairness, robustness, and trustworthy generative AI.