Model Card for colorize-unet-pytorch
Model Details
Model Description
colorization-model-unet is a PyTorch U‑Net model for automatic image colorization using the LAB color space.
It takes the L* (grayscale/lightness) channel as input and predicts the a* and b* chrominance channels, which are then combined with the original L* channel and converted back to RGB to produce a final colorized image.
- Developed by: Ammar Ahmed, Ahmad Naeem, Khurram Imran
- Model type: U‑Net (encoder–decoder CNN with skip connections)
- Task: Image colorization (grayscale → color)
- Input: Grayscale image / L* channel (1×H×W)
- Output: Predicted a*, b* channels (2×H×W)
- Training dataset: MS COCO 2017 (Validation set used as training source in the project)
- License: MIT
- Framework: PyTorch
Model Sources
- Repository: https://github.com/AmmarAhm3d/colorize-unet-pytorch
- Demo (Hugging Face Space): https://huggingface.co/spaces/AmmarAhm3d/colorize-unet-pytorch
Uses
Direct Use
This model is intended for:
- Colorizing grayscale images for demos, coursework, and experimentation.
- Running inference through:
- the included Gradio UI (
app.py), or - a simple Python inference script.
- the included Gradio UI (
Example use cases:
- restoring approximate color for old black-and-white photos (best-effort)
- educational demonstrations of LAB-space colorization with U‑Net
Downstream Use
Possible downstream uses include:
- Fine-tuning on domain-specific grayscale→color datasets (e.g., historical photos, medical/industrial imagery).
- Using the model as a baseline for:
- GAN-based colorization
- reference-based / user-guided colorization pipelines
- perceptual-loss training
Out-of-Scope Use
- Color accuracy guarantees: The model does not guarantee historically accurate or “ground-truth” colors. Colorization is inherently ambiguous.
- Safety-critical interpretation: Do not use colorized outputs for medical diagnosis, forensics, or any task where inferred color could cause harm.
- High-resolution production pipelines without adaptation: The default training/inference setup is centered around ~256×256 processing; higher resolutions may need tiling and careful postprocessing.
Bias, Risks, and Limitations
- Dataset bias: Trained using MS COCO imagery; outputs may reflect dataset distributions (typical objects, scenes, colors).
- Multi-modal uncertainty: Many objects can have multiple plausible colors (e.g., clothing, cars). The model may choose a plausible but incorrect option.
- Desaturation risk: With L1 loss, outputs can trend toward “safe” average colors (less vibrant).
- Artifact risk: Some regions (thin structures, rare textures) may show color bleeding or inconsistent tones.
Recommendations
- Use the model for visual enhancement / demo purposes, not for factual color restoration.
- For better quality:
- fine-tune with domain-specific data
- consider perceptual losses or adversarial training
- add postprocessing or user-guided constraints
How to Get Started with the Model
Installation
pip install torch torchvision numpy pillow scikit-image gradio
Minimal inference outline (conceptual)
- Read image (RGB or grayscale)
- Convert to LAB
- Normalize L* to [-1, 1]
- Predict ab with the U‑Net
- Denormalize ab and reconstruct LAB
- Convert LAB → RGB
For a ready-to-run demo, use the Gradio app:
python app.py
Hosted demo:
Training Details
Training Data
- Dataset: MS COCO 2017 (val2017 subset; ~5k images referenced in the repository README)
- Preprocessing:
- RGB → LAB
- L normalized from [0, 100] → [-1, 1]
- a,b normalized from approximately [-128, 127] → [-1, 1]
Training Procedure
- Objective: Predict chrominance (a*, b*) given luminance (L*) using supervised learning.
- Loss: L1 (Mean Absolute Error) between predicted and target ab channels.
- Optimizer: Adam
Training Hyperparameters
- Image size: 256×256
- Batch size: 16
- Learning rate: 2e-4
- Epochs: 50 (loss report mentions convergence around epoch ~24)
- Training regime: fp32 (not explicitly stated; update if you used fp16/bf16)
Speeds, Sizes, Times
- Reported training duration: ~2 hours on GPU (Kaggle P100 GPU)
If you want this section to be more precise, add your GPU type (e.g., T4 / RTX 3060) and actual training wall-clock time.
Evaluation
This repository includes an evaluation script producing objective fidelity metrics:
- PSNR
- SSIM
- RMSE
- SNR
Testing Data
- Sampled images from the same COCO subset / dataset path as configured in the repo (see
evaluation.py).
Metrics
- PSNR: image-level fidelity measure based on MSE; higher is better.
- SSIM: structural similarity measure; closer to 1 is better.
- RMSE: root mean squared error; lower is better.
- SNR: signal-to-noise ratio; higher is better.
Results (Evaluation Results)
The repo states evaluation output is generated by evaluation.py and written to:
evaluation_results/metrics_summary.txt- plus per-image comparisons and distribution plots.
To fully populate the “Eval Results” table on the Hugging Face Hub UI, paste your final numeric summary (mean/std and sample count) here.
If you share the contents ofevaluation_results/metrics_summary.txt, I can format it into the Hub’s recommended “Evaluation Results” table format.
Model Architecture and Objective
Architecture
U‑Net encoder–decoder CNN with skip connections:
- Encoder: 6 downsampling blocks (Conv2d + BatchNorm + ReLU)
- Bottleneck: 512 channels
- Decoder: 6 upsampling blocks (ConvTranspose2d + BatchNorm + ReLU), skip concatenations
- Output: 2-channel (ab) with Tanh activation
Objective
Learn mapping: L* → (a*, b*) in LAB space.
Environmental Impact
Not reported.
(For completeness, you can estimate emissions using: https://mlco2.github.io/impact)
Citation
If you use this model in academic work, consider citing the repository:
Repository: https://github.com/AmmarAhm3d/colorize-unet-pytorch
Model Card Contact
- GitHub Issues: https://github.com/AmmarAhm3d/colorize-unet-pytorch/issues