Model Card for colorize-unet-pytorch

Model Details

Model Description

colorization-model-unet is a PyTorch U‑Net model for automatic image colorization using the LAB color space.
It takes the L* (grayscale/lightness) channel as input and predicts the a* and b* chrominance channels, which are then combined with the original L* channel and converted back to RGB to produce a final colorized image.

  • Developed by: Ammar Ahmed, Ahmad Naeem, Khurram Imran
  • Model type: U‑Net (encoder–decoder CNN with skip connections)
  • Task: Image colorization (grayscale → color)
  • Input: Grayscale image / L* channel (1×H×W)
  • Output: Predicted a*, b* channels (2×H×W)
  • Training dataset: MS COCO 2017 (Validation set used as training source in the project)
  • License: MIT
  • Framework: PyTorch

Model Sources


Uses

Direct Use

This model is intended for:

  • Colorizing grayscale images for demos, coursework, and experimentation.
  • Running inference through:
    • the included Gradio UI (app.py), or
    • a simple Python inference script.

Example use cases:

  • restoring approximate color for old black-and-white photos (best-effort)
  • educational demonstrations of LAB-space colorization with U‑Net

Downstream Use

Possible downstream uses include:

  • Fine-tuning on domain-specific grayscale→color datasets (e.g., historical photos, medical/industrial imagery).
  • Using the model as a baseline for:
    • GAN-based colorization
    • reference-based / user-guided colorization pipelines
    • perceptual-loss training

Out-of-Scope Use

  • Color accuracy guarantees: The model does not guarantee historically accurate or “ground-truth” colors. Colorization is inherently ambiguous.
  • Safety-critical interpretation: Do not use colorized outputs for medical diagnosis, forensics, or any task where inferred color could cause harm.
  • High-resolution production pipelines without adaptation: The default training/inference setup is centered around ~256×256 processing; higher resolutions may need tiling and careful postprocessing.

Bias, Risks, and Limitations

  • Dataset bias: Trained using MS COCO imagery; outputs may reflect dataset distributions (typical objects, scenes, colors).
  • Multi-modal uncertainty: Many objects can have multiple plausible colors (e.g., clothing, cars). The model may choose a plausible but incorrect option.
  • Desaturation risk: With L1 loss, outputs can trend toward “safe” average colors (less vibrant).
  • Artifact risk: Some regions (thin structures, rare textures) may show color bleeding or inconsistent tones.

Recommendations

  • Use the model for visual enhancement / demo purposes, not for factual color restoration.
  • For better quality:
    • fine-tune with domain-specific data
    • consider perceptual losses or adversarial training
    • add postprocessing or user-guided constraints

How to Get Started with the Model

Installation

pip install torch torchvision numpy pillow scikit-image gradio

Minimal inference outline (conceptual)

  1. Read image (RGB or grayscale)
  2. Convert to LAB
  3. Normalize L* to [-1, 1]
  4. Predict ab with the U‑Net
  5. Denormalize ab and reconstruct LAB
  6. Convert LAB → RGB

For a ready-to-run demo, use the Gradio app:

python app.py

Hosted demo:


Training Details

Training Data

  • Dataset: MS COCO 2017 (val2017 subset; ~5k images referenced in the repository README)
  • Preprocessing:
    • RGB → LAB
    • L normalized from [0, 100] → [-1, 1]
    • a,b normalized from approximately [-128, 127] → [-1, 1]

Training Procedure

  • Objective: Predict chrominance (a*, b*) given luminance (L*) using supervised learning.
  • Loss: L1 (Mean Absolute Error) between predicted and target ab channels.
  • Optimizer: Adam

Training Hyperparameters

  • Image size: 256×256
  • Batch size: 16
  • Learning rate: 2e-4
  • Epochs: 50 (loss report mentions convergence around epoch ~24)
  • Training regime: fp32 (not explicitly stated; update if you used fp16/bf16)

Speeds, Sizes, Times

  • Reported training duration: ~2 hours on GPU (Kaggle P100 GPU)

If you want this section to be more precise, add your GPU type (e.g., T4 / RTX 3060) and actual training wall-clock time.


Evaluation

This repository includes an evaluation script producing objective fidelity metrics:

  • PSNR
  • SSIM
  • RMSE
  • SNR

Testing Data

  • Sampled images from the same COCO subset / dataset path as configured in the repo (see evaluation.py).

Metrics

  • PSNR: image-level fidelity measure based on MSE; higher is better.
  • SSIM: structural similarity measure; closer to 1 is better.
  • RMSE: root mean squared error; lower is better.
  • SNR: signal-to-noise ratio; higher is better.

Results (Evaluation Results)

The repo states evaluation output is generated by evaluation.py and written to:

  • evaluation_results/metrics_summary.txt
  • plus per-image comparisons and distribution plots.

To fully populate the “Eval Results” table on the Hugging Face Hub UI, paste your final numeric summary (mean/std and sample count) here.
If you share the contents of evaluation_results/metrics_summary.txt, I can format it into the Hub’s recommended “Evaluation Results” table format.


Model Architecture and Objective

Architecture

U‑Net encoder–decoder CNN with skip connections:

  • Encoder: 6 downsampling blocks (Conv2d + BatchNorm + ReLU)
  • Bottleneck: 512 channels
  • Decoder: 6 upsampling blocks (ConvTranspose2d + BatchNorm + ReLU), skip concatenations
  • Output: 2-channel (ab) with Tanh activation

Objective

Learn mapping: L* → (a*, b*) in LAB space.


Environmental Impact

Not reported.
(For completeness, you can estimate emissions using: https://mlco2.github.io/impact)


Citation

If you use this model in academic work, consider citing the repository:

Repository: https://github.com/AmmarAhm3d/colorize-unet-pytorch


Model Card Contact

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train AmmarAhm3d/colorization-model-unet