4 bit (UINT4 with SVD rank 32) quantization of black-forest-labs/FLUX.1-dev using SDNQ.

Usage:

pip install sdnq

import torch
import diffusers
from sdnq import SDNQConfig # import sdnq to register it into diffusers and transformers
from sdnq.common import use_torch_compile as triton_is_available
from sdnq.loader import apply_sdnq_options_to_model

pipe = diffusers.FluxPipeline.from_pretrained("Disty0/FLUX.1-dev-SDNQ-uint4-svd-r32", torch_dtype=torch.bfloat16)

# Enable INT8 MatMul for AMD, Intel ARC and Nvidia GPUs:
if triton_is_available and (torch.cuda.is_available() or torch.xpu.is_available()):
    pipe.transformer = apply_sdnq_options_to_model(pipe.transformer, use_quantized_matmul=True)
    pipe.text_encoder_2 = apply_sdnq_options_to_model(pipe.text_encoder_2, use_quantized_matmul=True)
    pipe.transformer = torch.compile(pipe.transformer) # optional for faster speeds

pipe.enable_model_cpu_offload()

prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt,
    height=1024,
    width=1024,
    guidance_scale=3.5,
    num_inference_steps=50,
    max_sequence_length=512,
    generator=torch.manual_seed(0)
).images[0]
image.save("flux-dev-sdnq-uint4-svd-r32.png")

Original BF16 vs SDNQ quantization comparison:

Quantization	Model Size	Visualization
Original BF16	23.8 GB
SDNQ UINT4	6.8 GB

Downloads last month: 217

Model tree for Disty0/FLUX.1-dev-SDNQ-uint4-svd-r32

Base model

black-forest-labs/FLUX.1-dev

Quantized

(60)

this model

Collection including Disty0/FLUX.1-dev-SDNQ-uint4-svd-r32

SDNQ

Collection

Models quantized with SDNQ • 20 items • Updated 4 days ago • 8