4 bit (UINT4 with SVD rank 32) quantization of black-forest-labs/FLUX.1-dev using SDNQ.

Usage:

pip install sdnq
import torch
import diffusers
from sdnq import SDNQConfig # import sdnq to register it into diffusers and transformers
from sdnq.common import use_torch_compile as triton_is_available
from sdnq.loader import apply_sdnq_options_to_model

pipe = diffusers.FluxPipeline.from_pretrained("Disty0/FLUX.1-dev-SDNQ-uint4-svd-r32", torch_dtype=torch.bfloat16)

# Enable INT8 MatMul for AMD, Intel ARC and Nvidia GPUs:
if triton_is_available and (torch.cuda.is_available() or torch.xpu.is_available()):
    pipe.transformer = apply_sdnq_options_to_model(pipe.transformer, use_quantized_matmul=True)
    pipe.text_encoder_2 = apply_sdnq_options_to_model(pipe.text_encoder_2, use_quantized_matmul=True)
    pipe.transformer = torch.compile(pipe.transformer) # optional for faster speeds

pipe.enable_model_cpu_offload()

prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt,
    height=1024,
    width=1024,
    guidance_scale=3.5,
    num_inference_steps=50,
    max_sequence_length=512,
    generator=torch.manual_seed(0)
).images[0]
image.save("flux-dev-sdnq-uint4-svd-r32.png")

Original BF16 vs SDNQ quantization comparison:

Quantization Model Size Visualization
Original BF16 23.8 GB Original BF16
SDNQ UINT4 6.8 GB SDNQ UINT4
Downloads last month
217
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Disty0/FLUX.1-dev-SDNQ-uint4-svd-r32

Quantized
(60)
this model

Collection including Disty0/FLUX.1-dev-SDNQ-uint4-svd-r32