Co-GRPO: Co-Optimized Group Relative Policy Optimization for Masked Diffusion Model

Introduction

We bridge the gap between single-step training and multi-step inference in Masked Diffusion Models by introducing Co-GRPO, a framework that cooperatively optimizes both the generative model and the inference schedule using reinforcement learning, achieving superior visual quality and alignment without costly multi-step backpropagation.

Usage

Please refer to github link.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rpzhou/Co-GRPO-Meissonic-1B

Base model

MeissonFlow/Meissonic

Finetuned

(2)

this model

rpzhou
/

Co-GRPO-Meissonic-1B

Co-GRPO: Co-Optimized Group Relative Policy Optimization for Masked Diffusion Model

Introduction

Usage

Model tree for rpzhou/Co-GRPO-Meissonic-1B

Datasets used to train rpzhou/Co-GRPO-Meissonic-1B