Any-to-Any
Transformers
Safetensors
qwen3_vl
image-to-text

OneThinker: All-in-one Reasoning Model for Image and Video

This repository contains the SFT model presented in: OneThinker: All-in-one Reasoning Model for Image and Video

This is an intermediate model prepared for subsequent RL training.

For more detailed instructions on environment setup, training scripts, and comprehensive evaluation, please refer to the OneThinker GitHub repository.

πŸ‘€ About OneThinker

OneThinker Teaser Image

We introduce OneThinker, an all-in-one multimodal reasoning generalist that is capable of thinking across a wide range of fundamental visual tasks within a single model.

OneThinker unifies image and video understanding across diverse fundamental visual tasks, including question answering, captioning, spatial and temporal grounding, tracking, and segmentation. To achieve this, we construct the large-scale OneThinker-600k multi-task training corpus and build OneThinker-SFT-340k with high-quality CoT annotations for SFT cold start. Furthermore, we propose EMA-GRPO, a new RL method that balances heterogeneous reward signals across diverse visual tasks by tracking task-wise moving averages of reward standard deviations for balanced optimization.

OneThinker demonstrates strong performance on 31 benchmarks across 10 fundamental vision tasks, while showing effective knowledge transfer between certain tasks and promising zero-shot generalization ability, marking a step toward a unified multimodal reasoning generalist.

πŸ“„ Citations

If you find our work helpful for your research, please consider citing our work.

@article{feng2025onethinker,
  title={OneThinker: All-in-one Reasoning Model for Image and Video},
  author={Feng, Kaituo and Zhang, Manyuan and Li, Hongyu and Fan, Kaixuan and Chen, Shuang and Jiang, Yilei and Zheng, Dian and Sun, Peiwen and Zhang, Yiyuan and Sun, Haoze and others},
  journal={arXiv preprint arXiv:2512.03043},
  year={2025}
}
Downloads last month
29
Safetensors
Model size
770k params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for OneThink/OneThinker-SFT-Qwen3-8B

Finetuned
(79)
this model

Dataset used to train OneThink/OneThinker-SFT-Qwen3-8B