Native Multimodal Models are World Learners 🌍
AI & ML interests
None defined yet.
Recent Activity
View all activity
Papers
RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics
General Agentic Memory Via Deep Research
Efficient MLLM for Long Video Understanding.
A Bilingual Pretraining Dataset for Enhancing Reasoning in Large Language Models
open-source community driven next generation of AI models
Emu3: Next-Token Prediction is All You Need
Chinese Corpora Internet(中文互联网语料)
Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
-
BAAI/Infinity-MM
Updated • 6.6k • 113 -
BAAI/Aquila-VL-2B-llava-qwen
Visual Question Answering • 2B • Updated • 238 • 61 -
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Paper • 2410.18558 • Published • 19 -
BAAI/Aquila-VL-2B-Intermediate
Image-Text-to-Text • Updated • 2
Alt
-
BAAI/AltCLIP
Zero-Shot Image Classification • Updated • 4.31k • 31 -
BAAI/AltCLIP-m18
Zero-Shot Image Classification • Updated • 399 • 5 -
AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities
Paper • 2211.06679 • Published • 2 -
AltDiffusion: A Multilingual Text-to-Image Diffusion Model
Paper • 2308.09991 • Published • 3
多语种多行业预训练数据集
URSA: Uniform Discrete Diffusion with Metric Path for Video Generation
RoboBrain 2.0: See Better. Think Harder. Do Smarter.
Scaling Instruction Selection and Synthesis to Enhance Language Models
-
BAAI/Infinity-Instruct
Viewer • Updated • 21.9M • 6.77k • 686 -
BAAI/Gemma2-9B-IT-Simpo-Infinity-Preference
9B • Updated • 96 • 17 -
BAAI/Infinity-Instruct-7M-Gen-Llama3_1-70B
Text Generation • 71B • Updated • 1.2k • • 19 -
BAAI/Infinity-Instruct-3M-0625-Yi-1.5-9B
Text Generation • 9B • Updated • 7.93k • 3
NOVA: Autoregressive Video Generation without Vector Quantization
多语种多行业指令数据集
Native Multimodal Models are World Learners 🌍
URSA: Uniform Discrete Diffusion with Metric Path for Video Generation
RoboBrain 2.0: See Better. Think Harder. Do Smarter.
Efficient MLLM for Long Video Understanding.
A Bilingual Pretraining Dataset for Enhancing Reasoning in Large Language Models
open-source community driven next generation of AI models
Emu3: Next-Token Prediction is All You Need
Chinese Corpora Internet(中文互联网语料)
Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
-
BAAI/Infinity-MM
Updated • 6.6k • 113 -
BAAI/Aquila-VL-2B-llava-qwen
Visual Question Answering • 2B • Updated • 238 • 61 -
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Paper • 2410.18558 • Published • 19 -
BAAI/Aquila-VL-2B-Intermediate
Image-Text-to-Text • Updated • 2
Scaling Instruction Selection and Synthesis to Enhance Language Models
-
BAAI/Infinity-Instruct
Viewer • Updated • 21.9M • 6.77k • 686 -
BAAI/Gemma2-9B-IT-Simpo-Infinity-Preference
9B • Updated • 96 • 17 -
BAAI/Infinity-Instruct-7M-Gen-Llama3_1-70B
Text Generation • 71B • Updated • 1.2k • • 19 -
BAAI/Infinity-Instruct-3M-0625-Yi-1.5-9B
Text Generation • 9B • Updated • 7.93k • 3
NOVA: Autoregressive Video Generation without Vector Quantization
Alt
-
BAAI/AltCLIP
Zero-Shot Image Classification • Updated • 4.31k • 31 -
BAAI/AltCLIP-m18
Zero-Shot Image Classification • Updated • 399 • 5 -
AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities
Paper • 2211.06679 • Published • 2 -
AltDiffusion: A Multilingual Text-to-Image Diffusion Model
Paper • 2308.09991 • Published • 3
多语种多行业指令数据集
多语种多行业预训练数据集