14 32 9

Henry Hengyuan Zhao

hhenryz

https://zhaohengyuan1.github.io/

AI & ML interests

Multimodal Reasoning, Human-AI Interaction, GUI Automation

Recent Activity

upvoted a paper 15 days ago

Computer-Use Agents as Judges for Generative User Interface

upvoted a paper 28 days ago

Grounding Computer Use Agents on Human Demonstrations

liked a dataset 29 days ago

open-thoughts/OpenThoughts3-1.2M

View all activity

Organizations

upvoted a paper 15 days ago

Computer-Use Agents as Judges for Generative User Interface

Paper • 2511.15567 • Published 20 days ago • 51

upvoted a paper 28 days ago

Grounding Computer Use Agents on Human Demonstrations

Paper • 2511.07332 • Published 29 days ago • 104

upvoted 2 papers about 1 month ago

Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs

Paper • 2506.14245 • Published Jun 17 • 44

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

Paper • 2511.02778 • Published Nov 4 • 101

upvoted a paper about 2 months ago

From Charts to Code: A Hierarchical Benchmark for Multimodal Models

Paper • 2510.17932 • Published Oct 20 • 7

upvoted a collection about 2 months ago

Qwen3-VL

Collection

37 items • Updated Nov 1 • 498

upvoted 3 papers 2 months ago

upvoted a collection 5 months ago

NVILA

Collection

11 items • Updated Sep 13 • 16

upvoted a paper 5 months ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 303

upvoted 2 papers 7 months ago

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Paper • 2505.21497 • Published May 27 • 109

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8 • 186

upvoted 3 papers 9 months ago

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

Paper • 2503.15661 • Published Mar 19 • 2

Long-Context Autoregressive Video Modeling with Next-Frame Prediction

Paper • 2503.19325 • Published Mar 25 • 73

Feather-SQL: A Lightweight NL2SQL Framework with Dual-Model Collaboration Paradigm for Small Language Models

Paper • 2503.17811 • Published Mar 22 • 13

upvoted an article 9 months ago

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

•

1.31k

upvoted 3 papers 9 months ago

Impossible Videos

Paper • 2503.14378 • Published Mar 18 • 61

Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments

Paper • 2501.10893 • Published Jan 18 • 26

TPDiff: Temporal Pyramid Video Diffusion Model

Paper • 2503.09566 • Published Mar 12 • 45

Henry Hengyuan Zhao

AI & ML interests

Recent Activity

Organizations

hhenryz's activity

Open-source DeepResearch – Freeing our search agents