Running on Zero Featured 303 Depth Anything 3 🏢 303 Generate depth maps from images using GPU acceleration
DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion Paper • 2510.20766 • Published Oct 23 • 34
Revisiting Multimodal Positional Encoding in Vision-Language Models Paper • 2510.23095 • Published Oct 27 • 20
Seedream 4.0: Toward Next-generation Multimodal Image Generation Paper • 2509.20427 • Published Sep 24 • 80
Does FLUX Already Know How to Perform Physically Plausible Image Composition? Paper • 2509.21278 • Published Sep 25 • 16
"Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries Paper • 2508.15752 • Published Aug 21 • 7
Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds Paper • 2508.14892 • Published Aug 20 • 9
FoNE: Precise Single-Token Number Embeddings via Fourier Features Paper • 2502.09741 • Published Feb 13 • 15
PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer Paper • 2505.04622 • Published May 7 • 27
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning Paper • 2505.04601 • Published May 7 • 29