Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published 5 days ago • 75
LongLive: Real-time Interactive Long Video Generation Paper • 2509.22622 • Published Sep 26 • 184
DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space Paper • 2509.25180 • Published Sep 29 • 6
DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder Paper • 2509.25182 • Published Sep 29 • 37
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention Paper • 2502.14866 • Published Feb 20 • 13
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20, 2024 • 95
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 108
NVILA: Efficient Frontier Visual Language Models Paper • 2412.04468 • Published Dec 5, 2024 • 59
MiniPLM: Knowledge Distillation for Pre-Training Language Models Paper • 2410.17215 • Published Oct 22, 2024 • 16
Data Selection via Optimal Control for Language Models Paper • 2410.07064 • Published Oct 9, 2024 • 9
Compact Language Models via Pruning and Knowledge Distillation Paper • 2407.14679 • Published Jul 19, 2024 • 39
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Paper • 2403.05530 • Published Mar 8, 2024 • 66
An Emulator for Fine-Tuning Large Language Models using Small Language Models Paper • 2310.12962 • Published Oct 19, 2023 • 13
Retentive Network: A Successor to Transformer for Large Language Models Paper • 2307.08621 • Published Jul 17, 2023 • 172
Knowledge Distillation of Large Language Models Paper • 2306.08543 • Published Jun 14, 2023 • 21