-
Nuclear Norm Regularization for Deep Learning
Paper • 2405.14544 • Published • 1 -
Token embeddings violate the manifold hypothesis
Paper • 2504.01002 • Published • 1 -
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers
Paper • 2403.10476 • Published • 1 -
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning
Paper • 2504.00254 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2501.06425
-
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
Paper • 2501.11873 • Published • 66 -
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 166 -
MoBA: Mixture of Block Attention for Long-Context LLMs
Paper • 2502.13189 • Published • 17
-
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 27 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Paper • 2501.06842 • Published • 16 -
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Paper • 2501.03895 • Published • 52
-
Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback
Paper • 2501.03916 • Published • 16 -
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Paper • 2501.04682 • Published • 99 -
Agent Laboratory: Using LLM Agents as Research Assistants
Paper • 2501.04227 • Published • 95 -
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper • 2501.05366 • Published • 102
-
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Paper • 2311.11077 • Published • 29 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 54 -
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Paper • 2403.03853 • Published • 66
-
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 115 -
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Paper • 2412.06559 • Published • 84 -
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
Paper • 2412.15084 • Published • 13 -
The Lessons of Developing Process Reward Models in Mathematical Reasoning
Paper • 2501.07301 • Published • 99
-
Nuclear Norm Regularization for Deep Learning
Paper • 2405.14544 • Published • 1 -
Token embeddings violate the manifold hypothesis
Paper • 2504.01002 • Published • 1 -
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers
Paper • 2403.10476 • Published • 1 -
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning
Paper • 2504.00254 • Published • 1
-
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Paper • 2311.11077 • Published • 29 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 54 -
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Paper • 2403.03853 • Published • 66
-
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
Paper • 2501.11873 • Published • 66 -
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 166 -
MoBA: Mixture of Block Attention for Long-Context LLMs
Paper • 2502.13189 • Published • 17
-
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 115 -
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Paper • 2412.06559 • Published • 84 -
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
Paper • 2412.15084 • Published • 13 -
The Lessons of Developing Process Reward Models in Mathematical Reasoning
Paper • 2501.07301 • Published • 99
-
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 27 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Paper • 2501.06842 • Published • 16 -
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Paper • 2501.03895 • Published • 52
-
Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback
Paper • 2501.03916 • Published • 16 -
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Paper • 2501.04682 • Published • 99 -
Agent Laboratory: Using LLM Agents as Research Assistants
Paper • 2501.04227 • Published • 95 -
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper • 2501.05366 • Published • 102