-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
Collections
Discover the best community collections!
Collections including paper arxiv:2503.18878
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 142
-
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
Truth Neurons
Paper • 2505.12182 • Published • 8 -
Resa: Transparent Reasoning Models via SAEs
Paper • 2506.09967 • Published • 21 -
Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls
Paper • 2510.00184 • Published • 16
-
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
Large Language Models are Locally Linear Mappings
Paper • 2505.24293 • Published • 14 -
Thought Anchors: Which LLM Reasoning Steps Matter?
Paper • 2506.19143 • Published • 13
-
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon
Paper • 2502.07445 • Published • 11 -
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning
Paper • 2502.04689 • Published • 8 -
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
Paper • 2502.03032 • Published • 60 -
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper • 2502.01534 • Published • 40
-
Resa: Transparent Reasoning Models via SAEs
Paper • 2506.09967 • Published • 21 -
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Paper • 2408.05147 • Published • 40 -
Train Sparse Autoencoders Efficiently by Utilizing Features Correlation
Paper • 2505.22255 • Published • 24 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119
-
andreuka18/DeepSeek-R1-Distill-Llama-8B-lmsys-openthoughts-tokenized
Viewer • Updated • 781k • 164 -
andreuka18/deepseek-r1-distill-llama-8b-lmsys-openthoughts
Text Generation • Updated -
andreuka18/OpenThoughts-10k-DeepSeek-R1
Viewer • Updated • 10k • 26 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119
-
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization
Paper • 2503.16874 • Published • 44 -
System Prompt Optimization with Meta-Learning
Paper • 2505.09666 • Published • 71 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
Paper • 2505.23754 • Published • 15
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization
Paper • 2412.18279 • Published -
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback
Paper • 2501.10799 • Published • 15 -
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Paper • 2501.19324 • Published • 39
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
Resa: Transparent Reasoning Models via SAEs
Paper • 2506.09967 • Published • 21 -
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Paper • 2408.05147 • Published • 40 -
Train Sparse Autoencoders Efficiently by Utilizing Features Correlation
Paper • 2505.22255 • Published • 24 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 142
-
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
Truth Neurons
Paper • 2505.12182 • Published • 8 -
Resa: Transparent Reasoning Models via SAEs
Paper • 2506.09967 • Published • 21 -
Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls
Paper • 2510.00184 • Published • 16
-
andreuka18/DeepSeek-R1-Distill-Llama-8B-lmsys-openthoughts-tokenized
Viewer • Updated • 781k • 164 -
andreuka18/deepseek-r1-distill-llama-8b-lmsys-openthoughts
Text Generation • Updated -
andreuka18/OpenThoughts-10k-DeepSeek-R1
Viewer • Updated • 10k • 26 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119
-
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
Large Language Models are Locally Linear Mappings
Paper • 2505.24293 • Published • 14 -
Thought Anchors: Which LLM Reasoning Steps Matter?
Paper • 2506.19143 • Published • 13
-
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization
Paper • 2503.16874 • Published • 44 -
System Prompt Optimization with Meta-Learning
Paper • 2505.09666 • Published • 71 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
Paper • 2505.23754 • Published • 15
-
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon
Paper • 2502.07445 • Published • 11 -
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning
Paper • 2502.04689 • Published • 8 -
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
Paper • 2502.03032 • Published • 60 -
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper • 2502.01534 • Published • 40
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization
Paper • 2412.18279 • Published -
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback
Paper • 2501.10799 • Published • 15 -
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Paper • 2501.19324 • Published • 39