3 19 5

Minsoo Kim

minsoo2333

https://marsjacobs.github.io

AI & ML interests

LLM compression

Recent Activity

upvoted a paper about 2 months ago

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

upvoted a paper 3 months ago

QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models

authored a paper 3 months ago

EpiCache: Episodic KV Cache Management for Long Conversational Question Answering

View all activity

Organizations

None yet

upvoted a paper about 2 months ago

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published Oct 13 • 176

upvoted 3 papers 3 months ago

QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models

Paper • 2509.17428 • Published Sep 22 • 9

Interleaved Reasoning for Large Language Models via Reinforcement Learning

Paper • 2505.19640 • Published May 26 • 14

EpiCache: Episodic KV Cache Management for Long Conversational Question Answering

Paper • 2509.17396 • Published Sep 22 • 19

upvoted a paper 5 months ago

KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction

Paper • 2505.23416 • Published May 29 • 11

upvoted a paper 6 months ago

InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding

Paper • 2506.15745 • Published Jun 18 • 13

upvoted a paper 12 months ago

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published Dec 5, 2024 • 59

upvoted a paper about 1 year ago

A Controlled Study on Long Context Extension and Generalization in LLMs

Paper • 2409.12181 • Published Sep 18, 2024 • 45

upvoted 3 papers over 1 year ago

upvoted a collection over 1 year ago

Gradient's Long Context Models

Collection

6 items • Updated Jun 13, 2024 • 3

upvoted 3 papers over 1 year ago

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Paper • 2407.02490 • Published Jul 2, 2024 • 27

Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 41

TransformerFAM: Feedback attention is working memory

Paper • 2404.09173 • Published Apr 14, 2024 • 43

upvoted a collection over 1 year ago

Meta Llama 3

Collection

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Dec 6, 2024 • 872

upvoted a paper over 1 year ago

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 116

upvoted a paper almost 2 years ago

Speculative Streaming: Fast LLM Inference without Auxiliary Models

Paper • 2402.11131 • Published Feb 16, 2024 • 43

upvoted a paper about 2 years ago

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 45

Minsoo Kim

AI & ML interests

Recent Activity

Organizations

minsoo2333's activity