-
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Paper • 2410.09024 • Published • 1 -
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
Paper • 2410.02644 • Published -
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Paper • 2402.04249 • Published • 6
Collections
Discover the best community collections!
Collections including paper arxiv:2402.04249
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Paper • 2401.03065 • Published • 11 -
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Paper • 2305.01210 • Published • 3 -
AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models
Paper • 2309.06495 • Published • 1 -
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Paper • 2311.16502 • Published • 37
-
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Paper • 2402.04249 • Published • 6 -
cais/HarmBench-Llama-2-13b-cls
Text Generation • 13B • Updated • 29.5k • • 25 -
cais/HarmBench-Llama-2-13b-cls-multimodal-behaviors
Text Generation • 13B • Updated • 25 • -
cais/HarmBench-Mistral-7b-val-cls
Text Generation • 7B • Updated • 15.3k • 6
-
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 30 -
Weak-to-Strong Jailbreaking on Large Language Models
Paper • 2401.17256 • Published • 16 -
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Paper • 2401.17263 • Published • 1 -
Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming
Paper • 2311.06237 • Published • 1
-
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Paper • 2410.09024 • Published • 1 -
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
Paper • 2410.02644 • Published -
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Paper • 2402.04249 • Published • 6
-
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Paper • 2402.04249 • Published • 6 -
cais/HarmBench-Llama-2-13b-cls
Text Generation • 13B • Updated • 29.5k • • 25 -
cais/HarmBench-Llama-2-13b-cls-multimodal-behaviors
Text Generation • 13B • Updated • 25 • -
cais/HarmBench-Mistral-7b-val-cls
Text Generation • 7B • Updated • 15.3k • 6
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 30 -
Weak-to-Strong Jailbreaking on Large Language Models
Paper • 2401.17256 • Published • 16 -
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Paper • 2401.17263 • Published • 1 -
Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming
Paper • 2311.06237 • Published • 1
-
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Paper • 2401.03065 • Published • 11 -
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Paper • 2305.01210 • Published • 3 -
AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models
Paper • 2309.06495 • Published • 1 -
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Paper • 2311.16502 • Published • 37