
jailbreak

DeepSeek Jailbreak Reveals Its Entire System Prompt
Researchers have tricked DeepSeek, the Chinese generative AI (GenAI) that debuted earlier this month to a whirlwind of publicity and user adoption, into revealing the instructions that define how it operates. DeepSeek, the new “it girl” in GenAI, was trained at a fractional cost of existing offerings, and as such has sparked competitive alarm across…

How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark
When we began studying jailbreak evaluations, we found a fascinating paper claiming that you could jailbreak frontier LLMs simply by translating forbidden prompts into obscure languages. Excited by this result, we attempted to reproduce it and found something unexpected.