Building a Hybrid Rule-Based and Machine Learning Framework to Detect and Defend Against Jailbreak Prompts in LLM Systems

Building a Hybrid Rule-Based and Machine Learning Framework to Detect and Defend Against Jailbreak Prompts in LLM Systems

In this tutorial, we introduce a Jailbreak Defense that we built step-by-step to detect and safely handle policy-evasion prompts. We generate realistic attack and benign examples, craft rule-based signals, and combine those with TF-IDF features into a compact, interpretable classifier so we can catch evasive prompts without blocking legitimate requests. We demonstrate evaluation metrics, explain…

Read More
Clément Domingo: “We are not using AI correctly to defend ourselves”

Clément Domingo: “We are not using AI correctly to defend ourselves”

Following Kaspersky Horizon on 1 July in Madrid, Clément Domingo, ethical hacker and cybersecurity evangelist, explains the cybercrime landscape now looks like the legitimate startup world: structured organizations with affiliates and even team-building culture. How a criminal startup works “A cybercrime startup is similar to a classic startup, but dedicated to cybercrime in a very…

Read More