
Mitigate

LlamaFirewall: Open-source framework to detect and mitigate AI centric security risks – Help Net Security
LlamaFirewall is a system-level security framework for LLM-powered applications, built with a modular design to support layered, adaptive defense. It is designed to mitigate a wide spectrum of AI agent security risks including jailbreaking and indirect prompt injection, goal hijacking, and insecure code outputs. Why Meta created LlamaFirewall LLMs are moving far beyond simple chatbot…

Google DeepMind Introduces MONA: A Novel Machine Learning Framework to Mitigate Multi-Step Reward Hacking in Reinforcement Learning
Reinforcement learning (RL) focuses on enabling agents to learn optimal behaviors through reward-based training mechanisms. These methods have empowered systems to tackle increasingly complex tasks, from mastering games to addressing real-world problems. However, as the complexity of these tasks increases, so does the potential for agents to exploit reward systems in unintended ways, creating new…