
evaluating
Evaluating social and ethical risks from generative AI
Introducing a context-based framework for comprehensively evaluating the social and ethical risks of AI systems Generative AI systems are already being used to write books, create graphic designs, assist medical practitioners, and are becoming increasingly capable. Ensuring these systems are developed and deployed responsibly requires carefully evaluating the potential ethical and social risks they may…
FACTS Grounding: A new benchmark for evaluating the factuality of large language models
Responsibility & Safety Published 17 December 2024 Authors FACTS team Our comprehensive benchmark and online leaderboard offer a much-needed measure of how accurately LLMs ground their responses in provided source material and avoid hallucinations Large language models (LLMs) are transforming how we access information, yet their grip on factual accuracy remains imperfect. They can “hallucinate”…