
Advanced LLMs

AI Acts Differently When It Knows It’s Being Tested, Research Finds
Echoing the 2015 ‘Dieselgate’ scandal, new research suggests that AI language models such as GPT-4, Claude, and Gemini may change their behavior during tests, sometimes acting ‘safer’ for the test than they would in real-world use. If LLMs habitually adjust their behavior under scrutiny, safety audits could end up certifying systems that behave very differently…

AI’s Struggle to Read Analogue Clocks May Have Deeper Significance
A new paper from researchers in China and Spain finds that even advanced multimodal AI models such as GPT-4.1 struggle to tell the time from images of analog clocks. Small visual changes in the clocks can cause major interpretation errors, and fine-tuning only helps with familiar examples. The results raise concerns about the reliability of…