
Study

Can AI really code? Study maps the roadblocks to autonomous software engineering
Imagine a future where artificial intelligence quietly shoulders the drudgery of software development: refactoring tangled code, migrating legacy systems, and hunting down race conditions, so that human engineers can devote themselves to architecture, design, and the genuinely novel problems still beyond a machine’s reach. Recent advances appear to have nudged that future tantalizingly close, but…

Google study shows LLMs abandon correct answers under pressure, threatening multi-turn AI systems
Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A new study by researchers at Google DeepMind and University College London reveals how large language models (LLMs) form, maintain and lose confidence in their answers. The findings reveal…

Power Tips #142: A comparison study on a floating voltage tracking power supply for ATE
In order to test multiple ICs simultaneously with different test voltages and currents, semiconductor automatic test equipment (ATE) uses multiple source measurement units (SMUs). Each SMU requires its own independent floating voltage tracking power supply to ensure clean measurements. Figure 1 shows the basic structure of the SMU power supply. The voltage tracking power supplies…

Just add humans: Oxford medical study underscores the missing link in chatbot testing
Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Headlines have been blaring it for years: Large language models (LLMs) can not only pass medical licensing exams but also outperform humans. GPT-4 could correctly answer U.S. medical exam licensing questions 90%…

Study shows vision-language models can’t handle queries with negation words
Imagine a radiologist examining a chest X-ray from a new patient. She notices the patient has swelling in the tissue but does not have an enlarged heart. Looking to speed up diagnosis, she might use a vision-language machine-learning model to search for reports from similar patients. But if the model mistakenly identifies reports with both…

Case Study: Combining Cutting-Edge CSS Features Into a “Course Navigation” Component | CSS-Tricks
I came across this awesome article navigator by Jhey Tompkins: CodePen Embed Fallback It solved a UX problem I was facing on a project, so I’ve adapted it to the needs of an online course — a “course navigator” if you will — and built upon it. And today I’m going to pick it apart…

How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark
When we began studying jailbreak evaluations, we found a fascinating paper claiming that you could jailbreak frontier LLMs simply by translating forbidden prompts into obscure languages. Excited by this result, we attempted to reproduce it and found something unexpected.