Study

Can AI really code? Study maps the roadblocks to autonomous software engineering

Imagine a future where artificial intelligence quietly shoulders the drudgery of software development: refactoring tangled code, migrating legacy systems, and hunting down race conditions, so that human engineers can devote themselves to architecture, design, and the genuinely novel problems still beyond a machine’s reach. Recent advances appear to have nudged that future tantalizingly close, but…

Google study shows LLMs abandon correct answers under pressure, threatening multi-turn AI systems

ellonjohns2 weeks ago011 mins

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A new study by researchers at Google DeepMind and University College London reveals how large language models (LLMs) form, maintain and lose confidence in their answers. The findings reveal…

Power Tips #142: A comparison study on a floating voltage tracking power supply for ATE

ellonjohns4 weeks ago015 mins

In order to test multiple ICs simultaneously with different test voltages and currents, semiconductor automatic test equipment (ATE) uses multiple source measurement units (SMUs). Each SMU requires its own independent floating voltage tracking power supply to ensure clean measurements. Figure 1 shows the basic structure of the SMU power supply. The voltage tracking power supplies…

Just add humans: Oxford medical study underscores the missing link in chatbot testing

ellonjohns1 month ago018 mins

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Headlines have been blaring it for years: Large language models (LLMs) can not only pass medical licensing exams but also outperform humans. GPT-4 could correctly answer U.S. medical exam licensing questions 90%…

Study shows vision-language models can’t handle queries with negation words

ellonjohns2 months ago011 mins

Imagine a radiologist examining a chest X-ray from a new patient. She notices the patient has swelling in the tissue but does not have an enlarged heart. Looking to speed up diagnosis, she might use a vision-language machine-learning model to search for reports from similar patients. But if the model mistakenly identifies reports with both…

Case Study: Combining Cutting-Edge CSS Features Into a “Course Navigation” Component | CSS-Tricks

ellonjohns4 months ago025 mins

I came across this awesome article navigator by Jhey Tompkins: CodePen Embed Fallback It solved a UX problem I was facing on a project, so I’ve adapted it to the needs of an online course — a “course navigator” if you will — and built upon it. And today I’m going to pick it apart…

How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

ellonjohns7 months ago01 mins

When we began studying jailbreak evaluations, we found a fascinating paper claiming that you could jailbreak frontier LLMs simply by translating forbidden prompts into obscure languages. Excited by this result, we attempted to reproduce it and found something unexpected.

Highlights

How Formatting Text in Web Design Increases Conversions

Luto Walkthrough – All Chapters And Full Game Guide

Proton VPN review 2025: A nonprofit service with premium performance

Trump’s Anti-Bias AI Order Is Just More Bias

Category Collection