LLM-as-a-Judge: Where Do Its Signals Break, When Do They Hold, and What Should “Evaluation” Mean?

LLM-as-a-Judge: Where Do Its Signals Break, When Do They Hold, and What Should “Evaluation” Mean?

What exactly is being measured when a judge LLM assigns a 1–5 (or pairwise) score? Most “correctness/faithfulness/completeness” rubrics are project-specific. Without task-grounded definitions, a scalar score can drift from business outcomes (e.g., “useful marketing post” vs. “high completeness”). Surveys of LLM-as-a-judge (LAJ) note that rubric ambiguity and prompt template choices materially shift scores and human…

Read More
Introducing SGS-1

Introducing SGS-1

Your browser does not support the video tag. SGS-1 in Fusion360 CAD software creating brackets for a roller assembly. Today we are announcing SGS-1, a foundation model that can generate fully manufacturable and parametric 3D geometry. You can try a research preview of SGS-1 here. Given an image or a 3D mesh, SGS-1 can generate CAD…

Read More
How Phoebe Gates and Sophia Kianni used Gen Z methods to raise M for Phia | TechCrunch

How Phoebe Gates and Sophia Kianni used Gen Z methods to raise $8M for Phia | TechCrunch

There’s a new buzzy fashion startup in town. Meet Phia, the shopping app founded by Bill Gates’ daughter Phoebe Gates and her Stanford roommate-slash co-founder, Sophia Kianni.  Phia searches the web to help users compare the price of fashion items. It’s a mobile app and browser extension that’s essentially “Google flights for fashion,” as the…

Read More