LLM

AI Acts Differently When It Knows It’s Being Tested, Research Finds

Echoing the 2015 ‘Dieselgate’ scandal, new research suggests that AI language models such as GPT-4, Claude, and Gemini may change their behavior during tests, sometimes acting ‘safer’ for the test than they would in real-world use. If LLMs habitually adjust their behavior under scrutiny, safety audits could end up certifying systems that behave very differently…

Fine-tuning vs. in-context learning: New research guides better LLM customization for real-world tasks

ellonjohns5 months ago010 mins

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Two popular approaches for customizing large language models (LLMs) for downstream tasks are fine-tuning and in-context learning (ICL). In a recent study, researchers at Google DeepMind and Stanford University explored the generalization capabilities of these two…

A Coding Guide to Asynchronous Web Data Extraction Using Crawl4AI: An Open-Source Web Crawling and Scraping Toolkit Designed for LLM Workflows

ellonjohns5 months ago09 mins

In this tutorial, we demonstrate how to harness Crawl4AI, a modern, Python‑based web crawling toolkit, to extract structured data from web pages directly within Google Colab. Leveraging the power of asyncio for asynchronous I/O, httpx for HTTP requests, and Crawl4AI’s built‑in AsyncHTTPCrawlerStrategy, we bypass the overhead of headless browsers while still parsing complex HTML via…

Salesforce AI Released APIGen-MT and xLAM-2-fc-r Model Series: Advancing Multi-Turn Agent Training with Verified Data Pipelines and Scalable LLM Architectures

ellonjohns6 months ago011 mins

AI agents quickly become core components in handling complex human interactions, particularly in business environments where conversations span multiple turns and involve task execution, information extraction, and adherence to specific procedural rules. Unlike traditional chatbots that handle single-turn questions, these agents must hold context over several dialogue exchanges while integrating external data and tool usage….

A Comprehensive Guide to LLM Routing: Tools and Frameworks

ellonjohns6 months ago014 mins

Deploying LLMs presents challenges, particularly in optimizing efficiency, managing computational costs, and ensuring high-quality performance. LLM routing has emerged as a strategic solution to these challenges, enabling intelligent task allocation to the most suitable models or tools. Let’s delve into the intricacies of LLM routing, explore various tools and frameworks designed for its implementation, and…

The TAO of data: How Databricks is optimizing AI LLM fine-tuning without data labels

ellonjohns6 months ago011 mins

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More AI models perform only as well as the data used to train or fine-tune them. Labeled data has been a foundational element of machine learning (ML) and generative AI for much of their history. Labeled data…

This AI Paper Explores Reinforced Learning and Process Reward Models: Advancing LLM Reasoning with Scalable Data and Test-Time Scaling

ellonjohns8 months ago08 mins

Scaling the size of large language models (LLMs) and their training data have now opened up emergent capabilities that allow these models to perform highly structured reasoning, logical deductions, and abstract thought. These are not incremental improvements over previous tools but mark the journey toward reaching Artificial general intelligence (AGI). Training LLMs to reason well…

Google DeepMind researchers introduce new benchmark to improve LLM factuality, reduce hallucinations

ellonjohns9 months ago09 mins

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Hallucinations, or factually inaccurate responses, continue to plague large language models (LLMs). Models falter particularly when they are given more complex tasks and when users are looking for specific and highly detailed responses. It’s a challenge…

Fine-Tuning an Open-Source LLM with Axolotl Using Direct Preference Optimization (DPO) — SitePoint

ellonjohns9 months ago015 mins

LLMs have unlocked countless new opportunities for AI applications. If you’ve ever wanted to fine-tune your own model, this guide will show you how to do it easily and without writing any code. Using tools like Axolotl and DPO, we’ll walk through the process step by step. What Is an LLM? A Large Language Model…

How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

ellonjohns9 months ago01 mins

When we began studying jailbreak evaluations, we found a fascinating paper claiming that you could jailbreak frontier LLMs simply by translating forbidden prompts into obscure languages. Excited by this result, we attempted to reproduce it and found something unexpected.

Highlights

7 easy ways I fixed iOS 26’s bad battery life on my iPhone

The Rise of Micro-Influencers: Small Audiences, Big Impact – Tecuy Media

SpyCloud Report: 2/3 Orgs Extremely Concerned About Identity Attacks Yet Major Blind Spots Persist

Sandisk WD Blue SN5100 2TB SSD Review: A Rhapsody in Blue

Category Collection