LLM

Building a Hybrid Rule-Based and Machine Learning Framework to Detect and Defend Against Jailbreak Prompts in LLM Systems

In this tutorial, we introduce a Jailbreak Defense that we built step-by-step to detect and safely handle policy-evasion prompts. We generate realistic attack and benign examples, craft rule-based signals, and combine those with TF-IDF features into a compact, interpretable classifier so we can catch evasive prompts without blocking legitimate requests. We demonstrate evaluation metrics, explain…

How to build AI scaling laws for efficient LLM training and budget maximization

ellonjohns6 days ago015 mins

When researchers are building large language models (LLMs), they aim to maximize performance under a particular computational and financial budget. Since training a model can amount to millions of dollars, developers need to be judicious with cost-impacting decisions about, for instance, the model architecture, optimizers, and training datasets before committing to a model. To anticipate…

Teaching the model: Designing LLM feedback loops that get smarter over time

ellonjohns1 month ago013 mins

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Large language models (LLMs) have dazzled with their ability to reason, generate and automate, but what separates a compelling demo from a lasting product isn’t just the model’s initial…

Creating a Knowledge Graph Using an LLM

ellonjohns2 months ago010 mins

In this tutorial, we’ll show how to create a Knowledge Graph from an unstructured document using an LLM. While traditional NLP methods have been used for extracting entities and relationships, Large Language Models (LLMs) like GPT-4o-mini make this process more accurate and context-aware. LLMs are especially useful when working with messy, unstructured data. Using Python,…

AI Guardrails and Trustworthy LLM Evaluation: Building Responsible AI Systems

ellonjohns2 months ago09 mins

Introduction: The Rising Need for AI Guardrails As large language models (LLMs) grow in capability and deployment scale, the risk of unintended behavior, hallucinations, and harmful outputs increases. The recent surge in real-world AI integrations across healthcare, finance, education, and defense sectors amplifies the demand for robust safety mechanisms. AI guardrails—technical and procedural controls ensuring…

Getting Started with Mirascope: Removing Semantic Duplicates using an LLM

ellonjohns2 months ago09 mins

Mirascope is a powerful and user-friendly library that provides a unified interface for working with a wide range of Large Language Model (LLM) providers, including OpenAI, Anthropic, Mistral, Google (Gemini and Vertex AI), Groq, Cohere, LiteLLM, Azure AI, and Amazon Bedrock. It simplifies everything from text generation and structured data extraction to building complex AI-powered…

Context Rot: How Increasing Input Tokens Impacts LLM Performance

ellonjohns2 months ago070 mins

Recent developments in LLMs show a trend toward longer context windows, with the input token count of the latest models reaching the millions. Because these models achieve near-perfect scores on widely adopted benchmarks like Needle in a Haystack (NIAH) [1], it’s often assumed that their performance is uniform across long-context tasks. However, NIAH is fundamentally…

Getting Started with MLFlow for LLM Evaluation

ellonjohns3 months ago09 mins

MLflow is a powerful open-source platform for managing the machine learning lifecycle. While it’s traditionally used for tracking model experiments, logging parameters, and managing deployments, MLflow has recently introduced support for evaluating Large Language Models (LLMs). In this tutorial, we explore how to use MLflow to evaluate the performance of an LLM—in our case, Google’s…

IBM sees enterprise customers are using ‘everything’ when it comes to AI, the challenge is matching the LLM to the right use case

ellonjohns3 months ago010 mins

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Over the last 100 years, IBM has seen many different tech trends rise and fall. What tends to win out are technologies where there is choice. At VB Transform 2025 today, Armand Ruiz, VP…

ether0: A 24B LLM Trained with Reinforcement Learning RL for Advanced Chemical Reasoning Tasks

ellonjohns4 months ago07 mins

LLMs primarily enhance accuracy through scaling pre-training data and computing resources. However, the attention has shifted towards alternate scaling due to finite data availability. This includes test-time training and inference compute scaling. Reasoning models enhance performance by emitting thought processes before answers, initially through CoT prompting. Recently, reinforcement learning (RL) post-training has been used. Scientific…

Highlights

The Rise of Micro-Influencers: Small Audiences, Big Impact – Tecuy Media

SpyCloud Report: 2/3 Orgs Extremely Concerned About Identity Attacks Yet Major Blind Spots Persist

Sandisk WD Blue SN5100 2TB SSD Review: A Rhapsody in Blue

Automating FOWLP design: A comprehensive framework for next-generation integration

Category Collection