
LLM

Building a Hybrid Rule-Based and Machine Learning Framework to Detect and Defend Against Jailbreak Prompts in LLM Systems
In this tutorial, we introduce a Jailbreak Defense that we built step-by-step to detect and safely handle policy-evasion prompts. We generate realistic attack and benign examples, craft rule-based signals, and combine those with TF-IDF features into a compact, interpretable classifier so we can catch evasive prompts without blocking legitimate requests. We demonstrate evaluation metrics, explain…

How to build AI scaling laws for efficient LLM training and budget maximization
When researchers are building large language models (LLMs), they aim to maximize performance under a particular computational and financial budget. Since training a model can amount to millions of dollars, developers need to be judicious with cost-impacting decisions about, for instance, the model architecture, optimizers, and training datasets before committing to a model. To anticipate…

Teaching the model: Designing LLM feedback loops that get smarter over time
Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Large language models (LLMs) have dazzled with their ability to reason, generate and automate, but what separates a compelling demo from a lasting product isn’t just the model’s initial…

Creating a Knowledge Graph Using an LLM
In this tutorial, we’ll show how to create a Knowledge Graph from an unstructured document using an LLM. While traditional NLP methods have been used for extracting entities and relationships, Large Language Models (LLMs) like GPT-4o-mini make this process more accurate and context-aware. LLMs are especially useful when working with messy, unstructured data. Using Python,…

AI Guardrails and Trustworthy LLM Evaluation: Building Responsible AI Systems
Introduction: The Rising Need for AI Guardrails As large language models (LLMs) grow in capability and deployment scale, the risk of unintended behavior, hallucinations, and harmful outputs increases. The recent surge in real-world AI integrations across healthcare, finance, education, and defense sectors amplifies the demand for robust safety mechanisms. AI guardrails—technical and procedural controls ensuring…

Getting Started with Mirascope: Removing Semantic Duplicates using an LLM
Mirascope is a powerful and user-friendly library that provides a unified interface for working with a wide range of Large Language Model (LLM) providers, including OpenAI, Anthropic, Mistral, Google (Gemini and Vertex AI), Groq, Cohere, LiteLLM, Azure AI, and Amazon Bedrock. It simplifies everything from text generation and structured data extraction to building complex AI-powered…

Context Rot: How Increasing Input Tokens Impacts LLM Performance
Recent developments in LLMs show a trend toward longer context windows, with the input token count of the latest models reaching the millions. Because these models achieve near-perfect scores on widely adopted benchmarks like Needle in a Haystack (NIAH) [1], it’s often assumed that their performance is uniform across long-context tasks. However, NIAH is fundamentally…

Getting Started with MLFlow for LLM Evaluation
MLflow is a powerful open-source platform for managing the machine learning lifecycle. While it’s traditionally used for tracking model experiments, logging parameters, and managing deployments, MLflow has recently introduced support for evaluating Large Language Models (LLMs). In this tutorial, we explore how to use MLflow to evaluate the performance of an LLM—in our case, Google’s…

IBM sees enterprise customers are using ‘everything’ when it comes to AI, the challenge is matching the LLM to the right use case
Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Over the last 100 years, IBM has seen many different tech trends rise and fall. What tends to win out are technologies where there is choice. At VB Transform 2025 today, Armand Ruiz, VP…

ether0: A 24B LLM Trained with Reinforcement Learning RL for Advanced Chemical Reasoning Tasks
LLMs primarily enhance accuracy through scaling pre-training data and computing resources. However, the attention has shifted towards alternate scaling due to finite data availability. This includes test-time training and inference compute scaling. Reasoning models enhance performance by emitting thought processes before answers, initially through CoT prompting. Recently, reinforcement learning (RL) post-training has been used. Scientific…
- 1
- 2