
model

Hybrid AI model crafts smooth, high-quality videos in seconds
What would a behind-the-scenes look at a video generated by an artificial intelligence model be like? You might think the process is similar to stop-motion animation, where many images are created and stitched together, but that’s not quite the case for “diffusion models” like OpenAl’s SORA and Google’s VEO 2. Instead of producing a video…

New model predicts a chemical reaction’s point of no return
When chemists design new chemical reactions, one useful piece of information involves the reaction’s transition state — the point of no return from which a reaction must proceed. This information allows chemists to try to produce the right conditions that will allow the desired reaction to occur. However, current methods for predicting the transition state…

A Step-by-Step Coding Guide to Defining Custom Model Context Protocol (MCP) Server and Client Tools with FastMCP and Integrating Them into Google Gemini 2.0’s Function‑Calling Workflow
In this Colab‑ready tutorial, we demonstrate how to integrate Google’s Gemini 2.0 generative AI with an in‑process Model Context Protocol (MCP) server, using FastMCP. Starting with an interactive getpass prompt to capture your GEMINI_API_KEY securely, we install and configure all necessary dependencies: the google‑genai Python client for calling the Gemini API, fastmcp for defining and…

Swapping LLMs isn’t plug-and-play: Inside the hidden cost of model migration
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Swapping large language models (LLMs) is supposed to be easy, isn’t it? After all, if they all speak “natural language,” switching from GPT-4o to Claude or Gemini should be as simple as changing an API key……

SQL-R1: A Reinforcement Learning-based NL2SQL Model that Outperforms Larger Systems in Complex Queries with Transparent and Accurate SQL Generation
Natural language interface to databases is a growing focus within artificial intelligence, particularly because it allows users to interact with structured databases using plain human language. This area, often known as NL2SQL (Natural Language to SQL), is centered on transforming user-friendly queries into SQL commands that can be directly executed on databases. The objective is…

RT-2: New model translates vision and language into action
Research Published 28 July 2023 Authors Yevgen Chebotar, Tianhe Yu Robotic Transformer 2 (RT-2) is a novel vision-language-action (VLA) model that learns from both web and robotics data, and translates this knowledge into generalised instructions for robotic control High-capacity vision-language models (VLMs) are trained on web-scale datasets, making these systems remarkably good at recognising visual…

Salesforce AI Released APIGen-MT and xLAM-2-fc-r Model Series: Advancing Multi-Turn Agent Training with Verified Data Pipelines and Scalable LLM Architectures
AI agents quickly become core components in handling complex human interactions, particularly in business environments where conversations span multiple turns and involve task execution, information extraction, and adherence to specific procedural rules. Unlike traditional chatbots that handle single-turn questions, these agents must hold context over several dialogue exchanges while integrating external data and tool usage….

GraphCast: AI model for faster and more accurate global weather forecasting
Research Published 14 November 2023 Authors Remi Lam on behalf of the GraphCast team Our state-of-the-art model delivers 10-day weather predictions at unprecedented accuracy in under one minute The weather affects us all, in ways big and small. It can dictate how we dress in the morning, provide us with green energy and, in the…

OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI’s voice AI models have gotten it into trouble before with actor Scarlett Johansson, but that isn’t stopping the company from continuing to advance its offerings in this category. Today, the ChatGPT maker has unveiled three,…

This AI Paper Introduces R1-Onevision: A Cross-Modal Formalization Model for Advancing Multimodal Reasoning and Structured Visual Interpretation
Multimodal reasoning is an evolving field that integrates visual and textual data to enhance machine intelligence. Traditional artificial intelligence models excel at processing either text or images but often struggle when required to reason across both formats. Analyzing charts, graphs, mathematical symbols, and complex visual patterns alongside textual descriptions is crucial for applications in education,…