Models
Bridging Reasoning and Action: The Synergy of Large Concept Models (LCMs) and Large Action Models (LAMs) in Agentic Systems
The advent of advanced AI models has led to innovations in how machines process information, interact with humans, and execute tasks in real-world settings. Two emerging pioneering approaches are large concept models (LCMs) and large action models (LAMs). While both extend the foundational capabilities of large language models (LLMs), their objectives and applications diverge. LCMs…
This AI Paper Explores Reinforced Learning and Process Reward Models: Advancing LLM Reasoning with Scalable Data and Test-Time Scaling
Scaling the size of large language models (LLMs) and their training data have now opened up emergent capabilities that allow these models to perform highly structured reasoning, logical deductions, and abstract thought. These are not incremental improvements over previous tools but mark the journey toward reaching Artificial general intelligence (AGI). Training LLMs to reason well…
Ghostbuster: Detecting Text Ghostwritten by Large Language Models
The structure of Ghostbuster, our new state-of-the-art method for detecting AI-generated text. Large language models like ChatGPT write impressively well—so well, in fact, that they’ve become a problem. Students have begun using these models to ghostwrite assignments, leading some schools to ban ChatGPT. In addition, these models are also prone to producing text with factual…
The Shift from Models to Compound AI Systems
AI caught everyone’s attention in 2023 with Large Language Models (LLMs) that can be instructed to perform general tasks, such as translation or coding, just by prompting. This naturally led to an intense focus on models as the primary ingredient in AI application development, with everyone wondering what capabilities new LLMs will bring. As more…
Why AI Needs Large Numerical Models (LNMs) for Mathematical Mastery • AI Blog
The availability and structure of mathematical training data, combined with the unique characteristics of mathematics itself, suggest that training a Large Numerical Model (LNM) is feasible and may require less data than training a general-purpose LLM. Here’s a detailed look: Availability of Mathematical Training Data Structure of Mathematics and Data Efficiency Mathematics’ highly structured nature…
Hunyuan-Large and the MoE Revolution: How AI Models Are Growing Smarter and Faster
Artificial Intelligence (AI) is advancing at an extraordinary pace. What seemed like a futuristic concept just a decade ago is now part of our daily lives. However, the AI we encounter now is only the beginning. The fundamental transformation is yet to be witnessed due to the developments behind the scenes, with massive models capable…
DeepSpeed: a tuning tool for large language models
Large Language Models (LLMs) have the potential to automate and reduce the workloads of many types, including those of cybersecurity analysts and incident responders. But generic LLMs lack the domain-specific knowledge to handle these tasks well. While they may have been built with training data that included some cybersecurity-related resources, that is often insufficient for…
OpenAI confirms new frontier models o3 and o3-mini
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI is slowly inviting selected users to test a whole new set of reasoning models named o3 and o3 mini, successors to the o1 and o1-mini models that just entered full release earlier this month. OpenAI…
Can AI Models Scale Knowledge Storage Efficiently? Meta Researchers Advance Memory Layer Capabilities at Scale
The field of neural network architectures has witnessed rapid advancements as researchers explore innovative ways to enhance computational efficiency while maintaining or improving model performance. Traditional dense networks rely heavily on computationally expensive matrix operations to encode and store information. This reliance poses challenges when scaling these models for real-world applications that demand extensive knowledge…
Hugging Face shows how test-time scaling helps small language models punch above their weight
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More In a new case study, Hugging Face researchers have demonstrated how small language models (SLMs) can be configured to outperform much larger models. Their findings show that a Llama 3 model with 3B parameters can outperform…
- 1
- 2