large

NVIDIA AI Researchers Introduce FFN Fusion: A Novel Optimization Technique that Demonstrates How Sequential Computation in Large Language Models LLMs can be Effectively Parallelized

Large language models (LLMs) have become vital across domains, enabling high-performance applications such as natural language generation, scientific research, and conversational agents. Underneath these advancements lies the transformer architecture, where alternating layers of attention mechanisms and feed-forward networks (FFNs) sequentially process tokenized input. However, with an increase in size and complexity, the computational burden required…

FunSearch: Making new discoveries in mathematical sciences using Large Language Models

ellonjohns7 months ago020 mins

Research Published 14 December 2023 Authors Alhussein Fawzi and Bernardino Romera Paredes By searching for “functions” written in computer code, FunSearch made the first discoveries in open problems in mathematical sciences using LLMs Update: In December 2024, we published a report on arXiv showing how our method can be used to amplify human performance in…

Tufa Labs Introduced LADDER: A Recursive Learning Framework Enabling Large Language Models to Self-Improve without Human Intervention

ellonjohns7 months ago09 mins

Large Language Models (LLMs) benefit significantly from reinforcement learning techniques, which enable iterative improvements by learning from rewards. However, training these models efficiently remains challenging, as they often require extensive datasets and human supervision to enhance their capabilities. Developing methods that allow LLMs to self-improve autonomously without additional human input or large-scale architectural modifications has…

Like human brains, large language models reason about diverse data in a general way

ellonjohns7 months ago011 mins

While early language models could only process text, contemporary large language models now perform highly diverse tasks on different types of data. For instance, LLMs can understand many languages, generate computer code, solve math problems, or answer questions about images and audio. MIT researchers probed the inner workings of LLMs to better understand how they…

This AI Paper from Menlo Research Introduces AlphaMaze: A Two-Stage Training Framework for Enhancing Spatial Reasoning in Large Language Models

ellonjohns7 months ago08 mins

Artificial intelligence continues to advance in natural language processing but still faces challenges in spatial reasoning tasks. Visual-spatial reasoning is fundamental for robotics, autonomous navigation, and interactive problem-solving applications. AI systems must effectively interpret structured environments and execute sequential decisions to function in these domains. While traditional maze-solving algorithms, such as depth-first search and A*,…

This AI Paper from UC Berkeley Introduces a Data-Efficient Approach to Long Chain-of-Thought Reasoning for Large Language Models

ellonjohns7 months ago08 mins

Large language models (LLMs) process extensive datasets to generate coherent outputs, focusing on refining chain-of-thought (CoT) reasoning. This methodology enables models to break down intricate problems into sequential steps, closely emulating human-like logical reasoning. Generating structured reasoning responses has been a major challenge, often requiring extensive computational resources and large-scale datasets to achieve optimal performance….

This AI Paper Explores Long Chain-of-Thought Reasoning: Enhancing Large Language Models with Reinforcement Learning and Supervised Fine-Tuning

ellonjohns8 months ago08 mins

Large language models (LLMs) have demonstrated proficiency in solving complex problems across mathematics, scientific research, and software engineering. Chain-of-thought (CoT) prompting is pivotal in guiding models through intermediate reasoning steps before reaching conclusions. Reinforcement learning (RL) is another essential component that enables structured reasoning, allowing models to recognize and correct errors efficiently. Despite these advancements,…

Bridging Reasoning and Action: The Synergy of Large Concept Models (LCMs) and Large Action Models (LAMs) in Agentic Systems

ellonjohns8 months ago015 mins

The advent of advanced AI models has led to innovations in how machines process information, interact with humans, and execute tasks in real-world settings. Two emerging pioneering approaches are large concept models (LCMs) and large action models (LAMs). While both extend the foundational capabilities of large language models (LLMs), their objectives and applications diverge. LCMs…

Ghostbuster: Detecting Text Ghostwritten by Large Language Models

ellonjohns8 months ago03 mins

The structure of Ghostbuster, our new state-of-the-art method for detecting AI-generated text. Large language models like ChatGPT write impressively well—so well, in fact, that they’ve become a problem. Students have begun using these models to ghostwrite assignments, leading some schools to ban ChatGPT. In addition, these models are also prone to producing text with factual…

What are Large Language Model (LLMs)?

ellonjohns9 months ago09 mins

Understanding and processing human language has always been a difficult challenge in artificial intelligence. Early AI systems often struggled to handle tasks like translating languages, generating meaningful text, or answering questions accurately. These systems relied on rigid rules or basic statistical methods that couldn’t capture the nuances of context, grammar, or cultural meaning. As a…

Highlights

Crash Tests for Security: Why BAS Is Proof of Defense, Not Assumptions

SCSI Controller Explained: What It Is and Why Your Storage Needs One

DC series motor caution

As people look for ways to make new friends, here are the apps promising to help | TechCrunch

Category Collection