
Reinforcement

Zhipu AI Unveils ComputerRL: An AI Framework Scaling End-to-End Reinforcement Learning for Computer Use Agents
In the rapidly evolving landscape of AI-driven automation, Zhipu AI has introduced ComputerRL, a groundbreaking framework designed to empower agents with the ability to navigate and manipulate complex digital workspaces. This innovation addresses a core challenge in AI agent development: the disconnect between computer agents and human-designed graphical user interfaces (GUIs). By integrating programmatic API…

Nebius AI Advances Open-Weight LLMs Through Reinforcement Learning for Capable SWE Agents
The landscape of software engineering automation is evolving rapidly, driven by advances in Large Language Models (LLMs). However, most approaches to training capable agents rely on proprietary models or costly teacher-based methods, leaving open-weight LLMs with limited capabilities in real-world scenarios. A team of researchers from Nebius AI and Humanoid introduced a reinforcement learning framework…

Zhipu AI Releases GLM-4.5V: Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Zhipu AI has officially released and open-sourced GLM-4.5V, a next-generation vision-language model (VLM) that significantly advances the state of open multimodal AI. Based on Zhipu’s 106-billion parameter GLM-4.5-Air architecture—with 12 billion active parameters via a Mixture-of-Experts (MoE) design—GLM-4.5V delivers strong real-world performance and unmatched versatility across visual and textual content. Key Features…

Alibaba Qwen Introduces Qwen3-MT: Next-Gen Multilingual Machine Translation Powered by Reinforcement Learning
Alibaba has introduced Qwen3-MT (qwen-mt-turbo) via Qwen API, its latest and most advanced machine translation model, designed to break language barriers with unprecedented accuracy, speed, and flexibility. Trained on trillions of multilingual tokens, Qwen3-MT supports over 92 languages—covering more than 95% of the global population. Leveraging cutting-edge architecture, reinforcement learning, and rich customization options, it delivers…

ether0: A 24B LLM Trained with Reinforcement Learning RL for Advanced Chemical Reasoning Tasks
LLMs primarily enhance accuracy through scaling pre-training data and computing resources. However, the attention has shifted towards alternate scaling due to finite data availability. This includes test-time training and inference compute scaling. Reasoning models enhance performance by emitting thought processes before answers, initially through CoT prompting. Recently, reinforcement learning (RL) post-training has been used. Scientific…

New tool evaluates progress in reinforcement learning
If there’s one thing that characterizes driving in any major city, it’s the constant stop-and-go as traffic lights change and as cars and trucks merge and separate and turn and park. This constant stopping and starting is extremely inefficient, driving up the amount of pollution, including greenhouse gases, that gets emitted per mile of driving. …

SQL-R1: A Reinforcement Learning-based NL2SQL Model that Outperforms Larger Systems in Complex Queries with Transparent and Accurate SQL Generation
Natural language interface to databases is a growing focus within artificial intelligence, particularly because it allows users to interact with structured databases using plain human language. This area, often known as NL2SQL (Natural Language to SQL), is centered on transforming user-friendly queries into SQL commands that can be directly executed on databases. The objective is…

Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment
Training Diffusion Models with Reinforcement Learning We deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone. Our goal is to tackle “stop-and-go” waves, those frustrating slowdowns and speedups that usually have no clear cause but lead to congestion and significant energy waste. To train efficient…

Reinforcement Learning Meets Chain-of-Thought: Transforming LLMs into Autonomous Reasoning Agents
Large Language Models (LLMs) have significantly advanced natural language processing (NLP), excelling at text generation, translation, and summarization tasks. However, their ability to engage in logical reasoning remains a challenge. Traditional LLMs, designed to predict the next word, rely on statistical pattern recognition rather than structured reasoning. This limits their ability to solve complex problems…

This AI Paper Explores Long Chain-of-Thought Reasoning: Enhancing Large Language Models with Reinforcement Learning and Supervised Fine-Tuning
Large language models (LLMs) have demonstrated proficiency in solving complex problems across mathematics, scientific research, and software engineering. Chain-of-thought (CoT) prompting is pivotal in guiding models through intermediate reasoning steps before reaching conclusions. Reinforcement learning (RL) is another essential component that enables structured reasoning, allowing models to recognize and correct errors efficiently. Despite these advancements,…
- 1
- 2