Complex

Microsoft AI Introduces Magentic-UI: An Open-Source Agent Prototype that Works with People to Complete Complex Tasks that Require Multi-Step Planning and Browser Use
Modern web usage spans many digital interactions, from filling out forms and managing accounts to executing data queries and navigating complex dashboards. Despite the web being deeply intertwined with productivity and work processes, many of these actions still demand repetitive human input. This scenario is especially true for environments that require detailed instructions or decisions…

How To Launch Big Complex Projects — Smashing Magazine
When was the last time your project wrapped up smoothly — no delays, no surprises, no last-minute compromises? In reality, most UX projects drift as timelines slip, budgets stretch, and features morph. How do we get better at navigating the chaos? An upcoming part of How To Measure UX and Design Impact by yours truly….

Designing a new way to optimize complex coordinated systems
Coordinating complicated interactive systems, whether it’s the different modes of transportation in a city or the various components that must work together to make an effective and efficient robot, is an increasingly important subject for software designers to tackle. Now, researchers at MIT have developed an entirely new way of approaching these complex problems, using simple…

A faster way to solve complex planning problems
When some commuter trains arrive at the end of the line, they must travel to a switching platform to be turned around so they can depart the station later, often from a different platform than the one at which they arrived. Engineers use software programs called algorithmic solvers to plan these movements, but at a…

SQL-R1: A Reinforcement Learning-based NL2SQL Model that Outperforms Larger Systems in Complex Queries with Transparent and Accurate SQL Generation
Natural language interface to databases is a growing focus within artificial intelligence, particularly because it allows users to interact with structured databases using plain human language. This area, often known as NL2SQL (Natural Language to SQL), is centered on transforming user-friendly queries into SQL commands that can be directly executed on databases. The objective is…

Researchers teach LLMs to solve complex planning challenges
Imagine a coffee company trying to optimize its supply chain. The company sources beans from three suppliers, roasts them at two facilities into either dark or light coffee, and then ships the roasted coffee to three retail locations. The suppliers have different fixed capacity, and roasting costs and shipping costs vary from place to place….

Meet PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
Multi-modal Large Language Models (MLLMs) have demonstrated remarkable capabilities across various domains, propelling their evolution into multi-modal agents for human assistance. GUI automation agents for PCs face particularly daunting challenges compared to smartphone counterparts. PC environments present significantly more complex interactive elements with dense, diverse icons and widgets often lacking textual labels, leading to perception…

Microsoft AI Research Introduces MVoT: A Multimodal Framework for Integrating Visual and Verbal Reasoning in Complex Tasks
The study of artificial intelligence has witnessed transformative developments in reasoning and understanding complex tasks. The most innovative developments are large language models (LLMs) and multimodal large language models (MLLMs). These systems can process textual and visual data, allowing them to analyze intricate tasks. Unlike traditional approaches that base their reasoning skills on verbal means,…