What if automating a desktop wasn’t about scripting click on patterns, however about giving your working system an clever staff of brokers? That’s the core thought behind UFO2, Microsoft’s latest open-source system that pushes past present Laptop-Utilizing Brokers (CUAs) and reinvents automation as a first-class OS abstraction. It turns your desktop into an clever management panel the place language-driven duties are executed natively, reliably, and with minimal disruption to your workflow.
Conventional desktop automation instruments like RPA techniques have all the time struggled with robustness. A minor change in a UI can wreck a whole script. CUAs tried to handle this with giant language fashions and screenshot evaluation, however they remained restricted by shallow system integration and clunky consumer experiences. UFO2 flips this mannequin by constructing from the OS upward. It introduces a multiagent structure the place a central HostAgent coordinates specialised AppAgents for various functions. Every agent speaks the native language of the app through APIs and UI metadata, not simply pixels.

One in all UFO2’s key technical improvements is its hybrid motion mannequin. As a substitute of simply clicking buttons like a human, every AppAgent can name actual APIs when out there. This implies duties like exporting a spreadsheet or formatting textual content are decreased from multi-step GUI dances to a single, atomic perform name. The system additionally speculates forward—utilizing a single LLM name to plan a number of steps and validating each stay with Home windows UI knowledge. This speculative multi-action execution dramatically cuts down on latency with out risking correctness.
Isolation with out interruption
CUAs sometimes hijack your desktop, locking the mouse and keyboard throughout execution. UFO2’s Image-in-Image (PiP) mode solves this with a digital desktop window that runs automation duties in parallel. The agent does its factor in a sandboxed setting, whilst you proceed working in the principle session. It’s seamless, safe, and makes use of native Home windows RDP loopback to keep up session integrity.

UFO2 integrates assist documentation and execution logs right into a retrieval-augmented reminiscence, enriching its prompts with procedural information. Over time, this creates a self-improving agent that will get higher at new duties with out retraining. Every AppAgent pulls from documentation, patch notes, and prior runs to make smarter choices. It’s an automation system with reminiscence, not simply response technology.
In head-to-head benchmarks in opposition to OpenAI’s Operator and different prime CUAs, UFO2 constantly outperforms. On the OSWorld-W benchmark, UFO2 reaches a 32.7% success charge utilizing the o1 mannequin—greater than doubling Operator’s 14.3%. Its speculative planning reduces motion steps by as much as 50%. Hybrid management detection (combining UIA APIs and imaginative and prescient parsing) recovers over 25% of beforehand failed interactions. Merely put, UFO2 isn’t simply smarter—it’s systemically higher.
All the pieces is an agent now
Extensibility is baked in. UFO2 permits third-party instruments, together with different CUAs like Operator, to be wrapped as AppAgents. This implies you possibly can combine specialised copilots or proprietary automation backends into the UFO2 ecosystem with out retraining or rewriting code. It additionally helps a client-server structure for enterprise deployment, preserving orchestration centralized and consumer units gentle.
The paper outlines future objectives, together with cross-platform compatibility with macOS and Linux through analogous accessibility APIs, sooner response through smaller LLMs, and improved reasoning from devoted GUI-interaction datasets. However even in its present state, UFO2 represents a new baseline for desktop automation. It’s open-source, already outperforming business techniques, and brings a brand new stage of modularity, reliability, and intelligence to human-computer interplay.
For anybody constructing the subsequent technology of clever brokers—or simply uninterested in brittle scripts—UFO2 is accessible on GitHub together with its documentation.
Featured picture credit score