OpenAI Codex Desktop Agent: From Sandbox to Mac Control in 6 Weeks

Six weeks ago, OpenAI Codex ran code inside a sandbox. It could write functions, fix bugs, and generate tests, all within an isolated container that never touched the operating system. On May 21, Codex unlocked the ability to control a Mac even after the screen locks, using an Apple authorization plug-in that temporarily lifts the lock screen, runs tasks with its own cursor, and relocks the machine the moment a human touches the keyboard.

That progression, from sandbox to full desktop agent, happened across three updates spanning 35 days. It is the fastest product transformation in the AI agent race so far, and it redraws the competitive map for every company selling coding tools, desktop automation, or enterprise AI.

Three Updates, 35 Days

On April 16, OpenAI shipped what it called “Codex for (almost) everything.” The update added Background Computer Use: the ability to operate macOS applications using mouse clicks, keyboard input, and screen reading, with no API required. Codex could now open Figma, navigate Jira, adjust system settings, and reproduce GUI bugs that no terminal command could reach. The same release shipped 90+ plugin integrations through the Model Context Protocol, covering CircleCI, GitLab, Microsoft Office, and video editing tools.

Four days later, on April 20, OpenAI released Chronicle. The feature captures periodic screenshots, extracts text via OCR, and stores memories as local Markdown files. It gives Codex what OpenAI calls “ambient memory,” a running awareness of what appears on screen over time. The documentation is frank about the tradeoffs: Chronicle burns through rate limits quickly, increases prompt injection risk, and stores memories unencrypted on disk. Only Pro subscribers on Apple Silicon Macs can enable it.

On May 14, Codex arrived on mobile. The ChatGPT app on iOS and Android now functions as a remote control for a Codex session running on a paired Mac. Users can approve or reject commands, start new tasks, and monitor live output from their phone. No local code execution happens on the device itself. The feature shipped in preview across all ChatGPT plans, including free.

The May 21 update added the locked-screen capability. An Apple authorization plug-in lets Codex temporarily unlock the Mac, run its tasks, and relock if it detects local keyboard or pointer input. All connected displays stay covered during the temporary unlock. The feature explicitly cannot automate Terminal apps, system admin prompts, or Codex itself. It launched with a geographic restriction, excluding the EEA, UK, and Switzerland.

The Adoption Numbers

Codex reports over 3 million weekly active developers as of April 2026. Its CLI npm downloads grew 177x in 12 months: from 82,000 in April 2025 to 14.53 million in March 2026. A broader industry survey puts daily AI coding agent usage at 73% among professional developers, which means nearly three in four developers now work alongside an AI agent as part of their standard workflow.

GPT-5.5, released April 23, became the default model inside Codex. It scores 82.7% on Terminal-Bench 2.0 for agentic coding tasks and supports a 1-million-token context window through the API, with a 400,000-token window inside Codex.

Pricing has stayed stable during the entire transformation. The base plan runs $20/month; the premium tier runs $100/month. OpenAI introduced a dedicated Pro tier for heavier Codex users. As one analyst put it: “The price war has not started. The feature race has.”

Desktop Control Collapses Two Product Categories

Before April 16, coding agents and desktop agents were separate product categories. Cursor, GitHub Copilot, and Codex competed for code-writing tasks. Claude Cowork and Anthropic’s Computer Use competed for desktop automation. Those boundaries no longer hold.

Codex now writes code, controls desktop applications, captures screen context, runs background tasks on a locked machine, and takes commands from a phone. That positions it as a direct competitor to Claude Cowork, which launched as Anthropic’s desktop agent for general productivity workflows. It also competes with Claude Code, which leads on SWE-bench at approximately 72.5% compared to Codex’s roughly 49% on that specific benchmark.

The competitive picture is more nuanced than any single benchmark captures. Codex claims 3x token efficiency for comparable tasks, which translates to a meaningful cost advantage at enterprise scale. Claude Code holds the lead on large-codebase reasoning, partly because of its context window advantage for multi-file refactors. On Terminal-Bench 2.0, which measures agentic coding rather than isolated bug fixes, GPT-5.5 leads at 77.3% versus Claude Code’s 65.4%.

For enterprise buyers evaluating AI coding tools, the question has shifted. In 2025, they compared code-generation accuracy. In 2026, they’re evaluating autonomous desktop agents that can unlock machines and operate GUI applications. The procurement process for these tools hasn’t caught up to what the tools can actually do.

The Security Surface Nobody Has Audited

Codex’s locked-screen feature solves a real developer annoyance. Before it existed, engineers used dummy display dongles and caffeinate terminal sessions to prevent Macs from sleeping during long-running agent tasks. The authorization plug-in replaces those workarounds with a cleaner, sanctioned path.

But the feature introduces a new attack surface: an AI agent that can temporarily unlock a computer, operate applications with its own cursor, and relock when finished. OpenAI’s safeguards are substantive. Short-lived authorization windows limit access duration. Automatic relocking triggers on any local input. Terminal access is blocked. Admin prompt automation is blocked. All connected displays stay covered.

What’s missing is independent verification. No third-party audit of the authorization plug-in has been published. No enterprise security team has publicly reviewed the architecture. The geographic exclusion of the EEA, UK, and Switzerland at launch may reflect regulatory caution as much as staged rollout logistics.

For anyone running AI agents in production environments, the precedent matters more than the specific implementation. Every major AI company is building agents that operate computers autonomously. Anthropic ships Computer Use with similar screen-control capabilities. Google’s Gemini Intelligence is turning Android into an agent layer. The locked-screen capability is the logical next step in a trajectory every frontier lab shares. The question is whether the security infrastructure for autonomous desktop agents will arrive before the capabilities outrun it.

Three Things to Watch

First, Windows support. Codex’s desktop control is macOS only. OpenAI lists Windows as “coming soon.” When it arrives, Codex gains access to the roughly 72% of enterprise desktops running Windows, and the competitive dynamics shift again.

Second, enterprise security reviews. IT departments that approved AI coding assistants in 2025 evaluated autocomplete accuracy and data handling. The 2026 procurement question is whether to grant an AI agent the ability to operate a locked workstation autonomously. Most enterprise security frameworks don’t have a category for that yet.

Third, the convergence endgame. GPT-5.5 was marketed as OpenAI’s first agent model. Codex is the product where that model meets the operating system. If the current trajectory holds, coding agents, desktop agents, and enterprise workflow agents will collapse into a single product category before the end of 2026. The companies that own the operating system integration layer (Apple, Google, Microsoft) will have the final say on how much access these agents actually get.

OpenAI moved faster than anyone expected. The rest of the stack, from security tooling to enterprise policies to regulatory frameworks, now has to decide how quickly it wants to follow.

OpenAI Turned Codex Into a Desktop Agent in Six Weeks. Three Million Developers Are Already Using It.

Three Updates, 35 Days

The Adoption Numbers

Desktop Control Collapses Two Product Categories

The Security Surface Nobody Has Audited

Three Things to Watch

Recent Posts