How to Use GPT-5.5: Driving OpenAI's Agent Model

GPT-5.5 is not a chatbot that learned some agent tricks. OpenAI built it the other way around: an agentic system first, a chat model second. When it launched on April 23, 2026, the framing was unusually direct for the company, and it changes how you should use the thing. If you prompt GPT-5.5 the way you prompted a 2024 chatbot, you will get a fraction of what it can do.

The model is designed to take a messy, multi-part task and run with it: plan the work, use tools, browse the web, write and debug code, navigate ambiguity, check its own results, and keep going until the job is finished. Your role shifts from typing each instruction to writing a good brief and reviewing a finished result. Here is how to drive it well.

Run it where it was meant to run: Codex

For real agentic work, Codex is the primary interface to GPT-5.5, not the chat box. Codex runs the model in a sandboxed environment with access to files, a terminal, and test runners, which is what lets it actually do work instead of just describing it. OpenAI turned Codex into a desktop agent earlier this spring, and it is now the front door for the model’s strongest capabilities.

The everyday loop looks like this. You select GPT-5.5 in the model picker, write a clear task prompt, and let it work through multiple steps on its own. When it finishes, you review the diff and approve the changes. The model holds the whole job in a 1M-token context window, so it can reason across a large codebase or document set rather than one file at a time.

If you only ever use GPT-5.5 through a plain chat window, you are using the least capable version of it. The sandbox is the point.

Use Plan Mode before any non-trivial job

The single habit that most improves results is turning on Plan Mode for anything beyond a one-liner.

Plan Mode inserts a planning step before execution. GPT-5.5 reads your task, surveys the relevant code or materials, and produces a structured plan: what it intends to change, in what order, and what it will validate at each step. You review that plan and approve it before the agent touches anything.

This matters for two reasons. First, it surfaces misunderstandings while they are cheap to fix, before the model has made fifty edits based on a wrong assumption. Second, it gives you a checkpoint to redirect. In practice I treat the plan the way I would treat a junior engineer’s proposed approach in a stand-up: read it, catch the one thing they got wrong about the system, and then let them go. Catching that one thing up front saves the entire rework cycle.

Set reasoning effort to match the task

GPT-5.5 exposes reasoning-effort levels, and using them well is how you balance speed, cost, and quality. The standard model offers low, medium, high, and xhigh, plus a non-reasoning mode; there is also GPT-5.5 Pro, a higher-compute variant that pushes harder on the longest-horizon tasks.

The usage guidance from OpenAI maps cleanly to real work. Use low for efficient, lightweight reasoning where speed matters. Use medium as your balanced default. Use high for complex agentic tasks that need hard reasoning and where you can tolerate more latency. Reserve xhigh for the hardest asynchronous jobs, the ones you kick off and walk away from. The mistake is leaving it pinned high “to be safe,” which just burns time and tokens on work that medium would have nailed.

Prompt for outcomes, not procedures

This is the biggest adjustment, and the one people resist most.

Older models worked best when you spelled out the steps. GPT-5.5 works best when you describe the destination and let it find the path. As the agentic-coding guides put it, you should describe the expected outcome, the success criteria, the constraints it must preserve, the rules for what counts as evidence, and the shape of the output you want. Then stop. Avoid step-by-step process guidance unless the exact path genuinely matters.

A weak prompt: “Open the auth file, find the login function, add a rate limiter, then update the tests, then check the config.” You have just boxed the model into your mental model of the code, which may be wrong.

A strong prompt: “Add rate limiting to login. It should block more than five attempts per minute per IP, must not change the existing API response shape, and all current tests plus new ones for the limit should pass. Show me the plan first.” That tells it what done looks like and what it must not break, and lets it figure out the how.

The best use cases for this style are exactly the ones OpenAI built it for: multi-file refactors, implementing a feature across a full stack, generating test suites, and running structured migrations. Those are jobs with a clear destination and a messy middle, which is precisely what an agent-first model is for.

Review the work like a manager, not a proofreader

Because GPT-5.5 does long stretches of work autonomously, your review is where quality is won or lost. Do not skim the summary and approve. Read the diff, run the tests it claims pass, and check that it respected the constraints you set.

The failure mode is trust drift: the first ten agent runs look great, you start rubber-stamping, and the eleventh quietly changes something it should not have. Treat every run’s output as a pull request from a fast, capable contributor whose work you still review. The whole reason to give it outcome-based briefs with explicit success criteria is so that your review has something concrete to check against.

What still does not belong on it

Agent-first does not mean use-it-for-everything. Quick factual questions, short drafts, and anything you will read once and verify in seconds do not need the agent loop; the overhead of planning and tool use is wasted on them. And work where you need tight control over every step, where the exact procedure is the requirement rather than the outcome, fights the model’s strengths. For that, you are better off scripting it deterministically.

GPT-5.5 earns its keep on the jobs that used to require a person to babysit a sequence of tools for an hour. Hand it those, brief it on the outcome, check the plan, and review the result. Driven that way, it does in one pass what used to be an afternoon of context-switching, and it is the clearest sign yet that the line between AI assistant and autonomous agent has effectively dissolved in the flagship products.

If you want the cross-provider view of when to use GPT versus Claude or Gemini for a given task, the routing guide for 2026 lays out the full decis

GPT-5.5 Is an Agent First. Here’s How to Actually Drive It.