Anthropic Dreaming: AI Agents That Learn From Mistakes

What Anthropic Dreaming Actually Does
How Dreaming Works Under the Hood
The Three Other Features That Ship Alongside Dreaming
Early Results: Harvey, Wisedocs, and the 6x Number
Why This Matters More Than Another Model Upgrade
What Enterprise Teams Should Do Right Now
FAQ

Anthropic dreaming is the kind of feature that sounds like marketing until you see what it actually does. On May 6, 2026, at the Code with Claude developer event in San Francisco, Anthropic announced a set of upgrades to Claude Managed Agents that could shift how enterprises think about AI agent deployment. The headline feature, called “dreaming,” lets agents review their own past sessions, extract patterns from failures, and carry institutional knowledge forward without any changes to model weights. Harvey, the legal AI company, reported a 6x increase in task completion rates. Wisedocs cut document review time by 50%. Those numbers deserve scrutiny, and they also deserve attention.

This is not a model release. It is an infrastructure play. And for anyone running AI agents in production, it changes the calculus on what “good enough” agent performance looks like.

What Anthropic Dreaming Actually Does

The concept is deceptively simple. When a Claude Managed Agent finishes a session, it generates memories: preferences it learned, tools it used, workarounds it discovered. That memory system launched earlier this year, and it works within and across individual sessions. Dreaming operates one level above that.

Dreaming is a scheduled background process that runs between agent sessions. It reads the agent’s accumulated memory stores alongside transcripts from past sessions, then produces a reorganized memory: duplicates merged, stale entries replaced, contradictions resolved, and new insights surfaced. The key word is “scheduled.” This is not something that happens in real time during a conversation. It runs on a cadence you control, the way a database runs compaction or a search engine rebuilds its index.

The output is plain text. The agent writes its learnings as notes and structured “playbooks” that future sessions can reference. No weight updates. No fine tuning. No black box. You can read every insight the agent extracted, edit them, or delete them before the next session uses them.

That distinction matters enormously for enterprise adoption. Fine tuning a model is expensive, slow, and hard to audit. Dreaming is cheap, fast, and fully observable. You can review every change before it lands, or let it run automatically if you trust the agent’s judgment. That is a knob most enterprises will appreciate having.

How Dreaming Works Under the Hood

The mechanics break into three phases.

Phase 1: Session Review. The dreaming process reads transcripts from completed agent sessions. It identifies recurring patterns: which tools the agent reached for, which approaches failed, which workarounds it invented on the fly. If the agent tried three different methods to parse a document format and the third one worked, dreaming notices that pattern.

Phase 2: Cross-Session Synthesis. This is where single session memory falls short. A single session knows what happened during that session. Dreaming sees across sessions. It can detect that five different agent runs all struggled with the same API authentication flow, or that agents working for different team members independently converged on the same workflow for a common task. That cross-session visibility is the core differentiator.

Phase 3: Memory Curation. The dreaming process does not just add new memories. It actively prunes. Outdated entries get replaced. Redundant notes get merged. The memory store stays high signal rather than growing into an unstructured pile of context. Anthropic compares this to how human memory consolidation works during sleep, and the analogy is more apt than most AI metaphors: the value is not in recording everything, but in deciding what to keep.

For developers, the implementation runs through the Claude Platform API. You configure a dreaming schedule, point it at the relevant memory stores and session logs, and let it run. The API documentation is already live.

The Three Other Features That Ship Alongside Dreaming

Dreaming grabbed the headlines, but Anthropic shipped three other capabilities at the same event that matter just as much for production deployments.

Outcomes (Public Beta)

Outcomes let you define what success looks like before the agent starts working. You write a rubric describing the expected output. A separate Claude instance, running in its own context window, evaluates the agent’s work against that rubric. If the output fails, the grader identifies exactly what needs to change, and the agent takes another pass. The loop continues until the rubric is met.

This is functionally an automated QA layer. In Anthropic’s internal benchmarks, outcomes improved task success rates by up to 10 percentage points, with the largest gains on the hardest tasks. For enterprise teams that currently have humans reviewing every agent output, this could cut review overhead significantly.

Multiagent Orchestration (Public Beta)

A lead agent can now break a project into pieces and delegate to up to 20 parallel specialist agents. Each specialist works independently on its assigned task while sharing a common file system. The lead agent coordinates, merges results, and handles conflicts.

This is the pattern that multi-agent systems researchers have been describing for years, now available as a managed service. The 20-agent ceiling is conservative but practical. Most real workflows do not need more than a handful of specialists; the bottleneck has been orchestration reliability, not parallelism.

Webhooks

Managed Agents can now run for hours and notify your system when they finish, rather than requiring an open connection. This eliminates the need for custom queue infrastructure for long running jobs. It sounds mundane, but it removes one of the biggest friction points in integrating agents into existing enterprise workflows.

Together, these four features represent the most significant platform update Claude Managed Agents has received since launch.

Early Results: Harvey, Wisedocs, and the 6x Number

Harvey, the legal AI startup backed by Google Ventures and Sequoia, uses Managed Agents to coordinate complex legal work: long form drafting, document creation, and multi-step research workflows. With dreaming enabled, their agents remember filetype workarounds and tool specific patterns between sessions. The reported 6x improvement in completion rates did not come from a model upgrade. It came entirely from agents carrying institutional knowledge forward.

That number needs context. A 6x improvement in “completion rate” could mean different things depending on the baseline. If agents were completing 10% of tasks before dreaming, 60% is impressive but still means 40% failure. If the baseline was 50%, then 6x would imply near perfection, which strains credibility. Harvey has not published the baseline, so the 6x figure is directional rather than definitive.

Wisedocs, which builds document quality check agents for medical records review, used the outcomes feature to grade each review against internal guidelines. They reported reviews running 50% faster while maintaining alignment with their standards. That result is more straightforward to interpret: same quality, half the time. For a company processing thousands of medical documents, that is a material cost reduction.

Both case studies share an important characteristic: these are narrow, well defined domains where the agent performs the same type of task repeatedly. Legal drafting and medical document review are exactly the kinds of workflows where institutional memory compounds. The results may not generalize to one off, highly variable agent tasks.

Why This Matters More Than Another Model Upgrade

The AI industry has spent three years in a cycle: new model drops, benchmarks improve, teams scramble to integrate, performance plateaus in production, new model drops again. Dreaming breaks that cycle by improving agent performance without changing the model at all.

This has three implications worth tracking.

First, it decouples performance from model releases. If your agents improve between sessions through accumulated knowledge, you are less dependent on waiting for the next model version. That changes the enterprise AI procurement calculus. You are not just buying a model; you are buying a system that gets better the more you use it.

Second, it makes agent deployment stickier. Every session your agents run on Claude builds institutional memory that is specific to your workflows, your tools, your team’s preferences. Switching to a competitor means starting that learning curve over. This is the kind of lock in that enterprise buyers should understand clearly before committing. It is also the kind of competitive advantage that explains why Anthropic is investing heavily in the managed agents platform.

Third, it validates the “agents as employees” framing. Human employees improve through experience. They remember what worked last quarter, what the client prefers, which internal tools have quirks. Dreaming gives agents a version of that same capability. It is not consciousness. It is not general intelligence. It is pattern matching across sessions, written as auditable plain text. But for practical purposes, it means your agents on day 90 will be meaningfully better than your agents on day one.

What Enterprise Teams Should Do Right Now

If you are running AI agents in production, or evaluating platforms for agent deployment, here is what this update means practically.

Evaluate dreaming on a narrow, repetitive workflow first. The early results from Harvey and Wisedocs both involve agents performing the same category of task hundreds of times. That is where cross session learning compounds fastest. Pick your most repetitive agent workflow and run dreaming in review mode, where you approve memory changes before they land. Measure completion rates before and after.

Test outcomes on your hardest tasks. The 10 percentage point improvement Anthropic reports is largest on difficult tasks. If you have agent workflows with high failure rates, outcomes could be the faster win while you wait for dreaming access.

Understand the lock in tradeoff. Dreaming memories are stored as plain text, which means they are theoretically portable. But “theoretically portable” and “practically portable” are different things. The playbooks and notes are structured for Claude’s context format. Moving them to another platform would require translation, and the cross session patterns would not transfer cleanly. Factor that into your platform decision.

Watch the pricing. Dreaming runs additional inference during the consolidation process. Anthropic has not published detailed pricing for dreaming compute. For high volume deployments, the cost of running background dreaming cycles could be significant. Ask for pricing details before committing to a rollout.

FAQ

What is Anthropic dreaming?
Anthropic dreaming is a scheduled background process for Claude Managed Agents that reviews past sessions, extracts patterns, and reorganizes the agent’s memory stores so future sessions benefit from accumulated experience. It does not modify model weights; it produces plain text notes and playbooks that agents reference in subsequent sessions.

Is dreaming available to all Claude users?
Dreaming is currently in research preview and available by request through the Claude Platform. The other features announced alongside it (outcomes, multiagent orchestration, and webhooks) are in public beta and available to anyone using Managed Agents.

How is dreaming different from fine tuning?
Fine tuning modifies a model’s internal weights, which is expensive, slow, and difficult to audit. Dreaming writes learnings as readable plain text that you can review, edit, or delete. It operates on top of the model rather than inside it, making it fully observable and significantly cheaper to run.

Does dreaming work for any type of agent task?
Early results are strongest in narrow, repetitive domains like legal drafting and document review, where the same patterns recur across many sessions. Highly variable, one off tasks are less likely to benefit because there are fewer recurring patterns for the system to extract.

What did Harvey achieve with dreaming?
Harvey reported a 6x improvement in task completion rates for its legal AI agents after enabling dreaming. The improvement came entirely from agents retaining institutional knowledge between sessions, not from any model change.

What Comes Next

Anthropic is making a bet that the next frontier in AI agent performance is not bigger models but better memory. Dreaming, outcomes, multiagent orchestration, and webhooks are infrastructure bets, not research demos. They shipped with API documentation, enterprise case studies, and public betas.

The question is whether the agent ecosystem converges on this pattern. OpenAI’s Operator and GPT-5.5 agent capabilities take a different approach. Google’s Gemini agents are building their own orchestration layer. If Anthropic’s dreaming pattern proves out at scale, expect every major platform to ship something similar within six months. If it does not, it becomes an expensive feature that adds complexity without proportional value.

For now, the early numbers are compelling enough to test. Run a pilot. Measure the delta. That is the only way to know whether dreaming works for your specific workflows, and whether the agent that remembers is worth more than the agent that forgets.

Anthropic Dreaming: What Happens When AI Agents Learn From Their Own Mistakes

Table of Contents