Last updated: June 2025. AI agents are moving fast — specific tool capabilities and availability may change. We review and refresh this post quarterly.
OpenAI’s Operator books a restaurant reservation. Anthropic’s Claude navigates a website, fills out a form, and sends a follow-up email — without you touching the keyboard. Devin, from Cognition AI, opens a GitHub repo, writes code, runs tests, hits errors, debugs them, and ships a pull request. These aren’t demos from a research lab’s wishlist. They’re products you can use today, right now, with a paid subscription.
Something genuinely different is happening. For the past two years, most people’s experience of AI has been a conversation — you type, it responds, you copy the output somewhere useful. Useful, but fundamentally passive. AI as a very fast, very well-read answering machine.
Agents change that equation entirely. When AI gets “hands” — the ability to take actions, use tools, browse the web, write and execute code, call APIs, manage files, and chain those actions together toward a goal — the relationship between humans and software shifts in ways that are still unfolding. This post is about what that actually looks like, what’s working, what’s still broken, and what it means for anyone building, managing, or just trying to keep up.
Table of Contents
- What Is an AI Agent, Actually?
- How Agents Work Under the Hood
- Real AI Agents You Can Use Today
- What Agents Are Actually Good At (And Where They Fall Apart)
- The Frameworks Powering Agentic AI
- The Trust Problem: Giving AI Autonomy Without Losing Control
- What This Means for Work, Teams, and Organizations
- What to Watch Next
- FAQ
What Is an AI Agent, Actually?
The word “agent” gets slapped on a lot of things right now — chatbots with memory, pipelines with if/then logic, marketing dashboards with an AI button in the corner. Most of that is not what people in the field mean when they talk about agents.
A genuine AI agent has a few properties that set it apart from a standard language model interaction:
The Core Definition
An agent perceives its environment, makes decisions, takes actions, and uses the results of those actions to inform what it does next — in a loop, often without a human in the middle of every step. It has a goal or task, some set of tools it can use to pursue that goal, and enough autonomy to figure out the sequence of steps needed to get there.
Andrej Karpathy, whose framing on AI architecture tends to be precise and worth listening to, has described LLMs as a kind of “kernel” — a core reasoning engine that can be extended with tools, memory, and action capabilities to become something much more capable. The model itself isn’t the agent; it’s the brain inside the agent.
The crucial distinction from a standard chatbot: a chatbot waits for you. An agent works.
The Spectrum of Autonomy
Agents aren’t binary — there’s a wide range of how much independence they operate with. It helps to think about a simple spectrum:
| Level | What It Looks Like | Example |
|---|---|---|
| Level 1: Assisted | AI suggests, human acts | GitHub Copilot autocomplete |
| Level 2: Supervised | AI acts, human approves each step | Cursor Agent with manual confirmation |
| Level 3: Monitored | AI works autonomously, human reviews results | Devin on a bounded coding task |
| Level 4: Delegated | AI handles full workflows, flags exceptions | n8n agentic workflows, early enterprise deployments |
| Level 5: Autonomous | AI manages ongoing processes with minimal oversight | Not reliably available yet — early research territory |
Most current commercial agents operate at Level 2 or 3. The jump to Level 4 and beyond is where things get interesting — and where the safety questions get harder.
Why “Hands” Is the Right Metaphor
The “hands” framing captures something important. A model that can only generate text is like a brilliant advisor locked in a room who can only pass notes under the door. Useful, but limited. An agent with tool access can open the door — browse the web, read and write files, execute code, call external services, send emails, fill forms, control a browser, and interact with the digital world the same way a human with a keyboard does.
That last part matters. Computer use — AI that can see a screen and click, type, and navigate like a person — is one of the more significant near-term unlocks. Anthropic’s Claude with computer use (available via API), OpenAI’s Operator, and Google’s Project Mariner are all exploring this. The interfaces aren’t polished yet, and reliability varies significantly, but the direction is clear.
How Agents Work Under the Hood
You don’t need to be an engineer to understand the basic architecture. It’s actually not that complicated once you see the pattern.
The Perceive-Reason-Act Loop
Every agent, at its core, runs some version of this cycle: it perceives its current context (what’s in its memory, what tools returned, what the user said), reasons about what to do next (using the underlying language model), takes an action (calling a tool, writing output, asking a clarifying question), then loops back with the new information.
This is sometimes called the ReAct pattern (Reasoning + Acting), which was formalized in a 2022 paper from Google and has become a foundational template for most agent implementations. In practice, you see it in how something like AutoGPT or a LangGraph agent thinks out loud — “I need to find recent news on X, I’ll search the web, okay here are results, now I’ll summarize and cross-reference with…” — and chains those steps until it reaches a stopping condition.
Tools, Memory, and Planning
Three components make a raw language model into an agent:
