Beyond Tools: Why AI Agents Demand a Different Interface

A common assumption in AI system design is that agents are simply more capable tools. This post argues against that view — tools and agents represent fundamentally different interaction models, and conflating them leads to fragile systems that are hard to reason about and scale.

To understand why, we need to shift our thinking from what an agent does to where decisions are made and what happens when those decisions cannot be completed cleanly.

The Well-Defined Nature of a Tool

Consider an agent that can invoke multiple tools. Each tool serves a distinct purpose: gathering information, applying a transformation, or triggering a side effect. In every case, there is a clear temporal contract — input goes in, output comes out. This is illustrated below, including the streaming variant.

Figure 1 — Standard and streaming tool interaction over time

Long-running operations (LROs) are conceptually the same — just with an extended time axis. In practice, the tool returns a reference to the LRO, which the caller can poll or subscribe to separately.

The key structural property of a tool is its temporal contract: request → execute → complete (or error). There are no intermediate states, no negotiations, no redirections. A tool either completes or it fails.

We can make this precise. Define a tool as a time-bounded computation described by the mapping:

T(i ∈ 𝕀) → o ∈ 𝕆 ∪ {ε}

where 𝕀 is the structured input domain, 𝕆 is the structured output range, and ε is the error state. The tool returns ε when i ∉ 𝕀 (invalid input) or when the computation cannot produce a valid o ∈ 𝕆 (internal failure). These are the only two possible outcomes.

This definition is powerful because it supports structural reasoning: the caller knows exactly what inputs are valid, what outputs to expect, and what an error means. Either the tool completes with o ∈ 𝕆, or it fails with ε. When called again, the tool starts fresh with a new i ∈ 𝕀 — it does not resume from a previous interrupted state. This is a key distinction.

Agents Are Problem-Solving Collaborators

Agents are expected to have autonomy — the ability to make decisions based on context, handle evolving requirements, and navigate incomplete information. This is fundamentally different from executing a bounded function.

Consider an agent asked to "schedule a meeting with the product team tomorrow at 2pm." On the surface, this resembles a tool invocation. But depending on the agent's environment, it could encounter:

· The shared calendar system requires room reservations, and all available rooms are booked at that time.

· One key team member has a conflicting event, and the agent must explore alternative slots.

· The meeting requires a video conferencing link from an external platform the agent needs authorization to provision first.

In each case, the agent cannot simply return a success or failure. It must engage with the caller to navigate the problem. The diagram below shows this multi-turn interaction model, with tools used internally by the agent.

Figure 2 — Multi-turn agent interaction with internal tool usage

With this interface, the agent can communicate intermediate states back to the caller:

· "I couldn't book 2pm — all rooms are taken. Would 3pm or Thursday morning work instead?"

· "I found a time slot, but I'll need you to authorize access to the video conferencing system first."

· "Two product team members are unavailable tomorrow. Should I invite the available members only, or would you prefer to reschedule?"

The caller then decides how to proceed — providing the needed information, escalating to a human, or abandoning the task entirely. Critically, the original action is not guaranteed to complete upon return. It is in an incomplete or interrupted state. Some sub-actions may have executed; others have not. The task might never finish — for instance, if the room booking conflict cannot be resolved, the meeting is never scheduled.

Structured vs. Unbounded Interactions

The distinction runs deeper than just multi-turn versus single-turn. It is about the shape of the interaction space.

For a tool, both 𝕀 and 𝕆 are tightly constrained by schema. This constraint is what makes tools easy to compose, test, and reason about. The caller generates a valid i ∈ 𝕀 and expects either o ∈ 𝕆 or ε. Nothing else is possible.

For an agent, both the input and output spaces are effectively unbounded. The response o may not represent a final answer — it may be a partial result, a clarifying question, or a redirect. The caller must interpret o, potentially transform it, and pass it back to continue the exchange. This iterative convergence toward a solution introduces fundamentally different design challenges.

Formally, we can characterize an agent interaction as:

A: (𝕀* × 𝕆*) → 𝕆* ∪ {done, interrupted}

where 𝕀* and 𝕆* denote sequences (zero or more elements) of inputs and outputs accumulated over multiple turns. Here 𝕆* in the codomain represents a non-terminal response — the agent has returned partial output and the caller must supply further input to continue. done and interrupted are terminal signals: done means the goal was reached; interrupted means the action has stalled and cannot proceed without external input.

Trying to squeeze this into a tool interface — for example, by encoding intermediate states as error codes or extending the response payload — is a leaky abstraction. It pushes agent-like complexity into what should be a simple, predictable unit of action.

The tool interface is a degenerate case of the agent interface — one where the agent happens to complete its task in a single turn with no interruptions. Treating an agent as a tool is only valid when you are certain this degenerate case is all you need. In practice, that certainty is hard to maintain as systems grow.

To see what this looks like at scale, consider a concrete multi-agent system navigating exactly these constraints.

Illustrative Example: The Product Launch Planner

To make this concrete, consider a multi-agent system tasked with helping a user organize a product launch event.

The user's request: "Help me plan a product launch for 100 attendees next quarter, with a budget of $10,000."

The system includes an Orchestrator, a Venue Booking Agent, a Catering Agent, an AV & Tech Setup Agent, and a Guest Management Agent.

The Orchestrator must break the overall budget and timeline into sub-goals and coordinate across agents. Along the way, it will encounter constraints that are not known upfront:

· The Venue Agent finds that venues suited for 100 people on weekdays cost $3,500–$5,000 — but the user has not specified whether weekends are acceptable.

· The Catering Agent needs dietary preference data that has not yet been collected from attendees.

· The AV Agent flags that the planned live demo requires a dedicated network setup that not all venues can support.

· The Guest Management Agent discovers that the proposed date conflicts with a major industry conference, and wants to know whether to proceed or shift the date by a week.

Each of these is an interruption — the agent cannot proceed without surfacing the constraint to the caller. The Orchestrator must navigate these in real time: some constraints it can resolve within its delegated authority (for example, selecting between two venues that are both available and within budget), while others it must escalate to the user (for example, asking whether a hybrid format is acceptable if full in-person capacity is unavailable).

This is problem solving, not function execution. The constraints are not fully known in advance — the path to completion emerges through multiple interactions. Tools are used throughout — to check venue availability, generate catering quotes, send invitations — but the navigation of the problem space is inherently agent-driven and cannot be captured by any fixed input-output schema.

Exception Handling and the Path Forward

The agent interaction model can be understood through the lens of unchecked exception handling in traditional programming. When an unchecked exception is thrown, control leaves the current execution context and propagates up the call stack — possibly reaching a handler several layers removed, or terminating the program entirely if unhandled. The original execution path is abandoned, with no guarantee it ever resumes.

Agent interfaces introduce an analogous form of non-local control flow. An interaction may leave the expected execution context, redirect through multiple agents, and potentially never resolve. Like unchecked exceptions scattered throughout a codebase, agent-style interactions at every layer make systems difficult to trace, test, and debug.

Just as software engineering best practices recommend isolating exception-based control flow — catching exceptions at well-defined boundaries rather than propagating them everywhere — we should isolate agent-style interactions at the agent-to-agent boundary. In practice this means the Orchestrator absorbs interruptions from sub-agents and translates them into structured responses or tool-compatible calls for the layers below it. Tools should remain what they are: structured, predictable, bounded computations. The complexity of open-ended problem solving belongs at the agent layer, not the tool layer.

When each agent operates within a well-defined action space backed by clean tool contracts, the overall system becomes easier to reason about, scale, and maintain. Complexity is contained where it belongs, and the parts of the system that can be simple remain simple.

Beyond Tools: Why AI Agents Demand a Different Interface

The Well-Defined Nature of a Tool

Agents Are Problem-Solving Collaborators

Structured vs. Unbounded Interactions

Illustrative Example: The Product Launch Planner

Exception Handling and the Path Forward

More to Read

Your AI Agent Keeps Forgetting. Open Knowledge Format (OKF) Is the Cure

Comments

Leave a comment