MCPs vs APIs: Why designing tools for LLMs is different

28 May, 2025

TL;DR: We share three engineering patterns which are import for MCPs / AI tool integrations: dynamic error handling with recovery hints, schema observation tools, and well-typed execution environments.

We've been building AI agents at Lutra that connect to real-world applications, and have learned that the hardest apps to work with are the ones where users can customize everything.

When you connect an LLM to Twitter, every user has the same fields: tweet text, author, timestamp. But when you connect to Airtable, every base is a unique snowflake. One user's Deal Status is another's sales_pipeline_stage is another's Opportunity Phase. On top of that, we also need to figure out the field types, validation rules, and dropdown options.

These complexities show up in plain English requests like:

Go through my Airtable of contacts, find the ones missing data, and research to fill them in.

Check my HubSpot companies list named "Target June 2025", research their SOC2 compliance status, and add notes to their records.

While the task sounds simple, it's much harder to get working reliably in practice. The agent has to figure out the nuances of how the user has set things up.

Read on to learn how we made these kinds of use cases work reliably with Lutra.

The complexities of customization

Most MCP tools built for LLMs follow a simple pattern: define an input schema, add documentation about when and how to use the tool, and you're done. This works great for services with fixed schemas. But user defined platforms are a different beast.

In Airtable, users create their own schemas for every base. They define custom field names, set specific data types, create dropdown options, and add validation rules. HubSpot lets you define custom properties for every object type. Notion has pages and databases with their custom schemas.

Consider an LLM performing a function call to attempt a record, but gets the field name wrong:

update_airtable_records(base, record, {
   "deal_fit_classification": "low"
})

The API responds with a short 422 Unprocessable Entity: UNKNOWN_FIELD_NAME. The LLM has limited information to work with or how to fix it. In the best case, it figures out how to get the correct field name, in the worst case, it retries with different variations of the field name and gives up.

We've deployed hundreds of these integrations in production, and are sharing three patterns that work.

Pattern 1: Errors that teach

The first realization was that error messages designed for developers are terrible for LLMs. Error messages are often uninformative and need the developer to check documentation, inspect schemas, or maybe use a debugger. An LLM just sees message and has to guess what to do next.

So we started intercepting errors and enriching them. When the Airtable API says unknown field name, our tool layer catches it, makes an additional API call to fetch the actual schema, and returns something like this:

{
  error: "Field 'deal_fit_classification' not found",
  available_fields: ["Name", "Deal Fit", "Email", "Deal Stage"],
  hint: "Note that field names are case-sensitive and must match exactly."
}

Instead of failing and giving up, the LLM would read the error, understand what went wrong, and correct itself. It's like the difference between a compiler that says Syntax error versus one that says Syntax error: unexpected token ';' on line 42. Did you forget to close the parenthesis on line 41?

The implementation requires extra work in our tool layer. Every error path could benefit from constructing helpful recovery hints, and may need to make additional API calls to get the information it needs.

Pattern 2: Observe then act

Even with rich error messages, we were still seeing the LLMs making too many mistakes on their first attempt. We added observation tools that let LLMs inspect schemas before taking action.

Now, before updating any records, an LLM can first call get_table_schema(base_id, table_name).

This returns the complete schema: field names, types, validation rules, dropdown options, and more. This pattern follows the OODA loop: Observe, Orient, Decide, Act. By forcing observation first, we dramatically reduced error rates and made the whole system more predictable.

However, we noticed that just providing an observation tool was insufficient - the LLM would sometimes skip the observation step and jump straight to updates, then fail and have to backtrack. So we added enforcement through enforcing the that the LLM has to respect types when it generates and executes code.

Pattern 3: Types to prevent hallucination

The third pattern addressed a major failure mode: hallucinated data. When updating a record, you need its ID. This ID should come from a previous search or list operation. But sometimes, LLMs would just make up plausible-looking IDs like "rec123" or "contact_456". These updates would fail and confuse the user.

Our solution was to create a typed execution environment. Instead of passing IDs as strings, we wrap them in typed objects that can only be created by actual API calls:

@dataclass
class AirtableRecordId:
    base_id: str
    table_id: str
    record_id: str
    _source: str = field(default="api_call", metadata={"hidden_from_llm": True})

The LLM is prevented from constructing these objects directly. They can only get them from functions like search_records() or list_records(). When they try to update a record, the function signature requires this typed object, not a string. If they try to construct one manually, the runtime blocks it and returns an error explaining how to get valid IDs.

This pattern extends beyond IDs. We use it for any value that must come from the API: enum values, user IDs, pagination tokens. It's like having a type system that enforces data provenance, not just data shape.

A new design philosophy for AI tools like MCPs

These patterns point to a fundamental shift in how we should design tool interfaces for AI. Traditional APIs are built for developers who read documentation, understand the domain, and write defensive code.

AI tool interfaces need to assume the opposite: the LLM has no context, makes frequent mistakes, and learns by trial and error. This means:

Every error should teach, not just report
Observation should be cheap and encouraged
Enforcing constraints between actions using type systems

MCPs vs APIs

This approach of AI-first design of tools highlights how MCP (Model Context Protocol) is fundamentally different from traditional API design. Where APIs optimize for efficiency and assume competence, MCPs optimize for discoverability and assume fallibility.

Results in production

These patterns transformed our success rates. We've seen LLMs successfully navigate incredibly complex Airtable bases with dozens of custom fields, update HubSpot records with custom properties, and even handle multi-step workflows across different apps.

Lutra agent observes the Airtable schema, performs research, and updates the corresponding fields.

What's Next

We're still in the early days of understanding how to build reliable AI agents. If you're building in this space, we'd love to compare notes. The examples in this post are simplified, and our implementations handle many more edge cases and failure modes. But the core patterns hold true: design for the LLM as the user in mind.

The future where AI agents reliably automate complex tasks is coming soon.

Want to see these patterns in practice? Check out Lutra.ai.

Subscribe to my blog via email or RSS feed.