The cost of bad tool descriptions

You built a perfectly good search tool. It works great when you test it directly. But your agent keeps using the web search tool instead, even when the answer is in local files. What’s going on?

The code is fine. The problem is in the description you wrote for the tool. That short block of text is the only thing the agent reads when deciding which tool to pick. If it’s vague, misleading, or incomplete, the agent will make bad decisions every single time, no matter how good the underlying code is.

I’ve watched this happen across dozens of agent setups, whether the description lives in a JSON tool definition or at the top of a skill file. The debugging session always starts with “my tool is broken” and ends with “oh, the description was confusing.” Let’s look at the most common mistakes and how to fix them.

Imagine you’re at a restaurant scanning the menu. You’re hungry, you’re in a hurry, and you need to pick something fast. If a dish is listed as “Chef’s Special” with no other details, you skip it. If two dishes sound identical, you pick one at random. If something sounds like it might be what you want but the description is ambiguous, you take a gamble.

That’s how an agent reads tool descriptions. It has a user request in hand, a list of available tools, and needs to make a quick decision. The description is the entire basis for that decision. The agent can’t read your source code, doesn’t know your intentions, and won’t ask clarifying questions. It just picks the tool whose description best matches the request.

This means a bad description doesn’t just cause occasional mistakes. It causes systematic, repeatable failures. Every time the user asks a certain kind of question, the agent picks the wrong tool, because the description pointed it in the wrong direction.

Mistake 1: too terse

Here’s one I see constantly.

Bad:

Searches files

What files? On disk? On the web? In a database? When should the agent pick this over grep, or a web search, or a database query? The agent has no idea, so it guesses. Sometimes it guesses right. Often it doesn’t.

Fixed:

Searches for files by name or content within the current project directory.
Returns matching file paths and a preview of each match. Use this when the
user wants to find something in their local codebase, not on the web.

The fix works because it answers three questions the agent needs answered: what does this do, where does it operate, and when is it the right choice? Two extra sentences eliminated an entire category of mispicks.

Mistake 2: too abstract

Some descriptions read like they were written for a product brochure.

Bad:

Interfaces with the filesystem to locate relevant resources
based on user-specified criteria.

This is technically accurate. It’s also useless. The agent doesn’t benefit from words like “interfaces with” or “relevant resources.” Those phrases add length without adding information. The agent needs to know: what kind of resources? What criteria? What comes back?

Fixed:

Searches the local filesystem for files matching a filename pattern
or containing specific text. Accepts a search query and an optional
directory path. Returns a list of matching file paths with line
numbers and surrounding context.

Plain language. Specific inputs and outputs. The agent can now match this description against a user request like “find all the files that mention the payments API” and know immediately that this tool is the right call.

Mistake 3: missing negative guidance

This one is subtle and causes some of the worst debugging sessions I’ve seen. Your description says what the tool does but never says when it shouldn’t be used. So the agent uses it everywhere.

Bad:

Retrieves information about the user's account, including profile
details, settings, and usage history.

Sounds reasonable. But what happens when the user asks “what’s my billing status?” The agent sees “account information” and reaches for this tool, even though billing data comes from a completely different API. Or the user asks “who am I logged in as?” and the agent calls this heavyweight account tool when a simple auth check would do.

Fixed:

Retrieves the user's full account profile: display name, email,
notification preferences, and usage history. Use this for questions
about profile settings or past activity. Do NOT use this for billing
questions (use the billing_status tool) or for checking login state
(use the auth_check tool).

The negative guidance is what makes this work. Telling the agent what not to do is at least as important as telling it what to do. Without those two sentences at the end, the agent treats this tool as a catch-all for anything account-related.

Mistake 4: overlapping descriptions

You have two tools that do related things. Their descriptions sound almost identical. The agent picks between them based on, essentially, a coin flip.

Bad (tool A):

Sends a message to the user's team.

Bad (tool B):

Sends a notification to team members.

Which one handles Slack messages? Which one sends email digests? The agent can’t tell. You might know the difference because you wrote the code, but the agent only has these two lines of text to go on.

Fixed (tool A):

Sends a real-time message to a Slack channel. Use this for
conversational messages, questions, and updates that need
immediate visibility. Requires a channel name and message body.

Fixed (tool B):

Sends an email digest summarizing recent activity. Use this for
end-of-day summaries, weekly reports, and non-urgent notifications.
Takes a recipient list and a time range. Do NOT use this for
time-sensitive messages; use slack_message instead.

Now the descriptions are doing two important things. They describe different mechanisms (Slack vs. email) and they describe different use cases (real-time vs. digest). The agent can reliably pick the right one because the descriptions carve out distinct territory.

Mistake 5: missing output description

The agent doesn’t just need to know what a tool does. It needs to know what it gets back. If the description doesn’t mention the output format, the agent can’t plan multi-step workflows or decide whether this tool gives it what it needs.

Bad:

Looks up customer information by email address.

Fixed:

Looks up a customer by email address. Returns the customer's name,
account ID, subscription tier, and signup date as a JSON object.
Returns null if no customer matches the email.

This matters more than you might expect. If the agent is trying to find a customer’s subscription tier, it needs to know that this tool returns that field. Without the output description, the agent might call a different tool, or call this one and then call another tool redundantly because it wasn’t sure the first one would have what it needed.

The real cost

Bad descriptions waste money and time in ways that aren’t always obvious.

Token waste is the most direct cost. Every time the agent picks the wrong tool, it burns tokens on a useless call, reads the unhelpful result, realizes the mistake, and tries a different tool. One mispick can easily double or triple the token cost of a simple task. Across hundreds of requests per day, that adds up fast.

But the more painful cost is user trust. When an agent consistently picks the wrong tool, users stop trusting it. They start manually specifying which tool to use, which defeats the purpose of having an agent. Or they give up on the agent entirely and go back to doing things by hand.

Debugging time is another hidden cost. I’ve spent hours tracing through agent logs trying to figure out why a tool wasn’t being selected, only to realize the description was the problem all along. If I’d spent ten minutes writing a better description upfront, I’d have saved an entire afternoon.

A quick checklist

Before you ship a tool, read the description out loud and ask yourself these questions:

Can someone who has never seen the code tell what this tool does?
Can they tell when to use it and when not to?
Can they tell what they’ll get back?
If you’re writing a skill file, the description at the top gets the same treatment. See the PR review skill and test writer skill for examples that get this right.

If any answer is no, rewrite it.

Your tool descriptions are the real interface. If you’re new to how agents pick and use tools, start with what agent skills are and how tool use works. Getting the description right is one of the highest-value things you can do when building agent tools. It takes five minutes and saves hours.

The cost of bad tool descriptions

The agent reads descriptions like a menu

Mistake 1: too terse

Mistake 2: too abstract

Mistake 3: missing negative guidance

Mistake 4: overlapping descriptions

Mistake 5: missing output description

The real cost

A quick checklist

Related articles

Search

Related articles

How to design AI agent skills
A deep dive into the four pillars of skill design: clear descriptions, well-typed parameters, error handling, and predictable output.

Prompt engineering vs skill design
They sound similar but they solve different problems. When to write a better prompt and when to build a skill instead.

Skill design principles
Foundational principles for designing composable, predictable, and maintainable skill files including single responsibility and idempotency.