How to design AI agent skills

A skill is only as useful as an agent’s ability to understand it, follow its instructions, and produce the right result. You can write the most thorough skill file in the world, but if the agent can’t figure out when to apply it or what steps to follow, it may as well not exist.

This guide breaks down the four pillars that separate well-designed skill files from frustrating ones: clear descriptions, well-defined inputs, thorough error handling, and predictable output. We’ll walk through a complete skill file section by section, explain why each part matters, and give you a framework for evaluating your own.

The four pillars

Before getting into examples, here’s the mental model. Every skill file has four concerns that determine whether an agent can use it well:

Pillar	What it answers	What goes wrong without it
Description	”Should I use this skill right now?”	Agent applies the skill to the wrong tasks
Inputs	”What information do I need to get started?”	Agent guesses at requirements and gets it wrong
Error handling	”What do I do when something goes wrong?”	Agent gets stuck or produces broken output
Output contract	”What should the result look like?”	Agent produces inconsistent, unpredictable results

If any one of these is weak, the whole skill degrades. A thorough set of steps with a vague description will rarely get used at the right time. Clear inputs with no error guidance will leave the agent stuck the moment something unexpected happens.

A running example: the code review skill

Throughout this article, we’ll build a “code review” skill file piece by piece. By the end, you’ll have a complete, well-structured skill and understand why every section is there.

Here’s the skeleton:

# Code review

## Description

...

## When to use

...

## When NOT to use

...

## Input

...

## Steps

...

## Error handling

...

## Output format

...

Each of the four pillars maps to one or more of these sections. Let’s walk through them.

Pillar 1: clear descriptions

The description is the single most important section of a skill file. It’s the agent’s only way to decide whether this skill applies to the current task. Think of it as a job posting: it needs to attract the right invocations and repel the wrong ones.

What a good description contains

What the skill does, in one sentence of plain language
When to use it, with specific scenarios
When NOT to use it (this is often more important than the positive case)
Key limitations or scope boundaries

Bad description vs. good description

Here’s a description that tells the agent almost nothing:

# Code review

## Description

Reviews code changes.

The agent has no idea what kind of review this performs, how thorough it should be, or when to prefer this over just reading the diff itself. Now compare:

# Code review

## Description

Review a pull request or set of code changes for bugs, security issues,
and maintainability problems. Produce a structured review with specific,
actionable feedback on each issue found.

## When to use

- The user asks you to review a PR, diff, or set of changes
- The user asks for feedback on code they've written
- You're about to merge or approve changes and want a quality check

## When NOT to use

- The user wants a full security audit (use the security audit skill instead)
- The user wants automated test generation (use the test writer skill instead)
- The changes are only documentation or formatting with no logic changes

Every sentence in that description is doing work. The agent now knows:

This is for bugs, security issues, and maintainability (not style nitpicks)
The output should be structured and actionable
There are sibling skills for security audits and test generation
Pure documentation changes don’t need this skill

Common description mistakes

Too terse: “Reviews code.” The agent has no idea what kind of review or when to prefer this over reading the code directly.

Too generic: “A flexible skill for improving code quality.” Could mean anything from formatting to architecture review. The agent will either overuse it or ignore it.

Missing negative guidance: Without “When NOT to use” clauses, the agent may try to use your code review skill for a full security audit, then produce a shallow result that misses the point.

Implementation details instead of intent: “Uses AST parsing with tree-sitter to analyze code structure.” The agent doesn’t care about your parsing strategy. It cares about what problems this skill solves.

For a deeper look at how bad descriptions cause cascading failures in real agent setups, see The Cost of Bad Skill Descriptions.

Pillar 2: well-defined inputs

The input section tells the agent what information it needs to gather before starting the skill. Without it, the agent guesses, and its guesses are often wrong or incomplete.

What makes a good input section

Be specific about what’s required vs. optional. If the skill needs a diff and a description of the intended changes, say so. If context about the project’s conventions is helpful but not required, mark it as optional.

Name the format you expect. Don’t just say “the code changes.” Say whether you want a git diff, a list of file paths, the full file contents, or something else.

Include examples when the format isn’t obvious.

Here’s the input section for our code review skill:

## Input

**Required:**

- The code changes to review. This can be:
  - A git diff (preferred, because it shows only what changed)
  - A list of modified files with their full contents
  - A PR number if you have access to the repository
- A brief description of what the changes are supposed to do

**Optional:**

- The project's language and framework (helps calibrate review depth)
- Any project-specific conventions or style guides
- Whether this is a draft (lighter review) or ready for merge (thorough review)

Contrast: bad input section

## Input

Give the skill the code to review.

This leaves too many questions unanswered. What form should the code be in? Does the skill need the whole file or just the diff? Does it need context about what the changes are supposed to do? The agent will fill in these blanks with its own assumptions, and different invocations will get different results.

Input design principles

Prefer explicit over implicit. If the skill works differently for Python vs. JavaScript, make the language an explicit input rather than hoping the agent infers it.

Keep inputs flat. A skill file isn’t a JSON schema, but the same principle applies: don’t bury important information in nested structures. Each piece of information the agent needs should be a clear, separate item.

State constraints. If the skill can’t handle more than 5,000 lines of diff, say so. If it expects a specific date format, specify it. Constraints prevent the agent from sending inputs that will fail.

For more on writing clear instructions that agents actually follow, see Writing Skill Instructions.

Pillar 3: thorough error handling

Things will go wrong. The diff might be empty. The file might not exist. The code might be in a language the skill doesn’t cover well. The question isn’t whether errors will happen, but whether your skill file tells the agent what to do when they do.

The error handling section

Every error scenario in a skill file should cover three things:

What happened — a clear description of the problem
Why it happened — enough context for the agent to understand the cause
What to do next — a concrete recovery action

Here’s the error handling section for our code review skill:

## Error handling

**Empty diff or no changes found:**
Tell the user no changes were detected. Ask them to confirm they've
provided the right diff or file paths.

**Binary files or non-text content:**
Skip binary files (images, compiled artifacts, etc.) and note which
files were skipped in your review output. Review only the text-based
changes.

**Unsupported language:**
If the code is in a language you don't recognize well, still review
for general issues (logic errors, obvious bugs, naming) but note in
your output that language-specific feedback may be limited. Do not
pretend to have deep knowledge of a language you don't.

**Diff too large (more than 5,000 lines):**
Do not attempt to review the entire diff at once. Instead:

1. Tell the user the diff is too large for a single review pass
2. Suggest splitting the review by file or by logical grouping
3. Offer to start with the files most likely to contain bugs
   (business logic over config files, new code over moved code)

**Conflicting instructions:**
If the user's description of the changes contradicts what the diff
actually shows, flag this explicitly. Do not silently ignore the
mismatch.

Why this matters

Without error handling guidance, the agent will do one of two things when it hits a problem: silently produce a bad result, or stop and ask the user a vague question. Neither is good.

A skill file that says “if the diff is too large, split by file and start with business logic” gives the agent a concrete recovery path. It can keep working instead of getting stuck.

Never silently ignore problems

One of the worst things a skill can do is produce a result that looks correct but silently skipped over a problem. If the diff included binary files and the skill reviewed only the text files, that’s fine as long as it tells the user which files were skipped. If it just pretends those files don’t exist, the user thinks they got a complete review when they didn’t.

## Bad: silent skip

Review the text files and ignore anything else.

## Good: explicit skip with notification

Skip binary files and include a "Skipped files" section at the end
of the review listing each skipped file and why.

Pillar 4: predictable output

The output section tells the agent exactly what the result should look like. Without it, every invocation produces a different format, and the user (or a downstream agent) can’t rely on the structure.

Define the format precisely

Don’t just say “provide a review.” Specify the sections, the format for each finding, and what metadata to include.

## Output format

Structure the review as follows:

### Summary

One paragraph: overall assessment, biggest risk, and whether the
changes are ready to merge.

### Findings

For each issue found, include:

- **File and line**: where the issue is
- **Severity**: critical, warning, or suggestion
- **Description**: what the problem is, in one or two sentences
- **Recommendation**: how to fix it, with a code example if helpful

Order findings by severity (critical first).

### Skipped files

List any files that were skipped (binary files, generated code, etc.)
and why.

### Verdict

One of: "Approve", "Request changes", or "Needs discussion".
Include a one-sentence justification.

Why structure matters

When output format is defined, the agent produces consistent results every time. The user learns where to look for the verdict, how to scan for critical issues, and what “suggestion” vs. “warning” means. If the format changes with every invocation, the user has to re-orient each time.

Include metadata

Tell the agent what metadata to include alongside the main output:

How many findings at each severity level
How many files were reviewed vs. skipped
Whether the review was complete or truncated due to size limits

This metadata helps the user (and any downstream automation) quickly assess the quality and completeness of the review.

Consistent shape, always

The output should follow the same structure whether there are zero findings or fifty. Don’t instruct the agent to “just say LGTM” when there are no issues. Instead:

If no issues are found, still use the full output format. The Findings
section should state "No issues found." The Verdict should be "Approve"
with a brief note on what you checked.

This way, anyone reading the output always knows where to look for each piece of information.

Putting it all together

Here’s the complete code review skill file with all four pillars in place:

# Code review

## Description

Review a pull request or set of code changes for bugs, security issues,
and maintainability problems. Produce a structured review with specific,
actionable feedback on each issue found.

## When to use

- The user asks you to review a PR, diff, or set of changes
- The user asks for feedback on code they've written
- You're about to merge or approve changes and want a quality check

## When NOT to use

- The user wants a full security audit (use the security audit skill)
- The user wants automated test generation (use the test writer skill)
- The changes are only documentation or formatting with no logic changes

## Input

**Required:**

- The code changes to review (git diff preferred, or full file contents)
- A brief description of what the changes are supposed to do

**Optional:**

- The project's language and framework
- Project-specific conventions or style guides
- Whether this is a draft review or a final review

## Steps

1. Read the diff and the description of intended changes
2. Verify the diff matches the description. If they contradict each
   other, flag this before proceeding
3. Review each changed file for:
   - Bugs and logic errors
   - Security issues (injection, auth bypass, data exposure)
   - Maintainability (naming, complexity, duplication)
   - Edge cases and missing error handling
4. For each issue, identify the file, line, severity, and a specific
   recommendation for fixing it
5. Write a one-paragraph summary of the overall change quality
6. Assign a verdict: Approve, Request changes, or Needs discussion

## Error handling

**Empty diff or no changes:**
Tell the user no changes were detected. Ask them to confirm the right
diff or file paths were provided.

**Binary files or non-text content:**
Skip binary files. Note which files were skipped in the Skipped Files
section of the output.

**Unsupported or unfamiliar language:**
Review for general issues (logic, naming, obvious bugs) but note that
language-specific feedback may be limited.

**Diff too large (more than 5,000 lines):**
Do not attempt a full review. Tell the user the diff is too large,
suggest splitting by file or logical grouping, and offer to start
with the files most likely to contain bugs.

**Conflicting description and diff:**
Flag the mismatch explicitly before proceeding with the review.

## Output format

### Summary

One paragraph: overall assessment, biggest risk, and whether the
changes are ready to merge.

### Findings

For each issue:

- **File and line**: location
- **Severity**: critical, warning, or suggestion
- **Description**: what the problem is
- **Recommendation**: how to fix it (with code example if helpful)

Order by severity (critical first).

If no issues found, state "No issues found."

### Skipped files

List any skipped files and why. If none, state "None."

### Verdict

One of: Approve, Request changes, or Needs discussion.
One-sentence justification.

Every section serves a purpose. The description tells the agent when to reach for this skill. The inputs tell it what to gather. The steps give it a repeatable process. The error handling keeps it from getting stuck. The output format makes the result consistent and scannable.

For real skill files you can install and study, see the PR review skill, test writer skill, and documentation generator skill.

Design checklist

Before shipping a skill file, run through this checklist:

Description:

States what the skill does in one sentence
Includes at least one “when to use” scenario
Includes at least one “when NOT to use” scenario with an alternative
Calls out key limitations or scope boundaries

Input:

Lists every piece of information the skill needs
Distinguishes required from optional inputs
Specifies expected formats (diff, file path, URL, etc.)
States constraints (size limits, supported formats)

Steps:

Provides a numbered, ordered process
Includes decision points for branching logic (“if X, do Y”)
Each step is concrete and actionable (not vague like “analyze the code”)

Error handling:

Covers the most likely failure modes
Each error scenario includes what happened, why, and what to do next
Never silently ignores or skips over problems

Output:

Defines a specific structure with named sections
Specifies the format for repeated elements (findings, items, etc.)
Output shape is the same whether there are results or not
Includes relevant metadata (counts, skipped items, completeness)

If your skill file passes every item, you have a well-designed skill that agents will be able to follow reliably.

Next steps

Now that you understand the anatomy of a skill file, explore Skill Design Principles to learn how skills compose together, when to split a skill into multiple pieces, and how to manage dependencies between skills. For practical advice on writing the prose inside your skill files, see Writing Skill Instructions. And for a simple skill you can use as a starting template, the File Search Skill example puts all four pillars into practice.