AI agents for customer support

Most AI chatbots in customer support are terrible. Customers hate them. They loop endlessly, they don’t understand the actual problem, and they refuse to connect you to a human until you’ve typed “speak to an agent” four times in increasingly creative ways. Agent skills can be better than that, but only if you design them to help your support team rather than replace them.

Here’s the key insight that separates good implementations from the ones customers complain about on Twitter: don’t put the agent in front of the customer. Put it behind the support rep. The agent assists the human who’s talking to the customer. The human stays in control of the conversation, the tone, and the decision to send each response. The customer never knows an agent is involved. They just notice their issue gets resolved faster.

Ticket routing and classification

The simplest and highest-value skill for support teams is automatic ticket classification. When a new ticket arrives, the agent reads it and determines: what type of issue is this, how urgent is it, and who should handle it?

Here’s what the classification prompt looks like in practice:

You are a ticket classifier for a SaaS product.

Given a support ticket, classify it into exactly one category:
- billing: payment issues, subscription changes, invoices, refunds
- technical: bugs, errors, performance problems, integration issues
- feature_request: suggestions for new functionality
- account: password resets, access issues, team management
- onboarding: setup help, getting started, configuration

Assign a priority:
- urgent: customer is blocked, revenue impact, security issue
- high: significant inconvenience, workaround exists but is painful
- normal: standard request, no immediate pressure
- low: nice-to-have, informational

Return JSON: { "category": "...", "priority": "...", "reasoning": "..." }

The reasoning field matters. When a ticket gets misrouted (and some will), the support lead can look at the reasoning, understand why the agent made that call, and adjust the prompt. Without it, you’re debugging a black box.

This skill alone saves a surprising amount of time. Manual triage is one of those tasks that takes 30 seconds per ticket but adds up to hours per day across a team. It also introduces inconsistency because what one rep calls “urgent” another might call “normal.” The agent applies the same criteria every time. It won’t get it perfect, but it’ll be consistent, and consistency makes it easier to calibrate.

Response drafting

This is where agents shine as copilots. A customer writes in about a specific error they’re hitting. The agent searches your documentation, finds relevant KB articles, pulls up similar resolved tickets from the last 90 days, and drafts a response.

The rep reads the draft, adjusts the tone and details, and sends it. Maybe the agent’s draft is 80% right and the rep tweaks a few sentences. Maybe it’s completely wrong and the rep rewrites it. Either way, the rep started with something instead of a blank text field.

// The tool definition is the contract: name, input shape, output shape.
// Internal steps belong in the implementation, not the interface.
interface ResponseDraftTool {
  name: "draft_support_response";
  input: {
    ticketId: string;
    customerMessage: string;
    category: string;
    customerTier: string; // "free", "pro", "enterprise"
  };
  output: {
    draft: string;
    sources: string[]; // links to docs and past tickets used
    confidence: number; // 0.0 - 1.0; below ~0.6 the rep should treat as low-confidence
  };
}

// Internally, the tool's implementation does the work in stages:
//   1. search_knowledge_base     -- find relevant docs
//   2. find_similar_tickets      -- past tickets with similar issues
//   3. check_known_issues        -- current outages or bugs
//   4. compose the reply from those sources, scoring confidence
// Each stage can be its own tool internally, or just a function.

The confidence score is important. When the agent can’t find good matches in your docs or ticket history, it should say so rather than hallucinate an answer. A confidence score below a threshold can trigger a visual indicator in the rep’s UI: “Low confidence draft, review carefully.” For more on designing these kinds of uncertainty signals, see writing skill instructions.

One thing to avoid: don’t auto-populate the response field with the draft. Put it in a sidebar or a “suggested response” panel. If it auto-fills the reply box, a rushed rep might hit send without reading it carefully. You want the agent’s output to be one click away, not zero clicks away.

Knowledge base maintenance

Support teams have a constant problem: the knowledge base drifts out of date. Features change, workarounds stop working, and new common issues emerge that nobody writes articles for. An agent can help with both sides of this.

For identifying gaps, the agent monitors incoming tickets and flags patterns. If 15 customers in the last two weeks asked about configuring SSO with Okta and there’s no KB article for it, the agent surfaces that gap. It can even draft the article based on how reps have been answering the question.

For identifying stale content, the agent periodically checks KB articles against recent ticket resolutions. If customers are asking about a topic that has a KB article, but reps are giving different answers than what the article says, something is out of date. The agent flags the article for review and shows the discrepancy.

name: kb_maintenance
schedule: weekly
steps:
  - analyze_recent_tickets:
      window: 14d
      group_by: topic
  - identify_gaps:
      threshold: 5 # topics with 5+ tickets and no KB article
      output: gap_report
  - check_article_freshness:
      compare: article_content vs recent_resolutions
      flag_if: mismatch_rate > 0.3
      output: staleness_report
  - draft_new_articles:
      for_each: gap_report.topics
      based_on: recent_ticket_resolutions

This doesn’t replace a human content manager. It gives them a prioritized list of what to write and what to update, along with first drafts they can edit.

Escalation detection

Some tickets start as normal priority and escalate because the customer gets frustrated with slow responses or repeated back-and-forth. By the time a senior rep notices, the customer is already angry. An escalation detection skill monitors open tickets for signals that things are going sideways.

Signals to watch for:

Customer sentiment turning negative over consecutive messages
The same customer replying three or more times without a resolution
Keywords indicating frustration: “cancel,” “manager,” “unacceptable,” “switching to competitor”
Time since first response exceeding the SLA for that customer tier
The ticket being reassigned more than twice

When the skill detects these signals, it doesn’t just flag the ticket. It provides context: “This enterprise customer has been waiting 4 hours (SLA is 2 hours), has sent 3 follow-up messages, and mentioned cancellation in the last message. Sentiment has dropped from neutral to negative. Suggesting escalation to senior support.”

This is much more useful than a simple SLA timer. SLA timers tell you a ticket is overdue. Escalation detection tells you a customer is about to churn. The difference matters.

Post-resolution summaries

After a ticket is closed, an agent can write up what happened. Not just “resolved,” but a structured summary: what the customer’s actual issue was, what was tried, what worked, and whether there’s a product improvement that would prevent this issue in the future.

These summaries feed into multiple things. They improve the response drafting skill because future similar tickets now have a clear resolution to reference. They give product managers signal about what’s causing friction. They make quarterly support reviews much easier because you have structured data instead of hundreds of ticket threads to read.

What makes this work vs. what makes it fail

I’ve seen both outcomes. The pattern is pretty clear.

It works when the agent is a copilot for the support rep. The rep sees the agent’s suggestions, uses their judgment, and makes the final call. The customer interacts with a human who has better tools. Response times drop because the rep spends less time searching for information. Quality stays high because a person is reviewing every outgoing message.

It fails when the agent becomes a gatekeeper between the customer and the support team. “I’m sorry, I don’t understand your question. Can you rephrase that?” repeated three times will make anyone want to throw their phone. If there’s no clear, immediate path to a human, customers leave. See when not to use agents for more on recognizing situations where automation does more harm than good.

It also fails when teams measure the wrong things. If you’re tracking “tickets resolved without human involvement” as your success metric, you’re optimizing for the wrong outcome. You’ll end up with an agent that closes tickets prematurely and frustrated customers who open new tickets about the same issue. Deflection rate is a vanity metric. Customer satisfaction and actual resolution rate are what matter.

Metrics worth tracking

When you add agent skills to your support workflow, measure these:

First response time should decrease. The drafting skill means reps aren’t starting from scratch, so they can respond faster.

Resolution time should decrease too, but less dramatically. Finding the right information is only part of solving the problem.

Customer satisfaction (CSAT) should go up or stay the same. If it drops, something is wrong. Either the agent is generating bad drafts that reps aren’t catching, or the routing is sending tickets to the wrong team, or the agent has been put in front of customers when it shouldn’t be.

Rep satisfaction also matters and often gets ignored. Ask your support team if the tools are helping. If reps are spending time correcting bad drafts or overriding wrong classifications, the agent is creating work rather than reducing it.

Escalation rate is a tricky one. It might go up initially because the escalation detection skill is catching issues that previously slipped through. That’s a good thing. Over time, it should stabilize as the team addresses the root causes.

Getting started

If you’re adding agent skills to a support team for the first time, start with ticket classification. It’s the lowest risk, the easiest to measure, and it gives you experience with how the agent performs on your specific ticket data. Run it in shadow mode for two weeks: let the agent classify tickets but don’t route based on its output. Compare its classifications to what your reps would have done. Tune the prompt until accuracy is above 90%.

Then add response drafting, again in a sidebar that reps can choose to use or ignore. Track adoption. If reps are using the drafts, they’re helpful. If reps are ignoring them, figure out why before pushing harder.

Build from there. Each skill you add should make the team’s day measurably better. If it doesn’t, pull it back and fix it before moving on. For patterns on keeping humans in the decision loop throughout, see human-in-the-loop.