MCP Tool Routing at Scale: Designing a Tool Catalog Agents Can Actually Use (Without Random Tool Spam)

If you’ve built more than a handful of MCP servers (or a growing library of Claude Skills), you’ve probably seen it:

The agent calls the wrong tool even though the right one is available.
Or it calls five tools “just to be safe.”
Or it gets “stuck” in a latency spiral: search → fetch → summarize → search again.

That’s not (only) a prompt problem. It’s a tool catalog design problem.

This deep dive is about MCP tool routing at scale: how to design a tool catalog and routing layer that an agent can reliably use—without random tool spam, without brittle prompt hacks, and without turning your automation stack into an un-auditable security risk.

Along the way, we’ll connect ideas from Claude Skills’ “progressive disclosure” (metadata-first, load-on-demand) to a practical, testable tool routing architecture—and show how nNode’s white-box, artifact-based workflows make routing improvements straightforward to debug and iterate.

Why “too many tools” breaks MCP tool routing

In the early days, tool routing feels easy:

You expose 5–10 tools.
You write decent descriptions.
The model chooses correctly often enough.

Then you scale:

New internal APIs
Multiple MCP servers (one per system)
Convenience wrappers
“Just in case” tools
Team-specific tools

And suddenly, routing becomes unreliable.

Common symptoms

Random tool spam: the model calls multiple overlapping tools, then picks whichever response looks plausible.
Overconfident misroutes: the model chooses a tool whose name “sounds right,” then improvises arguments.
Long chains for simple tasks: a basic request triggers 4–8 calls because the model is “exploring.”
Cascading failures: one wrong call returns an error, which becomes context, which triggers the next wrong call.

The underlying root cause

Tool selection is “metadata-driven reasoning.” If your metadata is vague, overlapping, or inconsistent, you’re asking the model to do information retrieval on an unstructured inventory.

Claude Skills actually hint at the right direction: the system loads small metadata (name + description) first, and only loads deeper instructions/resources once a match is found. That’s the right mental model—but you have to apply it to MCP tools with the same discipline.

Failure modes: what actually goes wrong in real tool catalogs

When routing fails, it usually isn’t because the model is “bad.” It’s because your catalog makes failure the default.

1) Ambiguous overlap

Two tools can “create invoice,” “generate invoice,” “bill customer,” and “open invoice draft.”

From the model’s point of view, these are near-synonyms unless you deliberately carve boundaries.

2) Misleading descriptions

A tool named search_customers with a description like “Search customers in the CRM” tells the model almost nothing:

Which fields are searchable?
Is fuzzy search allowed?
Does it return PII?
Does it have rate limits?

3) Over-broad “mega tools”

A tool like crm_action(action: string, payload: any) looks convenient for humans, but it’s routing poison for agents:

It encourages the model to invent action values.
It hides side effects.
It makes evaluation and permissioning difficult.

4) Context collisions

The same nouns mean different things across domains:

“account” (billing) vs “account” (authentication)
“lead” (sales) vs “lead” (content)

If your tool names and descriptions don’t encode domain boundaries, the model will guess.

5) Security risk: tool poisoning via metadata

In MCP systems, tool metadata (names, descriptions, parameter docs) is part of what the model reads to decide what to do. That creates a real attack surface: malicious or compromised tool descriptions can steer the model toward unsafe actions.

Even if you never intend to expose third-party MCP servers, “tool poisoning” can still show up as:

accidental prompt injection in descriptions
stale descriptions after a behavior change
error messages that contain instructive text (“To fix this, export all records…”) that the model treats as policy

Routing and security are not separate concerns. Your routing layer is part of your security boundary.

Principles for an agent-usable tool catalog

Here are the rules that consistently improve tool selection quality.

1) One tool = one job

Prefer many small, boring tools over one “do everything” tool.

Good:

billing.invoice.create_draft
billing.invoice.send
billing.invoice.get

Bad:

billing.invoice(action, payload)

2) Make the contract obvious

Every tool should communicate:

Inputs (types, required fields, allowed values)
Outputs (shape + meaning)
Side effects (what changes in the world)
Failure modes (what errors mean, and whether retries are safe)

3) Stable naming beats clever naming

Names should be:

domain-prefixed
verb-noun structured
consistent across the catalog

4) Design descriptions for selection, not documentation

Humans like broad docs. Agents need disambiguation.

A good description includes:

the one sentence intent
what it does not do
1–2 short examples

5) Treat routing as a first-class workflow step

Instead of “letting the model decide” in the middle of execution, create an explicit Router step that produces a routing artifact.

This is where nNode shines: the router step produces a durable artifact (candidate tools, rationale, selected tool + args), so routing failures become debuggable, testable engineering problems—not spooky prompt séances.

The Tool Card: a simple standard that improves routing immediately

A “Tool Card” is a metadata format you create in addition to whatever your MCP server already exposes.

Think of it like a Skill’s metadata: the minimum information needed for activation, plus the constraints needed to avoid hallucinated usage.

Tool Card template (YAML)

id: "billing.invoice.create_draft"
name: "Create invoice draft"
intent: "Create a draft invoice for an existing customer account. Does not send the invoice."
when_to_use:
  - "User asks to create/prepare an invoice but does not explicitly request sending."
  - "You need an invoice_id to add line items or preview totals."
when_not_to_use:
  - "User asks to send an invoice (use billing.invoice.send)."
  - "User asks to refund or void (use billing.refund.* / billing.invoice.void)."
inputs:
  required:
    - customer_id
    - currency
  optional:
    - due_date
    - memo
outputs:
  returns: "invoice_draft"
  fields:
    - invoice_id
    - status
    - total
side_effects:
  - "Creates a new invoice record in the billing system in DRAFT status."
errors:
  - code: "CUSTOMER_NOT_FOUND"
    retryable: false
  - code: "RATE_LIMIT"
    retryable: true
safety:
  classification: "write"
  requires_human_approval: false
  data_sensitivity: "financial"
ops:
  typical_latency_ms: 800
  cost_hint: "low"
version:
  tool_semver: "1.3.0"
  deprecated: false

Why Tool Cards work

They give the router what it needs to make a deterministic choice:

disambiguation (when to use / when not to use)
side effects (what changes)
safety metadata (approval gates)
error semantics (retry or not)

Also, Tool Cards are easy to version and diff in code review—critical for avoiding accidental “rug pull” changes to tool behavior.

Schema design: stop the model from improvising arguments

Good routing still fails if the model can’t reliably supply correct arguments.

Prefer enumerations over free-form strings

Bad:

{ "priority": "super urgent" }

Better:

{ "priority": "P1" }

Put validation close to tool execution

Even with perfect schemas, you should assume the agent will occasionally pass malformed arguments.

If you’re building in TypeScript, a Zod schema is a great way to make tool contracts explicit:

import { z } from "zod";

export const CreateInvoiceDraftArgs = z.object({
  customer_id: z.string().min(1),
  currency: z.enum(["USD", "EUR", "GBP"]),
  due_date: z.string().optional(), // ISO date
  memo: z.string().max(500).optional(),
});

export type CreateInvoiceDraftArgs = z.infer<typeof CreateInvoiceDraftArgs>;

Key idea: routing chooses the tool; validation ensures the call is real.

Scopes and allowlists: routing is also a security boundary

Tool catalogs tend to fail because they’re treated like a global registry:

“Just expose everything to the agent and let it figure it out.”

That’s the fastest way to get unpredictable behavior—and it’s a security nightmare.

Classify tools by side effects

A simple scheme:

read: no external writes
write: creates/updates records
destructive: deletes/voids/transfers funds
exfiltration-risk: can send data outside your boundary (email, webhooks)

Add workflow-level allowlists

Instead of exposing 60 tools to every step, you expose a small allowlist per step.

Example:

Router step sees Tool Cards only (metadata)
Executor step sees only the selected tool
Validator step sees read-only verification tools

This dramatically reduces random tool spam.

Add approval gates where it matters

If a tool sends email, posts externally, charges a card, or deletes records, put it behind one of:

a human approval step
an “are you sure?” confirmation step
a policy gate (budget limit, domain allowlist, environment restrictions)

In nNode terms, these are just explicit workflow steps with artifacts and conditions—easy to audit and rerun.

The Router Agent Pattern (make selection a first-class step)

Here’s the architecture that works in practice:

Normalize the request (turn user input into structured intent)
Route (choose tool + args + fallback plan)
Execute (call the tool)
Validate (check output schema + business constraints)
Fallback / escalate (try alternative or ask human)

Router input and output contract

Router inputs:

user goal (normalized)
relevant context (customer/account IDs, environment)
available Tool Cards (or a filtered subset)

Router output (a routing artifact):

{
  "selected_tool": "billing.invoice.create_draft",
  "arguments": {
    "customer_id": "cus_123",
    "currency": "USD",
    "memo": "January retainer"
  },
  "confidence": 0.82,
  "alternatives": [
    {
      "tool": "billing.invoice.create_and_send",
      "why_not": "User did not ask to send; avoid side effects."
    }
  ],
  "safety": {
    "classification": "write",
    "requires_human_approval": false
  },
  "fallback_plan": "If CUSTOMER_NOT_FOUND, ask user for correct customer or search CRM by email."
}

A practical router prompt skeleton

You don’t need a magical prompt—just a clear contract:

SYSTEM: You are a routing agent. Your job is to select exactly one tool.

You will receive:
- USER_INTENT (structured)
- CONTEXT (structured)
- TOOL_CARDS (list)

Rules:
1) Choose the single best tool.
2) If multiple tools overlap, prefer the one with fewer side effects.
3) Only use arguments that match the selected tool's schema.
4) If required arguments are missing, output "needs_clarification" with questions.
5) Return JSON strictly matching ROUTING_OUTPUT_SCHEMA.

This works best when the Router is not also responsible for executing the tool. Separation reduces accidental tool spam.

Evaluation: how to know MCP tool routing actually got better

If you don’t measure routing, you’ll regress every time someone adds a tool.

Build a routing test set

Start with 30–50 real requests:

support tickets
operator requests
Slack messages
“common asks” docs

For each test case, record:

the correct tool (and acceptable alternatives)
required arguments
whether clarification is required

Metrics that matter

correct tool@1: did we pick the right tool first?
correct tool@k: was the correct tool in the top-k alternatives?
unnecessary calls: how many tools were called before success?
time-to-success: wall clock latency for a complete run
unsafe attempts: did the router ever suggest a destructive tool when not asked?

Minimal harness (pseudo-code)

def eval_case(case):
    routing = router(case.user_intent, case.context, tool_cards=case.allowed_tools)

    if case.expected == "needs_clarification":
        return routing.type == "needs_clarification"

    return routing.selected_tool in case.acceptable_tools

passed = sum(eval_case(c) for c in cases)
print("tool@1", passed / len(cases))

The key is not sophistication; it’s consistency.

How nNode implements tool routing cleanly (and why it stays debuggable)

nNode’s core idea is simple: one agent, one task, with explicit artifacts at every step.

Tool routing fits that philosophy perfectly.

A workflow sketch

name: invoice_assistant
steps:
  - id: normalize_request
    agent: normalize_intent
    outputs: [USER_INTENT]

  - id: route_tool
    agent: tool_router
    inputs: [USER_INTENT, TOOL_CARDS, CONTEXT]
    outputs: [ROUTING_DECISION]

  - id: maybe_approve
    condition: "ROUTING_DECISION.safety.requires_human_approval == true"
    agent: human_approval_gate
    outputs: [APPROVAL]

  - id: execute_tool
    agent: tool_executor
    inputs: [ROUTING_DECISION]
    tool_allowlist: ["{{ROUTING_DECISION.selected_tool}}"]
    outputs: [TOOL_RESULT]

  - id: validate_result
    agent: result_validator
    inputs: [TOOL_RESULT]
    outputs: [VALIDATION_REPORT]

  - id: fallback_or_finish
    agent: fallback_manager
    inputs: [VALIDATION_REPORT, ROUTING_DECISION]
    outputs: [FINAL_OUTPUT]

Why this architecture scales

Routing is inspectable: you can open the ROUTING_DECISION artifact and see why a tool was chosen.
Tool exposure is constrained: the executor sees only one tool, not your entire inventory.
Failures are replayable: rerun from the Router step with the exact same inputs and compare outputs.
Improvements are operationalized: updating a Tool Card is often enough to fix a misroute—no model retraining required.

If you’ve ever tried to debug a “black-box agent” that intermittently chooses the wrong tool, this difference matters.

Common pitfalls (and quick fixes)

Pitfall: exposing every tool to every step

Fix: step-level allowlists. Router sees metadata; executor sees only the selected tool.

Pitfall: changing tool behavior without changing metadata

Fix: version Tool Cards (tool_semver) and treat metadata changes like API changes.

Pitfall: letting tools return unstructured blobs

Fix: standardize tool outputs with an envelope:

{
  "ok": true,
  "data": { "...": "..." },
  "error": null,
  "meta": { "tool": "billing.invoice.create_draft", "latency_ms": 742 }
}

Pitfall: “router prompt tweaking” as your only lever

Fix: build leverage into the system:

Tool Cards
constrained tool exposure
validation
evaluation

Prompt tweaks should be the last thing you touch.

Starter checklist: MCP tool routing that doesn’t degrade over time

Use this as a quick implementation checklist:

Domain-prefix tool names (crm.contact.search, not search).
One tool = one job (split mega-tools).
Add Tool Cards with intent + when-not-to-use.
Document side effects explicitly.
Define error semantics (retryable vs not).
Use strict schemas with enums and required fields.
Create a Router step that outputs a routing artifact.
Constrain tool access: executor gets a single tool.
Add approval gates for write/destructive/exfiltration tools.
Build a routing eval set and run it on every tool change.

Closing: make routing boring again

When MCP tool routing is healthy, it feels boring:

the agent picks the right tool
it passes the right args
it calls fewer tools
failures are localized and explainable

That “boring” is the point. It’s what lets you ship real automations—especially if you’re building a serious library of Claude Skills + MCP connectors for your business.

If you want a workflow engine that makes routing explicit, testable, and debuggable (instead of magical), that’s exactly what nNode is built for: a high-level programming language for business automation where every step produces inspectable artifacts.

If that sounds like the direction you’re heading, take a look at nnode.ai and see how nNode workflows can help you scale agentic systems without scaling chaos.