If you’ve built more than a handful of MCP servers (or a growing library of Claude Skills), you’ve probably seen it:
- The agent calls the wrong tool even though the right one is available.
- Or it calls five tools “just to be safe.”
- Or it gets “stuck” in a latency spiral: search → fetch → summarize → search again.
That’s not (only) a prompt problem. It’s a tool catalog design problem.
This deep dive is about MCP tool routing at scale: how to design a tool catalog and routing layer that an agent can reliably use—without random tool spam, without brittle prompt hacks, and without turning your automation stack into an un-auditable security risk.
Along the way, we’ll connect ideas from Claude Skills’ “progressive disclosure” (metadata-first, load-on-demand) to a practical, testable tool routing architecture—and show how nNode’s white-box, artifact-based workflows make routing improvements straightforward to debug and iterate.
Why “too many tools” breaks MCP tool routing
In the early days, tool routing feels easy:
- You expose 5–10 tools.
- You write decent descriptions.
- The model chooses correctly often enough.
Then you scale:
- New internal APIs
- Multiple MCP servers (one per system)
- Convenience wrappers
- “Just in case” tools
- Team-specific tools
And suddenly, routing becomes unreliable.
Common symptoms
- Random tool spam: the model calls multiple overlapping tools, then picks whichever response looks plausible.
- Overconfident misroutes: the model chooses a tool whose name “sounds right,” then improvises arguments.
- Long chains for simple tasks: a basic request triggers 4–8 calls because the model is “exploring.”
- Cascading failures: one wrong call returns an error, which becomes context, which triggers the next wrong call.
The underlying root cause
Tool selection is “metadata-driven reasoning.” If your metadata is vague, overlapping, or inconsistent, you’re asking the model to do information retrieval on an unstructured inventory.
Claude Skills actually hint at the right direction: the system loads small metadata (name + description) first, and only loads deeper instructions/resources once a match is found. That’s the right mental model—but you have to apply it to MCP tools with the same discipline.
Failure modes: what actually goes wrong in real tool catalogs
When routing fails, it usually isn’t because the model is “bad.” It’s because your catalog makes failure the default.
1) Ambiguous overlap
Two tools can “create invoice,” “generate invoice,” “bill customer,” and “open invoice draft.”
From the model’s point of view, these are near-synonyms unless you deliberately carve boundaries.
2) Misleading descriptions
A tool named search_customers with a description like “Search customers in the CRM” tells the model almost nothing:
- Which fields are searchable?
- Is fuzzy search allowed?
- Does it return PII?
- Does it have rate limits?
3) Over-broad “mega tools”
A tool like crm_action(action: string, payload: any) looks convenient for humans, but it’s routing poison for agents:
- It encourages the model to invent
actionvalues. - It hides side effects.
- It makes evaluation and permissioning difficult.
4) Context collisions
The same nouns mean different things across domains:
- “account” (billing) vs “account” (authentication)
- “lead” (sales) vs “lead” (content)
If your tool names and descriptions don’t encode domain boundaries, the model will guess.
5) Security risk: tool poisoning via metadata
In MCP systems, tool metadata (names, descriptions, parameter docs) is part of what the model reads to decide what to do. That creates a real attack surface: malicious or compromised tool descriptions can steer the model toward unsafe actions.
Even if you never intend to expose third-party MCP servers, “tool poisoning” can still show up as:
- accidental prompt injection in descriptions
- stale descriptions after a behavior change
- error messages that contain instructive text (“To fix this, export all records…”) that the model treats as policy
Routing and security are not separate concerns. Your routing layer is part of your security boundary.
Principles for an agent-usable tool catalog
Here are the rules that consistently improve tool selection quality.
1) One tool = one job
Prefer many small, boring tools over one “do everything” tool.
Good:
billing.invoice.create_draftbilling.invoice.sendbilling.invoice.get
Bad:
billing.invoice(action, payload)
2) Make the contract obvious
Every tool should communicate:
- Inputs (types, required fields, allowed values)
- Outputs (shape + meaning)
- Side effects (what changes in the world)
- Failure modes (what errors mean, and whether retries are safe)
3) Stable naming beats clever naming
Names should be:
- domain-prefixed
- verb-noun structured
- consistent across the catalog
4) Design descriptions for selection, not documentation
Humans like broad docs. Agents need disambiguation.
A good description includes:
- the one sentence intent
- what it does not do
- 1–2 short examples
5) Treat routing as a first-class workflow step
Instead of “letting the model decide” in the middle of execution, create an explicit Router step that produces a routing artifact.
This is where nNode shines: the router step produces a durable artifact (candidate tools, rationale, selected tool + args), so routing failures become debuggable, testable engineering problems—not spooky prompt séances.
The Tool Card: a simple standard that improves routing immediately
A “Tool Card” is a metadata format you create in addition to whatever your MCP server already exposes.
Think of it like a Skill’s metadata: the minimum information needed for activation, plus the constraints needed to avoid hallucinated usage.
Tool Card template (YAML)
id: "billing.invoice.create_draft"
name: "Create invoice draft"
intent: "Create a draft invoice for an existing customer account. Does not send the invoice."
when_to_use:
- "User asks to create/prepare an invoice but does not explicitly request sending."
- "You need an invoice_id to add line items or preview totals."
when_not_to_use:
- "User asks to send an invoice (use billing.invoice.send)."
- "User asks to refund or void (use billing.refund.* / billing.invoice.void)."
inputs:
required:
- customer_id
- currency
optional:
- due_date
- memo
outputs:
returns: "invoice_draft"
fields:
- invoice_id
- status
- total
side_effects:
- "Creates a new invoice record in the billing system in DRAFT status."
errors:
- code: "CUSTOMER_NOT_FOUND"
retryable: false
- code: "RATE_LIMIT"
retryable: true
safety:
classification: "write"
requires_human_approval: false
data_sensitivity: "financial"
ops:
typical_latency_ms: 800
cost_hint: "low"
version:
tool_semver: "1.3.0"
deprecated: false
Why Tool Cards work
They give the router what it needs to make a deterministic choice:
- disambiguation (when to use / when not to use)
- side effects (what changes)
- safety metadata (approval gates)
- error semantics (retry or not)
Also, Tool Cards are easy to version and diff in code review—critical for avoiding accidental “rug pull” changes to tool behavior.
Schema design: stop the model from improvising arguments
Good routing still fails if the model can’t reliably supply correct arguments.
Prefer enumerations over free-form strings
Bad:
{ "priority": "super urgent" }
Better:
{ "priority": "P1" }
Put validation close to tool execution
Even with perfect schemas, you should assume the agent will occasionally pass malformed arguments.
If you’re building in TypeScript, a Zod schema is a great way to make tool contracts explicit:
import { z } from "zod";
export const CreateInvoiceDraftArgs = z.object({
customer_id: z.string().min(1),
currency: z.enum(["USD", "EUR", "GBP"]),
due_date: z.string().optional(), // ISO date
memo: z.string().max(500).optional(),
});
export type CreateInvoiceDraftArgs = z.infer<typeof CreateInvoiceDraftArgs>;
Key idea: routing chooses the tool; validation ensures the call is real.
Scopes and allowlists: routing is also a security boundary
Tool catalogs tend to fail because they’re treated like a global registry:
“Just expose everything to the agent and let it figure it out.”
That’s the fastest way to get unpredictable behavior—and it’s a security nightmare.
Classify tools by side effects
A simple scheme:
- read: no external writes
- write: creates/updates records
- destructive: deletes/voids/transfers funds
- exfiltration-risk: can send data outside your boundary (email, webhooks)
Add workflow-level allowlists
Instead of exposing 60 tools to every step, you expose a small allowlist per step.
Example:
- Router step sees Tool Cards only (metadata)
- Executor step sees only the selected tool
- Validator step sees read-only verification tools
This dramatically reduces random tool spam.
Add approval gates where it matters
If a tool sends email, posts externally, charges a card, or deletes records, put it behind one of:
- a human approval step
- an “are you sure?” confirmation step
- a policy gate (budget limit, domain allowlist, environment restrictions)
In nNode terms, these are just explicit workflow steps with artifacts and conditions—easy to audit and rerun.
The Router Agent Pattern (make selection a first-class step)
Here’s the architecture that works in practice:
- Normalize the request (turn user input into structured intent)
- Route (choose tool + args + fallback plan)
- Execute (call the tool)
- Validate (check output schema + business constraints)
- Fallback / escalate (try alternative or ask human)
Router input and output contract
Router inputs:
- user goal (normalized)
- relevant context (customer/account IDs, environment)
- available Tool Cards (or a filtered subset)
Router output (a routing artifact):
{
"selected_tool": "billing.invoice.create_draft",
"arguments": {
"customer_id": "cus_123",
"currency": "USD",
"memo": "January retainer"
},
"confidence": 0.82,
"alternatives": [
{
"tool": "billing.invoice.create_and_send",
"why_not": "User did not ask to send; avoid side effects."
}
],
"safety": {
"classification": "write",
"requires_human_approval": false
},
"fallback_plan": "If CUSTOMER_NOT_FOUND, ask user for correct customer or search CRM by email."
}
A practical router prompt skeleton
You don’t need a magical prompt—just a clear contract:
SYSTEM: You are a routing agent. Your job is to select exactly one tool.
You will receive:
- USER_INTENT (structured)
- CONTEXT (structured)
- TOOL_CARDS (list)
Rules:
1) Choose the single best tool.
2) If multiple tools overlap, prefer the one with fewer side effects.
3) Only use arguments that match the selected tool's schema.
4) If required arguments are missing, output "needs_clarification" with questions.
5) Return JSON strictly matching ROUTING_OUTPUT_SCHEMA.
This works best when the Router is not also responsible for executing the tool. Separation reduces accidental tool spam.
Evaluation: how to know MCP tool routing actually got better
If you don’t measure routing, you’ll regress every time someone adds a tool.
Build a routing test set
Start with 30–50 real requests:
- support tickets
- operator requests
- Slack messages
- “common asks” docs
For each test case, record:
- the correct tool (and acceptable alternatives)
- required arguments
- whether clarification is required
Metrics that matter
- correct tool@1: did we pick the right tool first?
- correct tool@k: was the correct tool in the top-k alternatives?
- unnecessary calls: how many tools were called before success?
- time-to-success: wall clock latency for a complete run
- unsafe attempts: did the router ever suggest a destructive tool when not asked?
Minimal harness (pseudo-code)
def eval_case(case):
routing = router(case.user_intent, case.context, tool_cards=case.allowed_tools)
if case.expected == "needs_clarification":
return routing.type == "needs_clarification"
return routing.selected_tool in case.acceptable_tools
passed = sum(eval_case(c) for c in cases)
print("tool@1", passed / len(cases))
The key is not sophistication; it’s consistency.
How nNode implements tool routing cleanly (and why it stays debuggable)
nNode’s core idea is simple: one agent, one task, with explicit artifacts at every step.
Tool routing fits that philosophy perfectly.
A workflow sketch
name: invoice_assistant
steps:
- id: normalize_request
agent: normalize_intent
outputs: [USER_INTENT]
- id: route_tool
agent: tool_router
inputs: [USER_INTENT, TOOL_CARDS, CONTEXT]
outputs: [ROUTING_DECISION]
- id: maybe_approve
condition: "ROUTING_DECISION.safety.requires_human_approval == true"
agent: human_approval_gate
outputs: [APPROVAL]
- id: execute_tool
agent: tool_executor
inputs: [ROUTING_DECISION]
tool_allowlist: ["{{ROUTING_DECISION.selected_tool}}"]
outputs: [TOOL_RESULT]
- id: validate_result
agent: result_validator
inputs: [TOOL_RESULT]
outputs: [VALIDATION_REPORT]
- id: fallback_or_finish
agent: fallback_manager
inputs: [VALIDATION_REPORT, ROUTING_DECISION]
outputs: [FINAL_OUTPUT]
Why this architecture scales
- Routing is inspectable: you can open the
ROUTING_DECISIONartifact and see why a tool was chosen. - Tool exposure is constrained: the executor sees only one tool, not your entire inventory.
- Failures are replayable: rerun from the Router step with the exact same inputs and compare outputs.
- Improvements are operationalized: updating a Tool Card is often enough to fix a misroute—no model retraining required.
If you’ve ever tried to debug a “black-box agent” that intermittently chooses the wrong tool, this difference matters.
Common pitfalls (and quick fixes)
Pitfall: exposing every tool to every step
Fix: step-level allowlists. Router sees metadata; executor sees only the selected tool.
Pitfall: changing tool behavior without changing metadata
Fix: version Tool Cards (tool_semver) and treat metadata changes like API changes.
Pitfall: letting tools return unstructured blobs
Fix: standardize tool outputs with an envelope:
{
"ok": true,
"data": { "...": "..." },
"error": null,
"meta": { "tool": "billing.invoice.create_draft", "latency_ms": 742 }
}
Pitfall: “router prompt tweaking” as your only lever
Fix: build leverage into the system:
- Tool Cards
- constrained tool exposure
- validation
- evaluation
Prompt tweaks should be the last thing you touch.
Starter checklist: MCP tool routing that doesn’t degrade over time
Use this as a quick implementation checklist:
- Domain-prefix tool names (
crm.contact.search, notsearch). - One tool = one job (split mega-tools).
- Add Tool Cards with intent + when-not-to-use.
- Document side effects explicitly.
- Define error semantics (retryable vs not).
- Use strict schemas with enums and required fields.
- Create a Router step that outputs a routing artifact.
- Constrain tool access: executor gets a single tool.
- Add approval gates for write/destructive/exfiltration tools.
- Build a routing eval set and run it on every tool change.
Closing: make routing boring again
When MCP tool routing is healthy, it feels boring:
- the agent picks the right tool
- it passes the right args
- it calls fewer tools
- failures are localized and explainable
That “boring” is the point. It’s what lets you ship real automations—especially if you’re building a serious library of Claude Skills + MCP connectors for your business.
If you want a workflow engine that makes routing explicit, testable, and debuggable (instead of magical), that’s exactly what nNode is built for: a high-level programming language for business automation where every step produces inspectable artifacts.
If that sounds like the direction you’re heading, take a look at nnode.ai and see how nNode workflows can help you scale agentic systems without scaling chaos.