Claude Code skills are powerful—until they don’t trigger. If you’ve ever written a beautiful SKILL.md, tried a real request, and watched Claude ignore it, you’ve hit the gap between “prompting” and “production.”
This deep dive is a practical playbook for shipping Claude Code skills that activate reliably, plus a packaging pattern to graduate a skill into a reusable, inspectable, debuggable (“white‑box”) workflow you can run across a team (or for clients).
The real reason Claude Code skills don’t activate (a mental model)
Most “my skill isn’t activating” debugging is really three separate problems:
- Discovery: Is Claude even aware the skill exists?
- Selection: Given the user’s request, does the model believe this skill is the best match?
- Eligibility: Even if it’s a match, is Claude allowed to invoke it (permissions, invocation settings, tool access)?
If you don’t separate these, you’ll keep rewriting instructions that were never the issue.
1) Discovery: can Claude “see” the skill?
In Claude Code, skill descriptions are loaded into context so Claude knows what’s available. But there’s a character budget—if you have lots of skills, some descriptions can be excluded. That failure mode looks like “it never triggers,” because it never had a chance.
Quick checks:
- Confirm the skill is listed in What skills are available?
- Run
/contextand look for warnings about excluded skills - If you have a big skill library, consider raising the
SLASH_COMMAND_TOOL_CHAR_BUDGETenvironment variable (or prune / consolidate descriptions)
2) Selection: the description is the router
Claude uses the skill’s description to decide when to apply it. The highest-leverage authoring move is treating the description like a routing contract, not marketing copy.
Bad description (informative but not routable):
- “Summarizes documents and posts updates.”
Better description (decision-oriented):
- “Use when the user asks for a daily digest from Google Drive docs/notes or meeting transcripts; output Slack-ready bullet points + action items.”
Selection also fails when:
- Two skills overlap (“digest” vs “summary” vs “weekly update”) and collide
- The user’s wording doesn’t overlap your description keywords
- Your skill is too broad (“does everything”) so Claude can’t confidently pick it
3) Eligibility: permissions and invocation controls
Even a perfectly-routed skill won’t activate if it’s not eligible.
Key levers in Claude Code skill frontmatter:
disable-model-invocation: true— blocks automatic model invocation (manual/skill-namestill works)user-invocable: false— hides manual invocation, but Claude can still use it automatically- Permissions rules can allow/deny the Skill tool or specific skills (e.g.,
Skill(deploy *)) allowed-toolsin skill frontmatter can grant tool access when the skill is active (CLI behavior)
If your skill has side effects (deploying, posting, writing to CRM), you often want disable-model-invocation: true and a human approval gate.
Treat “skill activation rate” like a production metric
If you can’t measure activation, you can’t scale skills.
Here’s a simple metric set that works in practice:
- Activation rate (AR): % of prompts where the correct skill activates
- False positive rate (FPR): % of prompts where the skill activates but shouldn’t
- Near-miss rate: % of prompts where it activates only after rephrasing (signals brittle routing)
- Collision rate: % of prompts where a different skill activates (overlap / naming / description conflict)
What to log when a skill should have triggered but didn’t
When you hit a miss, capture:
- The exact user prompt (no “cleaned up” version)
- The top 2–3 skills you expected could match
- Whether the skill was discoverable (listed + not excluded by context budget)
- Whether the skill was eligible (invocation flags / permissions)
- The minimal wording change that makes it trigger (your best clue for description keywords)
This is how you move from “I swear it should work” to a concrete patch.
8 authoring patterns that make Claude Code skills trigger reliably
You can write great instructions and still get low activation. These patterns focus on routing and collision avoidance, not prose.
1) Write descriptions as a router (keywords + when/when-not)
A high-performing description answers:
- When to use it (user intents)
- What inputs it expects (files, URLs, repo paths, IDs)
- What it outputs (format contract)
- When NOT to use it (to avoid collisions)
If you only change one thing, change this.
2) Use “trigger phrase clusters” (synonyms, not just one term)
Users don’t say one canonical phrase. If your org says “daily digest” but users say:
- “summary for Slack”
- “what changed today?”
- “give me an update”
- “action items from these notes”
…then your description should include those phrases.
Tip: embed them naturally as part of “Use when …” bullets.
3) Add preconditions to prevent misfires
Preconditions reduce false positives and help activation, because they give Claude crisp boundaries.
Example preconditions:
- “Only run if the request mentions Slack or asks for a message ‘to send’.”
- “Only run if a folder path or file list is provided.”
- “If inputs are missing, ask for them—don’t guess.”
4) Put the tool contract first (schemas beat vibes)
The fastest way to get “activated but wrong behavior” is ambiguous output.
Write:
- A strict output shape
- Examples
- A “do not write to external systems unless…” rule
When a skill is meant to be used in an automation pipeline, treat it like a function:
- Inputs
- Outputs
- Failure modes
- Safe defaults
5) Use negative examples to reduce collisions
Negative examples are underrated.
Add a section like:
- “Do not use this skill for code review summaries (use
/review-pr).” - “Do not use this skill for long-form reports (use
/weekly-report).”
This helps Claude pick between similar skills.
6) Keep skills single-responsibility (and chain them in workflows)
Broad skills are harder to route and harder to debug.
A good rule:
- If your skill needs “Step 0: figure out what the user wants,” it’s probably too broad.
Instead:
- A skill routes / formats / standardizes output
- The workflow orchestrates tool calls, approvals, retries, and side effects
7) Use supporting files to keep SKILL.md lean
Claude Code skills can include additional files in the skill directory—examples, templates, reference docs, scripts.
This is a production move:
- Keep
SKILL.mdas “overview + navigation + core contract” - Move bulky material into
reference.md,examples/,scripts/
It improves maintainability and reduces accidental prompt bloat.
8) Use context: fork + an agent type when isolation improves reliability
If your skill is a task (not just conventions), running it in a forked context can reduce contamination from whatever messy conversation came before.
Pattern:
context: forkagent: Explorefor read-only research tasksagent: Planfor planning tasks
It’s especially useful when you want consistent outputs for evals.
Skills vs. subagents vs. “slash commands”: when to use which
In modern Claude Code, custom commands and skills have effectively converged: a .claude/commands/review.md and .claude/skills/review/SKILL.md can both create /review.
So the practical choice is:
| You need… | Use… | Why |
|---|---|---|
| Portable, reusable expertise that Claude can load when relevant | Skill | Persistent procedure/knowledge + model can choose when to apply |
| A self-contained worker with its own context + tool boundaries | Subagent | Isolation, specialization, cost/model routing |
| A human-controlled action with side effects | Manually-invoked skill (/skill-name) | You choose timing; safer for deploy/post/write |
A good production architecture is often:
- Skill = routing + formatting + “how we do X here”
- Subagent = bounded executor (researcher, planner, implementer)
- Workflow = orchestrator (state, approvals, retries, integrations)
A minimal eval harness for Claude Code skills (run weekly or in CI)
If you ship skills to a team (or clients), you need regression protection. Otherwise one description tweak can quietly tank activation.
Step 1: build a prompt suite
Start with 20 prompts per skill. Include:
- 10 “should trigger” prompts (realistic)
- 5 “should NOT trigger” prompts (close collisions)
- 5 ambiguous prompts (where the skill should ask clarifying questions)
Store it as YAML so it’s easy to extend:
# evals/daily-digest.skill-eval.yml
skill: daily-digest
cases:
- id: should-trigger-01
prompt: "Create a Slack-ready daily digest from the docs in ./notes/today/"
expect: { activate: true }
- id: should-trigger-02
prompt: "Summarize these meeting notes into action items and a message I can send to #client-updates"
expect: { activate: true }
- id: should-not-trigger-01
prompt: "Review this PR and write a summary of the diff"
expect: { activate: false }
- id: ambiguous-01
prompt: "Give me an update"
expect: { activate: "clarify" }
Step 2: run the suite and score activation
If you’re using the Claude Agent SDK, you can run a lightweight test runner that:
- Sends each prompt
- Captures whether the Skill tool was used (or whether output matches the skill’s signature format)
- Produces a diffable report
Here’s a practical starting point (treat as a template—SDK event shapes can vary by version):
# tools/run_skill_evals.py
import asyncio
import yaml
from claude_agent_sdk import query, ClaudeAgentOptions
async def run_case(prompt: str, options: ClaudeAgentOptions) -> dict:
transcript = []
async for event in query(prompt=prompt, options=options):
transcript.append(str(event))
text = "\n".join(transcript)
# Heuristic 1: did the Skill tool get invoked?
activated = "tool_name=Skill" in text or "Skill(" in text
# Heuristic 2 (optional): did output match the skill's expected signature?
slackish = "## Daily Digest" in text or "Action items" in text
return {"activated": activated, "signature_match": slackish}
async def main(eval_path: str):
suite = yaml.safe_load(open(eval_path))
options = ClaudeAgentOptions(
cwd=".",
setting_sources=["user", "project"],
allowed_tools=["Skill", "Read", "Grep", "Glob"],
)
results = []
for case in suite["cases"]:
r = await run_case(case["prompt"], options)
results.append({"id": case["id"], **r})
# Simple scoring
ar = sum(1 for r in results if r["activated"]) / len(results)
print(f"Activation-rate (rough): {ar:.2%}")
for r in results:
print(r)
if __name__ == "__main__":
asyncio.run(main("evals/daily-digest.skill-eval.yml"))
Step 3: set thresholds (and block bad changes)
Minimum viable policy:
- AR must not drop more than 5% week-over-week
- FPR must not exceed 2–3% on “should not trigger” cases
- Any new skill must include at least 10 eval cases before it ships
This is how you avoid “we added a better description and everything broke.”
A production-ready SKILL.md pattern (activation-oriented)
Below is a strong baseline for a skill intended to format and route a daily digest (low side effects).
---
name: daily-digest
argument-hint: "[folder-or-file-list] [optional: slack-channel]"
description: |
Create a Slack-ready daily digest (bullets + action items) from meeting notes, call notes, or docs.
Use when the user asks for: "daily digest", "Slack update", "client update", "action items", "what changed today".
Only use if the request is about summarizing recent notes/docs into a message to send.
Do NOT use for PR/code summaries or long-form reports.
---
# Daily Digest (Slack-ready)
## Inputs
- A folder path, file list, or pasted notes
- Optional: target Slack channel and tone (client-safe vs internal)
## Output format (strict)
Return **exactly**:
1) **Digest**: 5–10 bullets max
2) **Action items**: owner + due date if present, otherwise "TBD"
3) **Risks/blocks**: if any
4) **Slack message draft**: ready to paste
## Missing inputs
If you do not have a folder path, file list, or notes, ask ONE question to get them.
## Safety
Never post to Slack or write to external systems. Produce drafts only.
## Example triggers
- "Turn these meeting notes into action items + a Slack update"
- "Make a client-safe daily digest from ./notes/2026-03-06"
Notice what this does:
- The description contains natural language triggers
- The “do NOT use” clause reduces collisions
- Output has a strict signature (helps evals)
- Side effects are explicitly disallowed
Turning a skill into a reusable “white‑box workflow template” (the nNode bridge)
Skills are great at “how to do a thing.” But the moment you need:
- Integrations (Drive/Slack/HubSpot/monday)
- Idempotency (“don’t post twice”)
- Approval gates
- Retries and partial failures
- Run logs + artifacts
- Versioning
…you’re no longer in “skill land.” You’re in workflow land.
This is where nNode’s approach—white‑box workflows—is useful: you keep the skill as a portable brain module, but you move the operational responsibilities into a workflow you can inspect, reuse, and debug.
A packaging pattern that scales
Skill → Tool contract → Workflow steps → Run artifacts
- Skill defines routing + strict output schema (drafts, tags, structured results)
- Workflow steps do the side effects (fetch docs, write Slack message, create CRM tasks)
- Artifacts/logs store what happened (inputs, intermediate summaries, approvals)
- Versioning prevents “today’s tweak broke last month’s client workflow”
Versioning strategy (simple and effective)
Use semantic versioning for workflow templates:
v1.2.0= backwards-compatible improvement (better summary, new optional field)v2.0.0= changed output schema or behavior (requires re-approval)
Pin client workflows to a version. Upgrade intentionally.
Mini template: “Client Ops Daily Digest” (agency-friendly)
This is a common agency/internal-ops automation:
Inputs
- Google Drive folder (daily notes)
- Slack channel (client updates)
- CRM pipeline (e.g., HubSpot deals) or project board (e.g., monday.com)
Goal
- Generate a daily digest, get approval, post to Slack, and create follow-up tasks.
Where the Claude Code skill fits
Use the skill for the part that benefits from consistent language and formatting:
- Summarization
- Action item extraction
- Client-safe tone rules
- Standard Slack message shape
Where the workflow fits
Use the workflow for the part that needs operational guarantees:
- Pulling the right docs (and only once)
- Handling missing permissions/scopes
- Preventing duplicates
- Approval gating
- Creating tasks in CRM/PM tool
Example “white-box” workflow outline (conceptual)
# nNode workflow template (conceptual)
name: client-ops-daily-digest
version: 1.0.0
inputs:
drive_folder_id: { type: string }
slack_channel: { type: string }
crm_pipeline_id: { type: string, optional: true }
steps:
- id: fetch_notes
tool: google_drive.list_files
with:
folder_id: "{{inputs.drive_folder_id}}"
modified_since: "{{today}}"
- id: summarize
tool: claude.skill
with:
skill: daily-digest
arguments:
- "{{steps.fetch_notes.files}}"
- "{{inputs.slack_channel}}"
- id: approval
tool: human.approve
with:
summary: "{{steps.summarize.slack_message_draft}}"
- id: post_slack
when: "{{steps.approval.approved}}"
tool: slack.post_message
with:
channel: "{{inputs.slack_channel}}"
text: "{{steps.summarize.slack_message_draft}}"
- id: create_tasks
when: "{{steps.approval.approved}}"
tool: hubspot.create_tasks
with:
tasks: "{{steps.summarize.action_items}}"
artifacts:
- run_log
- inputs_snapshot
- fetched_file_list
- digest_output
- approval_decision
The key is separation of concerns:
- The skill produces a deterministic-ish draft
- The workflow handles side effects, logging, and safety
That’s the difference between a cool demo and something you can run every day for 10 clients.
Common failure modes + debugging checklist
“It didn’t activate”
- Confirm discovery: is it listed, and not excluded by context budget?
- Check description: does it include the user’s words?
- Check collisions: do you have multiple skills with overlapping descriptions?
- Try a “trigger phrase cluster” patch: add 5–10 synonyms users actually say
“It activated, but did the wrong thing”
- Tighten the output schema
- Add explicit preconditions
- Add negative examples
- Reduce scope (split into two skills)
“It activated twice / caused duplicates”
That’s not a skill problem—that’s orchestration.
Fix it in the workflow layer:
- Add idempotency keys (e.g., digest date + folder ID)
- Store “already posted” artifacts
- Make writes conditional on approval + no prior success
“Claude doesn’t see all my skills”
- Run
/contextand look for excluded skills warnings - Prune or consolidate descriptions
- Move bulk reference material into supporting files
- Adjust the skill-description character budget if needed
Conclusion: ship Claude Code skills like software, not like prompts
If you want Claude Code skills to “just work,” you need two upgrades:
- Authoring discipline (routing-first descriptions, collision control, strict output signatures)
- Operational packaging (evals + workflows for side effects, approvals, retries, and observability)
If you’re already building skills and you’re ready to turn them into reusable, client-safe automations, nNode is designed for that “last mile”: packaging skills into white‑box workflows you can inspect, version, and run across your tools.
Explore nNode at nnode.ai when you’re ready to graduate from “it worked in my chat” to “it runs every day."