Agent Skills: Concepts

A deep guide to understanding how Agent Skills work, how agents load them, and how to build great ones.

Contents

  1. What Are Agent Skills?
  2. Anatomy of a Skill
  3. Progressive Disclosure
  4. The Skill Lifecycle
  5. Frontmatter Specification
  6. Writing the Body
  7. Scripts & Executable Code
  8. Descriptions & Activation
  9. Cross-Agent Compatibility
  10. Evaluating Skills
  11. Security Considerations
  12. The Ecosystem

1. What Are Agent Skills?

An Agent Skill is a folder containing a SKILL.md file. That file includes metadata (name and description) and instructions that tell an AI agent how to perform a specific task. Skills can also bundle scripts, templates, and reference materials.

The Agent Skills specification is an open format adopted by 33+ agent products including Claude Code, GitHub Copilot, Cursor, OpenAI Codex, Gemini CLI, VS Code, JetBrains Junie, Roo Code, OpenHands, and more.

Think of skills as portable expertise packages. They give agents domain-specific knowledge they wouldn't have otherwise — your company's API patterns, your team's deployment workflow, your project's edge cases.

Key insight Skills are not plugins, extensions, or tools. They are instructions — natural language guidance that an agent reads and follows. They extend what the agent knows, not what it can do.

2. Anatomy of a Skill

my-skill/
├── SKILL.md          # Required: metadata + instructions
├── scripts/          # Optional: executable code
│   └── validate.sh
├── references/       # Optional: documentation
│   └── REFERENCE.md
└── assets/           # Optional: templates, resources
    └── template.json

Only SKILL.md is required. Everything else is optional and loaded on demand.

DirectoryPurposeLoaded When
SKILL.mdMetadata + instructionsOn activation
scripts/Executable code the agent runsWhen instructions reference them
references/Additional docs the agent readsWhen instructions reference them
assets/Static resources (templates, images)When instructions reference them
Rule of thumb Keep SKILL.md under 500 lines / 5,000 tokens. Move detailed reference material to separate files in references/ and tell the agent when to load each file.

3. Progressive Disclosure

This is the core architectural principle behind Agent Skills. Every skills-compatible agent follows the same three-tier loading strategy:

Tier 1: Catalog Tier 2: Instructions Tier 3: Resources
TierWhat's LoadedWhenToken Cost
1. CatalogName + description onlySession startup~50-100 per skill
2. InstructionsFull SKILL.md bodyWhen skill is activated<5,000 recommended
3. ResourcesScripts, references, assetsWhen instructions reference themVaries

Why This Matters

An agent with 20 installed skills doesn't pay the token cost of 20 full instruction sets upfront — only the ones actually used. With Tier 1, the agent knows what skills exist. With Tier 2, it knows how to use the relevant one. With Tier 3, it gets the detailed references only when needed.

Context budget reality Claude Code caches descriptions up to a 15,000 character budget. OpenAI Codex has 132 system skills that would cost ~100,000 tokens if fully loaded — metadata-only loading costs ~10,000. This is why progressive disclosure isn't optional.

What This Means for Skill Authors

4. The Skill Lifecycle

Every skill goes through five phases:

Phase 1: Discovery

At session startup, the agent scans skill directories and loads metadata (name + description) for each discovered skill. Agents scan multiple locations in priority order:

ScopeExample PathsPriority
Project.agents/skills/, .claude/skills/, .cursor/skills/Highest
User~/.agents/skills/, ~/.claude/skills/Medium
System/PluginAgent-bundled skillsLowest

Project-level skills override user-level skills with the same name.

Phase 2: Catalog Injection

The agent builds a catalog of all discovered skills and injects it into the model's context (system prompt or tool description). The model now knows what skills are available.

Phase 3: Activation

When the model decides a skill is relevant (based on description matching), or the user explicitly invokes it (e.g., /skill-name), the full SKILL.md body is loaded into context.

Two activation mechanisms:

Phase 4: Execution

The agent follows the skill's instructions, loading referenced files (scripts, references) on demand. Scripts are executed via the agent's shell tools; references are read into context.

Phase 5: Context Management

Skill content stays in context for the rest of the session. Good implementations protect skill content from context compaction (truncation) to prevent degraded behavior mid-conversation.

5. Frontmatter Specification

Every SKILL.md starts with YAML frontmatter between --- delimiters:

---
name: pdf-processing
description: Extract PDF text, fill forms, merge files. Use when handling PDFs.
license: Apache-2.0
compatibility: Requires Python 3.14+ and uv
metadata:
  author: example-org
  version: "1.0"
allowed-tools: Bash(git:*) Read
---
FieldRequiredConstraints
nameYes1-64 chars, lowercase alphanumeric + hyphens. No leading/trailing/consecutive hyphens. Must match parent directory name.
descriptionYes1-1024 chars. Should describe WHAT the skill does and WHEN to use it.
licenseNoLicense name or reference to bundled file.
compatibilityNoMax 500 chars. Environment requirements (runtime, packages, network).
metadataNoKey-value map (string → string). Custom metadata for extensions.
allowed-toolsNoSpace-delimited tool list. Experimental — support varies by agent.

Name Rules

Agent-Specific Extensions

Some agents add their own frontmatter fields beyond the spec:

FieldAgentPurpose
user-invocableCopilot, VS CodeShow/hide in slash command menu
disable-model-invocationCopilot, CursorManual-only activation
argument-hintCopilotHint text for /command [hint]
agents/openai.yamlCodexUI metadata (display name, icons, color)
Portability warning Agent-specific fields are silently ignored by other agents. Copilot ignores underscore fields (use hyphens). Codex's openai.yaml is Codex-only.

6. Writing the Body

The markdown body after the frontmatter contains the skill's instructions. There are no format restrictions — write whatever helps the agent perform the task. But certain patterns consistently produce better results.

Effective Patterns

Gotchas Sections

The highest-value content in many skills. These are concrete, environment-specific facts that defy reasonable assumptions:

## Gotchas

- The `users` table uses soft deletes. Always include `WHERE deleted_at IS NULL`.
- User ID is `user_id` in the DB, `uid` in auth, and `accountId` in billing.
- `/health` returns 200 even if DB is down. Use `/ready` for full health.

Output Templates

When the agent needs to produce output in a specific format, provide a template. Agents pattern-match well against concrete structures:

## Report structure

Use this template:

# [Analysis Title]

## Executive summary
[One-paragraph overview]

## Key findings
- Finding 1 with data
- Finding 2 with data

Validation Loops

Instruct the agent to validate its own work before proceeding:

1. Make your edits
2. Run: python scripts/validate.py output/
3. If validation fails, fix and re-run
4. Only proceed when validation passes

Plan-Validate-Execute

For batch or destructive operations, create an intermediate plan, validate it, then execute:

1. Extract form fields: python scripts/analyze.py input.pdf
2. Create field_values.json mapping each field
3. Validate: python scripts/validate.py form_fields.json field_values.json
4. If validation fails, revise and re-validate
5. Fill: python scripts/fill.py input.pdf field_values.json output.pdf

Anti-Patterns

7. Scripts & Executable Code

Scripts in scripts/ let skills perform deterministic operations. The agent executes them via shell tools — they cost zero context tokens (they run, not load).

Self-Contained Scripts

Bundle dependencies inline so scripts run with a single command:

LanguageMechanismRunner
PythonPEP 723 inline metadatauv run scripts/extract.py
TypeScriptDeno npm: importsdeno run scripts/extract.ts
JavaScriptBun auto-installbun run scripts/extract.ts
Rubybundler/inlineruby scripts/extract.rb
Gogo run with URLgo run package@version

Design Rules for Agentic Scripts

8. Descriptions & Activation

The description field carries the entire burden of triggering. If the description doesn't convey when the skill is useful, the agent won't know to reach for it.

Writing Effective Descriptions

# Bad
description: Helps with PDFs.

# Good
description: >
  Extract PDF text, fill forms, and merge files.
  Use when handling PDFs or when the user mentions
  documents, forms, or text extraction.

Principles

Testing Trigger Accuracy

Create eval queries labeled with should/shouldn't trigger:

[
  {"query": "analyze my sales spreadsheet", "should_trigger": true},
  {"query": "convert JSON to YAML", "should_trigger": false}
]

Run each query 3+ times (model behavior is non-deterministic) and compute a trigger rate. Use a 60/40 train/validation split to avoid overfitting the description to your test set.

Known issue Claude Code's auto-triggering becomes unreliable at 40+ loaded skills. This is a fundamental context budget problem — descriptions compete for attention. Keep descriptions focused and use explicit invocation (/skill-name) as a fallback.

9. Cross-Agent Compatibility

The Agent Skills format is adopted by 33+ agents, but each has quirks. Here's what you need to know.

Discovery Paths

AgentPrimary PathAlso Scans
Claude Code.claude/skills/.agents/skills/
GitHub Copilot.github/skills/.agents/skills/, .claude/skills/
Cursor.cursor/skills/.agents/skills/, .claude/skills/, .codex/skills/
OpenAI Codex.codex/skills/.agents/skills/
Gemini CLI.gemini/skills/.agents/skills/
Kiro.kiro/skills/
Roo Code.roo/skills/.agents/skills/
Maximum portability Place skills in .agents/skills/ — the cross-client convention supported by the most agents.

Common Compatibility Issues

IssueAffected AgentsMitigation
Underscore fields silently ignoredCopilot, VS CodeAlways use hyphens: user-invocable not user_invocable
allowed-tools validator bugVS CodeKnown bug (issue #14131). Ignore the warning.
Tool names differAllBash (Copilot) vs run_shell_command (Gemini) vs bash_tool (Anthropic)
Name collision with built-insClaude CodeCheck for built-in skills before naming yours
No YAML comment supportSome parsersAvoid complex YAML; test with multiple parsers

10. Evaluating Skills

A skill that "seems to work" on one prompt may fail on varied inputs. Structured evaluation gives you a feedback loop.

The Eval Loop

  1. Write test cases: prompt + expected output + optional input files
  2. Run with/without skill: each test case runs twice for baseline comparison
  3. Write assertions: verifiable checks after seeing first outputs
  4. Grade: PASS/FAIL with specific evidence for each assertion
  5. Aggregate: compute pass rates, token costs, time deltas
  6. Iterate: fix the skill based on failed assertions + execution transcripts

What Good Assertions Look Like

GoodBad
"Output file is valid JSON""Output is good"
"Chart has labeled axes""Chart looks nice"
"Report includes at least 3 recommendations""Report is comprehensive"

Key Metrics

{
  "with_skill":    { "pass_rate": 0.83, "tokens": 3800 },
  "without_skill": { "pass_rate": 0.33, "tokens": 2100 },
  "delta":         { "pass_rate": +0.50, "tokens": +1700 }
}

The delta tells you what the skill costs (tokens) and what it buys (pass rate). A +50% pass rate for +1,700 tokens is usually worth it.

11. Security Considerations

Skills are instructions injected into an agent's context. Untrusted skills pose real risks.

Threat Model

RiskDescription
Prompt injectionMalicious instructions that hijack agent behavior or extract system prompts
Data exfiltrationScripts that send sensitive data to external servers
Credential theftAccessing environment variables, API keys, or tokens
Destructive actionsScripts that delete files, drop databases, or modify production systems

Mitigations

Enterprise warning The current spec has no security model. Skills cannot declare capabilities, sign their contents, or be verified. The @skillscraft extended spec is working to address this with capability declarations, credential requirements, and signed skills.

12. The Ecosystem

Compatible Agents (33+)

Claude Code, GitHub Copilot, Cursor, OpenAI Codex, VS Code, Gemini CLI, JetBrains Junie, Roo Code, OpenHands, Amp, Letta, Goose, Kiro, TRAE, Factory, Firebender, Spring AI, Databricks Genie Code, Snowflake Cortex Code, and more.

Further Reading