A deep guide to understanding how Agent Skills work, how agents load them, and how to build great ones.
An Agent Skill is a folder containing a SKILL.md file. That file includes metadata (name and description) and instructions that tell an AI agent how to perform a specific task. Skills can also bundle scripts, templates, and reference materials.
The Agent Skills specification is an open format adopted by 33+ agent products including Claude Code, GitHub Copilot, Cursor, OpenAI Codex, Gemini CLI, VS Code, JetBrains Junie, Roo Code, OpenHands, and more.
Think of skills as portable expertise packages. They give agents domain-specific knowledge they wouldn't have otherwise — your company's API patterns, your team's deployment workflow, your project's edge cases.
my-skill/
├── SKILL.md # Required: metadata + instructions
├── scripts/ # Optional: executable code
│ └── validate.sh
├── references/ # Optional: documentation
│ └── REFERENCE.md
└── assets/ # Optional: templates, resources
└── template.json
Only SKILL.md is required. Everything else is optional and loaded on demand.
| Directory | Purpose | Loaded When |
|---|---|---|
SKILL.md | Metadata + instructions | On activation |
scripts/ | Executable code the agent runs | When instructions reference them |
references/ | Additional docs the agent reads | When instructions reference them |
assets/ | Static resources (templates, images) | When instructions reference them |
SKILL.md under 500 lines / 5,000 tokens. Move detailed reference material to separate files in references/ and tell the agent when to load each file.
This is the core architectural principle behind Agent Skills. Every skills-compatible agent follows the same three-tier loading strategy:
| Tier | What's Loaded | When | Token Cost |
|---|---|---|---|
| 1. Catalog | Name + description only | Session startup | ~50-100 per skill |
| 2. Instructions | Full SKILL.md body | When skill is activated | <5,000 recommended |
| 3. Resources | Scripts, references, assets | When instructions reference them | Varies |
An agent with 20 installed skills doesn't pay the token cost of 20 full instruction sets upfront — only the ones actually used. With Tier 1, the agent knows what skills exist. With Tier 2, it knows how to use the relevant one. With Tier 3, it gets the detailed references only when needed.
references/api.md if the API returns a non-200 status" is better than "see references/ for details."Every skill goes through five phases:
At session startup, the agent scans skill directories and loads metadata (name + description) for each discovered skill. Agents scan multiple locations in priority order:
| Scope | Example Paths | Priority |
|---|---|---|
| Project | .agents/skills/, .claude/skills/, .cursor/skills/ | Highest |
| User | ~/.agents/skills/, ~/.claude/skills/ | Medium |
| System/Plugin | Agent-bundled skills | Lowest |
Project-level skills override user-level skills with the same name.
The agent builds a catalog of all discovered skills and injects it into the model's context (system prompt or tool description). The model now knows what skills are available.
When the model decides a skill is relevant (based on description matching), or the user explicitly invokes it (e.g., /skill-name), the full SKILL.md body is loaded into context.
Two activation mechanisms:
/skill-name or $skill-name to force activation.The agent follows the skill's instructions, loading referenced files (scripts, references) on demand. Scripts are executed via the agent's shell tools; references are read into context.
Skill content stays in context for the rest of the session. Good implementations protect skill content from context compaction (truncation) to prevent degraded behavior mid-conversation.
Every SKILL.md starts with YAML frontmatter between --- delimiters:
---
name: pdf-processing
description: Extract PDF text, fill forms, merge files. Use when handling PDFs.
license: Apache-2.0
compatibility: Requires Python 3.14+ and uv
metadata:
author: example-org
version: "1.0"
allowed-tools: Bash(git:*) Read
---
| Field | Required | Constraints |
|---|---|---|
name | Yes | 1-64 chars, lowercase alphanumeric + hyphens. No leading/trailing/consecutive hyphens. Must match parent directory name. |
description | Yes | 1-1024 chars. Should describe WHAT the skill does and WHEN to use it. |
license | No | License name or reference to bundled file. |
compatibility | No | Max 500 chars. Environment requirements (runtime, packages, network). |
metadata | No | Key-value map (string → string). Custom metadata for extensions. |
allowed-tools | No | Space-delimited tool list. Experimental — support varies by agent. |
a-z), numbers (0-9), and hyphens (-)--)Some agents add their own frontmatter fields beyond the spec:
| Field | Agent | Purpose |
|---|---|---|
user-invocable | Copilot, VS Code | Show/hide in slash command menu |
disable-model-invocation | Copilot, Cursor | Manual-only activation |
argument-hint | Copilot | Hint text for /command [hint] |
agents/openai.yaml | Codex | UI metadata (display name, icons, color) |
openai.yaml is Codex-only.
The markdown body after the frontmatter contains the skill's instructions. There are no format restrictions — write whatever helps the agent perform the task. But certain patterns consistently produce better results.
The highest-value content in many skills. These are concrete, environment-specific facts that defy reasonable assumptions:
## Gotchas
- The `users` table uses soft deletes. Always include `WHERE deleted_at IS NULL`.
- User ID is `user_id` in the DB, `uid` in auth, and `accountId` in billing.
- `/health` returns 200 even if DB is down. Use `/ready` for full health.
When the agent needs to produce output in a specific format, provide a template. Agents pattern-match well against concrete structures:
## Report structure
Use this template:
# [Analysis Title]
## Executive summary
[One-paragraph overview]
## Key findings
- Finding 1 with data
- Finding 2 with data
Instruct the agent to validate its own work before proceeding:
1. Make your edits
2. Run: python scripts/validate.py output/
3. If validation fails, fix and re-run
4. Only proceed when validation passes
For batch or destructive operations, create an intermediate plan, validate it, then execute:
1. Extract form fields: python scripts/analyze.py input.pdf
2. Create field_values.json mapping each field
3. Validate: python scripts/validate.py form_fields.json field_values.json
4. If validation fails, revise and re-validate
5. Fill: python scripts/fill.py input.pdf field_values.json output.pdf
Scripts in scripts/ let skills perform deterministic operations. The agent executes them via shell tools — they cost zero context tokens (they run, not load).
Bundle dependencies inline so scripts run with a single command:
| Language | Mechanism | Runner |
|---|---|---|
| Python | PEP 723 inline metadata | uv run scripts/extract.py |
| TypeScript | Deno npm: imports | deno run scripts/extract.ts |
| JavaScript | Bun auto-install | bun run scripts/extract.ts |
| Ruby | bundler/inline | ruby scripts/extract.rb |
| Go | go run with URL | go run package@version |
--help — this is how the agent learns the interface.--dry-run for destructive operations.The description field carries the entire burden of triggering. If the description doesn't convey when the skill is useful, the agent won't know to reach for it.
# Bad
description: Helps with PDFs.
# Good
description: >
Extract PDF text, fill forms, and merge files.
Use when handling PDFs or when the user mentions
documents, forms, or text extraction.
Create eval queries labeled with should/shouldn't trigger:
[
{"query": "analyze my sales spreadsheet", "should_trigger": true},
{"query": "convert JSON to YAML", "should_trigger": false}
]
Run each query 3+ times (model behavior is non-deterministic) and compute a trigger rate. Use a 60/40 train/validation split to avoid overfitting the description to your test set.
/skill-name) as a fallback.
The Agent Skills format is adopted by 33+ agents, but each has quirks. Here's what you need to know.
| Agent | Primary Path | Also Scans |
|---|---|---|
| Claude Code | .claude/skills/ | .agents/skills/ |
| GitHub Copilot | .github/skills/ | .agents/skills/, .claude/skills/ |
| Cursor | .cursor/skills/ | .agents/skills/, .claude/skills/, .codex/skills/ |
| OpenAI Codex | .codex/skills/ | .agents/skills/ |
| Gemini CLI | .gemini/skills/ | .agents/skills/ |
| Kiro | .kiro/skills/ | — |
| Roo Code | .roo/skills/ | .agents/skills/ |
.agents/skills/ — the cross-client convention supported by the most agents.
| Issue | Affected Agents | Mitigation |
|---|---|---|
| Underscore fields silently ignored | Copilot, VS Code | Always use hyphens: user-invocable not user_invocable |
allowed-tools validator bug | VS Code | Known bug (issue #14131). Ignore the warning. |
| Tool names differ | All | Bash (Copilot) vs run_shell_command (Gemini) vs bash_tool (Anthropic) |
| Name collision with built-ins | Claude Code | Check for built-in skills before naming yours |
| No YAML comment support | Some parsers | Avoid complex YAML; test with multiple parsers |
A skill that "seems to work" on one prompt may fail on varied inputs. Structured evaluation gives you a feedback loop.
| Good | Bad |
|---|---|
| "Output file is valid JSON" | "Output is good" |
| "Chart has labeled axes" | "Chart looks nice" |
| "Report includes at least 3 recommendations" | "Report is comprehensive" |
{
"with_skill": { "pass_rate": 0.83, "tokens": 3800 },
"without_skill": { "pass_rate": 0.33, "tokens": 2100 },
"delta": { "pass_rate": +0.50, "tokens": +1700 }
}
The delta tells you what the skill costs (tokens) and what it buys (pass rate). A +50% pass rate for +1,700 tokens is usually worth it.
Skills are instructions injected into an agent's context. Untrusted skills pose real risks.
| Risk | Description |
|---|---|
| Prompt injection | Malicious instructions that hijack agent behavior or extract system prompts |
| Data exfiltration | Scripts that send sensitive data to external servers |
| Credential theft | Accessing environment variables, API keys, or tokens |
| Destructive actions | Scripts that delete files, drop databases, or modify production systems |
Claude Code, GitHub Copilot, Cursor, OpenAI Codex, VS Code, Gemini CLI, JetBrains Junie, Roo Code, OpenHands, Amp, Letta, Goose, Kiro, TRAE, Factory, Firebender, Spring AI, Databricks Genie Code, Snowflake Cortex Code, and more.