Agent Skills: Concepts

A deep guide to understanding how Agent Skills work, how agents load them, and how to build great ones.

What Are Agent Skills?
Anatomy of a Skill
Progressive Disclosure
The Skill Lifecycle
Frontmatter Specification
Writing the Body
Scripts & Executable Code
Descriptions & Activation
Cross-Agent Compatibility
Evaluating Skills
Security Considerations
The Ecosystem

1. What Are Agent Skills?

An Agent Skill is a folder containing a SKILL.md file. That file includes metadata (name and description) and instructions that tell an AI agent how to perform a specific task. Skills can also bundle scripts, templates, and reference materials.

The Agent Skills specification is an open format adopted by 33+ agent products including Claude Code, GitHub Copilot, Cursor, OpenAI Codex, Gemini CLI, VS Code, JetBrains Junie, Roo Code, OpenHands, and more.

Think of skills as portable expertise packages. They give agents domain-specific knowledge they wouldn't have otherwise — your company's API patterns, your team's deployment workflow, your project's edge cases.

Key insight Skills are not plugins, extensions, or tools. They are instructions — natural language guidance that an agent reads and follows. They extend what the agent knows, not what it can do.

2. Anatomy of a Skill

my-skill/
├── SKILL.md          # Required: metadata + instructions
├── scripts/          # Optional: executable code
│   └── validate.sh
├── references/       # Optional: documentation
│   └── REFERENCE.md
└── assets/           # Optional: templates, resources
    └── template.json

Only SKILL.md is required. Everything else is optional and loaded on demand.

Directory	Purpose	Loaded When
`SKILL.md`	Metadata + instructions	On activation
`scripts/`	Executable code the agent runs	When instructions reference them
`references/`	Additional docs the agent reads	When instructions reference them
`assets/`	Static resources (templates, images)	When instructions reference them

Rule of thumb Keep SKILL.md under 500 lines / 5,000 tokens. Move detailed reference material to separate files in references/ and tell the agent when to load each file.

3. Progressive Disclosure

This is the core architectural principle behind Agent Skills. Every skills-compatible agent follows the same three-tier loading strategy:

Tier 1: Catalog → Tier 2: Instructions → Tier 3: Resources

Tier	What's Loaded	When	Token Cost
1. Catalog	Name + description only	Session startup	~50-100 per skill
2. Instructions	Full SKILL.md body	When skill is activated	<5,000 recommended
3. Resources	Scripts, references, assets	When instructions reference them	Varies

Why This Matters

An agent with 20 installed skills doesn't pay the token cost of 20 full instruction sets upfront — only the ones actually used. With Tier 1, the agent knows what skills exist. With Tier 2, it knows how to use the relevant one. With Tier 3, it gets the detailed references only when needed.

Context budget reality Claude Code caches descriptions up to a 15,000 character budget. OpenAI Codex has 132 system skills that would cost ~100,000 tokens if fully loaded — metadata-only loading costs ~10,000. This is why progressive disclosure isn't optional.

What This Means for Skill Authors

Description is everything — it's the only thing the agent sees before activation. If the description doesn't convey when to use the skill, the agent won't activate it.
Keep SKILL.md lean — under 5,000 tokens. Move reference material to separate files.
Tell the agent when to load files — "Read references/api.md if the API returns a non-200 status" is better than "see references/ for details."

4. The Skill Lifecycle

Every skill goes through five phases:

Phase 1: Discovery

At session startup, the agent scans skill directories and loads metadata (name + description) for each discovered skill. Agents scan multiple locations in priority order:

Scope	Example Paths	Priority
Project	`.agents/skills/`, `.claude/skills/`, `.cursor/skills/`	Highest
User	`~/.agents/skills/`, `~/.claude/skills/`	Medium
System/Plugin	Agent-bundled skills	Lowest

Project-level skills override user-level skills with the same name.

Phase 2: Catalog Injection

The agent builds a catalog of all discovered skills and injects it into the model's context (system prompt or tool description). The model now knows what skills are available.

Phase 3: Activation

When the model decides a skill is relevant (based on description matching), or the user explicitly invokes it (e.g., /skill-name), the full SKILL.md body is loaded into context.

Two activation mechanisms:

Model-driven: The agent's LLM reads the catalog and decides which skill is relevant. This is the default for most agents.
User-explicit: The user types /skill-name or $skill-name to force activation.

Phase 4: Execution

The agent follows the skill's instructions, loading referenced files (scripts, references) on demand. Scripts are executed via the agent's shell tools; references are read into context.

Phase 5: Context Management

Skill content stays in context for the rest of the session. Good implementations protect skill content from context compaction (truncation) to prevent degraded behavior mid-conversation.

5. Frontmatter Specification

Every SKILL.md starts with YAML frontmatter between --- delimiters:

---
name: pdf-processing
description: Extract PDF text, fill forms, merge files. Use when handling PDFs.
license: Apache-2.0
compatibility: Requires Python 3.14+ and uv
metadata:
  author: example-org
  version: "1.0"
allowed-tools: Bash(git:*) Read
---

Field	Required	Constraints
`name`	Yes	1-64 chars, lowercase alphanumeric + hyphens. No leading/trailing/consecutive hyphens. Must match parent directory name.
`description`	Yes	1-1024 chars. Should describe WHAT the skill does and WHEN to use it.
`license`	No	License name or reference to bundled file.
`compatibility`	No	Max 500 chars. Environment requirements (runtime, packages, network).
`metadata`	No	Key-value map (string → string). Custom metadata for extensions.
`allowed-tools`	No	Space-delimited tool list. Experimental — support varies by agent.

Name Rules

Only lowercase letters (a-z), numbers (0-9), and hyphens (-)
Cannot start or end with a hyphen
No consecutive hyphens (--)
Must match the parent directory name exactly

Agent-Specific Extensions

Some agents add their own frontmatter fields beyond the spec:

Field	Agent	Purpose
`user-invocable`	Copilot, VS Code	Show/hide in slash command menu
`disable-model-invocation`	Copilot, Cursor	Manual-only activation
`argument-hint`	Copilot	Hint text for `/command [hint]`
`agents/openai.yaml`	Codex	UI metadata (display name, icons, color)

Portability warning Agent-specific fields are silently ignored by other agents. Copilot ignores underscore fields (use hyphens). Codex's openai.yaml is Codex-only.

6. Writing the Body

The markdown body after the frontmatter contains the skill's instructions. There are no format restrictions — write whatever helps the agent perform the task. But certain patterns consistently produce better results.

Effective Patterns

Gotchas Sections

The highest-value content in many skills. These are concrete, environment-specific facts that defy reasonable assumptions:

## Gotchas

- The `users` table uses soft deletes. Always include `WHERE deleted_at IS NULL`.
- User ID is `user_id` in the DB, `uid` in auth, and `accountId` in billing.
- `/health` returns 200 even if DB is down. Use `/ready` for full health.

Output Templates

When the agent needs to produce output in a specific format, provide a template. Agents pattern-match well against concrete structures:

## Report structure

Use this template:

# [Analysis Title]

## Executive summary
[One-paragraph overview]

## Key findings
- Finding 1 with data
- Finding 2 with data

Validation Loops

Instruct the agent to validate its own work before proceeding:

1. Make your edits
2. Run: python scripts/validate.py output/
3. If validation fails, fix and re-run
4. Only proceed when validation passes

Plan-Validate-Execute

For batch or destructive operations, create an intermediate plan, validate it, then execute:

1. Extract form fields: python scripts/analyze.py input.pdf
2. Create field_values.json mapping each field
3. Validate: python scripts/validate.py form_fields.json field_values.json
4. If validation fails, revise and re-validate
5. Fill: python scripts/fill.py input.pdf field_values.json output.pdf

Anti-Patterns

Generic instructions: "Handle errors appropriately" — the agent already knows this. Be specific.
Option menus: "You can use A, B, or C..." — pick a default, mention alternatives briefly.
Obvious knowledge: Don't explain what a PDF is or how HTTP works. Focus on what the agent wouldn't know without your skill.
Exhaustive coverage: Concise step-by-step guidance with a working example beats comprehensive documentation.

7. Scripts & Executable Code

Scripts in scripts/ let skills perform deterministic operations. The agent executes them via shell tools — they cost zero context tokens (they run, not load).

Self-Contained Scripts

Bundle dependencies inline so scripts run with a single command:

Language	Mechanism	Runner
Python	PEP 723 inline metadata	`uv run scripts/extract.py`
TypeScript	Deno `npm:` imports	`deno run scripts/extract.ts`
JavaScript	Bun auto-install	`bun run scripts/extract.ts`
Ruby	`bundler/inline`	`ruby scripts/extract.rb`
Go	`go run` with URL	`go run package@version`

Design Rules for Agentic Scripts

No interactive prompts — agents can't respond to TTY input. Use flags and env vars.
Document with --help — this is how the agent learns the interface.
Structured output — JSON/CSV over free-form text. Data to stdout, diagnostics to stderr.
Helpful errors — say what went wrong, what was expected, and what to try.
Idempotent — "create if not exists" beats "create and fail on duplicate."
Dry-run support — --dry-run for destructive operations.
Meaningful exit codes — distinct codes for different failure types.
Bounded output size — agent harnesses truncate at 10-30K chars. Default to summaries.

8. Descriptions & Activation

The description field carries the entire burden of triggering. If the description doesn't convey when the skill is useful, the agent won't know to reach for it.

Writing Effective Descriptions

# Bad
description: Helps with PDFs.

# Good
description: >
  Extract PDF text, fill forms, and merge files.
  Use when handling PDFs or when the user mentions
  documents, forms, or text extraction.

Principles

Imperative phrasing: "Use this skill when..." not "This skill does..."
User intent, not implementation: Describe what the user wants, not the skill's mechanics.
Be pushy: "even if they don't explicitly mention 'CSV' or 'analysis'"
Concise: A few sentences. Hard limit: 1,024 characters.

Testing Trigger Accuracy

Create eval queries labeled with should/shouldn't trigger:

[
  {"query": "analyze my sales spreadsheet", "should_trigger": true},
  {"query": "convert JSON to YAML", "should_trigger": false}
]

Run each query 3+ times (model behavior is non-deterministic) and compute a trigger rate. Use a 60/40 train/validation split to avoid overfitting the description to your test set.

Known issue Claude Code's auto-triggering becomes unreliable at 40+ loaded skills. This is a fundamental context budget problem — descriptions compete for attention. Keep descriptions focused and use explicit invocation (/skill-name) as a fallback.

9. Cross-Agent Compatibility

The Agent Skills format is adopted by 33+ agents, but each has quirks. Here's what you need to know.

Discovery Paths

Agent	Primary Path	Also Scans
Claude Code	`.claude/skills/`	`.agents/skills/`
GitHub Copilot	`.github/skills/`	`.agents/skills/`, `.claude/skills/`
Cursor	`.cursor/skills/`	`.agents/skills/`, `.claude/skills/`, `.codex/skills/`
OpenAI Codex	`.codex/skills/`	`.agents/skills/`
Gemini CLI	`.gemini/skills/`	`.agents/skills/`
Kiro	`.kiro/skills/`	—
Roo Code	`.roo/skills/`	`.agents/skills/`

Maximum portability Place skills in .agents/skills/ — the cross-client convention supported by the most agents.

Common Compatibility Issues

Issue	Affected Agents	Mitigation
Underscore fields silently ignored	Copilot, VS Code	Always use hyphens: `user-invocable` not `user_invocable`
`allowed-tools` validator bug	VS Code	Known bug (issue #14131). Ignore the warning.
Tool names differ	All	`Bash` (Copilot) vs `run_shell_command` (Gemini) vs `bash_tool` (Anthropic)
Name collision with built-ins	Claude Code	Check for built-in skills before naming yours
No YAML comment support	Some parsers	Avoid complex YAML; test with multiple parsers

10. Evaluating Skills

A skill that "seems to work" on one prompt may fail on varied inputs. Structured evaluation gives you a feedback loop.

The Eval Loop

Write test cases: prompt + expected output + optional input files
Run with/without skill: each test case runs twice for baseline comparison
Write assertions: verifiable checks after seeing first outputs
Grade: PASS/FAIL with specific evidence for each assertion
Aggregate: compute pass rates, token costs, time deltas
Iterate: fix the skill based on failed assertions + execution transcripts

What Good Assertions Look Like

Good	Bad
"Output file is valid JSON"	"Output is good"
"Chart has labeled axes"	"Chart looks nice"
"Report includes at least 3 recommendations"	"Report is comprehensive"

Key Metrics

{
  "with_skill":    { "pass_rate": 0.83, "tokens": 3800 },
  "without_skill": { "pass_rate": 0.33, "tokens": 2100 },
  "delta":         { "pass_rate": +0.50, "tokens": +1700 }
}

The delta tells you what the skill costs (tokens) and what it buys (pass rate). A +50% pass rate for +1,700 tokens is usually worth it.

11. Security Considerations

Skills are instructions injected into an agent's context. Untrusted skills pose real risks.

Threat Model

Risk	Description
Prompt injection	Malicious instructions that hijack agent behavior or extract system prompts
Data exfiltration	Scripts that send sensitive data to external servers
Credential theft	Accessing environment variables, API keys, or tokens
Destructive actions	Scripts that delete files, drop databases, or modify production systems

Mitigations

Trust gating: Only load project-level skills from trusted repositories
Capability declarations: Skills should declare what they need (network, file access, shell)
Script review: Audit all bundled scripts before enabling a skill
No hardcoded secrets: Never put API keys or tokens in SKILL.md or scripts
Network policies: Declare which external hosts scripts need to reach

Enterprise warning The current spec has no security model. Skills cannot declare capabilities, sign their contents, or be verified. The @skillscraft extended spec is working to address this with capability declarations, credential requirements, and signed skills.

12. The Ecosystem

Compatible Agents (33+)

Claude Code, GitHub Copilot, Cursor, OpenAI Codex, VS Code, Gemini CLI, JetBrains Junie, Roo Code, OpenHands, Amp, Letta, Goose, Kiro, TRAE, Factory, Firebender, Spring AI, Databricks Genie Code, Snowflake Cortex Code, and more.

Agent Skills: Concepts

Contents

1. What Are Agent Skills?

2. Anatomy of a Skill

3. Progressive Disclosure

Why This Matters

What This Means for Skill Authors

4. The Skill Lifecycle

Phase 1: Discovery

Phase 2: Catalog Injection

Phase 3: Activation

Phase 4: Execution

Phase 5: Context Management

5. Frontmatter Specification

Name Rules

Agent-Specific Extensions

6. Writing the Body

Effective Patterns

Gotchas Sections

Output Templates

Validation Loops

Plan-Validate-Execute

Anti-Patterns

7. Scripts & Executable Code

Self-Contained Scripts

Design Rules for Agentic Scripts

8. Descriptions & Activation

Writing Effective Descriptions

Principles

Testing Trigger Accuracy

9. Cross-Agent Compatibility

Discovery Paths

Common Compatibility Issues

10. Evaluating Skills

The Eval Loop

What Good Assertions Look Like

Key Metrics

11. Security Considerations

Threat Model

Mitigations

12. The Ecosystem

Compatible Agents (33+)

Further Reading