Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Standard skill layout (recommended):
- `references/` for detailed docs loaded on demand
- `assets/` for templates or data files
- `examples/` for expected outputs
- `evals/` for trigger tests (`triggers.yaml`)
- `tests/` with `run_tests.sh` as entry point
- `action.yml` for GitHub Actions integration

Expand Down
64 changes: 64 additions & 0 deletions SKILL_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ my-skill/
├── references/ # Optional — detailed documentation
├── assets/ # Optional — templates, images, data files
├── examples/ # Optional — example outputs
├── evals/ # Optional (recommended) — trigger tests
└── tests/ # Optional (recommended) — test suite
```

Expand Down Expand Up @@ -277,6 +278,68 @@ Tests are not required by the Agent Skills standard but are strongly recommended

See `skills/repo-audit/tests/` for a reference implementation.

### Description Trigger Testing

The `description` field determines whether an agent activates your Skill. Trigger tests verify that your description matches the right prompts and rejects the wrong ones.

Add an `evals/triggers.yaml` file to your Skill directory:

```yaml
# evals/triggers.yaml
#
# Prompts that SHOULD activate this Skill.
should_trigger:
- "scan this repo for secrets before open-sourcing"
- "audit the codebase for hardcoded API keys"
- "check if there are any credentials committed"

# Prompts that SHOULD NOT activate this Skill.
should_not_trigger:
- "write a unit test for the login function"
- "deploy to production"
- "create a new React component"
```

**Format rules**:

- Both `should_trigger` and `should_not_trigger` are required.
- Minimum 5 entries each. More is better — aim for 8-10.
- Each entry is a plain string representing a realistic user prompt.
- Write prompts in natural language, as a user would actually type them.

See `skills/repo-audit/evals/triggers.yaml` for a reference implementation.

#### Writing Good Trigger Samples

**should_trigger — prompts that must activate your Skill**:

1. **Use realistic user phrasing.** Write prompts the way a user would actually ask, not how a developer would describe the feature.
- Good: "are there any API keys checked into this repo?"
- Bad: "execute secret scanning module"

2. **Vary the wording.** Cover synonyms, different phrasings, and levels of specificity.
- "scan for secrets", "check for leaked credentials", "find hardcoded API keys"

3. **Include indirect requests.** Users don't always name the exact task.
- "I want to open-source this repo, what should I check first?"

4. **Cover your description's keywords.** If your description says "compliance", include a prompt that mentions compliance.

**should_not_trigger — prompts that must NOT activate your Skill**:

1. **Pick adjacent domains.** Choose prompts from related but distinct areas that a naive keyword match might confuse.
- For `repo-audit`: "write a security unit test" (security-adjacent but not an audit)

2. **Include common agent tasks.** Generic prompts like "deploy to production" or "write a README" should not trigger specialized Skills.

3. **Test confusing overlaps.** If your Skill mentions "code quality", add a prompt like "refactor this function for readability" — similar concept, wrong Skill.

**Anti-patterns to avoid**:

- Trivially obvious non-matches: "what's the weather?" tells you nothing useful.
- Prompts that copy your description verbatim — real users don't talk that way.
- Too few samples — 2-3 entries won't catch edge cases.

### Three Consumption Models

Nexus Skills are designed to work at multiple layers. When writing your SKILL.md, consider all three:
Expand Down Expand Up @@ -372,6 +435,7 @@ Avoid abbreviations unless universally understood (`eks`, `ci-cd`, `aws`).
- [ ] Skill added to `.claude-plugin/marketplace.json`
- [ ] Tests exist (if scripts are included)
- [ ] Tests pass locally
- [ ] `evals/triggers.yaml` exists with 5+ should_trigger and 5+ should_not_trigger samples

---

Expand Down
23 changes: 23 additions & 0 deletions SKILL_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,26 @@ metadata:
## Common Pitfalls

- **Pitfall**: Description and how to avoid it.

---

## Trigger Tests

Create `evals/triggers.yaml` to verify your description triggers correctly:

```yaml
# evals/triggers.yaml
should_trigger:
- "prompt that should activate this skill"
- "another way a user might phrase the request"
- "indirect request that implies this skill"
# ... aim for 5-10 entries

should_not_trigger:
- "prompt from an adjacent domain that should NOT match"
- "common generic request that is not this skill's job"
- "confusingly similar request meant for a different skill"
# ... aim for 5-10 entries
```

See [SKILL_GUIDE.md — Description Trigger Testing](SKILL_GUIDE.md#description-trigger-testing) for writing guidance.
33 changes: 33 additions & 0 deletions skills/agent-launcher/evals/triggers.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Trigger tests for agent-launcher
#
# Verifies that the skill's description triggers on relevant prompts
# and does NOT trigger on irrelevant ones.

should_trigger:
- "execute this tasks.yaml with parallel sub-agents"
- "launch the task graph and merge results into an integration branch"
- "run the planned tasks with dependency ordering and isolation"
- "orchestrate agent execution from this task graph"
- "I have a tasks.yaml ready, run it with safety guardrails"
- "execute the implementation run and generate a run report"
- "launch isolated sub-agents for each task in the graph"
- "run this task graph with file-scope boundaries enforced"
- "start the controlled implementation run from the plan"
- "merge task results into a branch after parallel agent execution"
- "the plan is ready, start building everything"
- "kick off the agents to implement these tasks"
- "run all the planned work and give me a report when done"

should_not_trigger:
- "decompose this PRD into domain specs"
- "break down this product spec into domain folders"
- "plan implementation tasks from a spec folder"
- "convert this spec.md into an ordered task plan"
- "create a tasks.yaml from this domain spec"
- "scan the repo for leaked credentials"
- "write a GitHub Actions workflow"
- "set up a CI/CD pipeline"
- "deploy to production with zero downtime"
- "review this code for security issues"
- "run the test suite and report coverage"
- "create a Kubernetes deployment manifest"
30 changes: 30 additions & 0 deletions skills/gha-create/evals/triggers.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Trigger tests for gha-create
#
# Verifies that the skill's description triggers on relevant prompts
# and does NOT trigger on irrelevant ones.

should_trigger:
- "create a GitHub Actions workflow for CI"
- "write a deployment pipeline using GitHub Actions"
- "harden our existing GitHub Actions workflow"
- "add SHA-pinned actions to this workflow file"
- "set up a CI workflow with proper permissions and caching"
- "review this GitHub Actions YAML for security issues"
- "generate a release workflow with OIDC authentication"
- "our CI workflow needs concurrency controls and path filtering"
- "create a workflow that runs tests on pull requests"
- "add least-privilege permissions to this GitHub Actions file"

should_not_trigger:
- "set up a Jenkins pipeline"
- "configure CircleCI for this project"
- "deploy to AWS using Terraform"
- "scan this repo for hardcoded secrets"
- "audit the codebase for leaked API keys before open-sourcing"
- "check this repo for compliance issues"
- "write a bash script to automate deployments"
- "create a Dockerfile for this service"
- "set up GitLab CI for our monorepo"
- "configure Kubernetes health checks"
- "write a unit test for the API endpoint"
- "manage SSH keys for server access"
30 changes: 30 additions & 0 deletions skills/prd-decompose/evals/triggers.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Trigger tests for prd-decompose
#
# Verifies that the skill's description triggers on relevant prompts
# and does NOT trigger on irrelevant ones.

should_trigger:
- "break down this PRD into domain-specific specs"
- "decompose this product requirements document for our agents"
- "split this technical design doc into frontend, backend, and infra specs"
- "I have a PRD that needs to be broken into work units for AI agents"
- "take this product spec and create domain folders with boundary conditions"
- "convert this requirements document into agent-consumable specs"
- "here's a PRD, separate it by domain — backend, security, devops"
- "parse this design document into self-contained specs per domain"
- "I need to decompose a product spec into separate agent tasks"
- "create domain-scoped work units from this requirements doc"

should_not_trigger:
- "scan this repo for hardcoded secrets"
- "review this pull request for bugs"
- "write a unit test for the API endpoint"
- "create a GitHub Actions CI workflow"
- "convert this spec.md and boundary.yaml into a task plan"
- "I have domain specs ready, plan out the implementation tasks"
- "execute the task graph with parallel agents"
- "launch sub-agents to implement the planned tasks"
- "refactor this module into smaller functions"
- "deploy this service to production"
- "generate API documentation from the codebase"
- "estimate how long this feature will take"
30 changes: 30 additions & 0 deletions skills/repo-audit/evals/triggers.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Trigger tests for repo-audit
#
# Verifies that the skill's description triggers on relevant prompts
# and does NOT trigger on irrelevant ones.

should_trigger:
- "scan this repo for secrets before open-sourcing"
- "audit the codebase for hardcoded API keys"
- "check if there are any credentials committed"
- "are there any leaked secrets in this repository?"
- "I want to make this repo public, is it safe?"
- "run a pre-release security checklist on this codebase"
- "find any PII or private keys in the code"
- "check code quality and documentation before publishing"
- "does this repo have any compliance issues?"
- "look for accidentally committed .env files or tokens"

should_not_trigger:
- "write a unit test for the login function"
- "deploy to production"
- "create a new React component"
- "set up a CI/CD pipeline with GitHub Actions"
- "harden this GitHub Actions workflow for security"
- "add security permissions to the CI workflow"
- "refactor this function for readability"
- "write a security-focused unit test"
- "generate a Dockerfile for this project"
- "review this pull request for bugs"
- "create a README for this project"
- "rotate the AWS access keys in our secrets manager"
34 changes: 34 additions & 0 deletions skills/spec-plan/evals/triggers.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Trigger tests for spec-plan
#
# Verifies that the skill's description triggers on relevant prompts
# and does NOT trigger on irrelevant ones.

should_trigger:
- "convert this domain spec folder into a tasks.yaml"
- "plan concrete implementation tasks from the spec and boundary files"
- "create an executable task graph with dependency ordering"
- "I have a spec.md and boundary.yaml, generate a task plan"
- "build a tasks.yaml with parallelization strategy from this domain folder"
- "plan the implementation tasks with file-scope isolation"
- "generate a task graph from this decomposed domain spec"
- "turn this domain specification into ordered tasks for AI agents"
- "create a dependency-ordered task plan from the spec folder"
- "produce tasks.yaml with acceptance criteria traceability"
- "I've got the spec folder ready, now plan out the work"
- "what tasks do we need to implement this spec?"
- "break this spec into concrete steps the agents can execute"

should_not_trigger:
- "decompose this PRD into domain-specific folders"
- "break down this product requirements document by domain"
- "split this technical design doc into frontend and backend specs"
- "execute the task graph with sub-agents"
- "launch parallel agents from tasks.yaml"
- "kick off the agents to implement these tasks"
- "scan this repository for secrets"
- "create a CI/CD workflow with GitHub Actions"
- "write unit tests for this service"
- "review this architecture for scalability"
- "deploy the application to staging"
- "generate a project README"
- "estimate effort for this feature"