PSDN-AI · AndyBoWu · Mar 4, 2026 · Mar 4, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -17,6 +17,7 @@ Standard skill layout (recommended):
 - `references/` for detailed docs loaded on demand
 - `assets/` for templates or data files
 - `examples/` for expected outputs
+- `evals/` for trigger tests (`triggers.yaml`)
 - `tests/` with `run_tests.sh` as entry point
 - `action.yml` for GitHub Actions integration
 

diff --git a/SKILL_GUIDE.md b/SKILL_GUIDE.md
@@ -17,6 +17,7 @@ my-skill/
 ├── references/           # Optional — detailed documentation
 ├── assets/               # Optional — templates, images, data files
 ├── examples/             # Optional — example outputs
+├── evals/                # Optional (recommended) — trigger tests
 └── tests/                # Optional (recommended) — test suite
 ```
 
@@ -277,6 +278,68 @@ Tests are not required by the Agent Skills standard but are strongly recommended
 
 See `skills/repo-audit/tests/` for a reference implementation.
 
+### Description Trigger Testing
+
+The `description` field determines whether an agent activates your Skill. Trigger tests verify that your description matches the right prompts and rejects the wrong ones.
+
+Add an `evals/triggers.yaml` file to your Skill directory:
+
+```yaml
+# evals/triggers.yaml
+#
+# Prompts that SHOULD activate this Skill.
+should_trigger:
+  - "scan this repo for secrets before open-sourcing"
+  - "audit the codebase for hardcoded API keys"
+  - "check if there are any credentials committed"
+
+# Prompts that SHOULD NOT activate this Skill.
+should_not_trigger:
+  - "write a unit test for the login function"
+  - "deploy to production"
+  - "create a new React component"
+```
+
+**Format rules**:
+
+- Both `should_trigger` and `should_not_trigger` are required.
+- Minimum 5 entries each. More is better — aim for 8-10.
+- Each entry is a plain string representing a realistic user prompt.
+- Write prompts in natural language, as a user would actually type them.
+
+See `skills/repo-audit/evals/triggers.yaml` for a reference implementation.
+
+#### Writing Good Trigger Samples
+
+**should_trigger — prompts that must activate your Skill**:
+
+1. **Use realistic user phrasing.** Write prompts the way a user would actually ask, not how a developer would describe the feature.
+   - Good: "are there any API keys checked into this repo?"
+   - Bad: "execute secret scanning module"
+
+2. **Vary the wording.** Cover synonyms, different phrasings, and levels of specificity.
+   - "scan for secrets", "check for leaked credentials", "find hardcoded API keys"
+
+3. **Include indirect requests.** Users don't always name the exact task.
+   - "I want to open-source this repo, what should I check first?"
+
+4. **Cover your description's keywords.** If your description says "compliance", include a prompt that mentions compliance.
+
+**should_not_trigger — prompts that must NOT activate your Skill**:
+
+1. **Pick adjacent domains.** Choose prompts from related but distinct areas that a naive keyword match might confuse.
+   - For `repo-audit`: "write a security unit test" (security-adjacent but not an audit)
+
+2. **Include common agent tasks.** Generic prompts like "deploy to production" or "write a README" should not trigger specialized Skills.
+
+3. **Test confusing overlaps.** If your Skill mentions "code quality", add a prompt like "refactor this function for readability" — similar concept, wrong Skill.
+
+**Anti-patterns to avoid**:
+
+- Trivially obvious non-matches: "what's the weather?" tells you nothing useful.
+- Prompts that copy your description verbatim — real users don't talk that way.
+- Too few samples — 2-3 entries won't catch edge cases.
+
 ### Three Consumption Models
 
 Nexus Skills are designed to work at multiple layers. When writing your SKILL.md, consider all three:
@@ -372,6 +435,7 @@ Avoid abbreviations unless universally understood (`eks`, `ci-cd`, `aws`).
 - [ ] Skill added to `.claude-plugin/marketplace.json`
 - [ ] Tests exist (if scripts are included)
 - [ ] Tests pass locally
+- [ ] `evals/triggers.yaml` exists with 5+ should_trigger and 5+ should_not_trigger samples
 
 ---
 

diff --git a/SKILL_TEMPLATE.md b/SKILL_TEMPLATE.md
@@ -40,3 +40,26 @@ metadata:
 ## Common Pitfalls
 
 - **Pitfall**: Description and how to avoid it.
+
+---
+
+## Trigger Tests
+
+Create `evals/triggers.yaml` to verify your description triggers correctly:
+
+```yaml
+# evals/triggers.yaml
+should_trigger:
+  - "prompt that should activate this skill"
+  - "another way a user might phrase the request"
+  - "indirect request that implies this skill"
+  #  ... aim for 5-10 entries
+
+should_not_trigger:
+  - "prompt from an adjacent domain that should NOT match"
+  - "common generic request that is not this skill's job"
+  - "confusingly similar request meant for a different skill"
+  #  ... aim for 5-10 entries
+```
+
+See [SKILL_GUIDE.md — Description Trigger Testing](SKILL_GUIDE.md#description-trigger-testing) for writing guidance.
diff --git a/skills/agent-launcher/evals/triggers.yaml b/skills/agent-launcher/evals/triggers.yaml
@@ -0,0 +1,33 @@
+# Trigger tests for agent-launcher
+#
+# Verifies that the skill's description triggers on relevant prompts
+# and does NOT trigger on irrelevant ones.
+
+should_trigger:
+  - "execute this tasks.yaml with parallel sub-agents"
+  - "launch the task graph and merge results into an integration branch"
+  - "run the planned tasks with dependency ordering and isolation"
+  - "orchestrate agent execution from this task graph"
+  - "I have a tasks.yaml ready, run it with safety guardrails"
+  - "execute the implementation run and generate a run report"
+  - "launch isolated sub-agents for each task in the graph"
+  - "run this task graph with file-scope boundaries enforced"
+  - "start the controlled implementation run from the plan"
+  - "merge task results into a branch after parallel agent execution"
+  - "the plan is ready, start building everything"
+  - "kick off the agents to implement these tasks"
+  - "run all the planned work and give me a report when done"
+
+should_not_trigger:
+  - "decompose this PRD into domain specs"
+  - "break down this product spec into domain folders"
+  - "plan implementation tasks from a spec folder"
+  - "convert this spec.md into an ordered task plan"
+  - "create a tasks.yaml from this domain spec"
+  - "scan the repo for leaked credentials"
+  - "write a GitHub Actions workflow"
+  - "set up a CI/CD pipeline"
+  - "deploy to production with zero downtime"
+  - "review this code for security issues"
+  - "run the test suite and report coverage"
+  - "create a Kubernetes deployment manifest"
diff --git a/skills/gha-create/evals/triggers.yaml b/skills/gha-create/evals/triggers.yaml
@@ -0,0 +1,30 @@
+# Trigger tests for gha-create
+#
+# Verifies that the skill's description triggers on relevant prompts
+# and does NOT trigger on irrelevant ones.
+
+should_trigger:
+  - "create a GitHub Actions workflow for CI"
+  - "write a deployment pipeline using GitHub Actions"
+  - "harden our existing GitHub Actions workflow"
+  - "add SHA-pinned actions to this workflow file"
+  - "set up a CI workflow with proper permissions and caching"
+  - "review this GitHub Actions YAML for security issues"
+  - "generate a release workflow with OIDC authentication"
+  - "our CI workflow needs concurrency controls and path filtering"
+  - "create a workflow that runs tests on pull requests"
+  - "add least-privilege permissions to this GitHub Actions file"
+
+should_not_trigger:
+  - "set up a Jenkins pipeline"
+  - "configure CircleCI for this project"
+  - "deploy to AWS using Terraform"
+  - "scan this repo for hardcoded secrets"
+  - "audit the codebase for leaked API keys before open-sourcing"
+  - "check this repo for compliance issues"
+  - "write a bash script to automate deployments"
+  - "create a Dockerfile for this service"
+  - "set up GitLab CI for our monorepo"
+  - "configure Kubernetes health checks"
+  - "write a unit test for the API endpoint"
+  - "manage SSH keys for server access"
diff --git a/skills/prd-decompose/evals/triggers.yaml b/skills/prd-decompose/evals/triggers.yaml
@@ -0,0 +1,30 @@
+# Trigger tests for prd-decompose
+#
+# Verifies that the skill's description triggers on relevant prompts
+# and does NOT trigger on irrelevant ones.
+
+should_trigger:
+  - "break down this PRD into domain-specific specs"
+  - "decompose this product requirements document for our agents"
+  - "split this technical design doc into frontend, backend, and infra specs"
+  - "I have a PRD that needs to be broken into work units for AI agents"
+  - "take this product spec and create domain folders with boundary conditions"
+  - "convert this requirements document into agent-consumable specs"
+  - "here's a PRD, separate it by domain — backend, security, devops"
+  - "parse this design document into self-contained specs per domain"
+  - "I need to decompose a product spec into separate agent tasks"
+  - "create domain-scoped work units from this requirements doc"
+
+should_not_trigger:
+  - "scan this repo for hardcoded secrets"
+  - "review this pull request for bugs"
+  - "write a unit test for the API endpoint"
+  - "create a GitHub Actions CI workflow"
+  - "convert this spec.md and boundary.yaml into a task plan"
+  - "I have domain specs ready, plan out the implementation tasks"
+  - "execute the task graph with parallel agents"
+  - "launch sub-agents to implement the planned tasks"
+  - "refactor this module into smaller functions"
+  - "deploy this service to production"
+  - "generate API documentation from the codebase"
+  - "estimate how long this feature will take"
diff --git a/skills/repo-audit/evals/triggers.yaml b/skills/repo-audit/evals/triggers.yaml
@@ -0,0 +1,30 @@
+# Trigger tests for repo-audit
+#
+# Verifies that the skill's description triggers on relevant prompts
+# and does NOT trigger on irrelevant ones.
+
+should_trigger:
+  - "scan this repo for secrets before open-sourcing"
+  - "audit the codebase for hardcoded API keys"
+  - "check if there are any credentials committed"
+  - "are there any leaked secrets in this repository?"
+  - "I want to make this repo public, is it safe?"
+  - "run a pre-release security checklist on this codebase"
+  - "find any PII or private keys in the code"
+  - "check code quality and documentation before publishing"
+  - "does this repo have any compliance issues?"
+  - "look for accidentally committed .env files or tokens"
+
+should_not_trigger:
+  - "write a unit test for the login function"
+  - "deploy to production"
+  - "create a new React component"
+  - "set up a CI/CD pipeline with GitHub Actions"
+  - "harden this GitHub Actions workflow for security"
+  - "add security permissions to the CI workflow"
+  - "refactor this function for readability"
+  - "write a security-focused unit test"
+  - "generate a Dockerfile for this project"
+  - "review this pull request for bugs"
+  - "create a README for this project"
+  - "rotate the AWS access keys in our secrets manager"
diff --git a/skills/spec-plan/evals/triggers.yaml b/skills/spec-plan/evals/triggers.yaml
@@ -0,0 +1,34 @@
+# Trigger tests for spec-plan
+#
+# Verifies that the skill's description triggers on relevant prompts
+# and does NOT trigger on irrelevant ones.
+
+should_trigger:
+  - "convert this domain spec folder into a tasks.yaml"
+  - "plan concrete implementation tasks from the spec and boundary files"
+  - "create an executable task graph with dependency ordering"
+  - "I have a spec.md and boundary.yaml, generate a task plan"
+  - "build a tasks.yaml with parallelization strategy from this domain folder"
+  - "plan the implementation tasks with file-scope isolation"
+  - "generate a task graph from this decomposed domain spec"
+  - "turn this domain specification into ordered tasks for AI agents"
+  - "create a dependency-ordered task plan from the spec folder"
+  - "produce tasks.yaml with acceptance criteria traceability"
+  - "I've got the spec folder ready, now plan out the work"
+  - "what tasks do we need to implement this spec?"
+  - "break this spec into concrete steps the agents can execute"
+
+should_not_trigger:
+  - "decompose this PRD into domain-specific folders"
+  - "break down this product requirements document by domain"
+  - "split this technical design doc into frontend and backend specs"
+  - "execute the task graph with sub-agents"
+  - "launch parallel agents from tasks.yaml"
+  - "kick off the agents to implement these tasks"
+  - "scan this repository for secrets"
+  - "create a CI/CD workflow with GitHub Actions"
+  - "write unit tests for this service"
+  - "review this architecture for scalability"
+  - "deploy the application to staging"
+  - "generate a project README"
+  - "estimate effort for this feature"