test: comprehensive validation layer for skills/agents/commands by alirezarezvani · Pull Request #58 · alirezarezvani/claude-code-tresor

alirezarezvani · 2026-05-19T01:28:16Z

Replaces #57 (closed because the original branch name claude/analyze-test-coverage-B1AOQ didn't match the repo's required feat|fix|test|... prefix pattern). Same commits, same diff.

Why

The old CI quality gate only ran yaml.safe_load against skills/ (8 files) and agents/ (8 symlinks). It silently skipped subagents/ (133 agents), commands/ (24 files), and the 1,142 lines of bash in scripts/install.sh + scripts/update.sh. This PR closes those gaps and fixes the bugs the new checks immediately surfaced.

What's new

Check	Script	Coverage
JSON-schema frontmatter validation	`scripts/validate_frontmatter.py` + `validation/schemas/*.json`	8 skills + 133 agents + 24 commands
Agent/skill cross-reference resolver	`scripts/check_agent_references.py` + baseline file	All `@name` mentions in commands, skills, root docs
Documented-count drift detector	`scripts/check_doc_counts.py`	CLAUDE.md and README.md counts
Install / update smoke test	`scripts/smoke_install.sh`	Fresh install + idempotent re-install + update against local clone
ShellCheck (error severity)	inline in CI	install.sh, update.sh, smoke_install.sh
Markdown internal-link check	`.markdown-link-check.json`	README.md, CLAUDE.md, NAVIGATION.md

Bugs fixed (caught by the new checks)

14 command frontmatter files had unquoted argument-hint: values starting with [, which YAML interpreted as broken arrays. Old CI never validated commands/.
install.sh placed BACKUP_DIR inside CLAUDE_CODE_DIR, causing cp -r to recurse into itself on every re-run. The smoke test caught this on the first try.
README quality badge pointed at docs/VALIDATION-REPORT-CONTENT.md — the file had moved to docs/archive/.
install.sh / update.sh had no env-override for TRESOR_DIR or the repo URL, so they couldn't be tested in isolation. Added env-overridable defaults (no behavior change for real users).

Known baseline

validation/known-unresolved-refs.txt snapshots 26 referenced agent names in orchestration command docs (@auth-security-specialist, @anomaly-detection-specialist, etc.) that don't exist in the 133-agent catalog. These look like aspirational names from when those commands were designed. CI now fails on any new unresolved name; this list can be drained incrementally.

What I deliberately did NOT do

Did not enforce the Read/Write/Edit/Grep/Glob-only tool policy on skills, despite security-auditor declaring Bash. That's a real policy/doc inconsistency — flagged here but out of scope for a testing PR.
Did not fix the 100+ shellcheck SC2155 warnings in the install/update scripts. Severity is error only; the SC2155/SC2115 cleanup is a separate task.
Did not auto-fix the 26 unresolved agent references. Each one needs a human decision (rename docs vs. create the agent).

Test plan

python3 scripts/validate_frontmatter.py all → 165 files, 0 errors
python3 scripts/check_agent_references.py → 26 baselined, 0 new
python3 scripts/check_doc_counts.py → 5 checks pass, 0 drift
shellcheck --severity=error scripts/*.sh → passes
bash scripts/smoke_install.sh → installs 8 skills, 8 agents, 24 commands; idempotent re-install; backup created
markdown-link-check against README.md, CLAUDE.md, NAVIGATION.md → no dead internal links
Lint, Tests, Docs, Security CI check on previous PR (test: comprehensive validation layer for skills/agents/commands #57) → success

Generated by Claude Code

claude · 2026-05-19T01:30:06Z

Code Review — PR #58: Comprehensive Validation Layer

Overall: Strong, well-scoped testing PR that closes real coverage gaps. The Python scripts are clean, the baseline-file pattern for known-unresolved refs is pragmatic, and the BACKUP_DIR bug fix in install.sh is an important correctness improvement. Several bugs and inconsistencies need addressing before merge.

Critical Bugs

1. update.sh — BACKUP_DIR still inside CLAUDE_CODE_DIR (recursive-copy bug not fully fixed)

install.sh was correctly patched:

BACKUP_DIR="${CLAUDE_CODE_DIR}.backup-$(date +%Y%m%d-%H%M%S)"  # sibling ✓

But update.sh still has:

BACKUP_DIR="${CLAUDE_CODE_DIR}/backup-update-$(date +%Y%m%d-%H%M%S)"  # child ✗

The same infinite-recurse-on-cp -r risk applies. Apply the same sibling pattern:

BACKUP_DIR="${CLAUDE_CODE_DIR}.backup-update-$(date +%Y%m%d-%H%M%S)"

2. check_doc_counts.py — "Extended Agents" check compares against wrong counter

CLAUDE.md documents **133 Extended Agents** (i.e., total minus the 8 core). The check maps it to the agents counter, which counts all agent.md files in subagents/ (core included). If there are 141 total agents, the check will always fail with "doc says 133, actual is 141".

Fix: add a dedicated counter and use it:

def count_extended_agents() -> int:
    return count_agents() - count_core_agents()

COUNTERS = {
    ...
    "extended-agents": count_extended_agents,
}

# in CHECKS:
("CLAUDE.md", "extended-agents", r"\*\*(\d+)\s+Extended\s+Agents\*\*", "Extended agents count"),

Schema Issues

3. validation/schemas/skill.schema.json — wrong $id URL

"$id": "https://github.com/alirezarezvani/coderezz/claude-code-tresor/..."

Has a spurious coderezz/ segment. The other two schemas use the correct path:

"$id": "https://github.com/alirezarezvani/claude-code-tresor/..."

Fix the $id in skill.schema.json.

4. skill.schema.json — allowed-tools is string-only, but agent.schema.json accepts both string and array

Per CLAUDE.md: "Skills have limited tool access (Read, Write, Edit, Grep, Glob)". The current schema locks tools to a comma-separated string, which matches existing skill files. However, adding oneOf: [string, array] (matching the agent schema pattern) would future-proof this without breaking anything.

Code Quality Issues

5. check_agent_references.py — indented code fences not handled

The in_code toggle fires on lines that startswith("```") after lstrip(). This will miss code fences that are indented (e.g., inside a blockquote or list item):

- Example:
    ```bash
    @some-agent-name do something
    ```

Agent names in those blocks will be scanned as prose. Low risk today, but worth documenting as a known limitation in the module docstring.

6. smoke_install.sh — update.sh exit-code handling is misleading

# update.sh exits non-zero when there are no updates to pull
if [ "$status" -ne 0 ] && [ "$status" -ne 1 ]; then

Exit code 1 is the generic failure code for shell scripts. Silently accepting it means a real update.sh failure will pass the smoke test. Recommend fixing update.sh to use exit code 2 for "no updates available" (so 0 = updated, 2 = up-to-date, other = error), then tighten the smoke test to only accept 0 and 2.

Documentation Inconsistency

7. CLAUDE.md still uses old command names while GETTING-STARTED.md was updated

GETTING-STARTED.md correctly references the full paths (/development-scaffold, /workflow-review, /testing-test-gen, /documentation-docs-gen). But CLAUDE.md — the canonical reference — still uses the old short names throughout:

Line: /scaffold react-component UserProfile --hooks --tests
Line: /review --scope staged --checks security,performance,configuration
Line: /test-gen --file utils.js --framework jest --coverage 90
Line: /docs-gen api --format openapi --include-examples

Users who read CLAUDE.md first (the majority) will invoke broken commands. Either update CLAUDE.md to match, or document in NAVIGATION.md that the short names are command aliases.

Minor / Suggestions

8. validate_frontmatter.py — max() silently swallows multi-target detail

return max(validate_target(t) for t in TARGETS)

If skills returns 0 and agents returns 1, exit code is 1 ✓. But the generator is consumed lazily — if validation raises an unexpected exception mid-way, earlier errors get dropped. Consider results = [validate_target(t) for t in TARGETS]; return max(results) to force full evaluation.

9. .markdown-link-check.json — anchor links entirely skipped

{ "pattern": "^#" }

This disables all fragment-link checking. Broken cross-section anchors (e.g., [see here](#nonexistent-heading)) won't be caught. Consider splitting into a separate stricter pass against internal .md files where anchors can be verified.

10. Known policy inconsistency should be tracked

The PR description calls out that security-auditor declares Bash in its SKILL.md despite the documented policy of Read/Write/Edit/Grep/Glob only. This should be filed as a GitHub issue before merge so it doesn't get lost. The baseline file is a good pattern for agent-ref drift; a similar explicit tracking mechanism for tool-policy violations would be valuable.

What's Working Well

The baseline file approach for unresolved refs (validation/known-unresolved-refs.txt) is exactly the right tradeoff: CI catches new drift without blocking on historical debt.
validate_frontmatter.py handles YAML-native date types via _coerce() — that's the kind of subtle edge case that would have caused silent failures with the old inline Python.
additionalProperties: false in all three schemas is the right call; it will catch typo'd field names.
ShellCheck at error-only severity is a sensible starting point; the PR description's plan to incrementally tighten severity is sound.
The smoke test's bare-repo workaround for CI's detached-HEAD refs/heads/main issue is solid.

Summary

Severity	Issue	File
🔴 Bug	BACKUP_DIR still inside CLAUDE_CODE_DIR	`scripts/update.sh`
🔴 Bug	Extended-agents count compared to wrong counter	`scripts/check_doc_counts.py`
🟠 Schema	Wrong `$id` URL	`validation/schemas/skill.schema.json`
🟠 Docs	CLAUDE.md uses old command names	`CLAUDE.md`
🟡 Logic	update.sh exit-code 1 silently accepted	`scripts/smoke_install.sh`
🟡 Tracking	security-auditor Bash policy inconsistency untracked	—
🟢 Minor	Indented code fence edge case	`scripts/check_agent_references.py`

Fix the two red bugs and the schema $id before merge — the rest can be addressed as follow-ups.

Review generated by Claude Code on 2026-05-19

claude · 2026-05-19T01:32:22Z

🔒 Security Audit (Claude)

Severity Summary: Critical: 0 | High: 0 | Medium: 0 | Low: 2 | Info: 1

Overall this PR is well-constructed from a security standpoint: yaml.safe_load throughout, regex-sanitized output before GHA annotations, set -euo pipefail in shell scripts, and version-pinned dependencies. No exploitable vulnerabilities found in the diff.

Findings

1. [LOW] `.github/workflows/ci-quality-gate.yml:10-11` – Version-pinned but not hash-pinned pip/npx packages

Risk: CI installs PyYAML==6.0.2, jsonschema==4.23.0, yamllint==1.35.1, check-jsonschema==0.28.4 and runs npx --yes markdown-link-check@3.12.2 with version pins but no integrity hashes. If a PyPI or npm package was re-published at the same version number with malicious code (rare but possible; PyPI does not prevent this for the original author), CI would silently install it and execute it with repo read access.

Recommendation: Add --require-hashes with a requirements.txt lockfile for pip, and use npm ci with a package-lock.json for the Node step. At minimum, npx --yes auto-confirms install prompts — replacing it with a pre-installed package in a package.json devDependency is safer for a CI context.

2. [LOW] `validation/schemas/*.json:2` – Wrong `$id` URIs in all three schemas

Risk: All three schema files contain incorrect $id identifiers:

skill.schema.json: https://github.com/alirezarezvani/coderezz/claude-code-tresor/… (spurious /coderezz/ segment)
agent.schema.json and command.schema.json: https://github.com/coderezz/claude-code-tresor/… (wrong owner — should be alirezarezvani)

JSON Schema $id URIs are used as canonical identifiers for $ref resolution and schema-registry caching. Incorrect URIs won't cause failures with the current local validation setup, but they will break any tooling that attempts to resolve schemas by URI (e.g., IDE schema validators, JSON Schema stores, or future $ref cross-schema references).

Recommendation: Correct all three to https://github.com/alirezarezvani/claude-code-tresor/validation/schemas/<name>.schema.json.

3. [INFO] `scripts/update.sh:17` – BACKUP_DIR placement inconsistency (safe today, maintenance hazard)

Risk: install.sh was fixed in this PR to place BACKUP_DIR as a sibling of CLAUDE_CODE_DIR (${CLAUDE_CODE_DIR}.backup-…) to avoid cp recursion when backing up the full directory tree. update.sh still places BACKUP_DIR inside CLAUDE_CODE_DIR (${CLAUDE_CODE_DIR}/backup-update-…). This is currently safe because update.sh's create_backup() copies only specific subdirectories (commands/, agents/, tresor-resources/) rather than the full parent, so no infinite recursion occurs. However, if a future change to update.sh ever adds a full cp -r "$CLAUDE_CODE_DIR" to match install.sh's approach, the recursion bug will silently reappear.

Recommendation: Align update.sh with the pattern established by this PR's install.sh fix: BACKUP_DIR="${CLAUDE_CODE_DIR}.backup-update-$(date +%Y%m%d-%H%M%S)".

Scope: PR #58 diff only — .github/workflows/ci-quality-gate.yml, scripts/ (install.sh, update.sh, smoke_install.sh, validate_frontmatter.py, check_agent_references.py, check_doc_counts.py), validation/schemas/, .markdown-link-check.json, command frontmatter files
Standards: OWASP Top 10, supply chain hardening (SLSA), GitHub Actions security best practices, LLM agent hardening

PR #58 was opened before PR #59 was created. PR #59 bundled the same 3 test-infrastructure commits along with CI/CD overhaul work, and was merged first. The actual content additions (validate_frontmatter.py, schemas, smoke test, fixes) are now on dev. This empty commit aligns the PR branch with dev so the PR can close cleanly as merged.

alirezarezvani mentioned this pull request May 19, 2026

test: comprehensive validation layer for skills/agents/commands #57

Closed

alirezarezvani mentioned this pull request May 19, 2026

ci: simplify CI/CD with release-please under Git Flow #59

Merged

7 tasks

alirezarezvani marked this pull request as ready for review May 19, 2026 02:03

alirezarezvani force-pushed the test/comprehensive-validation-layer branch from 97f986a to b0db3e0 Compare May 19, 2026 02:08

alirezarezvani force-pushed the test/comprehensive-validation-layer branch from b0db3e0 to 4b64232 Compare May 19, 2026 02:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: comprehensive validation layer for skills/agents/commands#58

test: comprehensive validation layer for skills/agents/commands#58
alirezarezvani wants to merge 1 commit into
devfrom
test/comprehensive-validation-layer

alirezarezvani commented May 19, 2026

Uh oh!

claude Bot commented May 19, 2026

Uh oh!

claude Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

alirezarezvani commented May 19, 2026

Why

What's new

Bugs fixed (caught by the new checks)

Known baseline

What I deliberately did NOT do

Test plan

Uh oh!

claude Bot commented May 19, 2026

Code Review — PR #58: Comprehensive Validation Layer

Critical Bugs

Schema Issues

Code Quality Issues

Documentation Inconsistency

Minor / Suggestions

What's Working Well

Summary

Uh oh!

claude Bot commented May 19, 2026

🔒 Security Audit (Claude)

Findings

1. [LOW] .github/workflows/ci-quality-gate.yml:10-11 – Version-pinned but not hash-pinned pip/npx packages

2. [LOW] validation/schemas/*.json:2 – Wrong $id URIs in all three schemas

3. [INFO] scripts/update.sh:17 – BACKUP_DIR placement inconsistency (safe today, maintenance hazard)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. [LOW] `.github/workflows/ci-quality-gate.yml:10-11` – Version-pinned but not hash-pinned pip/npx packages

2. [LOW] `validation/schemas/*.json:2` – Wrong `$id` URIs in all three schemas

3. [INFO] `scripts/update.sh:17` – BACKUP_DIR placement inconsistency (safe today, maintenance hazard)