Skip to content

test: comprehensive validation layer for skills/agents/commands#58

Open
alirezarezvani wants to merge 1 commit into
devfrom
test/comprehensive-validation-layer
Open

test: comprehensive validation layer for skills/agents/commands#58
alirezarezvani wants to merge 1 commit into
devfrom
test/comprehensive-validation-layer

Conversation

@alirezarezvani

Copy link
Copy Markdown
Owner

Replaces #57 (closed because the original branch name claude/analyze-test-coverage-B1AOQ didn't match the repo's required feat|fix|test|... prefix pattern). Same commits, same diff.

Why

The old CI quality gate only ran yaml.safe_load against skills/ (8 files) and agents/ (8 symlinks). It silently skipped subagents/ (133 agents), commands/ (24 files), and the 1,142 lines of bash in scripts/install.sh + scripts/update.sh. This PR closes those gaps and fixes the bugs the new checks immediately surfaced.

What's new

Check Script Coverage
JSON-schema frontmatter validation scripts/validate_frontmatter.py + validation/schemas/*.json 8 skills + 133 agents + 24 commands
Agent/skill cross-reference resolver scripts/check_agent_references.py + baseline file All @name mentions in commands, skills, root docs
Documented-count drift detector scripts/check_doc_counts.py CLAUDE.md and README.md counts
Install / update smoke test scripts/smoke_install.sh Fresh install + idempotent re-install + update against local clone
ShellCheck (error severity) inline in CI install.sh, update.sh, smoke_install.sh
Markdown internal-link check .markdown-link-check.json README.md, CLAUDE.md, NAVIGATION.md

Bugs fixed (caught by the new checks)

  1. 14 command frontmatter files had unquoted argument-hint: values starting with [, which YAML interpreted as broken arrays. Old CI never validated commands/.
  2. install.sh placed BACKUP_DIR inside CLAUDE_CODE_DIR, causing cp -r to recurse into itself on every re-run. The smoke test caught this on the first try.
  3. README quality badge pointed at docs/VALIDATION-REPORT-CONTENT.md — the file had moved to docs/archive/.
  4. install.sh / update.sh had no env-override for TRESOR_DIR or the repo URL, so they couldn't be tested in isolation. Added env-overridable defaults (no behavior change for real users).

Known baseline

validation/known-unresolved-refs.txt snapshots 26 referenced agent names in orchestration command docs (@auth-security-specialist, @anomaly-detection-specialist, etc.) that don't exist in the 133-agent catalog. These look like aspirational names from when those commands were designed. CI now fails on any new unresolved name; this list can be drained incrementally.

What I deliberately did NOT do

  • Did not enforce the Read/Write/Edit/Grep/Glob-only tool policy on skills, despite security-auditor declaring Bash. That's a real policy/doc inconsistency — flagged here but out of scope for a testing PR.
  • Did not fix the 100+ shellcheck SC2155 warnings in the install/update scripts. Severity is error only; the SC2155/SC2115 cleanup is a separate task.
  • Did not auto-fix the 26 unresolved agent references. Each one needs a human decision (rename docs vs. create the agent).

Test plan

  • python3 scripts/validate_frontmatter.py all → 165 files, 0 errors
  • python3 scripts/check_agent_references.py → 26 baselined, 0 new
  • python3 scripts/check_doc_counts.py → 5 checks pass, 0 drift
  • shellcheck --severity=error scripts/*.sh → passes
  • bash scripts/smoke_install.sh → installs 8 skills, 8 agents, 24 commands; idempotent re-install; backup created
  • markdown-link-check against README.md, CLAUDE.md, NAVIGATION.md → no dead internal links
  • Lint, Tests, Docs, Security CI check on previous PR (test: comprehensive validation layer for skills/agents/commands #57) → success

Generated by Claude Code

@claude

claude Bot commented May 19, 2026

Copy link
Copy Markdown

Code Review — PR #58: Comprehensive Validation Layer

Overall: Strong, well-scoped testing PR that closes real coverage gaps. The Python scripts are clean, the baseline-file pattern for known-unresolved refs is pragmatic, and the BACKUP_DIR bug fix in install.sh is an important correctness improvement. Several bugs and inconsistencies need addressing before merge.


Critical Bugs

1. update.sh — BACKUP_DIR still inside CLAUDE_CODE_DIR (recursive-copy bug not fully fixed)

install.sh was correctly patched:

BACKUP_DIR="${CLAUDE_CODE_DIR}.backup-$(date +%Y%m%d-%H%M%S)"  # sibling ✓

But update.sh still has:

BACKUP_DIR="${CLAUDE_CODE_DIR}/backup-update-$(date +%Y%m%d-%H%M%S)"  # child ✗

The same infinite-recurse-on-cp -r risk applies. Apply the same sibling pattern:

BACKUP_DIR="${CLAUDE_CODE_DIR}.backup-update-$(date +%Y%m%d-%H%M%S)"

2. check_doc_counts.py — "Extended Agents" check compares against wrong counter

CLAUDE.md documents **133 Extended Agents** (i.e., total minus the 8 core). The check maps it to the agents counter, which counts all agent.md files in subagents/ (core included). If there are 141 total agents, the check will always fail with "doc says 133, actual is 141".

Fix: add a dedicated counter and use it:

def count_extended_agents() -> int:
    return count_agents() - count_core_agents()

COUNTERS = {
    ...
    "extended-agents": count_extended_agents,
}

# in CHECKS:
("CLAUDE.md", "extended-agents", r"\*\*(\d+)\s+Extended\s+Agents\*\*", "Extended agents count"),

Schema Issues

3. validation/schemas/skill.schema.json — wrong $id URL

"$id": "https://github.com/alirezarezvani/coderezz/claude-code-tresor/..."

Has a spurious coderezz/ segment. The other two schemas use the correct path:

"$id": "https://github.com/alirezarezvani/claude-code-tresor/..."

Fix the $id in skill.schema.json.

4. skill.schema.jsonallowed-tools is string-only, but agent.schema.json accepts both string and array

Per CLAUDE.md: "Skills have limited tool access (Read, Write, Edit, Grep, Glob)". The current schema locks tools to a comma-separated string, which matches existing skill files. However, adding oneOf: [string, array] (matching the agent schema pattern) would future-proof this without breaking anything.


Code Quality Issues

5. check_agent_references.py — indented code fences not handled

The in_code toggle fires on lines that startswith("```") after lstrip(). This will miss code fences that are indented (e.g., inside a blockquote or list item):

- Example:
    ```bash
    @some-agent-name do something
    ```

Agent names in those blocks will be scanned as prose. Low risk today, but worth documenting as a known limitation in the module docstring.

6. smoke_install.shupdate.sh exit-code handling is misleading

# update.sh exits non-zero when there are no updates to pull
if [ "$status" -ne 0 ] && [ "$status" -ne 1 ]; then

Exit code 1 is the generic failure code for shell scripts. Silently accepting it means a real update.sh failure will pass the smoke test. Recommend fixing update.sh to use exit code 2 for "no updates available" (so 0 = updated, 2 = up-to-date, other = error), then tighten the smoke test to only accept 0 and 2.


Documentation Inconsistency

7. CLAUDE.md still uses old command names while GETTING-STARTED.md was updated

GETTING-STARTED.md correctly references the full paths (/development-scaffold, /workflow-review, /testing-test-gen, /documentation-docs-gen). But CLAUDE.md — the canonical reference — still uses the old short names throughout:

  • Line: /scaffold react-component UserProfile --hooks --tests
  • Line: /review --scope staged --checks security,performance,configuration
  • Line: /test-gen --file utils.js --framework jest --coverage 90
  • Line: /docs-gen api --format openapi --include-examples

Users who read CLAUDE.md first (the majority) will invoke broken commands. Either update CLAUDE.md to match, or document in NAVIGATION.md that the short names are command aliases.


Minor / Suggestions

8. validate_frontmatter.pymax() silently swallows multi-target detail

return max(validate_target(t) for t in TARGETS)

If skills returns 0 and agents returns 1, exit code is 1 ✓. But the generator is consumed lazily — if validation raises an unexpected exception mid-way, earlier errors get dropped. Consider results = [validate_target(t) for t in TARGETS]; return max(results) to force full evaluation.

9. .markdown-link-check.json — anchor links entirely skipped

{ "pattern": "^#" }

This disables all fragment-link checking. Broken cross-section anchors (e.g., [see here](#nonexistent-heading)) won't be caught. Consider splitting into a separate stricter pass against internal .md files where anchors can be verified.

10. Known policy inconsistency should be tracked

The PR description calls out that security-auditor declares Bash in its SKILL.md despite the documented policy of Read/Write/Edit/Grep/Glob only. This should be filed as a GitHub issue before merge so it doesn't get lost. The baseline file is a good pattern for agent-ref drift; a similar explicit tracking mechanism for tool-policy violations would be valuable.


What's Working Well

  • The baseline file approach for unresolved refs (validation/known-unresolved-refs.txt) is exactly the right tradeoff: CI catches new drift without blocking on historical debt.
  • validate_frontmatter.py handles YAML-native date types via _coerce() — that's the kind of subtle edge case that would have caused silent failures with the old inline Python.
  • additionalProperties: false in all three schemas is the right call; it will catch typo'd field names.
  • ShellCheck at error-only severity is a sensible starting point; the PR description's plan to incrementally tighten severity is sound.
  • The smoke test's bare-repo workaround for CI's detached-HEAD refs/heads/main issue is solid.

Summary

Severity Issue File
🔴 Bug BACKUP_DIR still inside CLAUDE_CODE_DIR scripts/update.sh
🔴 Bug Extended-agents count compared to wrong counter scripts/check_doc_counts.py
🟠 Schema Wrong $id URL validation/schemas/skill.schema.json
🟠 Docs CLAUDE.md uses old command names CLAUDE.md
🟡 Logic update.sh exit-code 1 silently accepted scripts/smoke_install.sh
🟡 Tracking security-auditor Bash policy inconsistency untracked
🟢 Minor Indented code fence edge case scripts/check_agent_references.py

Fix the two red bugs and the schema $id before merge — the rest can be addressed as follow-ups.


Review generated by Claude Code on 2026-05-19

@claude

claude Bot commented May 19, 2026

Copy link
Copy Markdown

🔒 Security Audit (Claude)

Severity Summary: Critical: 0 | High: 0 | Medium: 0 | Low: 2 | Info: 1

Overall this PR is well-constructed from a security standpoint: yaml.safe_load throughout, regex-sanitized output before GHA annotations, set -euo pipefail in shell scripts, and version-pinned dependencies. No exploitable vulnerabilities found in the diff.


Findings

1. [LOW] .github/workflows/ci-quality-gate.yml:10-11 – Version-pinned but not hash-pinned pip/npx packages

Risk: CI installs PyYAML==6.0.2, jsonschema==4.23.0, yamllint==1.35.1, check-jsonschema==0.28.4 and runs npx --yes markdown-link-check@3.12.2 with version pins but no integrity hashes. If a PyPI or npm package was re-published at the same version number with malicious code (rare but possible; PyPI does not prevent this for the original author), CI would silently install it and execute it with repo read access.

Recommendation: Add --require-hashes with a requirements.txt lockfile for pip, and use npm ci with a package-lock.json for the Node step. At minimum, npx --yes auto-confirms install prompts — replacing it with a pre-installed package in a package.json devDependency is safer for a CI context.


2. [LOW] validation/schemas/*.json:2 – Wrong $id URIs in all three schemas

Risk: All three schema files contain incorrect $id identifiers:

  • skill.schema.json: https://github.com/alirezarezvani/coderezz/claude-code-tresor/… (spurious /coderezz/ segment)
  • agent.schema.json and command.schema.json: https://github.com/coderezz/claude-code-tresor/… (wrong owner — should be alirezarezvani)

JSON Schema $id URIs are used as canonical identifiers for $ref resolution and schema-registry caching. Incorrect URIs won't cause failures with the current local validation setup, but they will break any tooling that attempts to resolve schemas by URI (e.g., IDE schema validators, JSON Schema stores, or future $ref cross-schema references).

Recommendation: Correct all three to https://github.com/alirezarezvani/claude-code-tresor/validation/schemas/<name>.schema.json.


3. [INFO] scripts/update.sh:17 – BACKUP_DIR placement inconsistency (safe today, maintenance hazard)

Risk: install.sh was fixed in this PR to place BACKUP_DIR as a sibling of CLAUDE_CODE_DIR (${CLAUDE_CODE_DIR}.backup-…) to avoid cp recursion when backing up the full directory tree. update.sh still places BACKUP_DIR inside CLAUDE_CODE_DIR (${CLAUDE_CODE_DIR}/backup-update-…). This is currently safe because update.sh's create_backup() copies only specific subdirectories (commands/, agents/, tresor-resources/) rather than the full parent, so no infinite recursion occurs. However, if a future change to update.sh ever adds a full cp -r "$CLAUDE_CODE_DIR" to match install.sh's approach, the recursion bug will silently reappear.

Recommendation: Align update.sh with the pattern established by this PR's install.sh fix: BACKUP_DIR="${CLAUDE_CODE_DIR}.backup-update-$(date +%Y%m%d-%H%M%S)".


Scope: PR #58 diff only — .github/workflows/ci-quality-gate.yml, scripts/ (install.sh, update.sh, smoke_install.sh, validate_frontmatter.py, check_agent_references.py, check_doc_counts.py), validation/schemas/, .markdown-link-check.json, command frontmatter files
Standards: OWASP Top 10, supply chain hardening (SLSA), GitHub Actions security best practices, LLM agent hardening

@alirezarezvani alirezarezvani marked this pull request as ready for review May 19, 2026 02:03
@alirezarezvani alirezarezvani force-pushed the test/comprehensive-validation-layer branch from 97f986a to b0db3e0 Compare May 19, 2026 02:08
PR #58 was opened before PR #59 was created. PR #59 bundled the same
3 test-infrastructure commits along with CI/CD overhaul work, and was
merged first. The actual content additions (validate_frontmatter.py,
schemas, smoke test, fixes) are now on dev. This empty commit aligns
the PR branch with dev so the PR can close cleanly as merged.
@alirezarezvani alirezarezvani force-pushed the test/comprehensive-validation-layer branch from b0db3e0 to 4b64232 Compare May 19, 2026 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants