fix: support src-layout Python projects in test_coverage detector by AreboursTLS · Pull Request #489 · peteromallet/desloppify

AreboursTLS · 2026-03-19T07:05:58Z

Problem

In Python projects using src/ layout (where production code lives under src/package_name/), the test_coverage detector fails to link test files to source files.

resolve_import_spec converts import specs like mypackage.foo to mypackage/foo.py, but production files are stored as src/mypackage/foo.py. The candidates never match.

_build_prod_by_module creates module aliases with a src. prefix (e.g. src.mypackage.foo) that import specs from tests never include.

This causes all modules in src-layout projects to be flagged as untested_critical even when comprehensive test suites exist that import them.

Fix

resolve_import_spec (languages/python/test_coverage.py): After checking direct candidates against production_files, also try src/-prefixed candidates.
_build_prod_by_module (engine/detectors/coverage/mapping_analysis.py): Strip src/ prefix from relative paths before computing module names, so the index maps argos_toolkit.foo instead of src.argos_toolkit.foo.

Testing

All 5495 existing tests pass.
Verified against a real src-layout project (argos-toolkit): test health detection went from broken (0%) to correctly detecting 22.0% coverage.

resolve_import_spec now tries src/-prefixed candidates when direct module-path candidates don't match any production file. This handles the common src-layout pattern (e.g. src/argos_toolkit/foo.py). _build_prod_by_module now strips the 'src/' prefix from relative paths before computing module names, so the module index maps 'argos_toolkit.foo' instead of 'src.argos_toolkit.foo'.

peteromallet · 2026-03-21T00:07:58Z

Thanks for the fix, @AreboursTLS! The src-layout gap is real — PEP 621 projects with src/package_name/ structure were getting missed by the coverage mapping.

Cherry-picked both changes into 0.9.12 as commit 36805343:

resolve_import_spec now tries src/-prefixed candidates when direct match fails
_build_prod_by_module strips the src/ prefix before computing module names

One minor adjustment: moved _SRC_PREFIXES to module level as a constant (was function-local in the PR).

All tests pass. Thanks for the clean, targeted fix!

@AreboursTLS

resolve_import_spec now tries src/-prefixed candidates when direct match fails, and _build_prod_by_module strips the src/ prefix from relative paths before computing module names. Both changes are needed so that src-layout projects (PEP 621) correctly map tests to production files. Adjustments: moved _SRC_PREFIXES to module-level constant (was function-local) Cherry-picked from PR #489 by @AreboursTLS Co-Authored-By: AreboursTLS <AreboursTLS@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ues #494-#490 Stage 1 assessments, Stage 2 challenges/advocacy, and Stage 3 execution for the current batch of open PRs and issues. Backfilled Stage 2 files for older items that only had Stage 1 results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@AreboursTLS

resolve_import_spec now tries src/-prefixed candidates when direct match fails, and _build_prod_by_module strips the src/ prefix from relative paths before computing module names. Both changes are needed so that src-layout projects (PEP 621) correctly map tests to production files. Adjustments: moved _SRC_PREFIXES to module-level constant (was function-local) Cherry-picked from PR peteromallet#489 by @AreboursTLS Co-Authored-By: AreboursTLS <AreboursTLS@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

peteromallet · 2026-05-13T16:24:37Z

Thanks for the contribution. This is included in v1.0: https://github.com/peteromallet/desloppify/releases/tag/v1.0

Record co-author trailers for PR authors included in the v1.0 release cycle so GitHub can associate release-cycle contribution credit with the tag. Refs: #189, #484, #485, #486, #489, #493, #495, #529, #539, #573, #580, #581, #584, #585, #589, #602, #603 Co-authored-by: R. Desmond <134018026+0-CYBERDYNE-SYSTEMS-0@users.noreply.github.com> Co-authored-by: AreboursTLS <77301936+AreboursTLS@users.noreply.github.com> Co-authored-by: AugusteBalas <128148269+AugusteBalas@users.noreply.github.com> Co-authored-by: Alex Price <2804025+awprice@users.noreply.github.com> Co-authored-by: Klaus Agnoletti <24544601+klausagnoletti@users.noreply.github.com> Co-authored-by: Koshi <18751916+koshimazaki@users.noreply.github.com> Co-authored-by: Pietro <6080662+pietrondo@users.noreply.github.com> Co-authored-by: raveinid <7130195+raveinid@users.noreply.github.com> Co-authored-by: Ryan Gerstenkorn <4079939+RyanJarv@users.noreply.github.com> Co-authored-by: ryexLLC <217349586+ryexLLC@users.noreply.github.com> Co-authored-by: Maximilian Scholz <6530123+sims1253@users.noreply.github.com> Co-authored-by: Tristan Manchester <108270628+tristanmanchester@users.noreply.github.com>

* desloppify: unify triage policy and error contracts * desloppify: streamline holistic review prep * desloppify: finish migration seam cleanup * desloppify: reset language runtime state coherently * desloppify: clarify coverage mapping and predicates * desloppify: simplify triage validation seams * desloppify: preserve review queue and observe evidence * desloppify: clarify compatibility naming seams * desloppify: reorganize framework and holistic package seams * desloppify: narrow public language framework surface * desloppify: simplify stale smell workflow seams * desloppify: streamline reporting and planning helpers * desloppify: strengthen review coverage seams * desloppify: tighten review and queue type contracts * fix: skip Go generated files and improve import-run error messages Add Go zone rules for generated file patterns (.generated., _gen.go, .pb.go, _string.go, _enumer.go) and config files (go.mod, go.sum). Improve --import-run error to explain what's missing and suggest next steps. Closes #402, addresses #401. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: restore language registry after reset test to prevent pollution test_reset_runtime_state_clears_registry_and_hooks called reset_runtime_state() without saving/restoring the registry, causing 5 downstream tests (erlang, ocaml, fsharp, javascript, bash) to fail when the language plugins couldn't re-register (already imported). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: delete 23 compat wrapper files and remove SimpleNamespace antipatterns Per project policy ("no backward compat for import paths — remove re-export facades, wrapper shims, compat layers"), delete all thin wrapper files left behind by recent package reorganizations: - 13 _framework/ wrappers (commands_base, generic, registry_state, etc.) - 8 context_holistic/ wrappers (budget_*, selection_contexts) - 2 helpers/ wrappers (runtime.py, persist.py) Replace SimpleNamespace fake-module pattern in override_misc.py and commit_log/dispatch.py with direct function calls. Update 27 import sites and 4 test monkeypatch targets to use canonical paths. All 5193 tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * desloppify: remove scan workflow lazy reconcile import * desloppify: retire triage facade hot path * desloppify: surface suggestion/evidence in show and cluster commands Add suggestion and evidence fields to show command output and cluster member display so triage stages can investigate issues without JSON exports. Add investigation command hints to compact summaries (self_record mode only), forward observe assessments to sense-check, and update enrich/sense-check instructions to reference the new data paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * desloppify: split organize validation and batch context normalization * feat: normalize C++ security tool findings * desloppify: unify issue lifecycle status and surface history to reviewers Add deferred/triaged_out to Status enum so state is always authoritative for issue disposition. Previously temporary and triaged_out skips left issue.status as "open", causing overcounting and misleading displays. Part A — Status unification: - Add DEFERRED, TRIAGED_OUT to Status enum and _CANONICAL_ISSUE_STATUSES - Update FAILURE_STATUSES_BY_MODE (all 3 scoring modes) - Map temporary skip → deferred, triaged_out skip → triaged_out in state - Backlog/unskip reopen deferred/triaged_out back to open - Triage dismiss sets state status to triaged_out - Reconcile migrates existing open+skipped → correct status on scan - Treat deferred/triaged_out as alive in reconcile (not superseded) - Add status icons (⏸ deferred, △ triaged_out) - Update plan header and summary_lines for new status buckets Part B — Surface history to reviewers: - Default --retrospective to True (--no-retrospective to opt out) - Rewrite render_historical_focus with status grouping and CLI hints - Add render_dimension_deferral_context for stale dimension warnings - Wire both into batch and external review prompt paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden C++ security normalization fallback and deduplication * fix: repair C++ coverage logic regex * fix: harden C++ tool-backed security scanning * desloppify: two-phase observe/judge review scoring with evolving characteristics Split holistic review into Phase 1 (observe: collect characteristics and defects) and Phase 2 (judge: synthesize dimension_character, then score). Positive observations now persist as context insights with positive: true and full provenance (added_at, source), replacing ephemeral strengths. judgment.strengths is backfilled from positive insights after import. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: batch cppcheck issue scanning for C++ * Fix scan export and scoring impact crash paths * fix: surface blind-review workflow in first anti-gaming penalty message The anti-gaming safeguard for subjective dimensions is working as designed, but agents weren't discovering the blind-review workflow because: 1. The first penalty message said only "Re-review objectively" with no pointer to the blind packet or agent overlay docs 2. The blind packet hint only appeared after repeated penalties (streak >= 2) 3. SKILL.md's anti-gaming note didn't reference the overlay docs Now the first penalty immediately surfaces: - The blind packet path (.desloppify/review_packet_blind.json) - Pointers to docs/CLAUDE.md and docs/HERMES.md for the full workflow https://claude.ai/code/session_01RqpTiawULymfeVXW8X8ySq * fix: redact numeric target from penalty messages to prevent anchoring The penalty message "matched target 95.0" leaks the exact target score to the agent. Even with a blind initial review, the agent infers the target from penalty output and anchors on every subsequent re-review, creating an unbreakable loop that burned through an entire Claude Max session in 60 minutes. Changes: - Replace "matched target {N}" with "clustered on the scoring target" - Replace "parked on target {N}" with "parked on the scoring target" - Redact target label from summary integrity warnings - Strengthen re-review instruction: "Launch a fresh, context-isolated agent" instead of "Re-review objectively" The numeric target remains available via `desloppify show subjective` for human operators who need it. https://claude.ai/code/session_01RqpTiawULymfeVXW8X8ySq * fix: rust workspace rustdoc execution * docs: add C++ full-scan requirements * fix: close Windows and C++ review blockers * chore: drop local C++ planning docs from PR * code health: broad cleanup and triage/review improvements Refactors triage validation, stage prompts, review batch scoring, and display layout. Adds evidence parsing enhancements, confirmation helpers, and execution constraints. Cleans up unused imports and dead code across tests and production modules. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address remaining C++ PR review issues * chore: gitignore .claude/ and remove tracked lock file Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve lint, mypy, and stale test expectations - Add missing SubjectiveVisibility import (F821) - Add missing Any import in runner_parallel/types.py (F821) - Fix union type annotation in core_normalize.py (mypy return-value) - Remove stale generic.py from mypy files list - Update review batch test expectations after scoring simplification Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: restore postflight scan marker after triage skip commands Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version to 0.9.6 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version to 0.9.7 and add tweet-on-release workflow Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: gate PyPI publish on main branch only Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version to 0.9.8 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: restore push triggers on PyPI publish workflow The CI contract test expects push.tags and push.branches triggers. Gate still blocks releases not targeting main. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * desloppify: update triage and issue semantics * chore: update local changes * feat: queue ownership, cluster semantics, lifecycle phase improvements Adds cluster_semantics module, refines issue semantics, updates work queue snapshot and plan ordering for phase isolation. Extends lifecycle phase handling and auto-cluster sync. Updates tests across plan, state, review, and narrative modules. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: sequential reconciliation pipeline, execution status flags, plan loading consolidation Fix cluster tracker race on parallel updates by introducing a shared boundary-triggered reconciliation pipeline that runs all sync steps sequentially. Add execution_status (active/review) flags to clusters, consolidate plan load/recovery into persistence module, and rename reconcile modules for clarity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: unblock objective resolves while triage is pending * feat: auto-resolve stale issues for deleted files, triage/queue/reconcile improvements - Fix #412: scan merge now auto-resolves open issues when the source file no longer exists on disk (verify_disappeared + MergeScanOptions.project_root) - Triage: sense-check orchestration, completion policy, validation stage upgrades - Work queue: snapshot overhaul, synthetic workflow, ranking refinements - Plan: reconcile pipeline expansion, refresh lifecycle consolidation, phase cleanup support, scan issue reconcile enhancements - Review: holistic cluster modules removed (inlined), import plan sync expanded - Rust: fixer/detector cleanup, remove compat re-export wrapper - Tests: broad coverage additions across triage, reconcile, queue, holistic Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: accept substantive work-product descriptions in triage attestations Attestation validation for organize/enrich/sense-check stages no longer requires literal cluster name references — detailed descriptions of the verified work are now accepted as an alternative. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: reorganize cluster/override into subpackages, triage validation improvements, review batch enhancements Move cluster_ops/update modules into cluster/ subpackage and override modules into override/ subpackage. Enhance triage completion policy, stage validation, and review batch execution phases. Add dynamic loaders, scan preflight checks, and expanded test coverage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: update scorecard image Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update renamed test reference in Makefile and CI contract Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: improve cxx detector scoping (PR #415) - Filter C/C++ security findings to scoped first-party file set - CMake-based test coverage mapping via add_executable/add_library/target_sources - Disable unsound generic unused-import phase for C++ - Fix _extract_import_name for C++ header extensions (.h, .hh, .hpp) - Remove duplicate test, add missing EOF newlines Co-Authored-By: Dragoy <Dragoy@users.noreply.github.com> * docs: add .desloppify/ gitignore reminder to setup instructions Closes #416. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version to 0.9.9, update scorecard Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: stub requests module in tweet release tests The tests load tweet_release.py which imports requests at module level. Without a stub, all 7 tests fail with ModuleNotFoundError since requests is a CI script dependency, not a project dependency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * ci: retrigger checks after branch protection update * fix: simplify PyPI publish to trigger on push to main only Removed redundant release and tag triggers — push to main with a bumped version in pyproject.toml is sufficient. The "check if version exists" step makes this idempotent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: remove test_release_image.png and AGENTS.md from repo These files were accidentally tracked — test_release_image.png is a test artifact and AGENTS.md is a local Claude Code skill file. Added AGENTS.md to .gitignore to prevent re-adding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add Rust inline-test filtering: ignore clippy diagnostics in cfg(test) modules (#440) * feat(r): improve tree-sitter R_SPEC function and import queries (#449) - Add anonymous function detection (function definitions passed as arguments to calls like lapply, purrr::map, etc.) - Add namespace operator (pkg::fn) capture in import query so dplyr::select, data.table::fread etc. are recognized as imports - Both patterns capture @path for compatibility with the dep graph and unused imports analysis * feat(ruby): improve plugin — excludes, detect markers, default_src, spec/ support, README, tests (#462) * feat(ruby): improve plugin — excludes, detect markers, default_src, README, tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Co-Authored-By: Gemini <gemini@google.com> Co-Authored-By: OpenAI Codex <codex@openai.com> * feat(ruby): add spec/ test dir, bin/ exclusion; expose external_test_dirs in generic_lang - Add external_test_dirs and test_file_extensions parameters to generic_lang() so plugins can override the hardcoded ["tests", "test"] defaults - Configure Ruby plugin with external_test_dirs=["spec", "test"] (RSpec + Minitest) - Add bin/ to Ruby exclusions (binstubs/shims) - Update tests: add bin/ to excluded dirs list, add test_external_test_dirs_includes_spec Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Co-Authored-By: Gemini <gemini@google.com> Co-Authored-By: OpenAI Codex <codex@openai.com> * docs(ruby): add bin/ to exclusions list in README Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Co-Authored-By: Gemini <gemini@google.com> Co-Authored-By: OpenAI Codex <codex@openai.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Gemini <gemini@google.com> Co-authored-by: OpenAI Codex <codex@openai.com> * feat: add Factory Droid skill harness support (#451) - Add 'droid' to SKILL_TARGETS (.factory/skills/desloppify/SKILL.md) - Add .factory/skills/ to SKILL_SEARCH_PATHS for auto-discovery - Create docs/DROID.md overlay with review and triage workflow - Bump SKILL_VERSION to 6 - Add droid to README agent prompt harness list * docs(python): add user-facing section to README (#459) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(javascript): add plugin tests and documentation (#458) Co-authored-by: Gemini <gemini@google.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * 0.9.10 (#463) * fix: strip image blocks from release notes on website The release notes contain a mascot image that renders as a broken or unwanted image on the website. Strip HTML <p><img></p> blocks from release body before rendering. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: derive project root from state_file path in do_import_run follow-up scan When running `desloppify review --import-run --scan-after-import`, the follow-up scan was using _runtime_project_root() which could return a contaminated path (pointing to the results directory instead of the actual project root). This caused state to be written to the wrong location. Instead, derive the project root from the state_file parameter which is known to be correct: state_file.parent.parent gives us the project root from `<root>/.desloppify/state-<lang>.json`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: preserve plan_start_scores during force-rescan to protect manual clusters _reset_cycle_for_force_rescan() was clearing plan_start_scores, which made is_mid_cycle() return False. This caused auto_cluster_issues() to run full cluster regeneration instead of early-returning, wiping manual cluster items via issue ID reconciliation in scan_issue_reconcile.py. The fix stops clearing plan_start_scores so is_mid_cycle() remains True during force-rescan, preserving manual cluster data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: require explicit triage decisions for auto-clusters Auto-clusters (auto/unused, auto/security, etc.) were silently left in backlog because the triage prompt said "silence means leave in backlog" and the output schema had no field for auto-cluster decisions. Now the triager must make an explicit promote/skip/break_up decision for each auto-cluster, and apply_triage_to_plan() processes those decisions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: require explicit backlog decisions for auto-clusters in staged triage The staged triage pipeline previously treated auto-clusters as optional in the reflect stage ("silence means it stays in backlog"). This change makes auto-cluster decisions mandatory, matching the treatment review issues get via the Coverage Ledger. Changes: - Reflect instructions: require a ## Backlog Decisions section listing every auto-cluster with promote/skip/supersede (replaces "silence means leave") - Organize instructions: clarify that ALL backlog decisions from reflect must be executed, not just promotions - Reflect validation: parse and persist BacklogDecision entries; warn (but don't block) when auto-clusters exist without a Backlog Decisions section - Organize validation: warn when reflect requested promotions that weren't executed during organize Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: unified triage pipeline + step detail display improvements Unified triage pipeline: - Widen is_triage_finding to all defects (mechanical + review + concern) - Sub-group auto-clusters by rule kind (auto/security-B602 instead of auto/security) - Add MEDIUM+LOW bandit filter and skip_tests config option - Auto-cluster statistical summaries in triage prompt (severity, confidence, samples) - Cluster-level observe sampling (ClusterVerdict parsing) - Blocking backlog decisions validation (every auto-cluster must have a decision) - Threshold-based staleness (10% mechanical growth, any new review issue) - Two-tier accounting: review issues get per-item ledger, mechanical via cluster decisions - Auto-add manual cluster members to queue_order on add_to_cluster Display improvements: - cluster show: steps now show effort tag, wrapped detail (4 lines), short refs - cluster show: members compact when steps exist (ID list, not full issue detail) - cluster list --verbose: effort summary column (3T 1S), hide empty auto-clusters, drop noise columns - next: cluster drill header shows step done markers and effort tags - next: individual task shows full untruncated step detail matched via issue_refs - next: focus mode shows cluster context + relevant step detail Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: lifecycle transition messages and agent directives Add transition_messages config and directives CLI for phase-specific agent instructions (model switching, constraints). Emit transition messages at lifecycle phase changes across resolve, skip, reopen, review import, and reconcile flows. Auto-focus cluster during mid-cluster execution so desloppify next stays in context. Hermes reset includes cluster-aware next-task instructions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add dev test-hermes command and bump skill doc version desloppify dev test-hermes: smoke-test Hermes model switching by switching to a random model and back. Skill doc version bumped to v6. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: handle missing git in review coordinator, remove unused import Wrap git status call in try/except OSError so review coordinator doesn't crash when git is unavailable. Remove unused triage_scoped_plan import from stage_validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update Hermes overlay for delegate_task, add directives docs, update website Rewrite HERMES.md: delegate_task subagent pattern replaces worktree-based parallel review. Add agent directives section to SKILL.md. Website: initiative #2 now active with $1k bounty challenge details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(r): correct shell quote escaping in lintr command (PR #424) Co-Authored-By: Maximilian Scholz <dev.scholz@mailbox.org> * feat(r): add Jarl as fast R linter with autofix (PR #425) Co-Authored-By: Maximilian Scholz <dev.scholz@mailbox.org> * fix: phpstan stderr/JSON parser fixes (PR #420) Co-Authored-By: Nick Perkins <nick@nickperkins.au> * fix(engine): prevent workflow::create-plan re-injection after resolution (PR #435) Co-Authored-By: Charles Dunda <charles.dunda@perchwell.com> * feat: add SCSS language plugin (PR #428) Co-Authored-By: Klaus Agnoletti <github@agnoletti.dk> * fix: Rust dep graph hangs from string-literal fake imports (PR #429) Co-Authored-By: Riccardo Spagni <ric@spagni.net> * fix: binding-aware unused import detection for JS/TS (PR #433) Co-Authored-By: Tom <tswift1991@icloud.com> * fix: project root detection, force-rescan plan wipe, and manual cluster visibility (PR #439) * perf(scan): detector prefetch + cache for faster scans (PR #432) Co-Authored-By: Tom <tswift1991@icloud.com> * feat(frameworks): FrameworkSpec layer + Next.js spec (PR #414) Co-Authored-By: Tom Swift <tswift1991@icloud.com> * fix: allow scan when queue is fully drained regardless of lifecycle phase Fixes #441 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: quote paths for Windows cmd /c and use utf-8 encoding in log recovery Fixes #442 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: merge retry batch results with original run before coverage check Fixes #443 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): SKILL.md cleanup — remove unsupported frontmatter, fix file naming, generalize install Fixes #444, #445, #446 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * cleanup: remove dead _strip_c_style_comments_preserve_lines shim from rust/tools.py Follow-up to PR #440 (Rust inline-test filtering). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: move queue_total==0 check into score_display_mode (#441) Move the empty-queue guard from scan_queue_preflight into score_display_mode() so ALL callers (status, plan nudge, next flow) benefit from the fix, not just scan preflight. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: extract anonymous functions in tree-sitter specs (R lang) PR #449 added an R anonymous function query pattern that captures @fn but the extractor requires @name, silently skipping all anonymous function matches. Fix the extractor to synthesize an "<anonymous>" name when @name is absent but @func is present. Original R spec contributed by sims1253 in PR #449. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): sync .agents SKILL.md with docs copy, add pip fallback and batch naming note - Remove `allowed-tools` frontmatter from .agents/skills/desloppify/SKILL.md (#444) - Add `pip install` fallback note alongside uvx in both copies (#446) - Add batch output naming clarification (batch-N.raw.txt vs .json imports) (#445) - Sync agent directives section and version bump to .agents copy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: collapse cmd /c arguments into single string for proper Windows quoting The previous fix pre-quoted the executable path, but the actual breakage was in argument paths (-C repo_root, -o output_file) containing spaces. Pre-embedding quotes in a subprocess list causes double-quoting because Popen's list2cmdline() adds its own quotes. The real issue: cmd /c concatenates everything after /c and re-parses it with its own tokeniser. The fix introduces _wrap_cmd_c() which uses subprocess.list2cmdline() to build the inner command as a single properly-quoted string, then passes that as one token after /c: ["cmd", "/c", "codex exec -C \"path with spaces\" ..."]. - Revert incorrect executable pre-quoting in _resolve_executable - Add _wrap_cmd_c() to properly collapse cmd /c commands - Apply _wrap_cmd_c in codex_batch_command after building the full arg list - Keep correct encoding="utf-8", errors="replace" fix in io.py - Add tests for _wrap_cmd_c and Windows codex_batch_command path quoting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: skip coverage gate on partial batch retry instead of merging results Replace the 195-line merge approach (find_prior_run_merged_results + overlay_retry_results_on_prior) with a ~5-line bypass: when --only-batches selects a subset of the packet's batches, set allow_partial=True so the coverage gate does not reject the partial retry. The merge approach had multiple issues: wrong prior-run selection after failed retry chains, dimension name normalization mismatches, and stale metadata in combined output. The simpler fix recognizes that a partial retry inherently cannot cover all dimensions, and the original run already handled the rest. Fixes #443 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * bump version to 0.9.10 * bump version to 0.9.10 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * bump version to 0.9.10 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: gitignore .agents/ and untrack generated skill doc The .agents/skills/desloppify/SKILL.md is a generated file (same as .claude/skills/). Canonical copies live under docs/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add hermes and droid to update-skill help text Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: draft 0.9.10 release notes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(ruby): improve plugin — excludes, detect markers, default_src, spec/ support, README, tests (#462) * feat(ruby): improve plugin — excludes, detect markers, default_src, README, tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Co-Authored-By: Gemini <gemini@google.com> Co-Authored-By: OpenAI Codex <codex@openai.com> * feat(ruby): add spec/ test dir, bin/ exclusion; expose external_test_dirs in generic_lang - Add external_test_dirs and test_file_extensions parameters to generic_lang() so plugins can override the hardcoded ["tests", "test"] defaults - Configure Ruby plugin with external_test_dirs=["spec", "test"] (RSpec + Minitest) - Add bin/ to Ruby exclusions (binstubs/shims) - Update tests: add bin/ to excluded dirs list, add test_external_test_dirs_includes_spec Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Co-Authored-By: Gemini <gemini@google.com> Co-Authored-By: OpenAI Codex <codex@openai.com> * docs(ruby): add bin/ to exclusions list in README Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Co-Authored-By: Gemini <gemini@google.com> Co-Authored-By: OpenAI Codex <codex@openai.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Gemini <gemini@google.com> Co-authored-by: OpenAI Codex <codex@openai.com> * feat: add Factory Droid skill harness support (#451) - Add 'droid' to SKILL_TARGETS (.factory/skills/desloppify/SKILL.md) - Add .factory/skills/ to SKILL_SEARCH_PATHS for auto-discovery - Create docs/DROID.md overlay with review and triage workflow - Bump SKILL_VERSION to 6 - Add droid to README agent prompt harness list * docs(python): add user-facing section to README (#459) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(javascript): add plugin tests and documentation (#458) Co-authored-by: Gemini <gemini@google.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(docs): correct autofix command in Ruby and JS plugin READMEs The command is `desloppify autofix`, not `desloppify fix` or `desloppify scan --fix`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update release notes with late-merged PRs and stats Add #458, #459, #462 contributions from klausagnoletti. Update stats to reflect final commit/file/test counts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(scss): replace {file_path} placeholders with glob patterns and use unix formatter The tool runner does not substitute {file_path} placeholders, so stylelint was receiving literal "{file_path}" and failing silently. Switch to glob patterns (matching every other plugin) and use --formatter unix with the gnu parser, since stylelint's JSON output doesn't match the expected json parser format. Based on findings from @klausagnoletti in PR #452. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Maximilian Scholz <dev.scholz@mailbox.org> Co-authored-by: Nick Perkins <nick@nickperkins.au> Co-authored-by: Charles Dunda <charles.dunda@perchwell.com> Co-authored-by: Klaus Agnoletti <github@agnoletti.dk> Co-authored-by: Riccardo Spagni <ric@spagni.net> Co-authored-by: Tom <tswift1991@icloud.com> Co-authored-by: Klaus Agnoletti <24544601+klausagnoletti@users.noreply.github.com> Co-authored-by: Gemini <gemini@google.com> Co-authored-by: OpenAI Codex <codex@openai.com> * chore: remove release notes file after publishing to GitHub Releases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: stop re-injecting phantom issue IDs from action_step refs into cluster membership Step refs are traceability metadata, not membership. Merging them into issue_ids caused bare shorthand IDs from the triage runner to become phantom cluster members that don't exist in work_items and reappear after every reconcile → load cycle. Membership recovery is already handled by execution log (recovered_members) and overrides (override_members). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(update-skill): detect duplicate content when begin/end markers are missing Raises CommandError if the file already has desloppify skill content (version marker present) but is missing the begin/end markers, preventing silent duplicate appends. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(r): add R-specific code smell detectors Add 10 R-specific smell checks: setwd(), <<- global assign, attach(), rm(list=ls()), browser()/debug() leftovers, T/F ambiguity, 1:n() off-by-one, options(stringsAsFactors), and library() inside functions. Also adds custom_phases support to GenericLangOptions so generic plugins can inject language-specific phases without converting to full plugins. * fix(r): address PR review comments for code smell detectors - Fix _strip_r_comments to properly preserve string literals by using placeholder substitution before stripping comments - Fix _detect_library_in_function to only track function-scoped braces by finding function definitions and matching their brace pairs, eliminating false positives from if/for/while blocks at top level - Replace custom _find_r_files with framework's find_source_files to respect project-configured exclusion patterns - Add tests for the fixes: hash in strings, library in non-function blocks, nested functions * refactor(r): use tree-sitter for library_in_function detection Replace manual brace tracking with tree-sitter AST parsing for more accurate detection of library()/require() calls inside function bodies. Includes fallback to regex-based detection when tree-sitter is unavailable. Benefits: - Properly handles nested functions, strings, and edge cases - Uses existing R_SPEC tree-sitter configuration - Deduplicates matches in nested function scenarios * fix: handle generic fixers that return entries without 'removed' key Generic fixers (e.g., eslint-warning) return FixResult entries with {file, line} or {file, fixed} — no "removed" key. Four call sites in the autofix pipeline assumed "removed" was always present, causing KeyError for any generic fixer invocation. Guard all access sites with .get("removed", []) and add regression tests for the generic fixer result shape. Cherry-picked from PR #484 by @AugusteBalas Co-Authored-By: AugusteBalas <AugusteBalas@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: handle dataclass objects in json_default serializer EcosystemFrameworkDetection dataclass instances (containing Path fields) can leak into review_cache via shared dict references, causing TypeError on state serialization. Add a dataclass handler to json_default that converts via dataclasses.asdict(), letting json.dumps recurse naturally and hit the existing Path handler for nested fields. Bug identified by @0-CYBERDYNE-SYSTEMS-0 in PR #486 Reported-by: 0-CYBERDYNE-SYSTEMS-0 <0-CYBERDYNE-SYSTEMS-0@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: filter synthetic IDs from deferred skip counting Synthetic queue IDs (workflow::*, triage::*) could end up in the plan's skipped dict via migrate_deferred_to_skipped() or skip_items(), causing phantom deferred-disposition loop items. Filter them using the existing is_synthetic_id() pattern already used in 6+ other locations. Cherry-picked from PR #485 by @ryexLLC (synthetic loop fix only; dataclass serialization handled separately) Co-Authored-By: ryexLLC <ryexLLC@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent catastrophic backtracking in dart function extractor regex The annotation sub-pattern had overlapping whitespace consumption between the character class (includes \s) and trailing \s*, wrapped in ()*, causing exponential backtracking on inputs with multiple @-prefixed tokens. Possessive quantifiers (++ and *+) prevent the engine from backtracking into already-matched portions. Also fixes a latent correctness bug where annotated functions produced garbage names. Cherry-picked from PR #477 by @AvoMandjian Co-Authored-By: AvoMandjian <AvoMandjian@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: keep framework caches out of persisted review state Framework detection writes dataclass objects (EcosystemFrameworkDetection, NextjsFrameworkInfo) into review_cache, which shares a dict reference with state["review_cache"]. This caused TypeError on JSON serialization. The root fix: introduce a separate runtime_cache field for ephemeral per-scan memoization that is never persisted. Framework caching in detection.py and nextjs.py now uses runtime_cache. This cleanly separates scan-scoped data from persisted review state. Cherry-picked from PR #483 by @maciej-trebacz Co-Authored-By: Maciej Trębacz <maciej-trebacz@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: auto-resolve stale issues when zone policy reclassifies files When zone rules change (e.g. adding JS test patterns), files may be reclassified to zones where certain detectors are skipped by policy. Previously, existing open issues for those files would persist forever since verify_disappeared only auto-resolved when the source file was deleted. Now checks ZONE_POLICIES — if a file's zone says to skip the detector, the issue is auto-resolved. Also adds JS zone rules for .test./.spec./__tests__/__mocks__/ patterns. Uses the existing should_skip_issue() from zones.py rather than hardcoding detector names — works for all detector/zone combinations. Bug identified by @claytona500 in PR #478; JS zone rules also from that PR Reported-by: claytona500 <claytona500@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: rename session_token placeholder to avoid Snyk W007 false positive The placeholder <session_token_from_template> in the review JSON example triggers Snyk's credential-detection heuristic (W007). Renamed to <session_hmac_from_template> which is more accurate (it's a per-session HMAC, not a secret credential) and doesn't match scanner patterns. Closes #473 (reported by @mark-major) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use certifi CA bundle for update-skill SSL on macOS Bare urllib.request.urlopen uses the system cert store, which on macOS with Homebrew Python often has no CA certificates installed. Now uses certifi's CA bundle if available, with a helpful error message suggesting `pip install certifi` if SSL verification still fails. Closes #468 (reported by @Vuk97) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add Windows font fallbacks for scorecard rendering The scorecard image generator only had macOS and Linux font paths, so Windows users fell through to Pillow's load_default() bitmap font which renders tiny and different-looking. Adds Consolas, Georgia, Segoe UI, and Arial as Windows fallbacks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add README explaining work queue ordering and pre/post-triage modes Documents how items get from scan to execution queue, why test_coverage dominates pre-triage, that tier is display-only metadata, and the full sort order. Prompted by user feedback that the tool appears obsessed with test writing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add high-level process overview to README and fix work queue docs Main README now has a "How it works" section explaining the scan → score → review → triage → execute → rescan loop, and why triage matters (pre-triage queue is sorted by raw impact which can be noisy). Work queue README corrected to include lifecycle phase gating (PHASE_REVIEW_INITIAL gates objective items behind initial review). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: document exact filter chain for default queue items Lists the 6 filters that determine what appears in `next` pre-triage: open status, not suppressed, above confidence threshold, in scan scope, mechanical_defect kind, not skipped. With file:line references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: bias triage away from test-writing busy work Multiple users report the tool is "obsessed with writing tests" instead of cleaning up actual code issues. The root cause: triage LLMs promote test_coverage clusters because the prompt shows them first (sorted by issue count) with no guidance to defer. Changes: - Triage prompt now explicitly says: clean up code quality BEFORE test coverage. Writing tests for sloppy code locks in the slop. - Added "defer" action for auto-clusters (keeps in backlog for later) - Example shows test_coverage as "defer" not "break_up" - Scan coaching changed from "add tests" to "review gaps (fix code first)" - Catalog guidance changed from "add tests for untested modules" to "review coverage gaps — defer test writing until code quality resolved" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: skip cmd /c wrapping for .exe binaries on Windows On Windows, _resolve_executable() was unconditionally routing through cmd /c, even for .exe binaries. When prompts contain spaces, the double list2cmdline interaction (inner collapse by _wrap_cmd_c + outer by subprocess.Popen) produces \" escapes that cmd.exe doesn't understand, causing "unexpected argument" errors. Now only uses cmd /c for .cmd/.bat shims and unresolved fallback. .exe binaries are invoked directly. Closes #487 (reported by @Dteyn) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add progression log — append-only lifecycle event timeline Introduces `.desloppify/progression.jsonl`, an append-only JSONL log recording lifecycle boundary events (scan completions, review imports, triage completions, queue drains, phase transitions). Each line is a self-contained JSON object with discriminated `event_type` + `payload`, timestamps, scan_count, and phase_before/phase_after for full timeline reconstruction. Event types: scan_preflight, scan_complete, postflight_scan_completed, subjective_review_completed, triage_complete, entered_planning_mode, execution_drain. Key design decisions: - Events fire on idempotent marker flips, not inferred from reconcile - Timestamps serve as join keys into state.json and plan execution_log for full detail — the log is a timeline index, not a data copy - All hooks are best-effort (try/except, never break parent command) - Advisory file locking with 2s timeout, periodic trim at 2000 lines - prev_last_scan captured before merge_scan() to correctly anchor execution summaries to the previous scan boundary Also includes queue policy, auto-cluster, and next-command improvements that were pending on this branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version to 0.9.11 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: update test paths after docs → dev reorganization ci_plan.md and DEVELOPMENT_PHILOSOPHY.md moved to dev/ but two contract tests still referenced docs/. Also adds release notes draft for v0.9.11. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version to 0.9.12 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: move docs/ and website/ into dev/ Internal documentation (DEVELOPMENT_PHILOSOPHY, QUEUE_LIFECYCLE, ci_plan) and release infrastructure (checklist, template, examples) moved to dev/. Website separated as its own repo. Old commit summaries removed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: track dev/review/ — review pipeline prompts, schema, and results Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add `desloppify setup` for universal global skill install Bundles skill definitions via importlib.resources so `pip install desloppify && desloppify setup` installs Claude Code and Cursor skills globally (~/.claude/, ~/.cursor/) without network access. Also supports `--local` for project-level AGENTS.md, extends skill discovery to detect global installs during scan, and fixes pyproject.toml license field for PEP 621 compliance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: trim overzealous scope from setup command Remove --local mode, global skill discovery integration, and sync guard test. The setup command now does one thing: copy bundled skills to ~/.claude/ and ~/.cursor/. Per-project installs stay with update-skill. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add explicit "run next after scan" instruction to skill doc Agents were interpreting scan output themselves instead of running `desloppify next`. Added a clear directive between scan and the rest of the workflow. Also updated install strings to include `desloppify setup`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: tighten skill description to reduce false activations Removed loose keywords (code quality, naming issues, large files, etc.) that triggered the skill on generic programming questions. Added explicit negative guidance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: force-rescan now injects stale subjective reviews into queue force-rescan preserved plan_start_scores to protect manual clusters, but this caused cycle_just_completed to be False, perpetually deferring stale reviews behind objective backlog. Additionally, the workflow supersession check skipped sync_subjective_dimensions entirely. Fix: - Thread force_rescan param through reconcile_plan to override cycle_just_completed, bypassing deferral logic - Disable workflow supersession bypass when force_rescan is active - Add _refresh_plan_start_baseline() that reseeds scores and scan_count_at_plan_start without clearing workflow sentinels - 4 new tests covering stale injection, sentinel preservation, baseline reseeding, and end-to-end with objective backlog Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: mark clusters done when all items are resolved Clusters stayed execution_status="active" even when all their items were fixed, causing completed work to reappear in the queue after rescan. - Add EXECUTION_STATUS_DONE="done" to cluster_semantics - living_plan.py: set execution_status to "done" when cluster_done is logged via plan resolve - scan_issue_reconcile.py: add _reconcile_active_clusters_by_item_status() sweep that marks active clusters done when all items are resolved - Also fix _complete_empty_manual_clusters() which had the same bug - 2 new tests for cluster completion reconciliation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: don't supersede resolved items from plan clusters _supersede_dead_references conflated "not actionable" with "gone from state." Items with status fixed/resolved/wontfix were being superseded and stripped from clusters, causing completed clusters to appear incomplete after rescan. Root fix: only supersede items that don't exist in state at all (issue is None). Resolved items stay in their clusters so _reconcile_active_clusters_by_item_status can detect completion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: force-rescan bypasses queue-empty guard for reconciliation reconcile_plan() was guarded by live_planned_queue_empty() at two levels, preventing stale subjective reviews from ever being injected when ANY objective items (like test coverage gaps) remained in the queue. force_rescan=True now bypasses both guards so stale dimensions can be detected and injected regardless of objective backlog. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: strategist CEO role — trend validation, confirmation gate, strategic issues The strategize triage step now functions as a strategic overseer: 1. **Trend validation**: score_trend and debt_trend are cross-checked against computed data from score trajectory. Mismatches trigger a warning and the reported value is overridden with the computed one. 2. **Confirmation gate**: strategize can now be explicitly confirmed via --confirm strategize with 80+ char attestation (like other stages). Auto-confirm preserved for backward compat but human review is now possible. 3. **Strategic issues**: strategist can create high-priority work items via strategic_issues output field. These become strategy:: prefixed work items inserted at the front of queue_order. Downstream stages reference them as strategic priorities. strategy:: added to SYNTHETIC_PREFIXES so strategic issues don't block reconciliation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: strategist saves state for strategic issues + detects cross-cycle regression Two root causes found and fixed: 1. Strategic issues were created in memory but never persisted — triage services had no save_state method. Now calls save_state_or_exit() after creating work items. 2. Score trend said "improving" for a plateau because score_trajectory() only saw the sliding window (+1.9 within window) without knowing the score was 79.7 before a cycle reset. Now accepts cycle_start_score from plan_start_scores/previous_plan_start_scores and downgrades "improving" to "stable" when current score is still below the cycle baseline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: bulletproof strategist — all-time high, recovering trend, save_state 1. ScoreTrajectory now tracks all_time_high from full scan_history, not just the 5-scan window. Trend overridden to "recovering" when current score is >2pts below all-time high despite positive window delta. 2. "recovering" added as 4th trend value (improving/stable/declining/recovering). Accepted by _parse_briefing validation and documented in strategist prompt. 3. save_state added to TriageServices as first-class method. Strategize uses it to persist strategic work items. Fallback to direct import for backward compat. 4. _seed_plan_start_scores and _refresh_plan_start_baseline now preserve current plan_start_scores as previous_plan_start_scores before overwriting (only when previous is empty). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: force UTF-8 encoding in review runner log and payload reads Adds explicit encoding="utf-8", errors="replace" to all file reads in the review runner pipeline (runner_failures.py, runner_parallel/__init__.py). Prevents charmap decode errors on Windows where Python defaults to the platform encoding but Codex runners emit UTF-8. Subprocess calls in attempts.py left unchanged — they go through the deps injection seam and the runner process is already UTF-8. Cherry-picked from PR #495 by @pietrondo (file-read changes only) Co-Authored-By: pietrondo <pietrondo@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: support src-layout Python projects in test_coverage detector resolve_import_spec now tries src/-prefixed candidates when direct match fails, and _build_prod_by_module strips the src/ prefix from relative paths before computing module names. Both changes are needed so that src-layout projects (PEP 621) correctly map tests to production files. Adjustments: moved _SRC_PREFIXES to module-level constant (was function-local) Cherry-picked from PR #489 by @AreboursTLS Co-Authored-By: AreboursTLS <AreboursTLS@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent Knip hang when not installed as project dependency Add stdin=subprocess.DEVNULL to the subprocess.run call in knip_adapter.py to prevent npx from blocking on interactive prompts. Add --yes flag to npx args as belt-and-suspenders. Add pre-check for knip in node_modules to fail fast when knip is not a local dependency. Updated existing tests to create node_modules/.bin/knip marker files where needed. Closes #494 (reported by @goobsnake) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: stale subjective reviews always sync regardless of queue state The live_planned_queue_empty guard was blocking ALL reconciliation (including stale review injection) when mechanical items remained in the queue. Stale reviews should coexist with mechanical items, not be blocked by them. Move the guard AFTER subjective sync so stale reviews are always detected and injected. Auto-clustering and workflow sync remain gated by queue emptiness. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: stale subjective reviews always sync regardless of queue state Move the live_planned_queue_empty guard AFTER subjective sync so stale reviews are always detected and injected. Auto-clustering and workflow sync remain gated by queue emptiness. Remove will_inject_workflow gate and cycle_just_completed coupling from subjective sync path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: phase ordering — stale subjective reviews take priority over queued workflows/triage Pre-review workflow IDs (deferred disposition, run scan, import scores) still jump ahead of everything, but non-critical workflows (communicate score) and triage items now yield to stale subjective reviews that need refresh. Adds PRE_REVIEW_WORKFLOW_IDS constant and 3 tests covering the priority rules. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: review pipeline results for PRs #495, #493, #489, #189 and issues #494-#490 Stage 1 assessments, Stage 2 challenges/advocacy, and Stage 3 execution for the current batch of open PRs and issues. Backfilled Stage 2 files for older items that only had Stage 1 results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: disable subjective anti-gaming integrity policy Blind-packet subagent reviews cannot anchor to the target score, making false positives (legitimate score convergence) more likely than actual gaming. The policy was zeroing 4 dimensions that independently scored 85.0 — a 21-point strict score drop. - _apply_subjective_integrity_policy now returns assessments unchanged - Removed target_match_reset enforcement from scoring engine - Updated 7 tests to expect status="disabled" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: always supplement graph imports with source parsing for test coverage The import graph often resolves Python submodule imports (e.g., from megaplan.evaluation import X) to the package __init__.py rather than the actual submodule file. This caused false "transitive_only" reports for modules with dedicated test files. Previously, source parsing was only used as a fallback when the graph had no entries. Now it always runs as a supplement, catching submodule imports the graph missed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Python import regex handles parenthesized multi-line imports PY_IMPORT_RE couldn't match `from megaplan.evaluation import ( build_evaluation, ...)` because $\w+$ expected a word immediately after 'import', not an opening paren. Added \(?\s* to optionally match the paren and whitespace before the first imported name. This was the root cause of false "transitive_only" test coverage reports for Python modules with dedicated test files that use multi-line import syntax. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version to 0.9.13 * fix: add lower-bound guard in fix_debug_logs to prevent negative-index file corruption When entry["line"] is 0, start becomes -1, which passes the upper-bound guard and causes Python's negative indexing to silently operate on lines at the end of the file. The other three fixers already have this guard. Cherry-picked from PR #499 by @FloodExLLC Co-Authored-By: FloodExLLC <FloodExLLC@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: scope .gitignore CLAUDE.md/AGENTS.md rules to root-level only The blanket `CLAUDE.md` rule on line 43 matched any file named CLAUDE.md anywhere in the repo, preventing desloppify/data/global/CLAUDE.md from being tracked. This caused 4 CI failures because the bundled package data file was absent in fresh clones while existing in local working trees (where tests passed). Change both `CLAUDE.md` and `AGENTS.md` to `/CLAUDE.md` and `/AGENTS.md` so they only match at the repo root. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: bundle all 11 skill overlays with sync guard and pre-commit hook Previously data/global/ only had 7 of 11 overlays (AMP, CLAUDE, COPILOT, DROID, OPENCODE were missing). Add the missing files copied from docs/. Add defense-in-depth to prevent drift between docs/ and data/global/: 1. test_bundled_sync.py — pytest guard that fails if files diverge (CI) 2. .githooks/pre-commit — auto-syncs data/global/ when docs/*.md staged 3. make sync-docs — convenience target for manual sync 4. make install-hooks — installs the pre-commit hook, wired into install-ci-tools and install-full-tools for automatic setup 5. make package-smoke — extended to verify wheel includes all bundled docs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: make global install the primary skill delivery mechanism Expand desloppify setup from 2 targets (claude, cursor) to 5 verified tools with global paths confirmed against official documentation: - Claude Code: ~/.claude/skills/ (code.claude.com/docs/en/skills) - Codex CLI: ~/.codex/AGENTS.md (developers.openai.com/codex/guides/agents-md) - Gemini CLI: ~/.gemini/skills/ (geminicli.com/docs/cli/skills/) - AMP: ~/.config/agents/skills/ (ampcode.com/news/agent-skills) - OpenCode: ~/.config/opencode/skills/ (opencode.ai/docs/skills/) Cursor removed from global targets — its global rules are UI-only, not filesystem-based (cursor.com/docs/rules). Key changes: - GLOBAL_TARGETS is the single source of truth in skill_docs.py (setup/cmd.py imports it, no duplication) - Add skip-if-current: don't rewrite files already at current version - Add shared-file handling: codex AGENTS.md uses section-replace - Add global staleness detection: find_stale_global_installs(), find_any_global_install(), updated check_skill_version() - Kill silent auto-update in agent_context.py — warn-only now, following pre-commit/husky best practice - Staleness warnings recommend "desloppify setup" for global, "desloppify update-skill" for per-project - Fix codex hint in runner_failures.py to reference AGENTS.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: false positives in orphaned, hardcoded_secret_name, and cycles detectors - orphaned: recognize __all__ exports as public API (skip from detection) - hardcoded_secret_name: add entropy heuristic to filter field names, sentinels, label prefixes - cycles: mark TYPE_CHECKING-guarded imports as deferred (excluded from cycle detection) - assessments: isinstance guard before .get() on potentially corrupted state values Closes #496, closes #465 Reported-by: Git-on-my-level <Git-on-my-level@users.noreply.github.com> Reported-by: Vuk97 <Vuk97@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: review pipeline — bias to action, ask maintainer when unsure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version to 0.9.14 * fix: skip tree-sitter spec tests when grammar not available in CI Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: skip tree-sitter spec tests when grammar not available in CI Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: simplify lifecycle phases to plan/execute modes Collapse 8 fine-grained persisted phases to 2 modes (plan/execute). Pipeline stage derived from queue contents via derive_pipeline_stage(). Display phase mapped via stage_to_display_phase() for consumer compat. Key changes: - sync_subjective_dimensions moved to boundary-only (fixes stuck-phase bug) - _raw_persisted_phase uses current_lifecycle_phase for migration - clear_postflight_scan_completion no longer forces execute mode - Cluster filter exempts plan-mode items (subjective, workflow, triage) - Scan preflight respects live_planned_queue_empty + snapshot - Migration: old phases map to plan/execute, stale subjective items pruned Closes the assessment_postflight deadlock where stale subjective items kept being re-injected mid-cycle, preventing transition to execute. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: remove legacy phase inference, unify snapshot derivation - current_lifecycle_phase() never returns None (always plan/execute) - _legacy_phase_inference, _ordered_postflight_phase, _raw_persisted_phase deleted - _DISPLAY_PHASE_ITEM_MAP, PHASE_* snapshot aliases deleted - snapshot.py: 738 → 614 lines (-124) - Single derivation path via _derive_display_phase - Scan preflight respects live_planned_queue_empty - Cluster filter exempts plan-mode items - Migration prunes stale subjective items from old plans Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: hide lifecycle jargon from users, auto-resolve communicate-score - user_facing_mode() maps internal display phases to "plan"/"execute" - Workflow items show friendly labels: (Ready to scan), (Create plan), etc - communicate-score auto-resolves during reconcile (no manual queue item) - explain_queue() shows "Mode: plan/execute" instead of raw phase names - Sentinel preserved during auto-resolve to prevent re-firing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: enforce single-writer lifecycle, eliminate side-channels - set_lifecycle_phase is now private, called only from reconcile_plan() - resolve_workflow no longer side-channels triage injection — uses plan-state marker (workflow_plan_just_resolved) read by reconcile - invalidate_postflight_scan only clears scan marker, never writes mode - All 5 invalidation sites use uniform Pattern A (invalidate + reconcile) - living_plan.py collapsed to single reconcile decision path - Triage kickoff helpers moved from app/ to engine/_plan/triage/lifecycle.py - AST-backed enforcement test: no production code writes lifecycle_phase outside _set_lifecycle_phase + one-shot migration - Derivation equivalence test: pipeline and snapshot agree on display phase Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add plan_checkpoint as canonical score checkpoint with sparkline Add plan_checkpoint progression event — the single canonical score snapshot in progression.jsonl. Fires when communicate-score auto-resolves after subjective reviews are cleared (review path) or when no subjective items exist (scan path). Remove redundant scores from scan_complete, review_complete, triage_complete, and execution_drain events. Key design: gate sync_communicate_score_needed with defer_if_subjective_queued so scan path defers when subjective items remain, routing checkpoints through the clean review-import flow. Also adds: - Delta fields (resolved_since_last, skipped_since_last, execution_summary) with last_plan_checkpoint_timestamp() helper for windowing - Smoothed sparkline on terminal status scorecard (≥3 checkpoints) - Snapshot rebaseline fields on ReconcileResult to avoid post-reconcile clearing race, save-success gating on scan path - Remove source_command duplication from checkpoint payload (envelope only) - Clean up dead prev_scores plumbing from ScanRuntime Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: stale focus counts across status/next/scan commands Three rendering sites showed raw cluster issue_ids count (including resolved items) instead of filtering to items still in the work queue. Also guard set_focus against focusing completed clusters. Closes #503 (reported by @NovaRagnarok) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: check all graph keys for normalization, not just first 3 The sampling heuristic ([:3]) could silently skip normalization if the first few keys happened to be relative while others were absolute. Relates to #502 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: consolidate lifecycle phase derivation and fix force-rescan review marker Three related changes to the plan/execute lifecycle machinery: 1. Fix: force-rescan no longer resets subjective review completion. Added carry_forward_subjective_review() that promotes the marker when the old review matches the cycle being replaced. 2. Refactor: consolidate phase derivation into shared derive_display_phase() pure function. Both pipeline and snapshot now delegate to the same boolean-signal priority chain. Migration moved to load_plan() time, current_lifecycle_phase() is now a pure reader. Marker invariants documented in module docstring. 3. Cleanup: remove snapshot _phase_for_snapshot bypasses. Mode-aware signal shaping (suppress_postflight_signals, prefer_scan) now lives in the caller, _derive_display_phase is a thin items→bools mapper. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add explicit UTF-8 encoding to external tool report readers Closes #505 (reported by @pietrondo) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add v0.9.14 release notes draft Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version to 0.9.15 * fix: cross-extension test-to-source mapping for TypeScript (.test.ts → .tsx) OverlayEditor.test.ts could not find OverlayEditor.tsx because map_test_to_source only tried the test file's own extension after stripping the .test. marker. Now tries all TS/JS extensions (.ts, .tsx, .js, .jsx) for each candidate. Also fixes a variable shadowing bug where _TS_EXTENSIONS (used by resolve_import_spec for /index.…

AreboursTLS force-pushed the fix/src-layout-test-coverage branch from eebf9df to 30a686e Compare March 19, 2026 18:06

peteromallet closed this Mar 21, 2026

peteromallet added the release:v1.0 Included in v1.0 label May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: support src-layout Python projects in test_coverage detector#489

fix: support src-layout Python projects in test_coverage detector#489
AreboursTLS wants to merge 1 commit into
peteromallet:mainfrom
AreboursTLS:fix/src-layout-test-coverage

AreboursTLS commented Mar 19, 2026 •

edited

Loading

Uh oh!

peteromallet commented Mar 21, 2026

Uh oh!

peteromallet commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AreboursTLS commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Testing

Uh oh!

peteromallet commented Mar 21, 2026

Uh oh!

peteromallet commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AreboursTLS commented Mar 19, 2026 •

edited

Loading