fix: support src-layout Python projects in test_coverage detector#489
Closed
AreboursTLS wants to merge 1 commit into
Closed
fix: support src-layout Python projects in test_coverage detector#489AreboursTLS wants to merge 1 commit into
AreboursTLS wants to merge 1 commit into
Conversation
resolve_import_spec now tries src/-prefixed candidates when direct module-path candidates don't match any production file. This handles the common src-layout pattern (e.g. src/argos_toolkit/foo.py). _build_prod_by_module now strips the 'src/' prefix from relative paths before computing module names, so the module index maps 'argos_toolkit.foo' instead of 'src.argos_toolkit.foo'.
eebf9df to
30a686e
Compare
Owner
|
Thanks for the fix, @AreboursTLS! The src-layout gap is real — PEP 621 projects with Cherry-picked both changes into
One minor adjustment: moved All tests pass. Thanks for the clean, targeted fix! |
peteromallet
added a commit
that referenced
this pull request
Mar 21, 2026
resolve_import_spec now tries src/-prefixed candidates when direct match fails, and _build_prod_by_module strips the src/ prefix from relative paths before computing module names. Both changes are needed so that src-layout projects (PEP 621) correctly map tests to production files. Adjustments: moved _SRC_PREFIXES to module-level constant (was function-local) Cherry-picked from PR #489 by @AreboursTLS Co-Authored-By: AreboursTLS <AreboursTLS@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
samtuckerdavis
pushed a commit
to Open-Paws/desloppify
that referenced
this pull request
Mar 21, 2026
resolve_import_spec now tries src/-prefixed candidates when direct match fails, and _build_prod_by_module strips the src/ prefix from relative paths before computing module names. Both changes are needed so that src-layout projects (PEP 621) correctly map tests to production files. Adjustments: moved _SRC_PREFIXES to module-level constant (was function-local) Cherry-picked from PR peteromallet#489 by @AreboursTLS Co-Authored-By: AreboursTLS <AreboursTLS@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Owner
|
Thanks for the contribution. This is included in v1.0: https://github.com/peteromallet/desloppify/releases/tag/v1.0 |
peteromallet
added a commit
that referenced
this pull request
May 13, 2026
Record co-author trailers for PR authors included in the v1.0 release cycle so GitHub can associate release-cycle contribution credit with the tag. Refs: #189, #484, #485, #486, #489, #493, #495, #529, #539, #573, #580, #581, #584, #585, #589, #602, #603 Co-authored-by: R. Desmond <134018026+0-CYBERDYNE-SYSTEMS-0@users.noreply.github.com> Co-authored-by: AreboursTLS <77301936+AreboursTLS@users.noreply.github.com> Co-authored-by: AugusteBalas <128148269+AugusteBalas@users.noreply.github.com> Co-authored-by: Alex Price <2804025+awprice@users.noreply.github.com> Co-authored-by: Klaus Agnoletti <24544601+klausagnoletti@users.noreply.github.com> Co-authored-by: Koshi <18751916+koshimazaki@users.noreply.github.com> Co-authored-by: Pietro <6080662+pietrondo@users.noreply.github.com> Co-authored-by: raveinid <7130195+raveinid@users.noreply.github.com> Co-authored-by: Ryan Gerstenkorn <4079939+RyanJarv@users.noreply.github.com> Co-authored-by: ryexLLC <217349586+ryexLLC@users.noreply.github.com> Co-authored-by: Maximilian Scholz <6530123+sims1253@users.noreply.github.com> Co-authored-by: Tristan Manchester <108270628+tristanmanchester@users.noreply.github.com>
samtuckerdavis
added a commit
to Open-Paws/desloppify
that referenced
this pull request
May 13, 2026
* desloppify: unify triage policy and error contracts
* desloppify: streamline holistic review prep
* desloppify: finish migration seam cleanup
* desloppify: reset language runtime state coherently
* desloppify: clarify coverage mapping and predicates
* desloppify: simplify triage validation seams
* desloppify: preserve review queue and observe evidence
* desloppify: clarify compatibility naming seams
* desloppify: reorganize framework and holistic package seams
* desloppify: narrow public language framework surface
* desloppify: simplify stale smell workflow seams
* desloppify: streamline reporting and planning helpers
* desloppify: strengthen review coverage seams
* desloppify: tighten review and queue type contracts
* fix: skip Go generated files and improve import-run error messages
Add Go zone rules for generated file patterns (.generated., _gen.go,
.pb.go, _string.go, _enumer.go) and config files (go.mod, go.sum).
Improve --import-run error to explain what's missing and suggest next steps.
Closes #402, addresses #401.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: restore language registry after reset test to prevent pollution
test_reset_runtime_state_clears_registry_and_hooks called
reset_runtime_state() without saving/restoring the registry, causing
5 downstream tests (erlang, ocaml, fsharp, javascript, bash) to fail
when the language plugins couldn't re-register (already imported).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: delete 23 compat wrapper files and remove SimpleNamespace antipatterns
Per project policy ("no backward compat for import paths — remove
re-export facades, wrapper shims, compat layers"), delete all thin
wrapper files left behind by recent package reorganizations:
- 13 _framework/ wrappers (commands_base, generic, registry_state, etc.)
- 8 context_holistic/ wrappers (budget_*, selection_contexts)
- 2 helpers/ wrappers (runtime.py, persist.py)
Replace SimpleNamespace fake-module pattern in override_misc.py and
commit_log/dispatch.py with direct function calls.
Update 27 import sites and 4 test monkeypatch targets to use canonical
paths. All 5193 tests pass.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* desloppify: remove scan workflow lazy reconcile import
* desloppify: retire triage facade hot path
* desloppify: surface suggestion/evidence in show and cluster commands
Add suggestion and evidence fields to show command output and cluster
member display so triage stages can investigate issues without JSON
exports. Add investigation command hints to compact summaries (self_record
mode only), forward observe assessments to sense-check, and update
enrich/sense-check instructions to reference the new data paths.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* desloppify: split organize validation and batch context normalization
* feat: normalize C++ security tool findings
* desloppify: unify issue lifecycle status and surface history to reviewers
Add deferred/triaged_out to Status enum so state is always authoritative
for issue disposition. Previously temporary and triaged_out skips left
issue.status as "open", causing overcounting and misleading displays.
Part A — Status unification:
- Add DEFERRED, TRIAGED_OUT to Status enum and _CANONICAL_ISSUE_STATUSES
- Update FAILURE_STATUSES_BY_MODE (all 3 scoring modes)
- Map temporary skip → deferred, triaged_out skip → triaged_out in state
- Backlog/unskip reopen deferred/triaged_out back to open
- Triage dismiss sets state status to triaged_out
- Reconcile migrates existing open+skipped → correct status on scan
- Treat deferred/triaged_out as alive in reconcile (not superseded)
- Add status icons (⏸ deferred, △ triaged_out)
- Update plan header and summary_lines for new status buckets
Part B — Surface history to reviewers:
- Default --retrospective to True (--no-retrospective to opt out)
- Rewrite render_historical_focus with status grouping and CLI hints
- Add render_dimension_deferral_context for stale dimension warnings
- Wire both into batch and external review prompt paths
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: harden C++ security normalization fallback and deduplication
* fix: repair C++ coverage logic regex
* fix: harden C++ tool-backed security scanning
* desloppify: two-phase observe/judge review scoring with evolving characteristics
Split holistic review into Phase 1 (observe: collect characteristics and
defects) and Phase 2 (judge: synthesize dimension_character, then score).
Positive observations now persist as context insights with positive: true
and full provenance (added_at, source), replacing ephemeral strengths.
judgment.strengths is backfilled from positive insights after import.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: batch cppcheck issue scanning for C++
* Fix scan export and scoring impact crash paths
* fix: surface blind-review workflow in first anti-gaming penalty message
The anti-gaming safeguard for subjective dimensions is working as designed,
but agents weren't discovering the blind-review workflow because:
1. The first penalty message said only "Re-review objectively" with no
pointer to the blind packet or agent overlay docs
2. The blind packet hint only appeared after repeated penalties (streak >= 2)
3. SKILL.md's anti-gaming note didn't reference the overlay docs
Now the first penalty immediately surfaces:
- The blind packet path (.desloppify/review_packet_blind.json)
- Pointers to docs/CLAUDE.md and docs/HERMES.md for the full workflow
https://claude.ai/code/session_01RqpTiawULymfeVXW8X8ySq
* fix: redact numeric target from penalty messages to prevent anchoring
The penalty message "matched target 95.0" leaks the exact target score
to the agent. Even with a blind initial review, the agent infers the
target from penalty output and anchors on every subsequent re-review,
creating an unbreakable loop that burned through an entire Claude Max
session in 60 minutes.
Changes:
- Replace "matched target {N}" with "clustered on the scoring target"
- Replace "parked on target {N}" with "parked on the scoring target"
- Redact target label from summary integrity warnings
- Strengthen re-review instruction: "Launch a fresh, context-isolated
agent" instead of "Re-review objectively"
The numeric target remains available via `desloppify show subjective`
for human operators who need it.
https://claude.ai/code/session_01RqpTiawULymfeVXW8X8ySq
* fix: rust workspace rustdoc execution
* docs: add C++ full-scan requirements
* fix: close Windows and C++ review blockers
* chore: drop local C++ planning docs from PR
* code health: broad cleanup and triage/review improvements
Refactors triage validation, stage prompts, review batch scoring,
and display layout. Adds evidence parsing enhancements, confirmation
helpers, and execution constraints. Cleans up unused imports and
dead code across tests and production modules.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address remaining C++ PR review issues
* chore: gitignore .claude/ and remove tracked lock file
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: resolve lint, mypy, and stale test expectations
- Add missing SubjectiveVisibility import (F821)
- Add missing Any import in runner_parallel/types.py (F821)
- Fix union type annotation in core_normalize.py (mypy return-value)
- Remove stale generic.py from mypy files list
- Update review batch test expectations after scoring simplification
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: restore postflight scan marker after triage skip commands
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump version to 0.9.6
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump version to 0.9.7 and add tweet-on-release workflow
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: gate PyPI publish on main branch only
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump version to 0.9.8
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: restore push triggers on PyPI publish workflow
The CI contract test expects push.tags and push.branches triggers.
Gate still blocks releases not targeting main.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* desloppify: update triage and issue semantics
* chore: update local changes
* feat: queue ownership, cluster semantics, lifecycle phase improvements
Adds cluster_semantics module, refines issue semantics, updates
work queue snapshot and plan ordering for phase isolation. Extends
lifecycle phase handling and auto-cluster sync. Updates tests across
plan, state, review, and narrative modules.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: sequential reconciliation pipeline, execution status flags, plan loading consolidation
Fix cluster tracker race on parallel updates by introducing a shared
boundary-triggered reconciliation pipeline that runs all sync steps
sequentially. Add execution_status (active/review) flags to clusters,
consolidate plan load/recovery into persistence module, and rename
reconcile modules for clarity.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: unblock objective resolves while triage is pending
* feat: auto-resolve stale issues for deleted files, triage/queue/reconcile improvements
- Fix #412: scan merge now auto-resolves open issues when the source file
no longer exists on disk (verify_disappeared + MergeScanOptions.project_root)
- Triage: sense-check orchestration, completion policy, validation stage upgrades
- Work queue: snapshot overhaul, synthetic workflow, ranking refinements
- Plan: reconcile pipeline expansion, refresh lifecycle consolidation,
phase cleanup support, scan issue reconcile enhancements
- Review: holistic cluster modules removed (inlined), import plan sync expanded
- Rust: fixer/detector cleanup, remove compat re-export wrapper
- Tests: broad coverage additions across triage, reconcile, queue, holistic
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: accept substantive work-product descriptions in triage attestations
Attestation validation for organize/enrich/sense-check stages no longer
requires literal cluster name references — detailed descriptions of the
verified work are now accepted as an alternative.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: reorganize cluster/override into subpackages, triage validation improvements, review batch enhancements
Move cluster_ops/update modules into cluster/ subpackage and override modules into
override/ subpackage. Enhance triage completion policy, stage validation, and review
batch execution phases. Add dynamic loaders, scan preflight checks, and expanded test coverage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: update scorecard image
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: update renamed test reference in Makefile and CI contract
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: improve cxx detector scoping (PR #415)
- Filter C/C++ security findings to scoped first-party file set
- CMake-based test coverage mapping via add_executable/add_library/target_sources
- Disable unsound generic unused-import phase for C++
- Fix _extract_import_name for C++ header extensions (.h, .hh, .hpp)
- Remove duplicate test, add missing EOF newlines
Co-Authored-By: Dragoy <Dragoy@users.noreply.github.com>
* docs: add .desloppify/ gitignore reminder to setup instructions
Closes #416.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version to 0.9.9, update scorecard
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: stub requests module in tweet release tests
The tests load tweet_release.py which imports requests at module level.
Without a stub, all 7 tests fail with ModuleNotFoundError since requests
is a CI script dependency, not a project dependency.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* ci: retrigger checks after branch protection update
* fix: simplify PyPI publish to trigger on push to main only
Removed redundant release and tag triggers — push to main with a bumped
version in pyproject.toml is sufficient. The "check if version exists"
step makes this idempotent.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: remove test_release_image.png and AGENTS.md from repo
These files were accidentally tracked — test_release_image.png is a
test artifact and AGENTS.md is a local Claude Code skill file. Added
AGENTS.md to .gitignore to prevent re-adding.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add Rust inline-test filtering: ignore clippy diagnostics in cfg(test) modules (#440)
* feat(r): improve tree-sitter R_SPEC function and import queries (#449)
- Add anonymous function detection (function definitions passed as
arguments to calls like lapply, purrr::map, etc.)
- Add namespace operator (pkg::fn) capture in import query so
dplyr::select, data.table::fread etc. are recognized as imports
- Both patterns capture @path for compatibility with the dep graph
and unused imports analysis
* feat(ruby): improve plugin — excludes, detect markers, default_src, spec/ support, README, tests (#462)
* feat(ruby): improve plugin — excludes, detect markers, default_src, README, tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: OpenAI Codex <codex@openai.com>
* feat(ruby): add spec/ test dir, bin/ exclusion; expose external_test_dirs in generic_lang
- Add external_test_dirs and test_file_extensions parameters to generic_lang()
so plugins can override the hardcoded ["tests", "test"] defaults
- Configure Ruby plugin with external_test_dirs=["spec", "test"] (RSpec + Minitest)
- Add bin/ to Ruby exclusions (binstubs/shims)
- Update tests: add bin/ to excluded dirs list, add test_external_test_dirs_includes_spec
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: OpenAI Codex <codex@openai.com>
* docs(ruby): add bin/ to exclusions list in README
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: OpenAI Codex <codex@openai.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Gemini <gemini@google.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
* feat: add Factory Droid skill harness support (#451)
- Add 'droid' to SKILL_TARGETS (.factory/skills/desloppify/SKILL.md)
- Add .factory/skills/ to SKILL_SEARCH_PATHS for auto-discovery
- Create docs/DROID.md overlay with review and triage workflow
- Bump SKILL_VERSION to 6
- Add droid to README agent prompt harness list
* docs(python): add user-facing section to README (#459)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(javascript): add plugin tests and documentation (#458)
Co-authored-by: Gemini <gemini@google.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* 0.9.10 (#463)
* fix: strip image blocks from release notes on website
The release notes contain a mascot image that renders as a broken
or unwanted image on the website. Strip HTML <p><img></p> blocks
from release body before rendering.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: derive project root from state_file path in do_import_run follow-up scan
When running `desloppify review --import-run --scan-after-import`, the
follow-up scan was using _runtime_project_root() which could return a
contaminated path (pointing to the results directory instead of the
actual project root). This caused state to be written to the wrong
location. Instead, derive the project root from the state_file parameter
which is known to be correct: state_file.parent.parent gives us the
project root from `<root>/.desloppify/state-<lang>.json`.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: preserve plan_start_scores during force-rescan to protect manual clusters
_reset_cycle_for_force_rescan() was clearing plan_start_scores, which
made is_mid_cycle() return False. This caused auto_cluster_issues() to
run full cluster regeneration instead of early-returning, wiping manual
cluster items via issue ID reconciliation in scan_issue_reconcile.py.
The fix stops clearing plan_start_scores so is_mid_cycle() remains True
during force-rescan, preserving manual cluster data.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: require explicit triage decisions for auto-clusters
Auto-clusters (auto/unused, auto/security, etc.) were silently left in
backlog because the triage prompt said "silence means leave in backlog"
and the output schema had no field for auto-cluster decisions. Now the
triager must make an explicit promote/skip/break_up decision for each
auto-cluster, and apply_triage_to_plan() processes those decisions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: require explicit backlog decisions for auto-clusters in staged triage
The staged triage pipeline previously treated auto-clusters as optional in
the reflect stage ("silence means it stays in backlog"). This change makes
auto-cluster decisions mandatory, matching the treatment review issues get
via the Coverage Ledger.
Changes:
- Reflect instructions: require a ## Backlog Decisions section listing every
auto-cluster with promote/skip/supersede (replaces "silence means leave")
- Organize instructions: clarify that ALL backlog decisions from reflect
must be executed, not just promotions
- Reflect validation: parse and persist BacklogDecision entries; warn (but
don't block) when auto-clusters exist without a Backlog Decisions section
- Organize validation: warn when reflect requested promotions that weren't
executed during organize
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: unified triage pipeline + step detail display improvements
Unified triage pipeline:
- Widen is_triage_finding to all defects (mechanical + review + concern)
- Sub-group auto-clusters by rule kind (auto/security-B602 instead of auto/security)
- Add MEDIUM+LOW bandit filter and skip_tests config option
- Auto-cluster statistical summaries in triage prompt (severity, confidence, samples)
- Cluster-level observe sampling (ClusterVerdict parsing)
- Blocking backlog decisions validation (every auto-cluster must have a decision)
- Threshold-based staleness (10% mechanical growth, any new review issue)
- Two-tier accounting: review issues get per-item ledger, mechanical via cluster decisions
- Auto-add manual cluster members to queue_order on add_to_cluster
Display improvements:
- cluster show: steps now show effort tag, wrapped detail (4 lines), short refs
- cluster show: members compact when steps exist (ID list, not full issue detail)
- cluster list --verbose: effort summary column (3T 1S), hide empty auto-clusters, drop noise columns
- next: cluster drill header shows step done markers and effort tags
- next: individual task shows full untruncated step detail matched via issue_refs
- next: focus mode shows cluster context + relevant step detail
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: lifecycle transition messages and agent directives
Add transition_messages config and directives CLI for phase-specific agent
instructions (model switching, constraints). Emit transition messages at
lifecycle phase changes across resolve, skip, reopen, review import, and
reconcile flows. Auto-focus cluster during mid-cluster execution so
desloppify next stays in context. Hermes reset includes cluster-aware
next-task instructions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add dev test-hermes command and bump skill doc version
desloppify dev test-hermes: smoke-test Hermes model switching by switching
to a random model and back. Skill doc version bumped to v6.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: handle missing git in review coordinator, remove unused import
Wrap git status call in try/except OSError so review coordinator doesn't
crash when git is unavailable. Remove unused triage_scoped_plan import
from stage_validation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: update Hermes overlay for delegate_task, add directives docs, update website
Rewrite HERMES.md: delegate_task subagent pattern replaces worktree-based
parallel review. Add agent directives section to SKILL.md. Website:
initiative #2 now active with $1k bounty challenge details.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(r): correct shell quote escaping in lintr command (PR #424)
Co-Authored-By: Maximilian Scholz <dev.scholz@mailbox.org>
* feat(r): add Jarl as fast R linter with autofix (PR #425)
Co-Authored-By: Maximilian Scholz <dev.scholz@mailbox.org>
* fix: phpstan stderr/JSON parser fixes (PR #420)
Co-Authored-By: Nick Perkins <nick@nickperkins.au>
* fix(engine): prevent workflow::create-plan re-injection after resolution (PR #435)
Co-Authored-By: Charles Dunda <charles.dunda@perchwell.com>
* feat: add SCSS language plugin (PR #428)
Co-Authored-By: Klaus Agnoletti <github@agnoletti.dk>
* fix: Rust dep graph hangs from string-literal fake imports (PR #429)
Co-Authored-By: Riccardo Spagni <ric@spagni.net>
* fix: binding-aware unused import detection for JS/TS (PR #433)
Co-Authored-By: Tom <tswift1991@icloud.com>
* fix: project root detection, force-rescan plan wipe, and manual cluster visibility (PR #439)
* perf(scan): detector prefetch + cache for faster scans (PR #432)
Co-Authored-By: Tom <tswift1991@icloud.com>
* feat(frameworks): FrameworkSpec layer + Next.js spec (PR #414)
Co-Authored-By: Tom Swift <tswift1991@icloud.com>
* fix: allow scan when queue is fully drained regardless of lifecycle phase
Fixes #441
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: quote paths for Windows cmd /c and use utf-8 encoding in log recovery
Fixes #442
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: merge retry batch results with original run before coverage check
Fixes #443
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(docs): SKILL.md cleanup — remove unsupported frontmatter, fix file naming, generalize install
Fixes #444, #445, #446
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* cleanup: remove dead _strip_c_style_comments_preserve_lines shim from rust/tools.py
Follow-up to PR #440 (Rust inline-test filtering).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: move queue_total==0 check into score_display_mode (#441)
Move the empty-queue guard from scan_queue_preflight into
score_display_mode() so ALL callers (status, plan nudge, next flow)
benefit from the fix, not just scan preflight.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: extract anonymous functions in tree-sitter specs (R lang)
PR #449 added an R anonymous function query pattern that captures @fn
but the extractor requires @name, silently skipping all anonymous
function matches. Fix the extractor to synthesize an "<anonymous>" name
when @name is absent but @func is present.
Original R spec contributed by sims1253 in PR #449.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(docs): sync .agents SKILL.md with docs copy, add pip fallback and batch naming note
- Remove `allowed-tools` frontmatter from .agents/skills/desloppify/SKILL.md (#444)
- Add `pip install` fallback note alongside uvx in both copies (#446)
- Add batch output naming clarification (batch-N.raw.txt vs .json imports) (#445)
- Sync agent directives section and version bump to .agents copy
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: collapse cmd /c arguments into single string for proper Windows quoting
The previous fix pre-quoted the executable path, but the actual breakage was
in argument paths (-C repo_root, -o output_file) containing spaces. Pre-embedding
quotes in a subprocess list causes double-quoting because Popen's list2cmdline()
adds its own quotes.
The real issue: cmd /c concatenates everything after /c and re-parses it with its
own tokeniser. The fix introduces _wrap_cmd_c() which uses subprocess.list2cmdline()
to build the inner command as a single properly-quoted string, then passes that as
one token after /c: ["cmd", "/c", "codex exec -C \"path with spaces\" ..."].
- Revert incorrect executable pre-quoting in _resolve_executable
- Add _wrap_cmd_c() to properly collapse cmd /c commands
- Apply _wrap_cmd_c in codex_batch_command after building the full arg list
- Keep correct encoding="utf-8", errors="replace" fix in io.py
- Add tests for _wrap_cmd_c and Windows codex_batch_command path quoting
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: skip coverage gate on partial batch retry instead of merging results
Replace the 195-line merge approach (find_prior_run_merged_results +
overlay_retry_results_on_prior) with a ~5-line bypass: when --only-batches
selects a subset of the packet's batches, set allow_partial=True so the
coverage gate does not reject the partial retry.
The merge approach had multiple issues: wrong prior-run selection after
failed retry chains, dimension name normalization mismatches, and stale
metadata in combined output. The simpler fix recognizes that a partial
retry inherently cannot cover all dimensions, and the original run already
handled the rest.
Fixes #443
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* bump version to 0.9.10
* bump version to 0.9.10
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* bump version to 0.9.10
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: gitignore .agents/ and untrack generated skill doc
The .agents/skills/desloppify/SKILL.md is a generated file (same as
.claude/skills/). Canonical copies live under docs/.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: add hermes and droid to update-skill help text
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: draft 0.9.10 release notes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(ruby): improve plugin — excludes, detect markers, default_src, spec/ support, README, tests (#462)
* feat(ruby): improve plugin — excludes, detect markers, default_src, README, tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: OpenAI Codex <codex@openai.com>
* feat(ruby): add spec/ test dir, bin/ exclusion; expose external_test_dirs in generic_lang
- Add external_test_dirs and test_file_extensions parameters to generic_lang()
so plugins can override the hardcoded ["tests", "test"] defaults
- Configure Ruby plugin with external_test_dirs=["spec", "test"] (RSpec + Minitest)
- Add bin/ to Ruby exclusions (binstubs/shims)
- Update tests: add bin/ to excluded dirs list, add test_external_test_dirs_includes_spec
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: OpenAI Codex <codex@openai.com>
* docs(ruby): add bin/ to exclusions list in README
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: OpenAI Codex <codex@openai.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Gemini <gemini@google.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
* feat: add Factory Droid skill harness support (#451)
- Add 'droid' to SKILL_TARGETS (.factory/skills/desloppify/SKILL.md)
- Add .factory/skills/ to SKILL_SEARCH_PATHS for auto-discovery
- Create docs/DROID.md overlay with review and triage workflow
- Bump SKILL_VERSION to 6
- Add droid to README agent prompt harness list
* docs(python): add user-facing section to README (#459)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(javascript): add plugin tests and documentation (#458)
Co-authored-by: Gemini <gemini@google.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(docs): correct autofix command in Ruby and JS plugin READMEs
The command is `desloppify autofix`, not `desloppify fix` or
`desloppify scan --fix`.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: update release notes with late-merged PRs and stats
Add #458, #459, #462 contributions from klausagnoletti.
Update stats to reflect final commit/file/test counts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(scss): replace {file_path} placeholders with glob patterns and use unix formatter
The tool runner does not substitute {file_path} placeholders, so
stylelint was receiving literal "{file_path}" and failing silently.
Switch to glob patterns (matching every other plugin) and use
--formatter unix with the gnu parser, since stylelint's JSON output
doesn't match the expected json parser format.
Based on findings from @klausagnoletti in PR #452.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Maximilian Scholz <dev.scholz@mailbox.org>
Co-authored-by: Nick Perkins <nick@nickperkins.au>
Co-authored-by: Charles Dunda <charles.dunda@perchwell.com>
Co-authored-by: Klaus Agnoletti <github@agnoletti.dk>
Co-authored-by: Riccardo Spagni <ric@spagni.net>
Co-authored-by: Tom <tswift1991@icloud.com>
Co-authored-by: Klaus Agnoletti <24544601+klausagnoletti@users.noreply.github.com>
Co-authored-by: Gemini <gemini@google.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
* chore: remove release notes file after publishing to GitHub Releases
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: stop re-injecting phantom issue IDs from action_step refs into cluster membership
Step refs are traceability metadata, not membership. Merging them into
issue_ids caused bare shorthand IDs from the triage runner to become
phantom cluster members that don't exist in work_items and reappear
after every reconcile → load cycle.
Membership recovery is already handled by execution log (recovered_members)
and overrides (override_members).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(update-skill): detect duplicate content when begin/end markers are missing
Raises CommandError if the file already has desloppify skill content
(version marker present) but is missing the begin/end markers, preventing
silent duplicate appends.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(r): add R-specific code smell detectors
Add 10 R-specific smell checks: setwd(), <<- global assign, attach(),
rm(list=ls()), browser()/debug() leftovers, T/F ambiguity, 1:n()
off-by-one, options(stringsAsFactors), and library() inside functions.
Also adds custom_phases support to GenericLangOptions so generic plugins
can inject language-specific phases without converting to full plugins.
* fix(r): address PR review comments for code smell detectors
- Fix _strip_r_comments to properly preserve string literals by using
placeholder substitution before stripping comments
- Fix _detect_library_in_function to only track function-scoped braces
by finding function definitions and matching their brace pairs,
eliminating false positives from if/for/while blocks at top level
- Replace custom _find_r_files with framework's find_source_files to
respect project-configured exclusion patterns
- Add tests for the fixes: hash in strings, library in non-function
blocks, nested functions
* refactor(r): use tree-sitter for library_in_function detection
Replace manual brace tracking with tree-sitter AST parsing for more
accurate detection of library()/require() calls inside function bodies.
Includes fallback to regex-based detection when tree-sitter is unavailable.
Benefits:
- Properly handles nested functions, strings, and edge cases
- Uses existing R_SPEC tree-sitter configuration
- Deduplicates matches in nested function scenarios
* fix: handle generic fixers that return entries without 'removed' key
Generic fixers (e.g., eslint-warning) return FixResult entries with
{file, line} or {file, fixed} — no "removed" key. Four call sites in
the autofix pipeline assumed "removed" was always present, causing
KeyError for any generic fixer invocation.
Guard all access sites with .get("removed", []) and add regression tests
for the generic fixer result shape.
Cherry-picked from PR #484 by @AugusteBalas
Co-Authored-By: AugusteBalas <AugusteBalas@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: handle dataclass objects in json_default serializer
EcosystemFrameworkDetection dataclass instances (containing Path fields)
can leak into review_cache via shared dict references, causing TypeError
on state serialization. Add a dataclass handler to json_default that
converts via dataclasses.asdict(), letting json.dumps recurse naturally
and hit the existing Path handler for nested fields.
Bug identified by @0-CYBERDYNE-SYSTEMS-0 in PR #486
Reported-by: 0-CYBERDYNE-SYSTEMS-0 <0-CYBERDYNE-SYSTEMS-0@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: filter synthetic IDs from deferred skip counting
Synthetic queue IDs (workflow::*, triage::*) could end up in the plan's
skipped dict via migrate_deferred_to_skipped() or skip_items(), causing
phantom deferred-disposition loop items. Filter them using the existing
is_synthetic_id() pattern already used in 6+ other locations.
Cherry-picked from PR #485 by @ryexLLC (synthetic loop fix only;
dataclass serialization handled separately)
Co-Authored-By: ryexLLC <ryexLLC@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: prevent catastrophic backtracking in dart function extractor regex
The annotation sub-pattern had overlapping whitespace consumption between
the character class (includes \s) and trailing \s*, wrapped in ()*,
causing exponential backtracking on inputs with multiple @-prefixed
tokens. Possessive quantifiers (++ and *+) prevent the engine from
backtracking into already-matched portions. Also fixes a latent
correctness bug where annotated functions produced garbage names.
Cherry-picked from PR #477 by @AvoMandjian
Co-Authored-By: AvoMandjian <AvoMandjian@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: keep framework caches out of persisted review state
Framework detection writes dataclass objects (EcosystemFrameworkDetection,
NextjsFrameworkInfo) into review_cache, which shares a dict reference with
state["review_cache"]. This caused TypeError on JSON serialization.
The root fix: introduce a separate runtime_cache field for ephemeral
per-scan memoization that is never persisted. Framework caching in
detection.py and nextjs.py now uses runtime_cache. This cleanly separates
scan-scoped data from persisted review state.
Cherry-picked from PR #483 by @maciej-trebacz
Co-Authored-By: Maciej Trębacz <maciej-trebacz@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: auto-resolve stale issues when zone policy reclassifies files
When zone rules change (e.g. adding JS test patterns), files may be
reclassified to zones where certain detectors are skipped by policy.
Previously, existing open issues for those files would persist forever
since verify_disappeared only auto-resolved when the source file was
deleted. Now checks ZONE_POLICIES — if a file's zone says to skip the
detector, the issue is auto-resolved.
Also adds JS zone rules for .test./.spec./__tests__/__mocks__/ patterns.
Uses the existing should_skip_issue() from zones.py rather than
hardcoding detector names — works for all detector/zone combinations.
Bug identified by @claytona500 in PR #478; JS zone rules also from that PR
Reported-by: claytona500 <claytona500@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: rename session_token placeholder to avoid Snyk W007 false positive
The placeholder <session_token_from_template> in the review JSON example
triggers Snyk's credential-detection heuristic (W007). Renamed to
<session_hmac_from_template> which is more accurate (it's a per-session
HMAC, not a secret credential) and doesn't match scanner patterns.
Closes #473 (reported by @mark-major)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: use certifi CA bundle for update-skill SSL on macOS
Bare urllib.request.urlopen uses the system cert store, which on macOS
with Homebrew Python often has no CA certificates installed. Now uses
certifi's CA bundle if available, with a helpful error message suggesting
`pip install certifi` if SSL verification still fails.
Closes #468 (reported by @Vuk97)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: add Windows font fallbacks for scorecard rendering
The scorecard image generator only had macOS and Linux font paths,
so Windows users fell through to Pillow's load_default() bitmap font
which renders tiny and different-looking. Adds Consolas, Georgia,
Segoe UI, and Arial as Windows fallbacks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add README explaining work queue ordering and pre/post-triage modes
Documents how items get from scan to execution queue, why test_coverage
dominates pre-triage, that tier is display-only metadata, and the full
sort order. Prompted by user feedback that the tool appears obsessed
with test writing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add high-level process overview to README and fix work queue docs
Main README now has a "How it works" section explaining the
scan → score → review → triage → execute → rescan loop, and why
triage matters (pre-triage queue is sorted by raw impact which
can be noisy).
Work queue README corrected to include lifecycle phase gating
(PHASE_REVIEW_INITIAL gates objective items behind initial review).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: document exact filter chain for default queue items
Lists the 6 filters that determine what appears in `next` pre-triage:
open status, not suppressed, above confidence threshold, in scan scope,
mechanical_defect kind, not skipped. With file:line references.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: bias triage away from test-writing busy work
Multiple users report the tool is "obsessed with writing tests" instead
of cleaning up actual code issues. The root cause: triage LLMs promote
test_coverage clusters because the prompt shows them first (sorted by
issue count) with no guidance to defer.
Changes:
- Triage prompt now explicitly says: clean up code quality BEFORE test
coverage. Writing tests for sloppy code locks in the slop.
- Added "defer" action for auto-clusters (keeps in backlog for later)
- Example shows test_coverage as "defer" not "break_up"
- Scan coaching changed from "add tests" to "review gaps (fix code first)"
- Catalog guidance changed from "add tests for untested modules" to
"review coverage gaps — defer test writing until code quality resolved"
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: skip cmd /c wrapping for .exe binaries on Windows
On Windows, _resolve_executable() was unconditionally routing through
cmd /c, even for .exe binaries. When prompts contain spaces, the double
list2cmdline interaction (inner collapse by _wrap_cmd_c + outer by
subprocess.Popen) produces \" escapes that cmd.exe doesn't understand,
causing "unexpected argument" errors.
Now only uses cmd /c for .cmd/.bat shims and unresolved fallback.
.exe binaries are invoked directly.
Closes #487 (reported by @Dteyn)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add progression log — append-only lifecycle event timeline
Introduces `.desloppify/progression.jsonl`, an append-only JSONL log
recording lifecycle boundary events (scan completions, review imports,
triage completions, queue drains, phase transitions). Each line is a
self-contained JSON object with discriminated `event_type` + `payload`,
timestamps, scan_count, and phase_before/phase_after for full timeline
reconstruction.
Event types: scan_preflight, scan_complete, postflight_scan_completed,
subjective_review_completed, triage_complete, entered_planning_mode,
execution_drain.
Key design decisions:
- Events fire on idempotent marker flips, not inferred from reconcile
- Timestamps serve as join keys into state.json and plan execution_log
for full detail — the log is a timeline index, not a data copy
- All hooks are best-effort (try/except, never break parent command)
- Advisory file locking with 2s timeout, periodic trim at 2000 lines
- prev_last_scan captured before merge_scan() to correctly anchor
execution summaries to the previous scan boundary
Also includes queue policy, auto-cluster, and next-command improvements
that were pending on this branch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version to 0.9.11
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: update test paths after docs → dev reorganization
ci_plan.md and DEVELOPMENT_PHILOSOPHY.md moved to dev/ but two contract
tests still referenced docs/. Also adds release notes draft for v0.9.11.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version to 0.9.12
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: move docs/ and website/ into dev/
Internal documentation (DEVELOPMENT_PHILOSOPHY, QUEUE_LIFECYCLE, ci_plan)
and release infrastructure (checklist, template, examples) moved to dev/.
Website separated as its own repo. Old commit summaries removed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: track dev/review/ — review pipeline prompts, schema, and results
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add `desloppify setup` for universal global skill install
Bundles skill definitions via importlib.resources so `pip install desloppify &&
desloppify setup` installs Claude Code and Cursor skills globally (~/.claude/,
~/.cursor/) without network access. Also supports `--local` for project-level
AGENTS.md, extends skill discovery to detect global installs during scan, and
fixes pyproject.toml license field for PEP 621 compliance.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: trim overzealous scope from setup command
Remove --local mode, global skill discovery integration, and sync guard
test. The setup command now does one thing: copy bundled skills to
~/.claude/ and ~/.cursor/. Per-project installs stay with update-skill.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add explicit "run next after scan" instruction to skill doc
Agents were interpreting scan output themselves instead of running
`desloppify next`. Added a clear directive between scan and the rest
of the workflow. Also updated install strings to include `desloppify setup`.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: tighten skill description to reduce false activations
Removed loose keywords (code quality, naming issues, large files, etc.)
that triggered the skill on generic programming questions. Added explicit
negative guidance.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: force-rescan now injects stale subjective reviews into queue
force-rescan preserved plan_start_scores to protect manual clusters,
but this caused cycle_just_completed to be False, perpetually deferring
stale reviews behind objective backlog. Additionally, the workflow
supersession check skipped sync_subjective_dimensions entirely.
Fix:
- Thread force_rescan param through reconcile_plan to override
cycle_just_completed, bypassing deferral logic
- Disable workflow supersession bypass when force_rescan is active
- Add _refresh_plan_start_baseline() that reseeds scores and
scan_count_at_plan_start without clearing workflow sentinels
- 4 new tests covering stale injection, sentinel preservation,
baseline reseeding, and end-to-end with objective backlog
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: mark clusters done when all items are resolved
Clusters stayed execution_status="active" even when all their items
were fixed, causing completed work to reappear in the queue after
rescan.
- Add EXECUTION_STATUS_DONE="done" to cluster_semantics
- living_plan.py: set execution_status to "done" when cluster_done
is logged via plan resolve
- scan_issue_reconcile.py: add _reconcile_active_clusters_by_item_status()
sweep that marks active clusters done when all items are resolved
- Also fix _complete_empty_manual_clusters() which had the same bug
- 2 new tests for cluster completion reconciliation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: don't supersede resolved items from plan clusters
_supersede_dead_references conflated "not actionable" with "gone from
state." Items with status fixed/resolved/wontfix were being superseded
and stripped from clusters, causing completed clusters to appear
incomplete after rescan.
Root fix: only supersede items that don't exist in state at all
(issue is None). Resolved items stay in their clusters so
_reconcile_active_clusters_by_item_status can detect completion.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: force-rescan bypasses queue-empty guard for reconciliation
reconcile_plan() was guarded by live_planned_queue_empty() at two
levels, preventing stale subjective reviews from ever being injected
when ANY objective items (like test coverage gaps) remained in the
queue. force_rescan=True now bypasses both guards so stale dimensions
can be detected and injected regardless of objective backlog.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: strategist CEO role — trend validation, confirmation gate, strategic issues
The strategize triage step now functions as a strategic overseer:
1. **Trend validation**: score_trend and debt_trend are cross-checked
against computed data from score trajectory. Mismatches trigger a
warning and the reported value is overridden with the computed one.
2. **Confirmation gate**: strategize can now be explicitly confirmed
via --confirm strategize with 80+ char attestation (like other
stages). Auto-confirm preserved for backward compat but human
review is now possible.
3. **Strategic issues**: strategist can create high-priority work items
via strategic_issues output field. These become strategy:: prefixed
work items inserted at the front of queue_order. Downstream stages
reference them as strategic priorities.
strategy:: added to SYNTHETIC_PREFIXES so strategic issues don't
block reconciliation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: strategist saves state for strategic issues + detects cross-cycle regression
Two root causes found and fixed:
1. Strategic issues were created in memory but never persisted — triage
services had no save_state method. Now calls save_state_or_exit()
after creating work items.
2. Score trend said "improving" for a plateau because score_trajectory()
only saw the sliding window (+1.9 within window) without knowing the
score was 79.7 before a cycle reset. Now accepts cycle_start_score
from plan_start_scores/previous_plan_start_scores and downgrades
"improving" to "stable" when current score is still below the cycle
baseline.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: bulletproof strategist — all-time high, recovering trend, save_state
1. ScoreTrajectory now tracks all_time_high from full scan_history,
not just the 5-scan window. Trend overridden to "recovering" when
current score is >2pts below all-time high despite positive window delta.
2. "recovering" added as 4th trend value (improving/stable/declining/recovering).
Accepted by _parse_briefing validation and documented in strategist prompt.
3. save_state added to TriageServices as first-class method. Strategize
uses it to persist strategic work items. Fallback to direct import
for backward compat.
4. _seed_plan_start_scores and _refresh_plan_start_baseline now preserve
current plan_start_scores as previous_plan_start_scores before
overwriting (only when previous is empty).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: force UTF-8 encoding in review runner log and payload reads
Adds explicit encoding="utf-8", errors="replace" to all file reads in
the review runner pipeline (runner_failures.py, runner_parallel/__init__.py).
Prevents charmap decode errors on Windows where Python defaults to the
platform encoding but Codex runners emit UTF-8.
Subprocess calls in attempts.py left unchanged — they go through the
deps injection seam and the runner process is already UTF-8.
Cherry-picked from PR #495 by @pietrondo (file-read changes only)
Co-Authored-By: pietrondo <pietrondo@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: support src-layout Python projects in test_coverage detector
resolve_import_spec now tries src/-prefixed candidates when direct match
fails, and _build_prod_by_module strips the src/ prefix from relative
paths before computing module names. Both changes are needed so that
src-layout projects (PEP 621) correctly map tests to production files.
Adjustments: moved _SRC_PREFIXES to module-level constant (was function-local)
Cherry-picked from PR #489 by @AreboursTLS
Co-Authored-By: AreboursTLS <AreboursTLS@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: prevent Knip hang when not installed as project dependency
Add stdin=subprocess.DEVNULL to the subprocess.run call in knip_adapter.py
to prevent npx from blocking on interactive prompts. Add --yes flag to
npx args as belt-and-suspenders. Add pre-check for knip in node_modules
to fail fast when knip is not a local dependency.
Updated existing tests to create node_modules/.bin/knip marker files
where needed.
Closes #494 (reported by @goobsnake)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: stale subjective reviews always sync regardless of queue state
The live_planned_queue_empty guard was blocking ALL reconciliation
(including stale review injection) when mechanical items remained
in the queue. Stale reviews should coexist with mechanical items,
not be blocked by them.
Move the guard AFTER subjective sync so stale reviews are always
detected and injected. Auto-clustering and workflow sync remain
gated by queue emptiness.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: stale subjective reviews always sync regardless of queue state
Move the live_planned_queue_empty guard AFTER subjective sync so stale
reviews are always detected and injected. Auto-clustering and workflow
sync remain gated by queue emptiness. Remove will_inject_workflow gate
and cycle_just_completed coupling from subjective sync path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: phase ordering — stale subjective reviews take priority over queued workflows/triage
Pre-review workflow IDs (deferred disposition, run scan, import scores) still
jump ahead of everything, but non-critical workflows (communicate score) and
triage items now yield to stale subjective reviews that need refresh.
Adds PRE_REVIEW_WORKFLOW_IDS constant and 3 tests covering the priority rules.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: review pipeline results for PRs #495, #493, #489, #189 and issues #494-#490
Stage 1 assessments, Stage 2 challenges/advocacy, and Stage 3 execution
for the current batch of open PRs and issues. Backfilled Stage 2 files
for older items that only had Stage 1 results.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: disable subjective anti-gaming integrity policy
Blind-packet subagent reviews cannot anchor to the target score,
making false positives (legitimate score convergence) more likely
than actual gaming. The policy was zeroing 4 dimensions that
independently scored 85.0 — a 21-point strict score drop.
- _apply_subjective_integrity_policy now returns assessments unchanged
- Removed target_match_reset enforcement from scoring engine
- Updated 7 tests to expect status="disabled"
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: always supplement graph imports with source parsing for test coverage
The import graph often resolves Python submodule imports (e.g.,
from megaplan.evaluation import X) to the package __init__.py
rather than the actual submodule file. This caused false
"transitive_only" reports for modules with dedicated test files.
Previously, source parsing was only used as a fallback when the
graph had no entries. Now it always runs as a supplement,
catching submodule imports the graph missed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: Python import regex handles parenthesized multi-line imports
PY_IMPORT_RE couldn't match `from megaplan.evaluation import (
build_evaluation, ...)` because \(\w+\) expected a word immediately
after 'import', not an opening paren. Added \(?\s* to optionally
match the paren and whitespace before the first imported name.
This was the root cause of false "transitive_only" test coverage
reports for Python modules with dedicated test files that use
multi-line import syntax.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version to 0.9.13
* fix: add lower-bound guard in fix_debug_logs to prevent negative-index file corruption
When entry["line"] is 0, start becomes -1, which passes the upper-bound
guard and causes Python's negative indexing to silently operate on lines
at the end of the file. The other three fixers already have this guard.
Cherry-picked from PR #499 by @FloodExLLC
Co-Authored-By: FloodExLLC <FloodExLLC@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: scope .gitignore CLAUDE.md/AGENTS.md rules to root-level only
The blanket `CLAUDE.md` rule on line 43 matched any file named CLAUDE.md
anywhere in the repo, preventing desloppify/data/global/CLAUDE.md from
being tracked. This caused 4 CI failures because the bundled package
data file was absent in fresh clones while existing in local working
trees (where tests passed).
Change both `CLAUDE.md` and `AGENTS.md` to `/CLAUDE.md` and `/AGENTS.md`
so they only match at the repo root.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: bundle all 11 skill overlays with sync guard and pre-commit hook
Previously data/global/ only had 7 of 11 overlays (AMP, CLAUDE, COPILOT,
DROID, OPENCODE were missing). Add the missing files copied from docs/.
Add defense-in-depth to prevent drift between docs/ and data/global/:
1. test_bundled_sync.py — pytest guard that fails if files diverge (CI)
2. .githooks/pre-commit — auto-syncs data/global/ when docs/*.md staged
3. make sync-docs — convenience target for manual sync
4. make install-hooks — installs the pre-commit hook, wired into
install-ci-tools and install-full-tools for automatic setup
5. make package-smoke — extended to verify wheel includes all bundled docs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: make global install the primary skill delivery mechanism
Expand desloppify setup from 2 targets (claude, cursor) to 5 verified
tools with global paths confirmed against official documentation:
- Claude Code: ~/.claude/skills/ (code.claude.com/docs/en/skills)
- Codex CLI: ~/.codex/AGENTS.md (developers.openai.com/codex/guides/agents-md)
- Gemini CLI: ~/.gemini/skills/ (geminicli.com/docs/cli/skills/)
- AMP: ~/.config/agents/skills/ (ampcode.com/news/agent-skills)
- OpenCode: ~/.config/opencode/skills/ (opencode.ai/docs/skills/)
Cursor removed from global targets — its global rules are UI-only,
not filesystem-based (cursor.com/docs/rules).
Key changes:
- GLOBAL_TARGETS is the single source of truth in skill_docs.py
(setup/cmd.py imports it, no duplication)
- Add skip-if-current: don't rewrite files already at current version
- Add shared-file handling: codex AGENTS.md uses section-replace
- Add global staleness detection: find_stale_global_installs(),
find_any_global_install(), updated check_skill_version()
- Kill silent auto-update in agent_context.py — warn-only now,
following pre-commit/husky best practice
- Staleness warnings recommend "desloppify setup" for global,
"desloppify update-skill" for per-project
- Fix codex hint in runner_failures.py to reference AGENTS.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: false positives in orphaned, hardcoded_secret_name, and cycles detectors
- orphaned: recognize __all__ exports as public API (skip from detection)
- hardcoded_secret_name: add entropy heuristic to filter field names, sentinels, label prefixes
- cycles: mark TYPE_CHECKING-guarded imports as deferred (excluded from cycle detection)
- assessments: isinstance guard before .get() on potentially corrupted state values
Closes #496, closes #465
Reported-by: Git-on-my-level <Git-on-my-level@users.noreply.github.com>
Reported-by: Vuk97 <Vuk97@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: review pipeline — bias to action, ask maintainer when unsure
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version to 0.9.14
* fix: skip tree-sitter spec tests when grammar not available in CI
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: skip tree-sitter spec tests when grammar not available in CI
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: simplify lifecycle phases to plan/execute modes
Collapse 8 fine-grained persisted phases to 2 modes (plan/execute).
Pipeline stage derived from queue contents via derive_pipeline_stage().
Display phase mapped via stage_to_display_phase() for consumer compat.
Key changes:
- sync_subjective_dimensions moved to boundary-only (fixes stuck-phase bug)
- _raw_persisted_phase uses current_lifecycle_phase for migration
- clear_postflight_scan_completion no longer forces execute mode
- Cluster filter exempts plan-mode items (subjective, workflow, triage)
- Scan preflight respects live_planned_queue_empty + snapshot
- Migration: old phases map to plan/execute, stale subjective items pruned
Closes the assessment_postflight deadlock where stale subjective items
kept being re-injected mid-cycle, preventing transition to execute.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: remove legacy phase inference, unify snapshot derivation
- current_lifecycle_phase() never returns None (always plan/execute)
- _legacy_phase_inference, _ordered_postflight_phase, _raw_persisted_phase deleted
- _DISPLAY_PHASE_ITEM_MAP, PHASE_* snapshot aliases deleted
- snapshot.py: 738 → 614 lines (-124)
- Single derivation path via _derive_display_phase
- Scan preflight respects live_planned_queue_empty
- Cluster filter exempts plan-mode items
- Migration prunes stale subjective items from old plans
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: hide lifecycle jargon from users, auto-resolve communicate-score
- user_facing_mode() maps internal display phases to "plan"/"execute"
- Workflow items show friendly labels: (Ready to scan), (Create plan), etc
- communicate-score auto-resolves during reconcile (no manual queue item)
- explain_queue() shows "Mode: plan/execute" instead of raw phase names
- Sentinel preserved during auto-resolve to prevent re-firing
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: enforce single-writer lifecycle, eliminate side-channels
- set_lifecycle_phase is now private, called only from reconcile_plan()
- resolve_workflow no longer side-channels triage injection — uses
plan-state marker (workflow_plan_just_resolved) read by reconcile
- invalidate_postflight_scan only clears scan marker, never writes mode
- All 5 invalidation sites use uniform Pattern A (invalidate + reconcile)
- living_plan.py collapsed to single reconcile decision path
- Triage kickoff helpers moved from app/ to engine/_plan/triage/lifecycle.py
- AST-backed enforcement test: no production code writes lifecycle_phase
outside _set_lifecycle_phase + one-shot migration
- Derivation equivalence test: pipeline and snapshot agree on display phase
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add plan_checkpoint as canonical score checkpoint with sparkline
Add plan_checkpoint progression event — the single canonical score
snapshot in progression.jsonl. Fires when communicate-score auto-resolves
after subjective reviews are cleared (review path) or when no subjective
items exist (scan path). Remove redundant scores from scan_complete,
review_complete, triage_complete, and execution_drain events.
Key design: gate sync_communicate_score_needed with
defer_if_subjective_queued so scan path defers when subjective items
remain, routing checkpoints through the clean review-import flow.
Also adds:
- Delta fields (resolved_since_last, skipped_since_last, execution_summary)
with last_plan_checkpoint_timestamp() helper for windowing
- Smoothed sparkline on terminal status scorecard (≥3 checkpoints)
- Snapshot rebaseline fields on ReconcileResult to avoid post-reconcile
clearing race, save-success gating on scan path
- Remove source_command duplication from checkpoint payload (envelope only)
- Clean up dead prev_scores plumbing from ScanRuntime
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: stale focus counts across status/next/scan commands
Three rendering sites showed raw cluster issue_ids count (including
resolved items) instead of filtering to items still in the work queue.
Also guard set_focus against focusing completed clusters.
Closes #503 (reported by @NovaRagnarok)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: check all graph keys for normalization, not just first 3
The sampling heuristic ([:3]) could silently skip normalization if the
first few keys happened to be relative while others were absolute.
Relates to #502
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: consolidate lifecycle phase derivation and fix force-rescan review marker
Three related changes to the plan/execute lifecycle machinery:
1. Fix: force-rescan no longer resets subjective review completion. Added
carry_forward_subjective_review() that promotes the marker when the old
review matches the cycle being replaced.
2. Refactor: consolidate phase derivation into shared derive_display_phase()
pure function. Both pipeline and snapshot now delegate to the same
boolean-signal priority chain. Migration moved to load_plan() time,
current_lifecycle_phase() is now a pure reader. Marker invariants
documented in module docstring.
3. Cleanup: remove snapshot _phase_for_snapshot bypasses. Mode-aware signal
shaping (suppress_postflight_signals, prefer_scan) now lives in the
caller, _derive_display_phase is a thin items→bools mapper.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: add explicit UTF-8 encoding to external tool report readers
Closes #505 (reported by @pietrondo)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add v0.9.14 release notes draft
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version to 0.9.15
* fix: cross-extension test-to-source mapping for TypeScript (.test.ts → .tsx)
OverlayEditor.test.ts could not find OverlayEditor.tsx because
map_test_to_source only tried the test file's own extension after
stripping the .test. marker. Now tries all TS/JS extensions
(.ts, .tsx, .js, .jsx) for each candidate.
Also fixes a variable shadowing bug where _TS_EXTENSIONS (used by
resolve_import_spec for /index.…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
In Python projects using
src/layout (where production code lives undersrc/package_name/), the test_coverage detector fails to link test files to source files.resolve_import_specconverts import specs likemypackage.footomypackage/foo.py, but production files are stored assrc/mypackage/foo.py. The candidates never match._build_prod_by_modulecreates module aliases with asrc.prefix (e.g.src.mypackage.foo) that import specs from tests never include.This causes all modules in src-layout projects to be flagged as
untested_criticaleven when comprehensive test suites exist that import them.Fix
resolve_import_spec(languages/python/test_coverage.py): After checking direct candidates againstproduction_files, also trysrc/-prefixed candidates._build_prod_by_module(engine/detectors/coverage/mapping_analysis.py): Stripsrc/prefix from relative paths before computing module names, so the index mapsargos_toolkit.fooinstead ofsrc.argos_toolkit.foo.Testing