Skip to content

fix(pyats): refactor pyats config file creation (incl disable EnvironmentDebugPlugin)#697

Open
oboehmer wants to merge 17 commits intomainfrom
fix/689-570-disable-env-debug-plugin
Open

fix(pyats): refactor pyats config file creation (incl disable EnvironmentDebugPlugin)#697
oboehmer wants to merge 17 commits intomainfrom
fix/689-570-disable-env-debug-plugin

Conversation

@oboehmer
Copy link
Copy Markdown
Collaborator

@oboehmer oboehmer commented Mar 22, 2026

Description

Security fix: PyATS EnvironmentDebugPlugin was writing environment variables (including passwords) to env.txt files within archive artifacts. This change disables the plugin and refactors config file management for cleaner architecture.
Also refactor pyats config file creation and remove dead code

Closes

Related Issue(s)

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactoring / Technical debt (internal improvements with no user-facing changes)
  • Documentation update
  • Chore (build process, CI, tooling, dependencies)
  • Other (please describe):

Test Framework Affected

  • PyATS
  • Robot Framework
  • Both
  • N/A (not test-framework specific)

Network as Code (NaC) Architecture Affected

  • ACI (APIC)
  • NDO (Nexus Dashboard Orchestrator)
  • NDFC / VXLAN-EVPN (Nexus Dashboard Fabric Controller)
  • Catalyst SD-WAN (SDWAN Manager / vManage)
  • Catalyst Center (DNA Center)
  • ISE (Identity Services Engine)
  • FMC (Firepower Management Center)
  • Meraki (Cloud-managed)
  • NX-OS (Nexus Direct-to-Device)
  • IOS-XE (Direct-to-Device)
  • IOS-XR (Direct-to-Device)
  • Hyperfabric
  • All architectures
  • N/A (architecture-agnostic)

Platform Tested

  • macOS (version tested: macOS with Python 3.12)
  • Linux (distro/version tested: )

Key Changes

Security (#689)

  • Add EnvironmentDebugPlugin: enabled: False to plugin configuration
  • Add E2E regression test scanning ALL output artifacts for credential sentinels
  • Test covers both zip contents and all non-zip files

Architecture Refactoring (#570)

  • Move config file creation from execute_job() into SubprocessRunner.__init__()
  • Add explicit cleanup() method to SubprocessRunner for config file removal
  • Add __del__ for opportunistic cleanup on unexpected exits (best-effort; see Remove merged data model artifact after test execution to prevent potential credential exposure #677)
  • Extract _cleanup_subprocess_runner() helper in orchestrator to DRY up two identical call sites
  • Orchestrator calls cleanup() conditionally based on keep_artifacts policy
  • Config file attributes only set after successful file creation (safer error handling)
  • Extract config filenames as constants in constants.py
  • Write config files to output_dir with well-known names instead of temp files

Dead Code Removal

  • Remove unused _build_reporter_config() and _generate_plugin_config() from orchestrator
  • Remove unnecessary tempfile.TemporaryDirectory wrapper in _run_tests_async
  • Remove unused plugin_config_path parameter

Testing Done

  • Unit tests added/updated (26 subprocess_runner unit tests)
  • Integration tests performed (16 subprocess_runner integration tests)
  • Manual testing performed:
    • PyATS tests executed successfully
    • Robot Framework tests executed successfully
    • D2D/SSH tests executed successfully (if applicable)
    • HTML reports generated correctly
  • All existing tests pass (pytest / pre-commit run -a)

Test Commands Used

# Unit tests (26 subprocess_runner tests, 219 pyats_core tests total)
pytest tests/unit/pyats_core/execution/test_subprocess_runner.py -v
pytest tests/unit/pyats_core/ tests/pyats_core/ -v

# E2E credential exposure tests
pytest tests/e2e/test_e2e_scenarios.py -v -k "credential" -n auto --dist loadscope

# Pre-commit hooks
pre-commit run --all-files

Checklist

  • Code follows project style guidelines (pre-commit run -a passes)
  • Self-review of code completed
  • Code is commented where necessary (especially complex logic)
  • Documentation updated (if applicable)
  • No new warnings introduced
  • Changes work on both macOS and Linux
  • CHANGELOG.md updated (if applicable)

Screenshots (if applicable)

N/A

Additional Notes

Design decisions:

@oboehmer oboehmer force-pushed the fix/689-570-disable-env-debug-plugin branch from 6a826c2 to 92ca3a8 Compare March 22, 2026 17:36
…andling (#689, #570)

Security fix: PyATS EnvironmentDebugPlugin was writing environment variables
(including passwords) to env.txt files within archive artifacts. This change
disables the plugin and refactors config file management for cleaner architecture.

**Security (#689)**
- Add `EnvironmentDebugPlugin: enabled: False` to plugin configuration
- Add E2E regression test scanning ALL output artifacts for credential sentinels
- Test covers both zip contents and all non-zip files

**Architecture Refactoring (#570)**
- Move config file creation from `execute_job()` into `SubprocessRunner.__init__()`
- Add `cleanup()` method to SubprocessRunner for config file removal
- Orchestrator calls `cleanup()` conditionally based on keep_artifacts policy
- Config file attributes only set after successful file creation (safer error handling)
- Extract config filenames as constants in `constants.py`
- Write config files to output_dir with well-known names instead of temp files

**Dead Code Removal**
- Remove unused `_build_reporter_config()` and `_generate_plugin_config()` from orchestrator
- Remove unnecessary `tempfile.TemporaryDirectory` wrapper in `_run_tests_async`
- Remove unused `plugin_config_path` parameter

Closes #689, #570

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
@oboehmer oboehmer force-pushed the fix/689-570-disable-env-debug-plugin branch from 92ca3a8 to 9214b78 Compare March 22, 2026 17:42
… credential check

- Remove 3 moot unit tests that checked config strings (fragile, not behavioral)
- Remove duplicate integration test for config creation failure
- Move credential sentinel test imports to module level
- Run credential check for all scenarios, not just PyATS
- Improve error message for PyATS config creation failure
…chestrator

Verify that when SubprocessRunner fails to create config files (e.g., disk full),
PyATSOrchestrator returns PyATSResults with from_error() for discovered test types.
…ix cleanup leak

- Delete ConfigFileCreationError; _create_config_files now raises
  RuntimeError, consistent with the pyats-not-found failure in __init__
- Catch RuntimeError in orchestrator so both init failures populate
  api/d2d results correctly instead of falling through to the broad
  except-Exception handler
- Call subprocess_runner.cleanup() in the report-generation failure
  branch so config files are removed regardless of outcome
- Assert EnvironmentDebugPlugin disabled in plugin config integration
  test using regex; convert all config-content checks to re.search
- Remove leftover try/finally unlink() blocks in TestConfigFileContent
  (files live in tmp_path, cleaned up by pytest)
…asserts

Review feedback from PR #697:

- Fix bug: if first config file writes successfully but second fails,
  clean up the first file to avoid orphaned files on disk
- Replace assert statements with RuntimeError in _build_command() since
  asserts are stripped in optimized mode (-O)
- Add clarifying comments explaining why cleanup isn't needed on
  SubprocessRunner init failure (attributes only set after success)
- Add docstring note about planned nac-test-wide cleanup mechanism
- Add test for partial write failure cleanup behavior
- Add SubprocessRunner.__del__ for opportunistic cleanup on unexpected exits
  (best-effort: CPython GC only, not guaranteed on SIGKILL; ref #677)
- Mark unreachable None-guard with # pragma: no cover and explanatory comment
- Update cleanup() docstring to reflect dual call sites (__del__ + orchestrator)
- Extract _cleanup_subprocess_runner() helper in orchestrator to DRY up two
  identical call sites; documents that __del__ covers unexpected exit paths
- Cast keep_artifacts to bool() to avoid bool | str | None union type
- Remove stale comment in except RuntimeError block (made obsolete by __del__)
- Strengthen test_cleanup_is_idempotent with file-existence assertions
- Add test_del_calls_cleanup to verify __del__ removes config files
@oboehmer oboehmer marked this pull request as ready for review March 24, 2026 20:55
@oboehmer oboehmer changed the title fix(pyats): disable EnvironmentDebugPlugin to prevent credential exposure fix(pyats): refactor pyats config file creation - disable EnvironmentDebugPlugin to prevent credential exposure Mar 24, 2026
@oboehmer oboehmer changed the title fix(pyats): refactor pyats config file creation - disable EnvironmentDebugPlugin to prevent credential exposure fix(pyats): refactor pyats config file creation (incl disable EnvironmentDebugPlugin) Mar 24, 2026
@oboehmer oboehmer requested a review from aitestino March 24, 2026 20:59
@oboehmer oboehmer self-assigned this Mar 25, 2026
oboehmer added a commit that referenced this pull request Mar 28, 2026
CleanupManager.register() gains a skip_if_debug parameter. Files
registered with skip_if_debug=True are deleted on normal exit but
retained when NAC_TEST_DEBUG is set, letting developers inspect
intermediate artifacts without any extra flags or code changes.

The merged data model YAML is the first consumer: it now registers with
skip_if_debug=True so it survives for inspection in debug runs. The
same mechanism is available for job scripts, testbed YAMLs, and any
other artifacts tracked in PR #697.

Internally, _files changes from set[Path] to dict[Path, bool], keeping
the flag co-located with the path and avoiding two collections to sync.

Remove the exit_with_cleanup() closure from main() — typer.Exit is
caught by Click which calls sys.exit(), reliably triggering atexit
handlers on every exit path. The closure and all cleanup_now() pre-exit
calls were redundant with the already-registered atexit handler.
Copy link
Copy Markdown
Collaborator

@aitestino aitestino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @oboehmer, thank you for the PR -- this is a well-motivated change that addresses a real security gap (EnvironmentDebugPlugin leaking credentials into archive artifacts) and takes the opportunity to clean up genuine architectural debt along the way. Moving config file ownership into SubprocessRunner so it owns both creation and cleanup is the right SRP boundary, the DRY win from eliminating those two copy-pasted config blocks in execute_job and execute_job_with_testbed is substantial (especially since they had already diverged -- one was missing the security fix), and the dead code removal of _build_reporter_config and _generate_plugin_config is overdue -- those methods were not only unused but broken (the dump_to_stream import references a module that doesn't exist). The E2E credential sentinel test is a strong regression guard, and the partial-write cleanup logic in _create_config_files is thoughtfully implemented.

I ran this through a pretty thorough review, and I think we need some adjustments before merge. The core design is sound and the security fix is correct (I traced through the PyATS source to confirm recursive_update properly merges the enabled: False override) -- these are all refinement items.

Things that need adjustment:

  1. Cleanup gap on "no archives found" early return -- In _generate_html_reports_async, the early return when archive_paths is empty skips _cleanup_subprocess_runner() entirely. The keep_artifacts variable is computed just above that block, so the fix is one line: self._cleanup_subprocess_runner(keep_artifacts) before the return. Alternatively, the two existing _cleanup_subprocess_runner call sites on the success and failure branches could be collapsed into a single try/finally -- that would also prevent this class of bug if a third return path is ever added.

  2. PLUGIN_CONFIG and PYATS_CONFIG should live in constants.py -- We try to centralize constants in constants.py -- the filename constants (PYATS_PLUGIN_CONFIG_FILENAME, PYATS_CONFIG_FILENAME) are correctly placed there, but the content constants ended up in subprocess_runner.py. Moving them to constants.py as PYATS_PLUGIN_CONFIG_CONTENT: str and PYATS_CONFIG_CONTENT: str (with type annotations, matching the other constants in that file) would keep everything in one place. As a bonus, anyone auditing the plugin configuration finds filenames and their contents together.

  3. New constants missing from __all__ in constants.py -- PYATS_PLUGIN_CONFIG_FILENAME and PYATS_CONFIG_FILENAME aren't added to the __all__ list that starts around line 113. Quick one-liner fix.

  4. Path.write_text patches are too broad in both test files -- patch.object(Path, "write_text", ...) intercepts every Path.write_text call in the process, not just the one in _create_config_files. For test_orchestrator_config_error.py, patching at SubprocessRunner._create_config_files with side_effect=RuntimeError(...) is more precise and decouples from the OSError-to-RuntimeError translation already covered in unit tests. For the unit tests, scoping to "nac_test.pyats_core.execution.subprocess_runner.Path.write_text" would be the right target.

  5. test_del_calls_cleanup tests the wrong contract -- It calls runner.__del__() and asserts files are removed, but test_cleanup_removes_config_files already covers that exact behavior. The actual __del__ contract -- the thing that distinguishes it from calling cleanup() directly -- is that it swallows exceptions silently. Replacing with a test that patches cleanup() to raise and verifies __del__() does not propagate would test the real behavior.

  6. NAC_TEST_PYATS_KEEP_REPORT_DATA env var string is scattered across 3 source files -- orchestrator.py, multi_archive_generator.py, and generator.py all evaluate os.environ.get("NAC_TEST_PYATS_KEEP_REPORT_DATA") independently. The existing DEBUG_MODE = get_bool_env("NAC_TEST_DEBUG") pattern shows exactly how to consolidate this: KEEP_REPORT_DATA: bool = get_bool_env("NAC_TEST_PYATS_KEEP_REPORT_DATA") in constants.py. The PR's keep_artifacts local variable is a step in the right direction -- this would complete it. Pre-existing debt, but the PR adds a new evaluation path so now is a natural moment.

  7. object.__new__(SubprocessRunner) in two unit tests creates fragile coupling -- test_write_failure_leaves_attributes_none_and_cleanup_is_safe and test_partial_write_failure_cleans_up_first_file bypass __init__ to test _create_config_files in isolation. This couples to internal attribute names and creates partially-initialized objects that can't exist in production. For the partial-write test, the value is anchored in the filesystem assertion so it's tolerable. For the attribute-state test, consider either mocking the pyats executable check or folding the assertion into the existing test_init_raises_runtime_error_on_write_failure.

  8. Missing _cleanup_subprocess_runner policy tests -- The helper has three behaviors worth covering: (a) no-op when subprocess_runner is None, (b) skips cleanup when keep_artifacts=True, (c) delegates to cleanup() when both conditions are met. Three quick tests, each about 5 lines.

  9. Test file placement deepens Issue #541 debt -- test_orchestrator_config_error.py is placed in tests/pyats_core/, but that directory's own conftest.py documents the intent to merge tests/pyats_core/ into tests/unit/. Placing new files there moves in the wrong direction. Worth considering tests/unit/pyats_core/ instead, or at minimum acknowledging this as #541 scope.

  10. Security advisory: consider disabling ArchiveConfigPlugin in a follow-up -- The ArchiveConfigPlugin (also in PyATS's debug.py) remains enabled and writes pyats.configuration.yaml and easypy.configuration.yaml to archives. For nac-test's current usage the risk is limited since controller credentials flow via environment variables, but for D2D tests with testbed files, device credentials could potentially be reflected. Adding ArchiveConfigPlugin: enabled: False to the plugin config would provide defense-in-depth. Not blocking -- the E2E sentinel test would catch it if credentials leak through this path.

I'd consider items 1-5 as things to address before merge -- they're all quick fixes (the cleanup gap is the most important one). Items 6-10 are lower priority and could be follow-ups if you'd rather not hold this up further.

What do you think?

P.S. — This comment was drafted using voice-to-text via Claude Code. If the tone comes across as overly direct or terse, please know that's just how it tends to phrase things. No offense or criticism is intended — this is purely an objective technical review of the PR. Thanks for understanding! 🙂

oboehmer added a commit that referenced this pull request Apr 15, 2026
… and eliminate redundant re-read (#677 #704) (#727)

* fix(pyats): prevent credential exposure in PyATS archives (#689)

Disable EnvironmentDebugPlugin which was writing environment variables
(including passwords) to env.txt files in PyATS result archives.

Add E2E regression test that scans all output artifacts (including ZIP
contents) for a sentinel password value to catch future credential leaks.

* fix(security): auto-cleanup merged_data_model_test_variables.yaml (#677)

Automatically delete merged_data_model_test_variables.yaml after test
execution to prevent potential credential exposure in shared environments.

Changes:
- Add CleanupManager singleton for registering files to delete on exit
- Register merged data file for cleanup immediately after creation
- Remove duplicate data merging in RobotWriter (CLI already merges)
- Simplify RobotWriter to accept merged_data_path instead of data_paths
- Handle cleanup on normal exit (atexit) and SIGTERM signals

The merged data file contains sensitive test variables that should not
persist after test execution completes.

Closes #677

* refactor: pass merged_data dict through Robot path, simplify DataMerger

Eliminates the redundant YAML re-read in RobotWriter (fixes #704).
Previously, main.py merged and wrote the data model to disk, then
RobotWriter re-read and re-parsed it — costing ~51 s and ~400 MB peak
RAM on large data models. The already-loaded dict is now passed through
CombinedOrchestrator → RobotOrchestrator → RobotWriter directly.

DataMerger is simplified in line with the removal of --merged-data-filename
in #677: the filename parameter is removed from write_merged_data_model()
(hardcoded to DEFAULT_MERGED_DATA_FILENAME), and load_yaml_file() is
removed entirely as it had no remaining callers.

RobotWriter._convert_data_model_for_templates() is deleted: the JSON
round-trip was intended to convert CommentedMap to plain dicts for
Jinja2, but CommentedMap is a dict subclass and is fully compatible
natively. The redundant self.template_data alias is collapsed into
self.data.

Tests:
- Add tests/unit/test_data_merger.py covering merge content correctness
  and write/roundtrip behaviour, replacing coverage lost when the
  commented-out integration tests were cleaned up
- Add tests/unit/robot/test_robot_writer.py covering constructor
  semantics and render_template context selection
- Move data_merge fixtures from tests/integration/fixtures/ to
  tests/unit/fixtures/ where they belong

* fix: correct signal handler restoration for SIG_DFL (falsy value 0)

signal.SIG_DFL has the integer value 0, which is falsy. The previous
`original if original else signal.SIG_DFL` expression would silently
treat a legitimate SIG_DFL original disposition as "not set" and still
return SIG_DFL — technically a no-op here, but masking the real issue:
any None (unrecorded handler) and SIG_DFL were indistinguishable.
Switch to `original if original is not None else signal.SIG_DFL` so the
intent is explicit and the two cases are correctly distinguished.

Add parametrized tests covering SIG_DFL (falsy), SIG_IGN (truthy), and
None (fallback) to pin the correct behaviour.

* refactor: centralise merged data file path via DataMerger

Consolidate all knowledge of where the merged data model file lives into
DataMerger, eliminating independent path reconstruction across callers.

- Rename DEFAULT_MERGED_DATA_FILENAME → MERGED_DATA_FILENAME; the
  DEFAULT_ prefix implied variability that no longer exists since the
  --merged-data-filename CLI option was removed
- Add MERGED_DATA_FILE_MODE = 0o600 constant (owner read/write only)
- Add DataMerger.merged_data_path() as the single query point for the
  file location; callers no longer construct output_dir / MERGED_DATA_FILENAME
- write_merged_data_model() now returns the Path it wrote, owns the
  chmod call, and takes no filename argument — "write this sensitive
  file securely" is a single responsibility
- Move merged_data_path injection into DeviceExecutor constructor so the
  orchestrator (which already holds the path) passes it down rather than
  DeviceExecutor re-deriving it from base_output_dir; removes the
  DataMerger import from device_executor.py entirely

* refactor: remove exit_with_cleanup closure from main()

typer.Exit is caught by Click internally, which calls sys.exit() —
this triggers atexit handlers reliably on every exit path. The
exit_with_cleanup closure (which called cleanup_now() before raising
typer.Exit) was therefore redundant with the atexit handler already
registered by CleanupManager.

Replace all exit_with_cleanup() call sites with plain raise typer.Exit(),
remove the closure, and drop the now-unused NoReturn import.

* feat: preserve debug-relevant files when NAC_TEST_DEBUG is set

CleanupManager.register() gains a skip_if_debug parameter. Files
registered with skip_if_debug=True are deleted on normal exit but
retained when NAC_TEST_DEBUG is set, letting developers inspect
intermediate artifacts without any extra flags or code changes.

The merged data model YAML is the first consumer: it now registers with
skip_if_debug=True so it survives for inspection in debug runs. The
same mechanism is available for job scripts, testbed YAMLs, and any
other artifacts tracked in PR #697.

Internally, _files changes from set[Path] to dict[Path, bool], keeping
the flag co-located with the path and avoiding two collections to sync.

Remove the exit_with_cleanup() closure from main() — typer.Exit is
caught by Click which calls sys.exit(), reliably triggering atexit
handlers on every exit path. The closure and all cleanup_now() pre-exit
calls were redundant with the already-registered atexit handler.

* test: assert merged data YAML is absent after a successful run

* restore comments in main

* refactor: place merged_data before optional params in orchestrator signatures

* fix: log full exception traceback to file while keeping terminal output clean (#573)

Add logger.exception() before the typer.echo() in the top-level exception
handler so the full stack trace (including chained causes) is captured in
the log file. Terminal output remains a single clean line for end users.

* refactor: resolve merged_data_path to absolute once at construction

Call .absolute() when assigning self.merged_data_path in PyATSOrchestrator
so all use sites get a consistent absolute path without redundant calls.
Move the explanatory comment to the assignment and rephrase it to not imply
a cwd change is always in effect.

* register DeviceExecutor temp files with CleanupManager

The per-device job (.py) and testbed (.yaml) files created with
NamedTemporaryFile(delete=False) were never deleted, accumulating in the
OS temp directory across runs. Register them with CleanupManager so they
are removed on normal exit, SIGTERM, and SIGINT.

skip_if_debug=True is used consistently with merged_data_model_test_variables.yaml
so that NAC_TEST_DEBUG retains all intermediate artifacts for debugging.

Also removes the longstanding "UNCOMMENT ME" commented-out deletion block
that this replaces.

* extract _setup_run_tests_mocks helper in test_orchestrator

The mock-wiring and artifact-creation block (pabot return value,
RobotReportGenerator stub, robot_results/ stub files) was copy-pasted
verbatim across test_verbose_flag_passed_to_pabot and
test_include_exclude_tags_passed_to_pabot. Extract it into a
_setup_run_tests_mocks static helper so future tests can reuse it
without duplicating the setup.

* tests: remove duplicate test_merged_data_path_matches_written_file

test_returns_path_to_written_file already asserts the same equality
(with operands swapped) and additionally checks that the file exists,
making the duplicate redundant.

* tests: add integration test for CleanupManager atexit via typer.Exit(0)

Parametrized subprocess test covering the production exit path with
skip_if_debug in both debug and non-debug modes — the atexit handler
can't be exercised from in-process unit tests.

* refactor: rename skip_if_debug to keep_if_debug in CleanupManager

The old name required parsing "skip [deletion] if debug" to understand
the intent. keep_if_debug=True reads directly as what it does.

* tests: add subprocess-level signal coverage for CleanupManager

Fill the contract gap where unit tests call _signal_handler() directly,
bypassing the _install_signal_handlers() registration path.

- Add test_cleanup_on_signal parametrized over SIGTERM and SIGINT:
  spawns a subprocess, waits for a "ready" handshake, delivers the
  signal, and asserts the sentinel file was deleted
- Add timeout=30 to the existing subprocess.run() call
- Rename _SCRIPT → _ATEXIT_SCRIPT, add _SIGNAL_SCRIPT at module level
- Move @pytest.mark.windows to the atexit test only; signal test runs
  Linux-only by CI convention

* remote trivial changes from new test_robot_writer.py

* fix: guard bad directive syntax, fix stale docstring, use LOG_HTML constant

- robot_writer.py: raise ValueError with file/directive context when
  iterate_list_chunked chunk_size cannot be parsed — propagates through
  the existing except Exception handler in combined_orchestrator, aborting
  execution cleanly instead of silently continuing with a bad directive
- robot/orchestrator.py: remove stale step "3. Creates merged data model
  file" from run_tests() docstring; renumber remaining steps 4-6 → 3-5
- combined_orchestrator.py: replace bare "log.html" with LOG_HTML constant

* add PRD, adjust comment

* minor adjustments in cleanup.py

* tests: restore integration rendering tests and fix fixture placement

* refactor: replace _cleanup/cleanup_now with single public run_cleanup()

Remove the private _cleanup() method and the cleanup_now() wrapper —
the split served no purpose since both had identical behaviour and
there was no reason to bypass the public API in atexit.register() or
_signal_handler(). Inline the implementation directly into run_cleanup(),
which is now the single method registered with atexit, called from the
signal handler, and available for manual invocation.

* reference MERGED_DATA_FILENAME in tests

* add comment on fork safety

* tests: improve CleanupManager unit test quality

- Parametrize test_unregister_removes_path over keep_if_debug values
- Remove test_register_multiple_files (only tested dict insertion count)
- Strengthen test_cleanup_is_idempotent — assert file deleted on first call
- Replace test_concurrent_registration with test_concurrent_register_and_cleanup:
  races register() against run_cleanup() to verify the lock prevents silent
  data loss, rather than just testing concurrent dict insertions

* update PRD

* fix: use RLock to prevent signal-handler deadlock in CleanupManager

Python signal handlers run in the main thread between bytecodes. If
SIGTERM/SIGINT arrives while the main thread holds the Lock inside
register() or unregister(), the signal handler's run_cleanup() call
attempts to acquire the same non-reentrant Lock, causing a deadlock.

Replace threading.Lock with threading.RLock (reentrant) so the same
thread can re-acquire safely from within the signal handler.

Addresses PR review comment #1.

* fix: always clean up merged data file regardless of debug mode

The merged data YAML may contain credentials resolved from !env
references. Registering it with keep_if_debug=True meant it would
persist when NAC_TEST_DEBUG is set, which CI/CD pipelines might
enable inadvertently — leaving credentials in build artifacts.

Remove the keep_if_debug flag so the sensitive file is always deleted
on exit. Debug users can still inspect it during execution; it only
needs to survive until cleanup runs.

Addresses PR review comment #2.

* fix: delete files immediately when registered after cleanup has run

Once run_cleanup() completes it sets _cleanup_done=True, making
subsequent run_cleanup() calls no-ops. Files registered after that
point were silently added to _files but never deleted — a bug since
the docstring says run_cleanup() is safe to call multiple times.

Fix: register() now checks _cleanup_done and, if True, deletes the
file immediately via the new _delete_file() helper (extracted from
run_cleanup() to share the deletion + keep_if_debug logic).

Adds test_register_after_cleanup_deletes_immediately to verify the
new behavior.

Addresses PR review comment #4.

* refactor: use _setup_run_tests_mocks helper in full execution test

test_run_tests_full_execution inlined the same mock setup that the
_setup_run_tests_mocks helper was introduced to eliminate. Delegate
to the helper and override only the specific TestResults return value.

Addresses PR review comment #5.

* docs: add nac-yaml 2.0 dependency comment in RobotWriter

Note why _convert_data_model_for_templates() was safe to remove:
nac-yaml 2.0+ guarantees plain dict/list output, so Jinja2 receives
ordinary Python types directly without a JSON round-trip conversion.

Addresses PR review comment #6.

* Revert "fix: delete files immediately when registered after cleanup has run"

This reverts commit ff191a9.

* fix(cleanup): remove redundant _cleanup_done flag

_files.clear() already makes run_cleanup() naturally idempotent.
Removing the flag ensures files registered between cleanup
invocations (e.g. signal handler → atexit) are not silently lost.

* refactor: adapt robot_writer shim to new merged_data constructor API

The canonical RobotWriter constructor changed from data_paths: list[Path]
to merged_data: dict[str, Any]. The shim now acts as an adapter:
accepts the old data_paths API, calls DataMerger.merge_data_files()
internally, and passes the merged dict to the canonical constructor.

Also removes the signature reminder tests (follow-up to PR #765 where
they should have been dropped — the integration tests now provide
strictly better coverage of canonical API changes).
Replace manual cleanup()/\__del__() with centralized CleanupManager
registration (keep_if_debug=True). This removes SubprocessRunner.cleanup(),
SubprocessRunner.__del__(), and PyATSOrchestrator._cleanup_subprocess_runner()
along with both call sites in _generate_html_reports_async().

Addresses PR #697 review comments #1 (cleanup gap on early return),
#5 (test_del_calls_cleanup wrong contract), and #8 (missing policy tests
for _cleanup_subprocess_runner — method now deleted).
- Add missing __all__ entries in constants.py (review #3)
- Narrow Path.write_text patches to module scope in test_subprocess_runner
  instead of global patch.object (review #4)
- Rewrite test_partial_write_failure and test_orchestrator_config_error to
  remove fragile object.__new__ coupling and update to current API (review #7)
Move test file from tests/pyats_core/ to tests/unit/pyats_core/ to reduce
placement debt per review comment #9. Add PyATSTestDirs and pyats_test_dirs
fixture to tests/unit/conftest.py (intentional duplication until #541 fully
consolidates the two conftest files).
@oboehmer
Copy link
Copy Markdown
Collaborator Author

Thanks for the thorough review @aitestino! Here's how we addressed each item:

# Comment Status Detail
1 Cleanup gap on early return ✅ Resolved Refactored to use CleanupManager (from #727) — cleanup is now automatic via atexit/signals, no manual call sites to miss
2 Content constants to constants.py ⏭️ Deferred Config file content in constants.py looks odd — these are multi-line YAML templates, not simple constants. Prefer to keep them co-located with their only consumer
3 Missing __all__ entries ✅ Fixed Added 3 entries
4 Path.write_text patches too broad ✅ Fixed Narrowed to module-level scope
5 test_del_calls_cleanup wrong contract ✅ Resolved Test and __del__ method both deleted — CleanupManager owns the lifecycle now
6 KEEP_REPORT_DATA env var scattered ⏭️ Deferred to #538 Pre-existing debt with its own tracking issue
7 object.__new__ fragile coupling ✅ Fixed Tests rewritten without __new__ bypass
8 Missing _cleanup_subprocess_runner tests ✅ Resolved Method deleted entirely — CleanupManager replaces it
9 Test file placement deepens #541 ✅ Fixed Moved test_orchestrator_config_error.py to tests/unit/pyats_core/
10 ArchiveConfigPlugin credential leak ⏭️ Tracked as #773 Low risk (nac-test creds are always env vars), but user-provided testbeds with hardcoded credentials could leak

The key architectural change: by adopting CleanupManager (merged in #727), we eliminated SubprocessRunner.cleanup(), SubprocessRunner.__del__(), and PyATSOrchestrator._cleanup_subprocess_runner() entirely. Config files are registered at creation time and cleaned up automatically — which also resolves the early-return gap from comment #1 by design.

@oboehmer oboehmer requested a review from aitestino April 15, 2026 13:20
@oboehmer oboehmer added re-review re-review requested labels Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

prio: high re-review re-review requested

Projects

None yet

2 participants