Skip to content

Add nvidia-nat-harbor for Harbor local and inline evaluation#1892

Draft
AnuradhaKaruppiah wants to merge 34 commits intoNVIDIA:developfrom
AnuradhaKaruppiah:ak-harbor-evals
Draft

Add nvidia-nat-harbor for Harbor local and inline evaluation#1892
AnuradhaKaruppiah wants to merge 34 commits intoNVIDIA:developfrom
AnuradhaKaruppiah:ak-harbor-evals

Conversation

@AnuradhaKaruppiah
Copy link
Copy Markdown
Contributor

@AnuradhaKaruppiah AnuradhaKaruppiah commented Apr 24, 2026

Description

Adds nvidia-nat-harbor, a NAT/Harbor integration package for running NAT workflows in Harbor evaluation jobs.

This PR supports:

  • Harbor NemoAgent integration for NAT workflow configs
  • host-local Harbor execution via LocalEnvironment
  • shell compatibility mode for existing Harbor task behavior
  • library mode for in-process NAT workflow execution
  • inline ATIF verifier support through Harbor verifier import hooks
  • script bridge compatibility for tests/test.sh verifier paths
  • simple calculator Harbor adapters and E2E examples

Docs

Start here:

  • Package overview and mode diagrams: packages/nvidia_nat_harbor/README.md
  • Runnable simple calculator E2E commands: examples/evaluation_and_profiling/simple_calculator_eval/harbor-eval-readme.md
  • Harbor upstreaming plan and priority table: packages/nvidia_nat_harbor/upstream-plan.md

Upstream Dependencies

This PR currently depends on Harbor-side verifier import-hook support:

  • VerifierFactory
  • VerifierConfig.import_path
  • VerifierConfig.kwargs
  • --verifier-import-path
  • --verifier-kwarg

Until that lands in a Harbor release, the docs use the Harbor side branch described in the README.

First-class --env local is also not yet accepted by Harbor CLI enum validation, so the current local-mode workaround is:

  • --env docker
  • --environment-import-path nat_harbor.environments.local:LocalEnvironment

Validation

Unit tests:

uv run pytest packages/nvidia_nat_harbor/tests

Latest result: 36 passed.

E2E commands from harbor-eval-readme.md were run successfully:
dataset generation: 10/10 tasks generated
shell-mode smoke: mean 1.000, zero exceptions
full shell-mode local job: 10/10, mean 1.000, zero exceptions
builtin trajectory inline verifier: mean 1.000, zero exceptions
builtin tunable-rag inline verifier: mean 1.000, zero exceptions
custom evaluator-ref inline verifier: mean 1.000, zero exceptions

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • New Features

    • Added Harbor evaluation framework for simple calculator benchmarks with local execution support.
    • Implemented library mode for in-process agent workflows.
    • Added inline verifier for ATIF evaluation with custom evaluator support.
  • Documentation

    • Comprehensive guides for Harbor evaluation setup and execution added.
  • Tests

    • Added test coverage for Harbor integration and evaluation capabilities.

Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
@AnuradhaKaruppiah AnuradhaKaruppiah self-assigned this Apr 24, 2026
@AnuradhaKaruppiah AnuradhaKaruppiah added feature request New feature or request DO NOT MERGE PR should not be merged; see PR for details labels Apr 24, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 24, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7e6b98e5-7664-4bc0-9fd9-81fd5893181f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

This pull request introduces a new Harbor integration package (nvidia-nat-harbor) alongside comprehensive infrastructure for evaluating simple calculator tasks. It provides adapters, verifiers, agents, and local environments to enable NAT workflows within Harbor's ATIF evaluation framework, including library-mode execution, inline verifiers, and local execution support.

Changes

Cohort / File(s) Summary
Simple Calculator Evaluation Example
examples/evaluation_and_profiling/simple_calculator_eval/harbor-eval-readme.md, examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/README.md
Documentation for Harbor adapter setup, dataset generation workflows, and execution modes (shell, library, inline verifiers).
Harbor Adapters Core
examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/__init__.py, examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/common.py
Shared utilities: template copying, arithmetic/power-of-two question parsing, ground truth writing, and numeric formatting helpers.
Harbor Adapters: Task Generators
examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator/*, examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_nested/*, examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/*
Three adapter implementations and CLI entry points (run_adapter.py) to generate task directories, parse benchmark data, and produce Harbor-compatible task configs with ground truth.
Harbor Adapters: Template
examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/...
Task template skeleton: Dockerfile, task.toml config, instruction/solution scripts, and numeric evaluator (evaluate.py, test.sh) for verifying computed answers within tolerance.
Evaluation Configuration
examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-nested-harbor-eval.yaml, examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/scripts/harbor_sample_evaluators.py
Harbor-ready NeMo workflow config and artifact-presence evaluator function for inline evaluation lanes.
nvidia-nat-harbor: Package Setup
packages/nvidia_nat_harbor/pyproject.toml, packages/nvidia_nat_harbor/README.md, packages/nvidia_nat_harbor/src/nat_harbor/__init__.py, packages/nvidia_nat_harbor/upstream-plan.md, packages/nvidia_nat_harbor/src/nat_harbor/meta/pypi.md
Package metadata, build configuration, documentation, and upstream development roadmap.
Agent Integration
packages/nvidia_nat_harbor/src/nat_harbor/agents/__init__.py, packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/__init__.py, packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent.py, packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent_run_wrapper.py, packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/library_mode.py, packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/inline_runner.py
NemoAgent bridge extending Harbor's agent with local install policy, library-mode execution, inline runner support, and workflow package management.
Agent Policy & Utilities
packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/policy.py, packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/local_install_policy.py
Local install policy resolution (skip/allow), validation, and backward-compatibility exports.
Environment Implementation
packages/nvidia_nat_harbor/src/nat_harbor/environments/__init__.py, packages/nvidia_nat_harbor/src/nat_harbor/environments/local.py
LocalEnvironment for host-local execution with container-path translation, allowlist-based security, profile-write blocking, and async command execution with timeouts.
Verifier Bridge & Evaluator Adapter
packages/nvidia_nat_harbor/src/nat_harbor/verifier/__init__.py, packages/nvidia_nat_harbor/src/nat_harbor/verifier/bridge_runner.py, packages/nvidia_nat_harbor/src/nat_harbor/verifier/evaluator_adapter.py, packages/nvidia_nat_harbor/src/nat_harbor/verifier/inline_verifier.py
ATIF evaluator bridge CLI, custom/builtin evaluator dispatch, async evaluation, fallback modes (fail/raw_output), and inline verifier integration with metadata enrichment.
Test Suite: Adapters
packages/nvidia_nat_harbor/tests/test_harbor_adapter_template.py, packages/nvidia_nat_harbor/tests/test_harbor_inline_verifier.py
Tests for template copying with cache exclusion, inline verifier request construction from environment variables, and evaluator reference dispatch.
Test Suite: Verifier & Bridge
packages/nvidia_nat_harbor/tests/test_atif_bridge_runner.py, packages/nvidia_nat_harbor/tests/test_inline_verifier_driver.py, packages/nvidia_nat_harbor/tests/test_library_mode_contracts.py
Tests for bridge runner fallback modes (fail/raw_output), missing-artifact handling, evaluator dispatch (custom/builtin), and inline verifier driver artifact resolution and error handling.
Test Suite: Agent & Environment
packages/nvidia_nat_harbor/tests/test_nemo_agent_bridge.py, packages/nvidia_nat_harbor/tests/test_local_mode_runtime.py
Tests for NemoAgent workflow-package merging, library-mode flag resolution, install policy persistence, local environment path translation, and policy validation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

DO NOT MERGE

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and concisely describes the main feature being added: a new nvidia-nat-harbor package that enables Harbor local and inline evaluation capabilities for NAT workflows.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
@AnuradhaKaruppiah AnuradhaKaruppiah changed the title Add nvidia-nat-harbor package Add nvidia-nat-harbor for Harbor local and inline evaluation Apr 29, 2026
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
@AnuradhaKaruppiah AnuradhaKaruppiah added non-breaking Non-breaking change and removed DO NOT MERGE PR should not be merged; see PR for details labels Apr 29, 2026
@AnuradhaKaruppiah
Copy link
Copy Markdown
Contributor Author

/ok to test eddf95a

Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
@AnuradhaKaruppiah
Copy link
Copy Markdown
Contributor Author

/ok to test 2f772fe

@AnuradhaKaruppiah
Copy link
Copy Markdown
Contributor Author

@CodeRabbit review full

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

@AnuradhaKaruppiah Sure, I'll redo the entire review of the PR now.

✅ Actions performed

Full review triggered.

@coderabbitai coderabbitai Bot added the DO NOT MERGE PR should not be merged; see PR for details label Apr 29, 2026
@AnuradhaKaruppiah AnuradhaKaruppiah removed the DO NOT MERGE PR should not be merged; see PR for details label Apr 29, 2026
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
@AnuradhaKaruppiah
Copy link
Copy Markdown
Contributor Author

/ok to test 853b12d

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (13)
packages/nvidia_nat_harbor/upstream-plan.md-141-142 (1)

141-142: ⚠️ Potential issue | 🟡 Minor

Avoid inanimate possessive wording in Markdown.

Please reword “Harbor's nemo-agent …” to a non-possessive form (for example, “the nemo-agent component of Harbor …”).

As per coding guidelines: “Documentation in Markdown files should not contain usage of a possessive 's with inanimate objects.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_harbor/upstream-plan.md` around lines 141 - 142, Replace
the possessive phrase "Harbor's `nemo-agent`" with a non-possessive construction
such as "the `nemo-agent` component of Harbor" (or similar) in the
upstream-plan.md content so the sentence reads non-possessively; locate the
sentence containing “Harbor's `nemo-agent` owns library mode directly…” and
update it accordingly.
examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/tests/evaluate.py-29-54 (1)

29-54: ⚠️ Potential issue | 🟡 Minor

Write verifier details.json on all failure paths, not only after parsing succeeds.

Line 29 through Line 43 can return early before any details artifact is emitted, which reduces debuggability for failed trials.

🔧 Suggested structure
+def _write_details(details: dict[str, object]) -> None:
+    Path("/logs/verifier").mkdir(parents=True, exist_ok=True)
+    Path("/logs/verifier/details.json").write_text(json.dumps(details, indent=2), encoding="utf-8")
+
 def main() -> int:
-    ground_truth = _load_ground_truth()
+    details: dict[str, object] = {"passed": False, "candidate_values": []}
+    ground_truth = _load_ground_truth()
     answer_path = Path("/workspace/answer.txt")
     if not answer_path.exists():
         print("Missing /workspace/answer.txt")
+        details["error"] = "missing_answer_file"
+        _write_details(details)
         return 1
@@
     if not answer_text:
         print("Answer is empty")
+        details["error"] = "empty_answer"
+        _write_details(details)
         return 1
@@
     if not candidate_values:
         print("No numeric values found in answer")
+        details["error"] = "no_numeric_values"
+        _write_details(details)
         return 1
@@
-    Path("/logs/verifier").mkdir(parents=True, exist_ok=True)
-    Path("/logs/verifier/details.json").write_text(json.dumps(details, indent=2), encoding="utf-8")
+    _write_details(details)
     return 0 if passed else 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/tests/evaluate.py`
around lines 29 - 54, The verifier currently returns early from evaluate.py
(checks around answer_path.exists(), empty answer_text, and no candidate_values)
without emitting the details artifact; update the failure paths so that before
each early return you build the same details dict (include fields:
"expected_value" from ground_truth, "tolerance", "candidate_values" (empty list
if not parsed), "passed": False, and an additional "error" or "reason" string
like "missing answer file", "empty answer", or "no numeric values") and ensure
Path("/logs/verifier").mkdir(parents=True, exist_ok=True) is called and
Path("/logs/verifier/details.json").write_text(json.dumps(details, indent=2),
encoding="utf-8") is executed so details.json is written on every exit path
(reference variables/functions: answer_path, answer_text, expected_value,
tolerance, candidate_values, details, _extract_numbers).
examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/README.md-28-29 (1)

28-29: ⚠️ Potential issue | 🟡 Minor

Replace NAT acronym in prose.

Line 28 uses “NAT examples” in Markdown text. Please spell this out as “NeMo Agent Toolkit examples” (acronym usage is only allowed inside package/code identifiers).

Suggested wording
-helpers so future NAT examples only define source-row loading, instruction
+helpers so future NeMo Agent Toolkit examples only define source-row loading, instruction

As per coding guidelines: "Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/README.md`
around lines 28 - 29, Replace the acronym "NAT" in the README sentence that
currently reads "NAT examples" with the full name "NeMo Agent Toolkit examples";
locate the phrase in the helpers description sentence in
harbor_adapters/README.md (the line containing "NAT examples") and update it so
the prose uses "NeMo Agent Toolkit examples" instead of "NAT" to comply with the
Markdown documentation guideline.
packages/nvidia_nat_harbor/src/nat_harbor/environments/local.py-53-61 (1)

53-61: ⚠️ Potential issue | 🟡 Minor

Resolve Ruff B026: Restructure super().__init__ call.

Line 59 places *args after keyword arguments, which violates Ruff B026. Reorder the call to move *args before keyword arguments (*args must precede any keyword arguments per PEP 3102).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_harbor/src/nat_harbor/environments/local.py` around lines
53 - 61, The super().__init__ call in LocalEnvironment's constructor currently
places *args after keyword arguments which triggers Ruff B026; reorder the call
so *args comes before the named keyword params and **kwargs comes last (i.e.,
call super().__init__(*args, environment_dir=..., environment_name=...,
session_id=..., trial_paths=..., task_env_config=..., **kwargs)) to comply with
PEP 3102.
packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/__init__.py-21-25 (1)

21-25: ⚠️ Potential issue | 🟡 Minor

Sort __all__ to satisfy Ruff RUF022.

__all__ is not sorted in alphabetical order and will fail linting checks.

Suggested ordering
 __all__ = [
+    "NemoAgent",
     "is_local_install_allowed",
-    "NemoAgent",
     "resolve_local_install_policy",
 ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/__init__.py`
around lines 21 - 25, The __all__ list is not alphabetized causing Ruff RUF022;
open the module's __all__ definition and reorder the exported symbol names
alphabetically (e.g., ensure entries for is_local_install_allowed, NemoAgent,
resolve_local_install_policy are sorted by name) so the __all__ list is
lexicographically sorted to satisfy the lint rule.
packages/nvidia_nat_harbor/src/nat_harbor/environments/local.py-190-220 (1)

190-220: ⚠️ Potential issue | 🟡 Minor

Add explicit -> None annotations on public mutating coroutines.

stop, upload_file, upload_dir, download_file, and download_dir are missing explicit return type annotations. The start method in the same class demonstrates the correct pattern with -> None. Please add return type annotations to align with the requirement that all public APIs have Python 3.11+ type hints.

Suggested signature updates
-    async def stop(self, delete: bool):
+    async def stop(self, delete: bool) -> None:
@@
-    async def upload_file(self, source_path: Path | str, target_path: str):
+    async def upload_file(self, source_path: Path | str, target_path: str) -> None:
@@
-    async def upload_dir(self, source_dir: Path | str, target_dir: str):
+    async def upload_dir(self, source_dir: Path | str, target_dir: str) -> None:
@@
-    async def download_file(self, source_path: str, target_path: Path | str):
+    async def download_file(self, source_path: str, target_path: Path | str) -> None:
@@
-    async def download_dir(self, source_dir: str, target_dir: Path | str):
+    async def download_dir(self, source_dir: str, target_dir: Path | str) -> None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_harbor/src/nat_harbor/environments/local.py` around lines
190 - 220, Add explicit return type annotations "-> None" to the public mutating
async methods in the Local environment class: stop, upload_file, upload_dir,
download_file, and download_dir (the same style as the existing start method).
Update each async def signature to include "-> None" while keeping existing
parameter types (e.g., Path | str and str) and ensure imports/type constructs
remain unchanged.
examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-nested-harbor-eval.yaml-23-45 (1)

23-45: ⚠️ Potential issue | 🟡 Minor

Use 4-space indentation for this YAML file.

Current nesting uses 2 spaces, but repo policy for .yaml requires 4 spaces.

🧹 Suggested formatting update
 function_groups:
-  calculator:
-    _type: calculator
+    calculator:
+        _type: calculator

 functions:
-  power_of_two:
-    _type: power_of_two
-    multiply_fn: calculator__multiply
+    power_of_two:
+        _type: power_of_two
+        multiply_fn: calculator__multiply

 llms:
-  nim_llm:
-    _type: nim
-    model_name: nvidia/nemotron-3-nano-30b-a3b
-    temperature: 0.0
-    max_tokens: 1024
-    chat_template_kwargs:
-      enable_thinking: false
+    nim_llm:
+        _type: nim
+        model_name: nvidia/nemotron-3-nano-30b-a3b
+        temperature: 0.0
+        max_tokens: 1024
+        chat_template_kwargs:
+            enable_thinking: false

 workflow:
-  _type: react_agent
-  tool_names: [power_of_two, calculator__multiply]
-  llm_name: nim_llm
-  verbose: true
-  parse_agent_response_max_retries: 3
+    _type: react_agent
+    tool_names: [power_of_two, calculator__multiply]
+    llm_name: nim_llm
+    verbose: true
+    parse_agent_response_max_retries: 3

As per coding guidelines, “Indent with 4 spaces, never tabs, and ensure every file ends with a single newline.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-nested-harbor-eval.yaml`
around lines 23 - 45, The YAML file uses 2-space indentation but must follow the
repo policy of 4-space indentation; update every nested block (sections like
calculator, functions -> power_of_two, llms -> nim_llm, and workflow) to use
4-space indents, ensure keys like multiply_fn: calculator__multiply and
workflow.tool_names include proper 4-space nesting, remove any tabs, and save
the file with a single trailing newline at the end.
examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/run_adapter.py-1-3 (1)

1-3: ⚠️ Potential issue | 🟡 Minor

Use the full Apache header template here as well.

Every other new source file in this PR includes the complete SPDX Apache-2.0 header block, but this script only has the two SPDX lines. Please copy the standard template for consistency and compliance. As per coding guidelines, "All source files must include the SPDX Apache-2.0 header template (copy from an existing file)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/run_adapter.py`
around lines 1 - 3, Replace the current two-line SPDX snippet at the top of
run_adapter.py with the full Apache-2.0 SPDX header template used across the
repo (copy the standard multi-line header from any other new source file in this
PR); ensure you keep the same file-level comment format, paste the full header
block in place of the existing lines, and save—no other logic changes required
in run_adapter.py.
packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent_run_wrapper.py-53-79 (1)

53-79: ⚠️ Potential issue | 🟡 Minor

Align the internal env vars with the repo-standard NAT_ naming.

This wrapper reads NVIDIA_NAT_LOG_LEVEL / NVIDIA_NAT_DEBUGPY_*, but the rest of the stack uses NAT_..., and this file even writes back to NAT_LOG_LEVEL. That means setting the standard NAT_LOG_LEVEL alone will not affect the wrapper's own logging. As per coding guidelines, "Use 'nat' for the API namespace and CLI tool, 'nvidia-nat' for the package name, 'NAT_' prefix for environment variables, and 'NeMo-Agent-Toolkit' for URLs and directory names".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent_run_wrapper.py`
around lines 53 - 79, Change the env var keys from the legacy "NVIDIA_NAT_*" to
the repo-standard "NAT_*" in the logging setup and debugpy logic: update the
top-level _log_level_str lookup to read os.environ.get("NAT_LOG_LEVEL",
"WARNING").upper() (optionally falling back to "NVIDIA_NAT_LOG_LEVEL" if present
for compatibility), change os.environ.setdefault to use "NAT_LOG_LEVEL", and
update maybe_enable_debugpy to read "NAT_DEBUGPY_PORT", "NAT_DEBUGPY_HOST" and
"NAT_DEBUGPY_WAIT_FOR_CLIENT" (again you may accept the NVIDIA_ prefixed names
as a fallback). Make these edits in the module-level code that defines
_log_level_str and in the maybe_enable_debugpy function so all references and
side-effects use the NAT_* environment variable names.
examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/run_adapter.py-11-20 (1)

11-20: ⚠️ Potential issue | 🟡 Minor

Only retry the local import when the missing module is actually adapter.

ModuleNotFoundError also fires for transitive imports inside adapter.py. In that case this fallback rewrites sys.path and retries instead of surfacing the real dependency error.

Suggested fix
-try:
-    from adapter import DEFAULT_SOURCE_DATA
-    from adapter import SimpleCalculatorPowerOfTwoAdapter
-except ModuleNotFoundError:
+try:
+    from adapter import DEFAULT_SOURCE_DATA
+    from adapter import SimpleCalculatorPowerOfTwoAdapter
+except ModuleNotFoundError as exc:
+    if exc.name != "adapter":
+        raise
     SCRIPT_DIR = Path(__file__).resolve().parent
     if str(SCRIPT_DIR) not in sys.path:
         sys.path.insert(0, str(SCRIPT_DIR))
     from adapter import DEFAULT_SOURCE_DATA
     from adapter import SimpleCalculatorPowerOfTwoAdapter
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/run_adapter.py`
around lines 11 - 20, The fallback import retry currently catches any
ModuleNotFoundError (which can be raised for transitive imports) and masks the
real error; modify the except block to catch ModuleNotFoundError as e and only
perform the sys.path rewrite and re-import when e.name (or e.args[0]) equals
"adapter" (i.e., the missing module is truly adapter); otherwise re-raise the
exception so transitive import errors surface. Apply this change around the
current import block that references DEFAULT_SOURCE_DATA and
SimpleCalculatorPowerOfTwoAdapter and keep the existing SCRIPT_DIR / sys.path
insert logic for the adapter-only case.
packages/nvidia_nat_harbor/tests/test_nemo_agent_bridge.py-69-70 (1)

69-70: ⚠️ Potential issue | 🟡 Minor

Remove @pytest.mark.asyncio decorators from async tests.

Repository policy requires async tests to run via the configured async runner without explicit decorators. Per coding guidelines: "Async tests are automatically detected and run by the async runner—the decorator is unnecessary clutter."

Proposed fix
-@pytest.mark.asyncio
 async def test_install_skips_local_mode_by_default(tmp_path: Path) -> None:
-@pytest.mark.asyncio
 async def test_install_allows_local_mode_with_opt_in(tmp_path: Path) -> None:
-@pytest.mark.asyncio
 async def test_setup_installs_multiple_local_packages_in_order(tmp_path: Path) -> None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_harbor/tests/test_nemo_agent_bridge.py` around lines 69 -
70, Remove the explicit pytest.mark.asyncio decorator from the async test
function test_install_skips_local_mode_by_default; open the test definition
(async def test_install_skips_local_mode_by_default(...)) and delete the
preceding `@pytest.mark.asyncio` line so the test relies on the repository's
configured async test runner instead.
packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent.py-47-81 (1)

47-81: ⚠️ Potential issue | 🟡 Minor

Rename the new env fallbacks to the NAT_ namespace.

NVIDIA_NAT_WORKFLOW_PACKAGES, HARBOR_LOCAL_INSTALL_POLICY, and NVIDIA_NAT_PYTHON_BIN do not follow the repo convention used elsewhere in this feature (NAT_HARBOR_...). Locking these names in now will make the Harbor bridge harder to document and keep consistent.

As per coding guidelines, "Use 'nat' for the API namespace and CLI tool, 'nvidia-nat' for the package name, 'NAT_' prefix for environment variables, and 'NeMo-Agent-Toolkit' for URLs and directory names".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent.py`
around lines 47 - 81, Update the env_fallbacks in the CLI_FLAGS list to use the
NAT_ namespace: change NVIDIA_NAT_WORKFLOW_PACKAGES -> NAT_WORKFLOW_PACKAGES for
the CliFlag "workflow_packages", change HARBOR_LOCAL_INSTALL_POLICY ->
NAT_LOCAL_INSTALL_POLICY for the CliFlag "local_install_policy", and change
NVIDIA_NAT_PYTHON_BIN -> NAT_PYTHON_BIN for the CliFlag "python_bin"; keep the
CliFlag names and other properties unchanged so only the env_fallback strings in
CLI_FLAGS are renamed to follow the NAT_ convention.
examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_nested/adapter.py-47-104 (1)

47-104: ⚠️ Potential issue | 🟡 Minor

Add the missing public-API docs and __init__ return type.

SimpleCalculatorNestedAdapter.__init__, make_local_task_id(), list_available_tasks(), generate_task(), and generate_many() are all public entry points in a new package, but they ship without the required Google-style docstrings, and __init__ is missing -> None.

As per coding guidelines, "All public APIs require Python 3.11+ type hints on parameters and return values" and "Provide Google-style docstrings for every public module, class, function and CLI command".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_nested/adapter.py`
around lines 47 - 104, Add Google-style docstrings to the public module, the
SimpleCalculatorNestedAdapter class, and its public methods (__init__,
make_local_task_id, list_available_tasks, generate_task, generate_many)
describing parameters, return values, and behavior; also update the __init__
signature to include the explicit return type "-> None". Ensure the docstrings
follow the project's Google-style format and cover parameter types (task_dir,
source_file, source_id, local_task_id, overwrite) and what each method returns
(make_local_task_id -> str, list_available_tasks -> list[str], generate_task ->
bool, generate_many -> tuple[int, int]) so the public API is fully documented.
🧹 Nitpick comments (2)
examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/adapter.py (1)

47-47: Tighten public API typing on constructor and collection parameters.

Please add -> None to __init__ and prefer Sequence[str] for generate_many input to avoid over-constraining callers.

Proposed fix
+from collections.abc import Sequence
@@
-    def __init__(self, task_dir: Path, source_file: Path | None = None):
+    def __init__(self, task_dir: Path, source_file: Path | None = None) -> None:
@@
-    def generate_many(self, source_ids: list[str], *, overwrite: bool = True) -> tuple[int, int]:
+    def generate_many(self, source_ids: Sequence[str], *, overwrite: bool = True) -> tuple[int, int]:
As per coding guidelines, "All public APIs require Python 3.11+ type hints on parameters and return values" and "Prefer collections.abc / typing abstractions (Sequence over list)".

Also applies to: 91-91

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/adapter.py`
at line 47, Update the public API type hints: add an explicit return annotation
"-> None" to the constructor definition (__init__) and change the collection
parameter type on generate_many from a concrete list/other to
typing.Sequence[str] (import Sequence from collections.abc or typing as needed);
ensure the new annotations are applied to both occurrences referenced in the
diff so the public API uses Python 3.11+ return hints and an abstract sequence
type for inputs.
examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator/adapter.py (1)

47-47: Apply the same public typing cleanup here for consistency.

__init__ should declare -> None, and generate_many should accept Sequence[str] to keep the API flexible.

As per coding guidelines, "All public APIs require Python 3.11+ type hints on parameters and return values" and "Prefer collections.abc / typing abstractions (Sequence over list)".

Also applies to: 93-93

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator/adapter.py`
at line 47, Update the public method type hints: change the constructor
signature for __init__ to declare a return type of None (def __init__(self,
task_dir: Path, source_file: Path | None = None) -> None) and modify
generate_many to accept a Sequence[str] instead of a concrete list type (e.g.,
def generate_many(self, inputs: Sequence[str]) -> ...), using typing
collections.abc abstractions to satisfy the public API typing guidelines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/common.py`:
- Around line 76-78: Before performing the division assign, add a guard that
checks if rhs == 0 to avoid ZeroDivisionError: if rhs is zero, set expected to
None (or another sentinel) and return operation, lhs, rhs, expected (or raise a
clear ValueError with a descriptive message) instead of computing lhs / rhs;
update the branch where expected is computed (the code that sets expected = lhs
/ rhs and returns operation, lhs, rhs, expected) to perform this check and
handle the zero case gracefully using the existing variables operation, lhs,
rhs, expected.

In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_nested/adapter.py`:
- Around line 81-90: The generate_task method currently constructs output_dir
from local_task_id and may rmtree it without validating the input; ensure
local_task_id cannot escape self.task_dir by validating/sanitizing before any
delete: convert local_task_id to a Path, reject absolute paths and any path
containing ".." (or resolve and assert
output_dir.is_relative_to(self.task_dir)), and only proceed with
shutil.rmtree/output_dir.mkdir if the resolved output_dir is inside
self.task_dir; update generate_task to raise an error on invalid local_task_id
rather than performing any filesystem operations.

In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/environment/Dockerfile`:
- Around line 4-12: The Dockerfile runs as root and installs packages without
--no-install-recommends; create a non-root user and switch to it, tighten
apt-get flags, and minimize image footprint: update the RUN apt-get command in
the Dockerfile to use apt-get install --no-install-recommends -y and remove
unnecessary packages, create a dedicated user (e.g., "appuser") and group,
ensure WORKDIR ownership is changed (chown) to that user, and add a USER
instruction to run the container as the non-root user while keeping CMD
["/bin/bash"].

In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor-eval-readme.md`:
- Around line 23-34: Update the README prose to conform to Markdown rules:
replace every occurrence of the acronym "NAT" with "NeMo Agent Toolkit", remove
possessive forms like "Harbor's verifier import hook" (e.g., change to "the
Harbor verifier import hook" or "Harbor verifier import hook"), change heading
text "Builtin" to "built-in", and ensure other inanimate possessive uses are
rewritten; keep the technical references intact (NemoAgent, --model, --ak
config_file=..., llms, nat_harbor.environments.local:LocalEnvironment / --env
docker, tests/test.sh, --ak library_mode=true,
nat_harbor.verifier.inline_verifier:ATIFInlineVerifier) while applying these
wording changes throughout lines ~98-165 and the shown section.

In
`@examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/scripts/harbor_sample_evaluators.py`:
- Line 22: The public function artifact_presence_evaluator currently leaves the
atif_samples parameter untyped; add a 3.11+ type hint (e.g., atif_samples:
list[dict[str, Any]] or Sequence[dict[str, Any]]) to the function signature and
ensure Any/Sequence are imported (or use built-in generic types) so the
signature reads something like artifact_presence_evaluator(atif_samples:
list[dict[str, Any]], artifact_path: str | None = None) -> dict[str, Any];
update the imports to include typing symbols if needed and keep the existing
return type intact.

In `@packages/nvidia_nat_harbor/README.md`:
- Around line 20-29: The README uses forbidden shorthand and possessive forms;
update all plain "NAT" occurrences to "NeMo Agent Toolkit" (e.g., change
"NAT-backed Harbor agent (NemoAgent)" to "NeMo Agent Toolkit-backed Harbor agent
(NemoAgent)" and any other "NAT" mentions) and remove possessive "'s" on
inanimate objects (e.g., replace "Harbor's verifier import hook" with "the
verifier import hook for Harbor" or "the verifier import hook used by Harbor");
apply the same normalization to all affected sections (notably the excerpt
around "nvidia-nat-harbor", the bullet list entries including "NAT-backed Harbor
agent (NemoAgent)", "inline ATIF verifier class for Harbor verifier import
hooks", and all similar phrases in the README between the noted ranges) so
wording conforms to the docs rules.

In
`@packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent_run_wrapper.py`:
- Around line 172-203: The SessionManager is not guaranteed to be shut down on
exceptions; wrap the lifecycle so that after creating session_manager via
SessionManager.create (and before exiting the function) you always call await
session_manager.shutdown() in a finally block — e.g., create session_manager,
then use try: with WorkflowBuilder.from_config(...): ... await runner.result()
... finally: await session_manager.shutdown(); ensure any intermediate_task
handling (pull_intermediate, _write_trajectory) and session_manager.session /
session.run blocks remain inside the try so shutdown runs on all error paths.
- Around line 206-227: The current __main__ block manually scans sys.argv and
strips "--trajectory-output" which corrupts free-form instruction text; replace
that logic with argparse: define a positional "config_path" argument, a
positional "instruction" argument using nargs="+" (to capture the whole prompt
as a list), and an optional "--trajectory-output" (or "--trajectory_output")
that stores into trajectory_path, then join the instruction list into a single
string and call asyncio.run(main(config_path, instruction_str,
trajectory_path)); update the code paths around the existing trajectory_path
variable and the call to main to use the parsed values.

In `@packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent.py`:
- Around line 47-81: Convert the mutable CLI_FLAGS list to an immutable tuple by
replacing the list literal with a tuple while preserving the unpack of
HarborNemoAgent.CLI_FLAGS and all CliFlag entries; also rename the env_fallback
value for the "local_install_policy" CliFlag from "HARBOR_LOCAL_INSTALL_POLICY"
to "NAT_HARBOR_LOCAL_INSTALL_POLICY" to follow the NAT_ prefix convention so the
symbol CLI_FLAGS (and the specific CliFlag "local_install_policy") are updated
accordingly.
- Around line 206-212: The current implementation resolves the API key via
_resolve_api_key(), injects it into _generate_config_yaml(), and writes that
YAML into logs_dir (config_path), which risks persisting secrets; instead change
the flow so the generated config with secrets is never written to logs_dir:
generate the config in a secure temporary location (e.g.,
tempfile.TemporaryDirectory) or avoid embedding the key in the file by setting
the credential via environment variable or an OS-level secret (export into
process env) and have _generate_config_yaml() produce a redacted file without
secrets; ensure _generate_config_yaml, _resolve_api_key and the code that writes
config_path are updated so only a redacted config (no API key) is written to
logs_dir or the real config is written to a temp secure path and returned as its
secure path, and add explicit redaction of any API key substring before any
write to logs_dir.

In `@packages/nvidia_nat_harbor/src/nat_harbor/verifier/inline_verifier.py`:
- Around line 102-105: The _write_details method currently calls
json.dumps(details, indent=2) which will raise TypeError for non-JSON-native
objects (Path, datetime, Pydantic models) and cause verifier failure; update
_write_details to robustly serialize details_path by replacing json.dumps with a
safe serializer (e.g., json.dumps(details, indent=2, default=str) or a small
recursive converter that calls .dict() on Pydantic models and str() on
Path/datetime) and keep writing to details_path with encoding="utf-8" so
details.json always gets a readable representation of the evaluator payload even
when it contains non-JSON types.

In `@packages/nvidia_nat_harbor/tests/test_library_mode_contracts.py`:
- Line 35: Replace the hardcoded "/tmp/config.yml" strings (causing Ruff S108)
by using pytest's tmp_path fixture to create a temporary config path and, if
needed, write the config file there; update the calls that pass
config_file="/tmp/config.yml" (both occurrences around the config_file argument)
to pass config_file=str(tmp_path / "config.yml") and ensure the test function
signature includes tmp_path and any setup writes the expected contents to
tmp_path / "config.yml".

In `@packages/nvidia_nat_harbor/upstream-plan.md`:
- Around line 20-23: Replace all prose occurrences of the acronym "NAT" with the
full name "NeMo Agent Toolkit" throughout the document; leave lowercase `nat`
only when it is part of code/package identifiers and ensure those identifiers
are wrapped in backticks (for example keep `nvidia-nat-harbor` but change any
plain NAT references to NeMo Agent Toolkit). Search for bare "NAT" tokens
(including headings and inline text) and update them to "NeMo Agent Toolkit",
verify package or code identifiers containing nat remain lowercase and
backticked, and ensure no remaining uppercase "NAT" exists in prose anywhere in
the file.

---

Minor comments:
In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/README.md`:
- Around line 28-29: Replace the acronym "NAT" in the README sentence that
currently reads "NAT examples" with the full name "NeMo Agent Toolkit examples";
locate the phrase in the helpers description sentence in
harbor_adapters/README.md (the line containing "NAT examples") and update it so
the prose uses "NeMo Agent Toolkit examples" instead of "NAT" to comply with the
Markdown documentation guideline.

In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_nested/adapter.py`:
- Around line 47-104: Add Google-style docstrings to the public module, the
SimpleCalculatorNestedAdapter class, and its public methods (__init__,
make_local_task_id, list_available_tasks, generate_task, generate_many)
describing parameters, return values, and behavior; also update the __init__
signature to include the explicit return type "-> None". Ensure the docstrings
follow the project's Google-style format and cover parameter types (task_dir,
source_file, source_id, local_task_id, overwrite) and what each method returns
(make_local_task_id -> str, list_available_tasks -> list[str], generate_task ->
bool, generate_many -> tuple[int, int]) so the public API is fully documented.

In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/run_adapter.py`:
- Around line 1-3: Replace the current two-line SPDX snippet at the top of
run_adapter.py with the full Apache-2.0 SPDX header template used across the
repo (copy the standard multi-line header from any other new source file in this
PR); ensure you keep the same file-level comment format, paste the full header
block in place of the existing lines, and save—no other logic changes required
in run_adapter.py.
- Around line 11-20: The fallback import retry currently catches any
ModuleNotFoundError (which can be raised for transitive imports) and masks the
real error; modify the except block to catch ModuleNotFoundError as e and only
perform the sys.path rewrite and re-import when e.name (or e.args[0]) equals
"adapter" (i.e., the missing module is truly adapter); otherwise re-raise the
exception so transitive import errors surface. Apply this change around the
current import block that references DEFAULT_SOURCE_DATA and
SimpleCalculatorPowerOfTwoAdapter and keep the existing SCRIPT_DIR / sys.path
insert logic for the adapter-only case.

In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/tests/evaluate.py`:
- Around line 29-54: The verifier currently returns early from evaluate.py
(checks around answer_path.exists(), empty answer_text, and no candidate_values)
without emitting the details artifact; update the failure paths so that before
each early return you build the same details dict (include fields:
"expected_value" from ground_truth, "tolerance", "candidate_values" (empty list
if not parsed), "passed": False, and an additional "error" or "reason" string
like "missing answer file", "empty answer", or "no numeric values") and ensure
Path("/logs/verifier").mkdir(parents=True, exist_ok=True) is called and
Path("/logs/verifier/details.json").write_text(json.dumps(details, indent=2),
encoding="utf-8") is executed so details.json is written on every exit path
(reference variables/functions: answer_path, answer_text, expected_value,
tolerance, candidate_values, details, _extract_numbers).

In
`@examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-nested-harbor-eval.yaml`:
- Around line 23-45: The YAML file uses 2-space indentation but must follow the
repo policy of 4-space indentation; update every nested block (sections like
calculator, functions -> power_of_two, llms -> nim_llm, and workflow) to use
4-space indents, ensure keys like multiply_fn: calculator__multiply and
workflow.tool_names include proper 4-space nesting, remove any tabs, and save
the file with a single trailing newline at the end.

In `@packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/__init__.py`:
- Around line 21-25: The __all__ list is not alphabetized causing Ruff RUF022;
open the module's __all__ definition and reorder the exported symbol names
alphabetically (e.g., ensure entries for is_local_install_allowed, NemoAgent,
resolve_local_install_policy are sorted by name) so the __all__ list is
lexicographically sorted to satisfy the lint rule.

In
`@packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent_run_wrapper.py`:
- Around line 53-79: Change the env var keys from the legacy "NVIDIA_NAT_*" to
the repo-standard "NAT_*" in the logging setup and debugpy logic: update the
top-level _log_level_str lookup to read os.environ.get("NAT_LOG_LEVEL",
"WARNING").upper() (optionally falling back to "NVIDIA_NAT_LOG_LEVEL" if present
for compatibility), change os.environ.setdefault to use "NAT_LOG_LEVEL", and
update maybe_enable_debugpy to read "NAT_DEBUGPY_PORT", "NAT_DEBUGPY_HOST" and
"NAT_DEBUGPY_WAIT_FOR_CLIENT" (again you may accept the NVIDIA_ prefixed names
as a fallback). Make these edits in the module-level code that defines
_log_level_str and in the maybe_enable_debugpy function so all references and
side-effects use the NAT_* environment variable names.

In `@packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent.py`:
- Around line 47-81: Update the env_fallbacks in the CLI_FLAGS list to use the
NAT_ namespace: change NVIDIA_NAT_WORKFLOW_PACKAGES -> NAT_WORKFLOW_PACKAGES for
the CliFlag "workflow_packages", change HARBOR_LOCAL_INSTALL_POLICY ->
NAT_LOCAL_INSTALL_POLICY for the CliFlag "local_install_policy", and change
NVIDIA_NAT_PYTHON_BIN -> NAT_PYTHON_BIN for the CliFlag "python_bin"; keep the
CliFlag names and other properties unchanged so only the env_fallback strings in
CLI_FLAGS are renamed to follow the NAT_ convention.

In `@packages/nvidia_nat_harbor/src/nat_harbor/environments/local.py`:
- Around line 53-61: The super().__init__ call in LocalEnvironment's constructor
currently places *args after keyword arguments which triggers Ruff B026; reorder
the call so *args comes before the named keyword params and **kwargs comes last
(i.e., call super().__init__(*args, environment_dir=..., environment_name=...,
session_id=..., trial_paths=..., task_env_config=..., **kwargs)) to comply with
PEP 3102.
- Around line 190-220: Add explicit return type annotations "-> None" to the
public mutating async methods in the Local environment class: stop, upload_file,
upload_dir, download_file, and download_dir (the same style as the existing
start method). Update each async def signature to include "-> None" while
keeping existing parameter types (e.g., Path | str and str) and ensure
imports/type constructs remain unchanged.

In `@packages/nvidia_nat_harbor/tests/test_nemo_agent_bridge.py`:
- Around line 69-70: Remove the explicit pytest.mark.asyncio decorator from the
async test function test_install_skips_local_mode_by_default; open the test
definition (async def test_install_skips_local_mode_by_default(...)) and delete
the preceding `@pytest.mark.asyncio` line so the test relies on the repository's
configured async test runner instead.

In `@packages/nvidia_nat_harbor/upstream-plan.md`:
- Around line 141-142: Replace the possessive phrase "Harbor's `nemo-agent`"
with a non-possessive construction such as "the `nemo-agent` component of
Harbor" (or similar) in the upstream-plan.md content so the sentence reads
non-possessively; locate the sentence containing “Harbor's `nemo-agent` owns
library mode directly…” and update it accordingly.

---

Nitpick comments:
In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/adapter.py`:
- Line 47: Update the public API type hints: add an explicit return annotation
"-> None" to the constructor definition (__init__) and change the collection
parameter type on generate_many from a concrete list/other to
typing.Sequence[str] (import Sequence from collections.abc or typing as needed);
ensure the new annotations are applied to both occurrences referenced in the
diff so the public API uses Python 3.11+ return hints and an abstract sequence
type for inputs.

In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator/adapter.py`:
- Line 47: Update the public method type hints: change the constructor signature
for __init__ to declare a return type of None (def __init__(self, task_dir:
Path, source_file: Path | None = None) -> None) and modify generate_many to
accept a Sequence[str] instead of a concrete list type (e.g., def
generate_many(self, inputs: Sequence[str]) -> ...), using typing collections.abc
abstractions to satisfy the public API typing guidelines.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b6a03ab8-8699-4059-a6f6-ae6e62719273

📥 Commits

Reviewing files that changed from the base of the PR and between 2cfdf1f and 2f772fe.

⛔ Files ignored due to path filters (1)
  • packages/nvidia_nat_harbor/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (44)
  • examples/evaluation_and_profiling/simple_calculator_eval/harbor-eval-readme.md
  • examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/README.md
  • examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/__init__.py
  • examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/common.py
  • examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator/adapter.py
  • examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator/run_adapter.py
  • examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_nested/adapter.py
  • examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_nested/run_adapter.py
  • examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/adapter.py
  • examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/run_adapter.py
  • examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/environment/Dockerfile
  • examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/instruction.md
  • examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/solution/solve.sh
  • examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/task.toml
  • examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/tests/evaluate.py
  • examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/tests/test.sh
  • examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-nested-harbor-eval.yaml
  • examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/scripts/harbor_sample_evaluators.py
  • packages/nvidia_nat_harbor/README.md
  • packages/nvidia_nat_harbor/pyproject.toml
  • packages/nvidia_nat_harbor/src/nat_harbor/__init__.py
  • packages/nvidia_nat_harbor/src/nat_harbor/agents/__init__.py
  • packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/__init__.py
  • packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/inline_runner.py
  • packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/library_mode.py
  • packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/local_install_policy.py
  • packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent.py
  • packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent_run_wrapper.py
  • packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/policy.py
  • packages/nvidia_nat_harbor/src/nat_harbor/environments/__init__.py
  • packages/nvidia_nat_harbor/src/nat_harbor/environments/local.py
  • packages/nvidia_nat_harbor/src/nat_harbor/meta/pypi.md
  • packages/nvidia_nat_harbor/src/nat_harbor/verifier/__init__.py
  • packages/nvidia_nat_harbor/src/nat_harbor/verifier/bridge_runner.py
  • packages/nvidia_nat_harbor/src/nat_harbor/verifier/evaluator_adapter.py
  • packages/nvidia_nat_harbor/src/nat_harbor/verifier/inline_verifier.py
  • packages/nvidia_nat_harbor/tests/test_atif_bridge_runner.py
  • packages/nvidia_nat_harbor/tests/test_harbor_adapter_template.py
  • packages/nvidia_nat_harbor/tests/test_harbor_inline_verifier.py
  • packages/nvidia_nat_harbor/tests/test_inline_verifier_driver.py
  • packages/nvidia_nat_harbor/tests/test_library_mode_contracts.py
  • packages/nvidia_nat_harbor/tests/test_local_mode_runtime.py
  • packages/nvidia_nat_harbor/tests/test_nemo_agent_bridge.py
  • packages/nvidia_nat_harbor/upstream-plan.md

Comment thread packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent.py Outdated
Comment thread packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent.py Outdated
Comment thread packages/nvidia_nat_harbor/src/nat_harbor/verifier/inline_verifier.py Outdated
Comment thread packages/nvidia_nat_harbor/tests/test_library_mode_contracts.py Outdated
Comment thread packages/nvidia_nat_harbor/upstream-plan.md Outdated
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant