Add nvidia-nat-harbor for Harbor local and inline evaluation#1892
Add nvidia-nat-harbor for Harbor local and inline evaluation#1892AnuradhaKaruppiah wants to merge 34 commits intoNVIDIA:developfrom
Conversation
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughThis pull request introduces a new Harbor integration package ( Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
|
/ok to test eddf95a |
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
|
/ok to test 2f772fe |
|
@CodeRabbit review full |
|
✅ Actions performedFull review triggered. |
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
|
/ok to test 853b12d |
There was a problem hiding this comment.
Actionable comments posted: 13
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🟡 Minor comments (13)
packages/nvidia_nat_harbor/upstream-plan.md-141-142 (1)
141-142:⚠️ Potential issue | 🟡 MinorAvoid inanimate possessive wording in Markdown.
Please reword “Harbor's
nemo-agent…” to a non-possessive form (for example, “thenemo-agentcomponent of Harbor …”).As per coding guidelines: “Documentation in Markdown files should not contain usage of a possessive 's with inanimate objects.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/nvidia_nat_harbor/upstream-plan.md` around lines 141 - 142, Replace the possessive phrase "Harbor's `nemo-agent`" with a non-possessive construction such as "the `nemo-agent` component of Harbor" (or similar) in the upstream-plan.md content so the sentence reads non-possessively; locate the sentence containing “Harbor's `nemo-agent` owns library mode directly…” and update it accordingly.examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/tests/evaluate.py-29-54 (1)
29-54:⚠️ Potential issue | 🟡 MinorWrite verifier
details.jsonon all failure paths, not only after parsing succeeds.Line 29 through Line 43 can return early before any details artifact is emitted, which reduces debuggability for failed trials.
🔧 Suggested structure
+def _write_details(details: dict[str, object]) -> None: + Path("/logs/verifier").mkdir(parents=True, exist_ok=True) + Path("/logs/verifier/details.json").write_text(json.dumps(details, indent=2), encoding="utf-8") + def main() -> int: - ground_truth = _load_ground_truth() + details: dict[str, object] = {"passed": False, "candidate_values": []} + ground_truth = _load_ground_truth() answer_path = Path("/workspace/answer.txt") if not answer_path.exists(): print("Missing /workspace/answer.txt") + details["error"] = "missing_answer_file" + _write_details(details) return 1 @@ if not answer_text: print("Answer is empty") + details["error"] = "empty_answer" + _write_details(details) return 1 @@ if not candidate_values: print("No numeric values found in answer") + details["error"] = "no_numeric_values" + _write_details(details) return 1 @@ - Path("/logs/verifier").mkdir(parents=True, exist_ok=True) - Path("/logs/verifier/details.json").write_text(json.dumps(details, indent=2), encoding="utf-8") + _write_details(details) return 0 if passed else 1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/tests/evaluate.py` around lines 29 - 54, The verifier currently returns early from evaluate.py (checks around answer_path.exists(), empty answer_text, and no candidate_values) without emitting the details artifact; update the failure paths so that before each early return you build the same details dict (include fields: "expected_value" from ground_truth, "tolerance", "candidate_values" (empty list if not parsed), "passed": False, and an additional "error" or "reason" string like "missing answer file", "empty answer", or "no numeric values") and ensure Path("/logs/verifier").mkdir(parents=True, exist_ok=True) is called and Path("/logs/verifier/details.json").write_text(json.dumps(details, indent=2), encoding="utf-8") is executed so details.json is written on every exit path (reference variables/functions: answer_path, answer_text, expected_value, tolerance, candidate_values, details, _extract_numbers).examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/README.md-28-29 (1)
28-29:⚠️ Potential issue | 🟡 MinorReplace
NATacronym in prose.Line 28 uses “NAT examples” in Markdown text. Please spell this out as “NeMo Agent Toolkit examples” (acronym usage is only allowed inside package/code identifiers).
Suggested wording
-helpers so future NAT examples only define source-row loading, instruction +helpers so future NeMo Agent Toolkit examples only define source-row loading, instructionAs per coding guidelines: "Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/README.md` around lines 28 - 29, Replace the acronym "NAT" in the README sentence that currently reads "NAT examples" with the full name "NeMo Agent Toolkit examples"; locate the phrase in the helpers description sentence in harbor_adapters/README.md (the line containing "NAT examples") and update it so the prose uses "NeMo Agent Toolkit examples" instead of "NAT" to comply with the Markdown documentation guideline.packages/nvidia_nat_harbor/src/nat_harbor/environments/local.py-53-61 (1)
53-61:⚠️ Potential issue | 🟡 MinorResolve Ruff
B026: Restructuresuper().__init__call.Line 59 places
*argsafter keyword arguments, which violates RuffB026. Reorder the call to move*argsbefore keyword arguments (*argsmust precede any keyword arguments per PEP 3102).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/nvidia_nat_harbor/src/nat_harbor/environments/local.py` around lines 53 - 61, The super().__init__ call in LocalEnvironment's constructor currently places *args after keyword arguments which triggers Ruff B026; reorder the call so *args comes before the named keyword params and **kwargs comes last (i.e., call super().__init__(*args, environment_dir=..., environment_name=..., session_id=..., trial_paths=..., task_env_config=..., **kwargs)) to comply with PEP 3102.packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/__init__.py-21-25 (1)
21-25:⚠️ Potential issue | 🟡 MinorSort
__all__to satisfy RuffRUF022.
__all__is not sorted in alphabetical order and will fail linting checks.Suggested ordering
__all__ = [ + "NemoAgent", "is_local_install_allowed", - "NemoAgent", "resolve_local_install_policy", ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/__init__.py` around lines 21 - 25, The __all__ list is not alphabetized causing Ruff RUF022; open the module's __all__ definition and reorder the exported symbol names alphabetically (e.g., ensure entries for is_local_install_allowed, NemoAgent, resolve_local_install_policy are sorted by name) so the __all__ list is lexicographically sorted to satisfy the lint rule.packages/nvidia_nat_harbor/src/nat_harbor/environments/local.py-190-220 (1)
190-220:⚠️ Potential issue | 🟡 MinorAdd explicit
-> Noneannotations on public mutating coroutines.
stop,upload_file,upload_dir,download_file, anddownload_dirare missing explicit return type annotations. Thestartmethod in the same class demonstrates the correct pattern with-> None. Please add return type annotations to align with the requirement that all public APIs have Python 3.11+ type hints.Suggested signature updates
- async def stop(self, delete: bool): + async def stop(self, delete: bool) -> None: @@ - async def upload_file(self, source_path: Path | str, target_path: str): + async def upload_file(self, source_path: Path | str, target_path: str) -> None: @@ - async def upload_dir(self, source_dir: Path | str, target_dir: str): + async def upload_dir(self, source_dir: Path | str, target_dir: str) -> None: @@ - async def download_file(self, source_path: str, target_path: Path | str): + async def download_file(self, source_path: str, target_path: Path | str) -> None: @@ - async def download_dir(self, source_dir: str, target_dir: Path | str): + async def download_dir(self, source_dir: str, target_dir: Path | str) -> None:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/nvidia_nat_harbor/src/nat_harbor/environments/local.py` around lines 190 - 220, Add explicit return type annotations "-> None" to the public mutating async methods in the Local environment class: stop, upload_file, upload_dir, download_file, and download_dir (the same style as the existing start method). Update each async def signature to include "-> None" while keeping existing parameter types (e.g., Path | str and str) and ensure imports/type constructs remain unchanged.examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-nested-harbor-eval.yaml-23-45 (1)
23-45:⚠️ Potential issue | 🟡 MinorUse 4-space indentation for this YAML file.
Current nesting uses 2 spaces, but repo policy for
.yamlrequires 4 spaces.🧹 Suggested formatting update
function_groups: - calculator: - _type: calculator + calculator: + _type: calculator functions: - power_of_two: - _type: power_of_two - multiply_fn: calculator__multiply + power_of_two: + _type: power_of_two + multiply_fn: calculator__multiply llms: - nim_llm: - _type: nim - model_name: nvidia/nemotron-3-nano-30b-a3b - temperature: 0.0 - max_tokens: 1024 - chat_template_kwargs: - enable_thinking: false + nim_llm: + _type: nim + model_name: nvidia/nemotron-3-nano-30b-a3b + temperature: 0.0 + max_tokens: 1024 + chat_template_kwargs: + enable_thinking: false workflow: - _type: react_agent - tool_names: [power_of_two, calculator__multiply] - llm_name: nim_llm - verbose: true - parse_agent_response_max_retries: 3 + _type: react_agent + tool_names: [power_of_two, calculator__multiply] + llm_name: nim_llm + verbose: true + parse_agent_response_max_retries: 3As per coding guidelines, “Indent with 4 spaces, never tabs, and ensure every file ends with a single newline.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-nested-harbor-eval.yaml` around lines 23 - 45, The YAML file uses 2-space indentation but must follow the repo policy of 4-space indentation; update every nested block (sections like calculator, functions -> power_of_two, llms -> nim_llm, and workflow) to use 4-space indents, ensure keys like multiply_fn: calculator__multiply and workflow.tool_names include proper 4-space nesting, remove any tabs, and save the file with a single trailing newline at the end.examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/run_adapter.py-1-3 (1)
1-3:⚠️ Potential issue | 🟡 MinorUse the full Apache header template here as well.
Every other new source file in this PR includes the complete SPDX Apache-2.0 header block, but this script only has the two SPDX lines. Please copy the standard template for consistency and compliance. As per coding guidelines, "All source files must include the SPDX Apache-2.0 header template (copy from an existing file)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/run_adapter.py` around lines 1 - 3, Replace the current two-line SPDX snippet at the top of run_adapter.py with the full Apache-2.0 SPDX header template used across the repo (copy the standard multi-line header from any other new source file in this PR); ensure you keep the same file-level comment format, paste the full header block in place of the existing lines, and save—no other logic changes required in run_adapter.py.packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent_run_wrapper.py-53-79 (1)
53-79:⚠️ Potential issue | 🟡 MinorAlign the internal env vars with the repo-standard
NAT_naming.This wrapper reads
NVIDIA_NAT_LOG_LEVEL/NVIDIA_NAT_DEBUGPY_*, but the rest of the stack usesNAT_..., and this file even writes back toNAT_LOG_LEVEL. That means setting the standardNAT_LOG_LEVELalone will not affect the wrapper's own logging. As per coding guidelines, "Use 'nat' for the API namespace and CLI tool, 'nvidia-nat' for the package name, 'NAT_' prefix for environment variables, and 'NeMo-Agent-Toolkit' for URLs and directory names".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent_run_wrapper.py` around lines 53 - 79, Change the env var keys from the legacy "NVIDIA_NAT_*" to the repo-standard "NAT_*" in the logging setup and debugpy logic: update the top-level _log_level_str lookup to read os.environ.get("NAT_LOG_LEVEL", "WARNING").upper() (optionally falling back to "NVIDIA_NAT_LOG_LEVEL" if present for compatibility), change os.environ.setdefault to use "NAT_LOG_LEVEL", and update maybe_enable_debugpy to read "NAT_DEBUGPY_PORT", "NAT_DEBUGPY_HOST" and "NAT_DEBUGPY_WAIT_FOR_CLIENT" (again you may accept the NVIDIA_ prefixed names as a fallback). Make these edits in the module-level code that defines _log_level_str and in the maybe_enable_debugpy function so all references and side-effects use the NAT_* environment variable names.examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/run_adapter.py-11-20 (1)
11-20:⚠️ Potential issue | 🟡 MinorOnly retry the local import when the missing module is actually
adapter.
ModuleNotFoundErroralso fires for transitive imports insideadapter.py. In that case this fallback rewritessys.pathand retries instead of surfacing the real dependency error.Suggested fix
-try: - from adapter import DEFAULT_SOURCE_DATA - from adapter import SimpleCalculatorPowerOfTwoAdapter -except ModuleNotFoundError: +try: + from adapter import DEFAULT_SOURCE_DATA + from adapter import SimpleCalculatorPowerOfTwoAdapter +except ModuleNotFoundError as exc: + if exc.name != "adapter": + raise SCRIPT_DIR = Path(__file__).resolve().parent if str(SCRIPT_DIR) not in sys.path: sys.path.insert(0, str(SCRIPT_DIR)) from adapter import DEFAULT_SOURCE_DATA from adapter import SimpleCalculatorPowerOfTwoAdapter🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/run_adapter.py` around lines 11 - 20, The fallback import retry currently catches any ModuleNotFoundError (which can be raised for transitive imports) and masks the real error; modify the except block to catch ModuleNotFoundError as e and only perform the sys.path rewrite and re-import when e.name (or e.args[0]) equals "adapter" (i.e., the missing module is truly adapter); otherwise re-raise the exception so transitive import errors surface. Apply this change around the current import block that references DEFAULT_SOURCE_DATA and SimpleCalculatorPowerOfTwoAdapter and keep the existing SCRIPT_DIR / sys.path insert logic for the adapter-only case.packages/nvidia_nat_harbor/tests/test_nemo_agent_bridge.py-69-70 (1)
69-70:⚠️ Potential issue | 🟡 MinorRemove
@pytest.mark.asynciodecorators from async tests.Repository policy requires async tests to run via the configured async runner without explicit decorators. Per coding guidelines: "Async tests are automatically detected and run by the async runner—the decorator is unnecessary clutter."
Proposed fix
-@pytest.mark.asyncio async def test_install_skips_local_mode_by_default(tmp_path: Path) -> None:-@pytest.mark.asyncio async def test_install_allows_local_mode_with_opt_in(tmp_path: Path) -> None:-@pytest.mark.asyncio async def test_setup_installs_multiple_local_packages_in_order(tmp_path: Path) -> None:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/nvidia_nat_harbor/tests/test_nemo_agent_bridge.py` around lines 69 - 70, Remove the explicit pytest.mark.asyncio decorator from the async test function test_install_skips_local_mode_by_default; open the test definition (async def test_install_skips_local_mode_by_default(...)) and delete the preceding `@pytest.mark.asyncio` line so the test relies on the repository's configured async test runner instead.packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent.py-47-81 (1)
47-81:⚠️ Potential issue | 🟡 MinorRename the new env fallbacks to the
NAT_namespace.
NVIDIA_NAT_WORKFLOW_PACKAGES,HARBOR_LOCAL_INSTALL_POLICY, andNVIDIA_NAT_PYTHON_BINdo not follow the repo convention used elsewhere in this feature (NAT_HARBOR_...). Locking these names in now will make the Harbor bridge harder to document and keep consistent.As per coding guidelines, "Use 'nat' for the API namespace and CLI tool, 'nvidia-nat' for the package name, 'NAT_' prefix for environment variables, and 'NeMo-Agent-Toolkit' for URLs and directory names".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent.py` around lines 47 - 81, Update the env_fallbacks in the CLI_FLAGS list to use the NAT_ namespace: change NVIDIA_NAT_WORKFLOW_PACKAGES -> NAT_WORKFLOW_PACKAGES for the CliFlag "workflow_packages", change HARBOR_LOCAL_INSTALL_POLICY -> NAT_LOCAL_INSTALL_POLICY for the CliFlag "local_install_policy", and change NVIDIA_NAT_PYTHON_BIN -> NAT_PYTHON_BIN for the CliFlag "python_bin"; keep the CliFlag names and other properties unchanged so only the env_fallback strings in CLI_FLAGS are renamed to follow the NAT_ convention.examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_nested/adapter.py-47-104 (1)
47-104:⚠️ Potential issue | 🟡 MinorAdd the missing public-API docs and
__init__return type.
SimpleCalculatorNestedAdapter.__init__,make_local_task_id(),list_available_tasks(),generate_task(), andgenerate_many()are all public entry points in a new package, but they ship without the required Google-style docstrings, and__init__is missing-> None.As per coding guidelines, "All public APIs require Python 3.11+ type hints on parameters and return values" and "Provide Google-style docstrings for every public module, class, function and CLI command".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_nested/adapter.py` around lines 47 - 104, Add Google-style docstrings to the public module, the SimpleCalculatorNestedAdapter class, and its public methods (__init__, make_local_task_id, list_available_tasks, generate_task, generate_many) describing parameters, return values, and behavior; also update the __init__ signature to include the explicit return type "-> None". Ensure the docstrings follow the project's Google-style format and cover parameter types (task_dir, source_file, source_id, local_task_id, overwrite) and what each method returns (make_local_task_id -> str, list_available_tasks -> list[str], generate_task -> bool, generate_many -> tuple[int, int]) so the public API is fully documented.
🧹 Nitpick comments (2)
examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/adapter.py (1)
47-47: Tighten public API typing on constructor and collection parameters.Please add
-> Noneto__init__and preferSequence[str]forgenerate_manyinput to avoid over-constraining callers.As per coding guidelines, "All public APIs require Python 3.11+ type hints on parameters and return values" and "Prefer collections.abc / typing abstractions (Sequence over list)".Proposed fix
+from collections.abc import Sequence @@ - def __init__(self, task_dir: Path, source_file: Path | None = None): + def __init__(self, task_dir: Path, source_file: Path | None = None) -> None: @@ - def generate_many(self, source_ids: list[str], *, overwrite: bool = True) -> tuple[int, int]: + def generate_many(self, source_ids: Sequence[str], *, overwrite: bool = True) -> tuple[int, int]:Also applies to: 91-91
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/adapter.py` at line 47, Update the public API type hints: add an explicit return annotation "-> None" to the constructor definition (__init__) and change the collection parameter type on generate_many from a concrete list/other to typing.Sequence[str] (import Sequence from collections.abc or typing as needed); ensure the new annotations are applied to both occurrences referenced in the diff so the public API uses Python 3.11+ return hints and an abstract sequence type for inputs.examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator/adapter.py (1)
47-47: Apply the same public typing cleanup here for consistency.
__init__should declare-> None, andgenerate_manyshould acceptSequence[str]to keep the API flexible.As per coding guidelines, "All public APIs require Python 3.11+ type hints on parameters and return values" and "Prefer collections.abc / typing abstractions (Sequence over list)".
Also applies to: 93-93
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator/adapter.py` at line 47, Update the public method type hints: change the constructor signature for __init__ to declare a return type of None (def __init__(self, task_dir: Path, source_file: Path | None = None) -> None) and modify generate_many to accept a Sequence[str] instead of a concrete list type (e.g., def generate_many(self, inputs: Sequence[str]) -> ...), using typing collections.abc abstractions to satisfy the public API typing guidelines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/common.py`:
- Around line 76-78: Before performing the division assign, add a guard that
checks if rhs == 0 to avoid ZeroDivisionError: if rhs is zero, set expected to
None (or another sentinel) and return operation, lhs, rhs, expected (or raise a
clear ValueError with a descriptive message) instead of computing lhs / rhs;
update the branch where expected is computed (the code that sets expected = lhs
/ rhs and returns operation, lhs, rhs, expected) to perform this check and
handle the zero case gracefully using the existing variables operation, lhs,
rhs, expected.
In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_nested/adapter.py`:
- Around line 81-90: The generate_task method currently constructs output_dir
from local_task_id and may rmtree it without validating the input; ensure
local_task_id cannot escape self.task_dir by validating/sanitizing before any
delete: convert local_task_id to a Path, reject absolute paths and any path
containing ".." (or resolve and assert
output_dir.is_relative_to(self.task_dir)), and only proceed with
shutil.rmtree/output_dir.mkdir if the resolved output_dir is inside
self.task_dir; update generate_task to raise an error on invalid local_task_id
rather than performing any filesystem operations.
In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/environment/Dockerfile`:
- Around line 4-12: The Dockerfile runs as root and installs packages without
--no-install-recommends; create a non-root user and switch to it, tighten
apt-get flags, and minimize image footprint: update the RUN apt-get command in
the Dockerfile to use apt-get install --no-install-recommends -y and remove
unnecessary packages, create a dedicated user (e.g., "appuser") and group,
ensure WORKDIR ownership is changed (chown) to that user, and add a USER
instruction to run the container as the non-root user while keeping CMD
["/bin/bash"].
In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor-eval-readme.md`:
- Around line 23-34: Update the README prose to conform to Markdown rules:
replace every occurrence of the acronym "NAT" with "NeMo Agent Toolkit", remove
possessive forms like "Harbor's verifier import hook" (e.g., change to "the
Harbor verifier import hook" or "Harbor verifier import hook"), change heading
text "Builtin" to "built-in", and ensure other inanimate possessive uses are
rewritten; keep the technical references intact (NemoAgent, --model, --ak
config_file=..., llms, nat_harbor.environments.local:LocalEnvironment / --env
docker, tests/test.sh, --ak library_mode=true,
nat_harbor.verifier.inline_verifier:ATIFInlineVerifier) while applying these
wording changes throughout lines ~98-165 and the shown section.
In
`@examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/scripts/harbor_sample_evaluators.py`:
- Line 22: The public function artifact_presence_evaluator currently leaves the
atif_samples parameter untyped; add a 3.11+ type hint (e.g., atif_samples:
list[dict[str, Any]] or Sequence[dict[str, Any]]) to the function signature and
ensure Any/Sequence are imported (or use built-in generic types) so the
signature reads something like artifact_presence_evaluator(atif_samples:
list[dict[str, Any]], artifact_path: str | None = None) -> dict[str, Any];
update the imports to include typing symbols if needed and keep the existing
return type intact.
In `@packages/nvidia_nat_harbor/README.md`:
- Around line 20-29: The README uses forbidden shorthand and possessive forms;
update all plain "NAT" occurrences to "NeMo Agent Toolkit" (e.g., change
"NAT-backed Harbor agent (NemoAgent)" to "NeMo Agent Toolkit-backed Harbor agent
(NemoAgent)" and any other "NAT" mentions) and remove possessive "'s" on
inanimate objects (e.g., replace "Harbor's verifier import hook" with "the
verifier import hook for Harbor" or "the verifier import hook used by Harbor");
apply the same normalization to all affected sections (notably the excerpt
around "nvidia-nat-harbor", the bullet list entries including "NAT-backed Harbor
agent (NemoAgent)", "inline ATIF verifier class for Harbor verifier import
hooks", and all similar phrases in the README between the noted ranges) so
wording conforms to the docs rules.
In
`@packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent_run_wrapper.py`:
- Around line 172-203: The SessionManager is not guaranteed to be shut down on
exceptions; wrap the lifecycle so that after creating session_manager via
SessionManager.create (and before exiting the function) you always call await
session_manager.shutdown() in a finally block — e.g., create session_manager,
then use try: with WorkflowBuilder.from_config(...): ... await runner.result()
... finally: await session_manager.shutdown(); ensure any intermediate_task
handling (pull_intermediate, _write_trajectory) and session_manager.session /
session.run blocks remain inside the try so shutdown runs on all error paths.
- Around line 206-227: The current __main__ block manually scans sys.argv and
strips "--trajectory-output" which corrupts free-form instruction text; replace
that logic with argparse: define a positional "config_path" argument, a
positional "instruction" argument using nargs="+" (to capture the whole prompt
as a list), and an optional "--trajectory-output" (or "--trajectory_output")
that stores into trajectory_path, then join the instruction list into a single
string and call asyncio.run(main(config_path, instruction_str,
trajectory_path)); update the code paths around the existing trajectory_path
variable and the call to main to use the parsed values.
In `@packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent.py`:
- Around line 47-81: Convert the mutable CLI_FLAGS list to an immutable tuple by
replacing the list literal with a tuple while preserving the unpack of
HarborNemoAgent.CLI_FLAGS and all CliFlag entries; also rename the env_fallback
value for the "local_install_policy" CliFlag from "HARBOR_LOCAL_INSTALL_POLICY"
to "NAT_HARBOR_LOCAL_INSTALL_POLICY" to follow the NAT_ prefix convention so the
symbol CLI_FLAGS (and the specific CliFlag "local_install_policy") are updated
accordingly.
- Around line 206-212: The current implementation resolves the API key via
_resolve_api_key(), injects it into _generate_config_yaml(), and writes that
YAML into logs_dir (config_path), which risks persisting secrets; instead change
the flow so the generated config with secrets is never written to logs_dir:
generate the config in a secure temporary location (e.g.,
tempfile.TemporaryDirectory) or avoid embedding the key in the file by setting
the credential via environment variable or an OS-level secret (export into
process env) and have _generate_config_yaml() produce a redacted file without
secrets; ensure _generate_config_yaml, _resolve_api_key and the code that writes
config_path are updated so only a redacted config (no API key) is written to
logs_dir or the real config is written to a temp secure path and returned as its
secure path, and add explicit redaction of any API key substring before any
write to logs_dir.
In `@packages/nvidia_nat_harbor/src/nat_harbor/verifier/inline_verifier.py`:
- Around line 102-105: The _write_details method currently calls
json.dumps(details, indent=2) which will raise TypeError for non-JSON-native
objects (Path, datetime, Pydantic models) and cause verifier failure; update
_write_details to robustly serialize details_path by replacing json.dumps with a
safe serializer (e.g., json.dumps(details, indent=2, default=str) or a small
recursive converter that calls .dict() on Pydantic models and str() on
Path/datetime) and keep writing to details_path with encoding="utf-8" so
details.json always gets a readable representation of the evaluator payload even
when it contains non-JSON types.
In `@packages/nvidia_nat_harbor/tests/test_library_mode_contracts.py`:
- Line 35: Replace the hardcoded "/tmp/config.yml" strings (causing Ruff S108)
by using pytest's tmp_path fixture to create a temporary config path and, if
needed, write the config file there; update the calls that pass
config_file="/tmp/config.yml" (both occurrences around the config_file argument)
to pass config_file=str(tmp_path / "config.yml") and ensure the test function
signature includes tmp_path and any setup writes the expected contents to
tmp_path / "config.yml".
In `@packages/nvidia_nat_harbor/upstream-plan.md`:
- Around line 20-23: Replace all prose occurrences of the acronym "NAT" with the
full name "NeMo Agent Toolkit" throughout the document; leave lowercase `nat`
only when it is part of code/package identifiers and ensure those identifiers
are wrapped in backticks (for example keep `nvidia-nat-harbor` but change any
plain NAT references to NeMo Agent Toolkit). Search for bare "NAT" tokens
(including headings and inline text) and update them to "NeMo Agent Toolkit",
verify package or code identifiers containing nat remain lowercase and
backticked, and ensure no remaining uppercase "NAT" exists in prose anywhere in
the file.
---
Minor comments:
In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/README.md`:
- Around line 28-29: Replace the acronym "NAT" in the README sentence that
currently reads "NAT examples" with the full name "NeMo Agent Toolkit examples";
locate the phrase in the helpers description sentence in
harbor_adapters/README.md (the line containing "NAT examples") and update it so
the prose uses "NeMo Agent Toolkit examples" instead of "NAT" to comply with the
Markdown documentation guideline.
In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_nested/adapter.py`:
- Around line 47-104: Add Google-style docstrings to the public module, the
SimpleCalculatorNestedAdapter class, and its public methods (__init__,
make_local_task_id, list_available_tasks, generate_task, generate_many)
describing parameters, return values, and behavior; also update the __init__
signature to include the explicit return type "-> None". Ensure the docstrings
follow the project's Google-style format and cover parameter types (task_dir,
source_file, source_id, local_task_id, overwrite) and what each method returns
(make_local_task_id -> str, list_available_tasks -> list[str], generate_task ->
bool, generate_many -> tuple[int, int]) so the public API is fully documented.
In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/run_adapter.py`:
- Around line 1-3: Replace the current two-line SPDX snippet at the top of
run_adapter.py with the full Apache-2.0 SPDX header template used across the
repo (copy the standard multi-line header from any other new source file in this
PR); ensure you keep the same file-level comment format, paste the full header
block in place of the existing lines, and save—no other logic changes required
in run_adapter.py.
- Around line 11-20: The fallback import retry currently catches any
ModuleNotFoundError (which can be raised for transitive imports) and masks the
real error; modify the except block to catch ModuleNotFoundError as e and only
perform the sys.path rewrite and re-import when e.name (or e.args[0]) equals
"adapter" (i.e., the missing module is truly adapter); otherwise re-raise the
exception so transitive import errors surface. Apply this change around the
current import block that references DEFAULT_SOURCE_DATA and
SimpleCalculatorPowerOfTwoAdapter and keep the existing SCRIPT_DIR / sys.path
insert logic for the adapter-only case.
In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/tests/evaluate.py`:
- Around line 29-54: The verifier currently returns early from evaluate.py
(checks around answer_path.exists(), empty answer_text, and no candidate_values)
without emitting the details artifact; update the failure paths so that before
each early return you build the same details dict (include fields:
"expected_value" from ground_truth, "tolerance", "candidate_values" (empty list
if not parsed), "passed": False, and an additional "error" or "reason" string
like "missing answer file", "empty answer", or "no numeric values") and ensure
Path("/logs/verifier").mkdir(parents=True, exist_ok=True) is called and
Path("/logs/verifier/details.json").write_text(json.dumps(details, indent=2),
encoding="utf-8") is executed so details.json is written on every exit path
(reference variables/functions: answer_path, answer_text, expected_value,
tolerance, candidate_values, details, _extract_numbers).
In
`@examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-nested-harbor-eval.yaml`:
- Around line 23-45: The YAML file uses 2-space indentation but must follow the
repo policy of 4-space indentation; update every nested block (sections like
calculator, functions -> power_of_two, llms -> nim_llm, and workflow) to use
4-space indents, ensure keys like multiply_fn: calculator__multiply and
workflow.tool_names include proper 4-space nesting, remove any tabs, and save
the file with a single trailing newline at the end.
In `@packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/__init__.py`:
- Around line 21-25: The __all__ list is not alphabetized causing Ruff RUF022;
open the module's __all__ definition and reorder the exported symbol names
alphabetically (e.g., ensure entries for is_local_install_allowed, NemoAgent,
resolve_local_install_policy are sorted by name) so the __all__ list is
lexicographically sorted to satisfy the lint rule.
In
`@packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent_run_wrapper.py`:
- Around line 53-79: Change the env var keys from the legacy "NVIDIA_NAT_*" to
the repo-standard "NAT_*" in the logging setup and debugpy logic: update the
top-level _log_level_str lookup to read os.environ.get("NAT_LOG_LEVEL",
"WARNING").upper() (optionally falling back to "NVIDIA_NAT_LOG_LEVEL" if present
for compatibility), change os.environ.setdefault to use "NAT_LOG_LEVEL", and
update maybe_enable_debugpy to read "NAT_DEBUGPY_PORT", "NAT_DEBUGPY_HOST" and
"NAT_DEBUGPY_WAIT_FOR_CLIENT" (again you may accept the NVIDIA_ prefixed names
as a fallback). Make these edits in the module-level code that defines
_log_level_str and in the maybe_enable_debugpy function so all references and
side-effects use the NAT_* environment variable names.
In `@packages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent.py`:
- Around line 47-81: Update the env_fallbacks in the CLI_FLAGS list to use the
NAT_ namespace: change NVIDIA_NAT_WORKFLOW_PACKAGES -> NAT_WORKFLOW_PACKAGES for
the CliFlag "workflow_packages", change HARBOR_LOCAL_INSTALL_POLICY ->
NAT_LOCAL_INSTALL_POLICY for the CliFlag "local_install_policy", and change
NVIDIA_NAT_PYTHON_BIN -> NAT_PYTHON_BIN for the CliFlag "python_bin"; keep the
CliFlag names and other properties unchanged so only the env_fallback strings in
CLI_FLAGS are renamed to follow the NAT_ convention.
In `@packages/nvidia_nat_harbor/src/nat_harbor/environments/local.py`:
- Around line 53-61: The super().__init__ call in LocalEnvironment's constructor
currently places *args after keyword arguments which triggers Ruff B026; reorder
the call so *args comes before the named keyword params and **kwargs comes last
(i.e., call super().__init__(*args, environment_dir=..., environment_name=...,
session_id=..., trial_paths=..., task_env_config=..., **kwargs)) to comply with
PEP 3102.
- Around line 190-220: Add explicit return type annotations "-> None" to the
public mutating async methods in the Local environment class: stop, upload_file,
upload_dir, download_file, and download_dir (the same style as the existing
start method). Update each async def signature to include "-> None" while
keeping existing parameter types (e.g., Path | str and str) and ensure
imports/type constructs remain unchanged.
In `@packages/nvidia_nat_harbor/tests/test_nemo_agent_bridge.py`:
- Around line 69-70: Remove the explicit pytest.mark.asyncio decorator from the
async test function test_install_skips_local_mode_by_default; open the test
definition (async def test_install_skips_local_mode_by_default(...)) and delete
the preceding `@pytest.mark.asyncio` line so the test relies on the repository's
configured async test runner instead.
In `@packages/nvidia_nat_harbor/upstream-plan.md`:
- Around line 141-142: Replace the possessive phrase "Harbor's `nemo-agent`"
with a non-possessive construction such as "the `nemo-agent` component of
Harbor" (or similar) in the upstream-plan.md content so the sentence reads
non-possessively; locate the sentence containing “Harbor's `nemo-agent` owns
library mode directly…” and update it accordingly.
---
Nitpick comments:
In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/adapter.py`:
- Line 47: Update the public API type hints: add an explicit return annotation
"-> None" to the constructor definition (__init__) and change the collection
parameter type on generate_many from a concrete list/other to
typing.Sequence[str] (import Sequence from collections.abc or typing as needed);
ensure the new annotations are applied to both occurrences referenced in the
diff so the public API uses Python 3.11+ return hints and an abstract sequence
type for inputs.
In
`@examples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator/adapter.py`:
- Line 47: Update the public method type hints: change the constructor signature
for __init__ to declare a return type of None (def __init__(self, task_dir:
Path, source_file: Path | None = None) -> None) and modify generate_many to
accept a Sequence[str] instead of a concrete list type (e.g., def
generate_many(self, inputs: Sequence[str]) -> ...), using typing collections.abc
abstractions to satisfy the public API typing guidelines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b6a03ab8-8699-4059-a6f6-ae6e62719273
⛔ Files ignored due to path filters (1)
packages/nvidia_nat_harbor/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (44)
examples/evaluation_and_profiling/simple_calculator_eval/harbor-eval-readme.mdexamples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/README.mdexamples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/__init__.pyexamples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/common.pyexamples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator/adapter.pyexamples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator/run_adapter.pyexamples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_nested/adapter.pyexamples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_nested/run_adapter.pyexamples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/adapter.pyexamples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/simple_calculator_power_of_two/run_adapter.pyexamples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/environment/Dockerfileexamples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/instruction.mdexamples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/solution/solve.shexamples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/task.tomlexamples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/tests/evaluate.pyexamples/evaluation_and_profiling/simple_calculator_eval/harbor_adapters/template/tests/test.shexamples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-nested-harbor-eval.yamlexamples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/scripts/harbor_sample_evaluators.pypackages/nvidia_nat_harbor/README.mdpackages/nvidia_nat_harbor/pyproject.tomlpackages/nvidia_nat_harbor/src/nat_harbor/__init__.pypackages/nvidia_nat_harbor/src/nat_harbor/agents/__init__.pypackages/nvidia_nat_harbor/src/nat_harbor/agents/installed/__init__.pypackages/nvidia_nat_harbor/src/nat_harbor/agents/installed/inline_runner.pypackages/nvidia_nat_harbor/src/nat_harbor/agents/installed/library_mode.pypackages/nvidia_nat_harbor/src/nat_harbor/agents/installed/local_install_policy.pypackages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent.pypackages/nvidia_nat_harbor/src/nat_harbor/agents/installed/nemo_agent_run_wrapper.pypackages/nvidia_nat_harbor/src/nat_harbor/agents/installed/policy.pypackages/nvidia_nat_harbor/src/nat_harbor/environments/__init__.pypackages/nvidia_nat_harbor/src/nat_harbor/environments/local.pypackages/nvidia_nat_harbor/src/nat_harbor/meta/pypi.mdpackages/nvidia_nat_harbor/src/nat_harbor/verifier/__init__.pypackages/nvidia_nat_harbor/src/nat_harbor/verifier/bridge_runner.pypackages/nvidia_nat_harbor/src/nat_harbor/verifier/evaluator_adapter.pypackages/nvidia_nat_harbor/src/nat_harbor/verifier/inline_verifier.pypackages/nvidia_nat_harbor/tests/test_atif_bridge_runner.pypackages/nvidia_nat_harbor/tests/test_harbor_adapter_template.pypackages/nvidia_nat_harbor/tests/test_harbor_inline_verifier.pypackages/nvidia_nat_harbor/tests/test_inline_verifier_driver.pypackages/nvidia_nat_harbor/tests/test_library_mode_contracts.pypackages/nvidia_nat_harbor/tests/test_local_mode_runtime.pypackages/nvidia_nat_harbor/tests/test_nemo_agent_bridge.pypackages/nvidia_nat_harbor/upstream-plan.md
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Description
Adds
nvidia-nat-harbor, a NAT/Harbor integration package for running NAT workflows in Harbor evaluation jobs.This PR supports:
NemoAgentintegration for NAT workflow configsLocalEnvironmenttests/test.shverifier pathsDocs
Start here:
packages/nvidia_nat_harbor/README.mdexamples/evaluation_and_profiling/simple_calculator_eval/harbor-eval-readme.mdpackages/nvidia_nat_harbor/upstream-plan.mdUpstream Dependencies
This PR currently depends on Harbor-side verifier import-hook support:
VerifierFactoryVerifierConfig.import_pathVerifierConfig.kwargs--verifier-import-path--verifier-kwargUntil that lands in a Harbor release, the docs use the Harbor side branch described in the README.
First-class
--env localis also not yet accepted by Harbor CLI enum validation, so the current local-mode workaround is:--env docker--environment-import-path nat_harbor.environments.local:LocalEnvironmentValidation
Unit tests:
Latest result: 36 passed.
E2E commands from harbor-eval-readme.md were run successfully:
dataset generation: 10/10 tasks generated
shell-mode smoke: mean 1.000, zero exceptions
full shell-mode local job: 10/10, mean 1.000, zero exceptions
builtin trajectory inline verifier: mean 1.000, zero exceptions
builtin tunable-rag inline verifier: mean 1.000, zero exceptions
custom evaluator-ref inline verifier: mean 1.000, zero exceptions
By Submitting this PR I confirm:
Summary by CodeRabbit
New Features
Documentation
Tests