fix(cache-middleware): bound cache size and enforce safe fuzzy threshold by ColinM-sys · Pull Request #1879 · NVIDIA/NeMo-Agent-Toolkit

ColinM-sys · 2026-04-16T22:45:53Z

Summary

`CacheMiddleware` previously had two hardening gaps:

Unbounded cache (dict-backed) — sustained unique inputs grew the cache without limit, a memory-exhaustion DoS surface, and combined with fuzzy matching a long-lived surface for cross-request confusion.
Unrestricted `similarity_threshold` — any float in [0, 1] was accepted. Low values make cache poisoning trivial: a crafted input whose difflib ratio exceeds the threshold against a legitimate cached key hijacks the legitimate user's response. At 0.5, most short/structured inputs collide on first try.

What changed

Back the cache with an `OrderedDict`. Evict the oldest entry (LRU) when the cache exceeds `max_entries`. A cache hit `move_to_end()`s the entry so frequently-useful results are preferred on eviction.
Add `max_entries` parameter (default 1024, must be positive int).
Reject `similarity_threshold < 0.85` at construct time with an actionable error. 1.0 remains recommended.
Non-numeric threshold values now raise instead of failing later.

CWE

CWE-400 — Uncontrolled Resource Consumption
CWE-345 — Insufficient Verification of Data Authenticity (cache-key confusion)

Breaking change

Configs that pass `similarity_threshold < 0.85` will fail at construct time. Intentional — any such value is a cache-poisoning foot-gun. The error message points at the two safe choices:

Use `1.0` (exact match, recommended)
Use `>= 0.85` (fuzzy, reasonable edit tolerance)

Test plan

Existing tests at sub-0.85 thresholds were updated to 0.85/0.9 so they keep exercising fuzzy mode without tripping the new floor.

New coverage in two classes:

`TestSimilarityThresholdFloor`:

Below-floor values (0.0, 0.3, 0.5, 0.7, 0.84) rejected with "outside the safe range"
At-or-above-floor values (0.85, 0.9, 0.95, 1.0) accepted
Threshold > 1.0 rejected
Non-numeric threshold rejected with "must be a number"

`TestMaxEntriesLruEviction`:

Default `max_entries` is positive
Zero or negative `max_entries` rejected
Cache stays at `max_entries` after sustained unique inserts (size = 3 after 10 inserts with max_entries=3; newest 3 retained)
Cache hit promotes entry to MRU: A→B→C, hit A, insert D, B (oldest) is evicted; A and D remain

Syntax-checked locally; pytest deferred to CI (my env is missing `pytest-asyncio`).

🤖 Generated with Claude Code

Signed-off-by: Colin McDonough cmcdonough@50words.com

Summary by CodeRabbit

New Features
- Cache gains a configurable max-entries limit (default 1024) with LRU eviction and promotion on hits.
- Configuration exposes a new max_entries field.
Improvements
- Fuzzy-match logic refined for more consistent matching and clearer insertion logging.
- Various dependency and packaging constraints updated.
Tests
- Added tests covering LRU eviction and adjusted similarity-threshold expectations.
Documentation
- Docs and notebooks updated to version 1.7 links.
Chores
- CI/workflow pins and build/script logging simplified; small container image and tooling updates.

CacheMiddleware previously: 1. Used an unbounded dict for storage — sustained unique inputs would grow the cache without limit, a memory-exhaustion DoS surface. 2. Accepted any similarity_threshold in [0, 1]. Low values make cache poisoning trivial: a crafted input whose difflib ratio exceeds the threshold against a legitimate cached key hijacks the legitimate user's response. At threshold=0.5, most short/structured inputs collide on first try. Changes: - Back the cache with an OrderedDict. Evict the oldest entry (LRU) whenever the cache exceeds max_entries. A cache hit moves the entry to the MRU end so frequently-useful results are preferred on eviction. - Add max_entries to __init__ (default 1024, must be a positive int). - Reject similarity_threshold values below 0.85 at construct time with an actionable error. 1.0 remains recommended (exact match only). - Non-numeric threshold values now raise instead of failing later. Breaking change: existing configs that pass similarity_threshold < 0.85 will fail at construct time. This is intentional — any such value is a cache-poisoning foot-gun. The error message points at the two safe choices (use 1.0, or use a value >= 0.85). Test updates: - test_similarity_computation_for_different_thresholds / _fuzzy_match_caching / test_multiple_similar_entries / test_custom_initialization: bumped from 0.5/0.7/0.8 to values >= 0.85 so they exercise fuzzy matching without tripping the new floor. - Added TestSimilarityThresholdFloor: * below-floor values (0.0, 0.3, 0.5, 0.7, 0.84) are rejected * at/above-floor values (0.85, 0.9, 0.95, 1.0) are accepted * threshold > 1.0 is rejected * non-numeric threshold is rejected - Added TestMaxEntriesLruEviction: * default max_entries is positive * zero or negative max_entries is rejected * cache stays at max_entries under sustained unique input * hit promotes entry to MRU; later insert evicts the older one CWE-400 (resource exhaustion) + CWE-345 (insufficient data authenticity / cache-key confusion). Signed-off-by: ColinM-sys <cmcdonough@50words.com>

copy-pr-bot · 2026-04-16T22:45:58Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-16T22:46:08Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

CacheMiddleware now uses an LRU-bounded OrderedDict with a configurable max_entries (default 1024), switches fuzzy matching to difflib.get_close_matches(...), promotes hits with move_to_end, and evicts least-recently-used entries when capacity is exceeded. Constructor gains a max_entries parameter.

Changes

Cohort / File(s)	Summary
Core Cache Middleware Implementation `packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py`	Introduce `_DEFAULT_MAX_CACHE_ENTRIES = 1024`; constructor accepts `max_entries` and stores `_max_entries`. Replace plain `dict` with `OrderedDict[str, Any]`. Replace per-key `SequenceMatcher` scanning with `difflib.get_close_matches(..., n=1, cutoff=similarity_threshold)`. On cache hit call `move_to_end(similar_key)`; on insert move new key to MRU and evict LRU entries via `popitem(last=False)` until `len(self._cache) <= self._max_entries`. Update insertion debug logging to include `size=%d/%d`.
Cache Middleware Config `packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py`	Add `max_entries: int` field (default `_DEFAULT_MAX_CACHE_ENTRIES`, `ge=1`) and import `_DEFAULT_MAX_CACHE_ENTRIES`. Clarify `similarity_threshold` description/validation and note difflib performance/collision considerations.
Middleware Registration `packages/nvidia_nat_core/src/nat/middleware/cache/register.py`	Pass `config.max_entries` into `CacheMiddleware(...)` when constructing the middleware (constructor call expanded to include the new parameter).
Cache Middleware Tests `packages/nvidia_nat_core/tests/nat/middleware/test_cache_middleware.py`	Adjust test thresholds (e.g., 0.8→0.9, etc.) to match changed fuzzy-match behavior. Add `TestMaxEntriesLruEviction` tests covering default `_max_entries` positivity, LRU eviction on overflow, and MRU promotion on cache hits. Update expected similarity comments accordingly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 51.85% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main changes: bounding cache size and enforcing safe fuzzy threshold, which directly aligns with the PR's primary objectives of addressing CWE-400 (unbounded cache) and CWE-345 (cache-poisoning via unrestricted similarity_threshold).
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py`:
- Around line 86-105: The config schema in cache_middleware_config.py is out of
sync with CacheMiddleware.__init__: update the schema to enforce
similarity_threshold >= 0.85 (or reference _MIN_FUZZY_THRESHOLD if exported)
instead of [0.0,1.0], add a max_entries field with validation (int >=1) and
default matching _DEFAULT_MAX_CACHE_ENTRIES, and update the docstrings/comments
for both similarity_threshold and max_entries to mirror the constructor
semantics so config-driven instantiation cannot pass invalid values; touch the
model class/fields in cache_middleware_config.py and adjust any validators or
examples to match CacheMiddleware.__init__ and the constants used there.
- Around line 111-122: The validation for similarity_threshold should explicitly
reject booleans and non-finite floats: update the checks around
similarity_threshold (which currently reference _MIN_FUZZY_THRESHOLD) to first
reject isinstance(similarity_threshold, bool), then ensure it's an int or float,
and use math.isfinite(similarity_threshold) to reject NaN/inf before doing the
range comparison; adjust the ValueError messages accordingly. Also consider
applying the same bool-rejection to max_entries (which is currently validated as
an int) so True/False cannot pass as valid integers.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 1b52b49a-f114-47f3-9289-01aca9986fbd

📥 Commits

Reviewing files that changed from the base of the PR and between a328e8a and 24de910.

📒 Files selected for processing (2)

packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py
packages/nvidia_nat_core/tests/nat/middleware/test_cache_middleware.py

…resholds Per CodeRabbit on NVIDIA#1879: 1. cache_middleware_config.py was out of sync with the new constructor semantics — schema allowed similarity_threshold in [0, 1] and had no max_entries field at all, so config-driven instantiation could pass values the constructor now rejects. Re-export _MIN_FUZZY_THRESHOLD and _DEFAULT_MAX_CACHE_ENTRIES from the module and wire both constants into the schema: similarity_threshold: ge=_MIN_FUZZY_THRESHOLD, le=1.0 max_entries: ge=1, default=_DEFAULT_MAX_CACHE_ENTRIES Docstrings mirror the constructor's rationale (cache-poisoning for threshold, DoS for max_entries). 2. Constructor validation did not reject `bool` (since isinstance(True, int) is True in Python) or non-finite floats (NaN / +inf / -inf). Both are classic config-bug foot-guns: a boolean silently becoming 0 or 1.0, or a parser that hands back NaN from an upstream source, would slip past the range comparison. Add explicit: - bool rejection on similarity_threshold (before the number check) - math.isfinite() check on similarity_threshold - bool rejection on max_entries New tests: - test_threshold_bool_is_rejected (parametrized True / False) - test_threshold_non_finite_is_rejected (parametrized NaN / +inf / -inf) - test_bool_max_entries_is_rejected (parametrized True / False) Signed-off-by: ColinM-sys <cmcdonough@50words.com>

coderabbitai

♻️ Duplicate comments (1)

packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py (1)
59-65: ⚠️ Potential issue | 🟠 Major

Wire max_entries through the config builder.

CacheMiddlewareConfig.max_entries is public here, but packages/nvidia_nat_core/src/nat/middleware/cache/register.py:22-33 still instantiates CacheMiddleware with only enabled_mode and similarity_threshold. Any configured max_entries value is silently ignored, so config-driven deployments always fall back to 1024.
🔧 Suggested follow-up
--- a/packages/nvidia_nat_core/src/nat/middleware/cache/register.py
+++ b/packages/nvidia_nat_core/src/nat/middleware/cache/register.py
@@
-    yield CacheMiddleware(enabled_mode=config.enabled_mode, similarity_threshold=config.similarity_threshold)
+    yield CacheMiddleware(
+        enabled_mode=config.enabled_mode,
+        similarity_threshold=config.similarity_threshold,
+        max_entries=config.max_entries,
+    )
As per coding guidelines, "Keep documentation in sync with code" and "packages/nvidia_nat_core/src/nat/**/*: Changes in packages/nvidia_nat_core/src/nat/**/* should prioritize backward compatibility as this contains core functionality."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py`
around lines 59 - 65, CacheMiddlewareConfig.max_entries is never passed into the
CacheMiddleware instance created in the registration/builder, so any configured
max_entries is ignored; update the instantiation in register.py (where
CacheMiddleware(...) is created with enabled_mode and similarity_threshold) to
also pass config.max_entries (or the corresponding builder argument) into the
CacheMiddleware constructor, and ensure CacheMiddleware's __init__ or factory
accepts and uses that parameter to set the LRU capacity instead of always using
the default.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py`:
- Around line 59-65: CacheMiddlewareConfig.max_entries is never passed into the
CacheMiddleware instance created in the registration/builder, so any configured
max_entries is ignored; update the instantiation in register.py (where
CacheMiddleware(...) is created with enabled_mode and similarity_threshold) to
also pass config.max_entries (or the corresponding builder argument) into the
CacheMiddleware constructor, and ensure CacheMiddleware's __init__ or factory
accepts and uses that parameter to set the LRU capacity instead of always using
the default.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: f69bb38a-9df8-4903-97a2-908e8482841d

📥 Commits

Reviewing files that changed from the base of the PR and between 24de910 and 9f60ef6.

📒 Files selected for processing (3)

packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py
packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py
packages/nvidia_nat_core/tests/nat/middleware/test_cache_middleware.py

✅ Files skipped from review due to trivial changes (1)

packages/nvidia_nat_core/tests/nat/middleware/test_cache_middleware.py

Per CodeRabbit on NVIDIA#1879: CacheMiddlewareConfig.max_entries was added to the config schema but register.py still instantiated CacheMiddleware with only (enabled_mode, similarity_threshold) — so a configured max_entries value was silently ignored and deployments always used the 1024 default. Pass config.max_entries through to the constructor so config-driven LRU sizing actually takes effect. Signed-off-by: ColinM-sys <cmcdonough@50words.com>

dagardner-nv · 2026-04-20T20:58:37Z

@ColinM-sys I think this PR has some value, but it really needs a GitHub issue to be opened to discuss the design of these changes (no need to close the PR).

At a first glance, I'm not sure that similarity_threshold requires a minimum value but instead a mention of the potential consequences are of a lower value.

I don't believe the type-checking around the similarity_threshold attribute is required as this should be enforced by Pydantic, as the intended use-case is to construct this class from the CacheMiddlewareConfig class, anyone attempting to construct an instance of CacheMiddleware another way is on their own.

ColinM-sys · 2026-04-20T22:24:51Z

Opened #1888 for the design discussion. Makes sense on the type-checking since Pydantic handles that at the config layer. On the threshold floor I think there is still value in at least documenting the cache poisoning risk of lower values even if a hard minimum is not appropriate for a core library. Happy to revise once the issue discussion settles on a direction.

Per design discussion in NVIDIA#1888: - Drop similarity_threshold floor from 0.85 to 0 and remove manual type/range validation from the constructor; Pydantic enforces bounds at the config layer (CacheMiddlewareConfig) and anyone constructing CacheMiddleware directly is on their own - Update CacheMiddlewareConfig.similarity_threshold to ge=0 with updated description documenting the performance cost of low values and the cache-collision risk near 0 - Replace the manual difflib.SequenceMatcher loop in _find_similar_key with difflib.get_close_matches for cleaner, more idiomatic code - Remove constructor-level validation tests that no longer apply Signed-off-by: ColinM-sys <cmcdonough@50words.com>

coderabbitai

♻️ Duplicate comments (1)

packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py (1)

79-109: ⚠️ Potential issue | 🟠 Major

Guard max_entries in constructor to prevent runtime eviction failures.

Line 238-239 can raise KeyError when max_entries is zero/negative (especially negative: loop condition stays true even when cache is empty). Since this class is also directly instantiated, add a defensive check in __init__.

🐛 Proposed fix

     def __init__(
         self,
         *,
         enabled_mode: str,
         similarity_threshold: float,
         max_entries: int = _DEFAULT_MAX_CACHE_ENTRIES,
     ) -> None:
@@
         self._enabled_mode = enabled_mode
         self._similarity_threshold = similarity_threshold
+        if not isinstance(max_entries, int) or isinstance(max_entries, bool) or max_entries < 1:
+            raise ValueError(f"max_entries must be a positive integer, got {max_entries!r}")
         # OrderedDict gives O(1) LRU: move_to_end() on hit, popitem(last=False)
         # to evict the oldest when we exceed max_entries.
         self._cache: OrderedDict[str, Any] = OrderedDict()
         self._max_entries = max_entries

Also applies to: 238-239

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py` around
lines 79 - 109, The constructor for CacheMiddleware (__init__) must guard the
max_entries parameter to avoid zero/negative values causing KeyError or infinite
loops during eviction; validate max_entries (the argument and the assigned
self._max_entries) and either raise a ValueError for non-positive values or
coerce to a minimum of 1 before assigning, so later eviction logic (where
entries are popped when len(self._cache) > self._max_entries) cannot loop or pop
from an empty OrderedDict.

🧹 Nitpick comments (3)

packages/nvidia_nat_core/tests/nat/middleware/test_cache_middleware.py (1)

346-349: Make cache-key assertions format-independent.

Line 346 and Line 374 currently assert by substring-matching serialized JSON text. That is brittle to serializer formatting and harder to reason about than exact key checks.

♻️ Suggested refactor

-        latest_keys = list(mw._cache.keys())  # noqa: SLF001
-        for i in range(7, 10):
-            assert any(f"unique_input_{i}" in k for k in latest_keys)
+        expected_keys = {
+            mw._serialize_input({"value": f"unique_input_{i}"})  # noqa: SLF001
+            for i in range(7, 10)
+        }
+        assert set(mw._cache.keys()) == expected_keys  # noqa: SLF001
@@
-        keys = "".join(list(mw._cache.keys()))  # noqa: SLF001
-        assert '"value": "A"' in keys
-        assert '"value": "D"' in keys
-        assert '"value": "B"' not in keys
+        cached_keys = set(mw._cache.keys())  # noqa: SLF001
+        assert mw._serialize_input({"value": "A"}) in cached_keys  # noqa: SLF001
+        assert mw._serialize_input({"value": "D"}) in cached_keys  # noqa: SLF001
+        assert mw._serialize_input({"value": "B"}) not in cached_keys  # noqa: SLF001

Also applies to: 374-377

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_core/tests/nat/middleware/test_cache_middleware.py`
around lines 346 - 349, The test currently checks cache keys by
substring-matching serialized JSON which is brittle; update assertions to be
format-independent by parsing or normalizing keys before comparison: grab
mw._cache.keys() in latest_keys, extract/normalize the logical identifier (e.g.,
remove JSON serialization wrapper or parse JSON if keys are JSON strings) and
then assert equality (or membership) against the exact expected strings like
"unique_input_7", "unique_input_8", "unique_input_9" instead of substring
matching; apply the same change to both occurrences around latest_keys and the
block referencing unique_input_i so the tests assert exact keys rather than
serialized fragments.

packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py (1)

22-22: Avoid schema dependency on a private runtime symbol.

Line 22 imports _DEFAULT_MAX_CACHE_ENTRIES from the implementation module. This couples config import-time behavior to middleware internals and a private name. Consider moving the constant to a small shared constants module in nat.middleware.cache and importing it from both files.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py`
at line 22, The config module is importing the private runtime symbol
_DEFAULT_MAX_CACHE_ENTRIES from cache_middleware, creating an implementation
dependency; fix this by extracting the constant into a small shared module
(e.g., create nat.middleware.cache.constants) and expose a public name like
DEFAULT_MAX_CACHE_ENTRIES there, then update both cache_middleware_config.py and
cache_middleware.py to import DEFAULT_MAX_CACHE_ENTRIES from
nat.middleware.cache.constants (and remove uses of the underscored name),
ensuring all references (e.g., in CacheMiddleware and cache config) use the new
public constant.

packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py (1)

233-235: Align the inline comment with actual eviction order.

Line 233 says eviction is enforced “BEFORE insert,” but insertion happens first and then eviction runs. Updating the comment avoids future confusion during maintenance.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py` around
lines 233 - 235, Update the inline comment in the "Phase 3: Postprocess" block
to correctly state that eviction runs after insertion (not before). Edit the
comment that currently reads "Enforce the LRU bound BEFORE insert so the new
entry always lands in a cache of size <= max_entries" to reflect the actual
operation order: insertion occurs first and then eviction/LRU trimming runs, so
reword it to indicate eviction is performed AFTER insert to maintain the cache
size bound. Target the comment in the Phase 3 section of cache_middleware.py
(around the postprocess/cache insert logic) so future readers understand the
real eviction order.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py`:
- Around line 79-109: The constructor for CacheMiddleware (__init__) must guard
the max_entries parameter to avoid zero/negative values causing KeyError or
infinite loops during eviction; validate max_entries (the argument and the
assigned self._max_entries) and either raise a ValueError for non-positive
values or coerce to a minimum of 1 before assigning, so later eviction logic
(where entries are popped when len(self._cache) > self._max_entries) cannot loop
or pop from an empty OrderedDict.

---

Nitpick comments:
In
`@packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py`:
- Line 22: The config module is importing the private runtime symbol
_DEFAULT_MAX_CACHE_ENTRIES from cache_middleware, creating an implementation
dependency; fix this by extracting the constant into a small shared module
(e.g., create nat.middleware.cache.constants) and expose a public name like
DEFAULT_MAX_CACHE_ENTRIES there, then update both cache_middleware_config.py and
cache_middleware.py to import DEFAULT_MAX_CACHE_ENTRIES from
nat.middleware.cache.constants (and remove uses of the underscored name),
ensuring all references (e.g., in CacheMiddleware and cache config) use the new
public constant.

In `@packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py`:
- Around line 233-235: Update the inline comment in the "Phase 3: Postprocess"
block to correctly state that eviction runs after insertion (not before). Edit
the comment that currently reads "Enforce the LRU bound BEFORE insert so the new
entry always lands in a cache of size <= max_entries" to reflect the actual
operation order: insertion occurs first and then eviction/LRU trimming runs, so
reword it to indicate eviction is performed AFTER insert to maintain the cache
size bound. Target the comment in the Phase 3 section of cache_middleware.py
(around the postprocess/cache insert logic) so future readers understand the
real eviction order.

In `@packages/nvidia_nat_core/tests/nat/middleware/test_cache_middleware.py`:
- Around line 346-349: The test currently checks cache keys by
substring-matching serialized JSON which is brittle; update assertions to be
format-independent by parsing or normalizing keys before comparison: grab
mw._cache.keys() in latest_keys, extract/normalize the logical identifier (e.g.,
remove JSON serialization wrapper or parse JSON if keys are JSON strings) and
then assert equality (or membership) against the exact expected strings like
"unique_input_7", "unique_input_8", "unique_input_9" instead of substring
matching; apply the same change to both occurrences around latest_keys and the
block referencing unique_input_i so the tests assert exact keys rather than
serialized fragments.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 00784370-cf08-4c25-aa8d-b9bf7e28bb18

📥 Commits

Reviewing files that changed from the base of the PR and between ee05bcd and 2036cf2.

📒 Files selected for processing (3)

packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py
packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py
packages/nvidia_nat_core/tests/nat/middleware/test_cache_middleware.py

…resholds Per CodeRabbit on NVIDIA#1879: 1. cache_middleware_config.py was out of sync with the new constructor semantics — schema allowed similarity_threshold in [0, 1] and had no max_entries field at all, so config-driven instantiation could pass values the constructor now rejects. Re-export _MIN_FUZZY_THRESHOLD and _DEFAULT_MAX_CACHE_ENTRIES from the module and wire both constants into the schema: similarity_threshold: ge=_MIN_FUZZY_THRESHOLD, le=1.0 max_entries: ge=1, default=_DEFAULT_MAX_CACHE_ENTRIES Docstrings mirror the constructor's rationale (cache-poisoning for threshold, DoS for max_entries). 2. Constructor validation did not reject `bool` (since isinstance(True, int) is True in Python) or non-finite floats (NaN / +inf / -inf). Both are classic config-bug foot-guns: a boolean silently becoming 0 or 1.0, or a parser that hands back NaN from an upstream source, would slip past the range comparison. Add explicit: - bool rejection on similarity_threshold (before the number check) - math.isfinite() check on similarity_threshold - bool rejection on max_entries New tests: - test_threshold_bool_is_rejected (parametrized True / False) - test_threshold_non_finite_is_rejected (parametrized NaN / +inf / -inf) - test_bool_max_entries_is_rejected (parametrized True / False) Signed-off-by: ColinM-sys <cmcdonough@50words.com>

Per CodeRabbit on NVIDIA#1879: CacheMiddlewareConfig.max_entries was added to the config schema but register.py still instantiated CacheMiddleware with only (enabled_mode, similarity_threshold) — so a configured max_entries value was silently ignored and deployments always used the 1024 default. Pass config.max_entries through to the constructor so config-driven LRU sizing actually takes effect. Signed-off-by: ColinM-sys <cmcdonough@50words.com>

review-notebook-app · 2026-04-21T14:19:37Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ci/scripts/common.sh`:
- Around line 75-77: In function build_wheel update the uv build invocation to
quote the directory argument so paths with spaces or special chars are handled;
specifically, change the --directory $1 usage in the uv build command to pass
the value quoted (reference symbols: build_wheel, uv build, --directory,
WHEELS_DIR) so the shell does not split or misinterpret the path.

In `@ci/scripts/github/build_wheel.sh`:
- Around line 40-42: The invocation expands NAT_EXAMPLE unquoted which can break
argument parsing for paths with spaces or globs; update the call site in the
loop to pass the variable quoted by changing build_wheel ${NAT_EXAMPLE}
"examples" to use a quoted parameter like build_wheel "${NAT_EXAMPLE}"
"examples" so the build_wheel function receives the full path as a single
argument (refer to NAT_EXAMPLE, NAT_EXAMPLES and build_wheel).

In `@ci/scripts/gitlab/artifactory/Dockerfile`:
- Around line 16-18: The Dockerfile currently leaves the container running as
root; update the Dockerfile (the image built from the FROM
ghcr.io/astral-sh/uv:0.9.28-python3.13-bookworm and the RUN installing JFrog
CLI) to create or use a non-root user and set USER to that non-root account
before the final image layer. Specifically, add steps to create a dedicated
non-root user/group (e.g., app user), adjust ownership of any required
directories or installed artifacts created by the RUN installation, and then add
a USER app (or chosen username) directive so the container runs as a non-root
user at runtime. Ensure the user has necessary permissions for the installed
tooling and runtime files created by the RUN command.
- Line 18: Replace the unsafe "RUN VERSION=2.99.0 curl -fL
https://install-cli.jfrog.io | sh" pattern in the Dockerfile with a direct
binary download and checksum verification: use the same VERSION variable to
construct the jfrog CLI binary URL, download the binary to a temporary path (do
not pipe to sh), download or embed the official SHA256 checksum for that
VERSION, verify the downloaded file with sha256sum (or equivalent), and only on
successful verification chmod +x and install/move the binary into
/usr/local/bin; ensure the build fails if checksum verification fails and remove
the use of an unconditional curl | sh in the RUN command.

In `@ci/scripts/gitlab/checks.sh`:
- Line 37: Fix the typo in the echo statement that prints "Runing Documentation
checks": locate the echo call containing the string "Runing Documentation
checks" and change it to "Running Documentation checks" so the CI status output
is spelled correctly.

In
`@examples/notebooks/launchables/GPU_Cluster_Sizing_with_NeMo_Agent_Toolkit.ipynb`:
- Around line 1160-1164: The four broken documentation links in the markdown
list (the link strings containing "Full Documentation", "Get Started Guide",
"Examples", "Evaluation Guide", and "Troubleshooting" that currently point to
URLs with "/nemo/agent-toolkit/1.7/") must be fixed by replacing each 1.7 URL
with a validated, working documentation URL (or remove the corresponding list
item if no valid URL exists); locate the exact link strings in the notebook cell
and either (a) update the hrefs to the correct version (confirm the intended
NeMo Agent Toolkit version) or (b) remove the broken entries and add a brief
note pointing to the repository's README or canonical docs; ensure this change
is limited to the notebook cell containing those link strings and include a
concise commit message clarifying whether the version bump to 1.7 was
intentional.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 60096a16-f0dd-4d23-bc9d-6f2f1ef22ec8

📥 Commits

Reviewing files that changed from the base of the PR and between 2036cf2 and 870acac.

⛔ Files ignored due to path filters (7)

examples/agents/uv.lock is excluded by !**/*.lock
examples/finetuning/rl_with_openpipe_art/uv.lock is excluded by !**/*.lock
examples/frameworks/agno_personal_finance/uv.lock is excluded by !**/*.lock
examples/frameworks/multi_frameworks/uv.lock is excluded by !**/*.lock
packages/nvidia_nat_agno/uv.lock is excluded by !**/*.lock
packages/nvidia_nat_openpipe_art/uv.lock is excluded by !**/*.lock
uv.lock is excluded by !**/*.lock

📒 Files selected for processing (34)

.github/workflows/ci_pipe.yml
.github/workflows/pr.yaml
.gitlab-ci.yml
ci/scripts/common.sh
ci/scripts/github/build_wheel.sh
ci/scripts/github/checks.sh
ci/scripts/github/common.sh
ci/scripts/github/docs.sh
ci/scripts/github/tests.sh
ci/scripts/gitlab/artifactory/Dockerfile
ci/scripts/gitlab/artifactory_upload.sh
ci/scripts/gitlab/build_wheel.sh
ci/scripts/gitlab/checks.sh
ci/scripts/gitlab/common.sh
ci/scripts/gitlab/docs.sh
ci/scripts/gitlab/model_health_check.sh
ci/scripts/gitlab/tests.sh
docs/source/versions1.json
examples/frameworks/agno_personal_finance/pyproject.toml
examples/frameworks/multi_frameworks/pyproject.toml
examples/notebooks/getting_started_with_nat.ipynb
examples/notebooks/launchables/GPU_Cluster_Sizing_with_NeMo_Agent_Toolkit.ipynb
examples/notebooks/observability_evaluation_and_profiling.ipynb
packages/nvidia_nat_agno/pyproject.toml
packages/nvidia_nat_core/src/nat/meta/pypi.md
packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py
packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py
packages/nvidia_nat_core/src/nat/middleware/cache/register.py
packages/nvidia_nat_core/tests/nat/middleware/test_cache_middleware.py
packages/nvidia_nat_memmachine/src/nat/plugins/memmachine/memmachine_editor.py
packages/nvidia_nat_openpipe_art/pyproject.toml
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/config.py
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trainer_adapter.py
packages/nvidia_nat_openpipe_art/tests/test_trainer_adapter.py

💤 Files with no reviewable changes (6)

ci/scripts/github/common.sh
examples/frameworks/multi_frameworks/pyproject.toml
packages/nvidia_nat_openpipe_art/tests/test_trainer_adapter.py
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/config.py
examples/frameworks/agno_personal_finance/pyproject.toml
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trainer_adapter.py

✅ Files skipped from review due to trivial changes (8)

ci/scripts/gitlab/docs.sh
ci/scripts/github/tests.sh
ci/scripts/github/checks.sh
examples/notebooks/observability_evaluation_and_profiling.ipynb
packages/nvidia_nat_core/src/nat/meta/pypi.md
examples/notebooks/getting_started_with_nat.ipynb
.github/workflows/pr.yaml
docs/source/versions1.json

🚧 Files skipped from review as they are similar to previous changes (4)

packages/nvidia_nat_core/tests/nat/middleware/test_cache_middleware.py
packages/nvidia_nat_core/src/nat/middleware/cache/register.py
packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py
packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py

coderabbitai · 2026-04-21T14:26:56Z

 function build_wheel() {
-    rapids-logger "Building Wheel for $1"
+    echo "Building Wheel for $1"
    uv build --wheel --no-progress --out-dir "${WHEELS_DIR}/$2" --directory $1


⚠️ Potential issue | 🟡 Minor

Quote the --directory parameter in build_wheel.

Line 77 passes $1 unquoted, which can break uv build argument parsing for non-trivial paths.

Suggested fix

function build_wheel() { echo "Building Wheel for $1" - uv build --wheel --no-progress --out-dir "${WHEELS_DIR}/$2" --directory $1 + uv build --wheel --no-progress --out-dir "${WHEELS_DIR}/$2" --directory "$1" }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

function build_wheel() {

rapids-logger "Building Wheel for $1"

echo "Building Wheel for $1"

uv build --wheel --no-progress --out-dir "${WHEELS_DIR}/$2" --directory $1

function build_wheel() {

echo "Building Wheel for $1"

uv build --wheel --no-progress --out-dir "${WHEELS_DIR}/$2" --directory "$1"

}

🧰 Tools

🪛 Shellcheck (0.11.0)

[info] 77-77: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ci/scripts/common.sh` around lines 75 - 77, In function build_wheel update the uv build invocation to quote the directory argument so paths with spaces or special chars are handled; specifically, change the --directory $1 usage in the uv build command to pass the value quoted (reference symbols: build_wheel, uv build, --directory, WHEELS_DIR) so the shell does not split or misinterpret the path.

coderabbitai · 2026-04-21T14:26:57Z

+for NAT_EXAMPLE in "${NAT_EXAMPLES[@]}"; do
    # places all wheels flat under example
    build_wheel ${NAT_EXAMPLE} "examples"


⚠️ Potential issue | 🟡 Minor

Quote the example path when invoking build_wheel.

Line 42 still expands ${NAT_EXAMPLE} unquoted, so a path with spaces/globs can break argument parsing despite the safer array loop.

Suggested fix

-for NAT_EXAMPLE in "${NAT_EXAMPLES[@]}"; do +for NAT_EXAMPLE in "${NAT_EXAMPLES[@]}"; do # places all wheels flat under example - build_wheel ${NAT_EXAMPLE} "examples" + build_wheel "${NAT_EXAMPLE}" "examples" done

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for NAT_EXAMPLE in "${NAT_EXAMPLES[@]}"; do

# places all wheels flat under example

build_wheel ${NAT_EXAMPLE} "examples"

for NAT_EXAMPLE in "${NAT_EXAMPLES[@]}"; do

# places all wheels flat under example

build_wheel "${NAT_EXAMPLE}" "examples"

done

🧰 Tools

🪛 Shellcheck (0.11.0)

[info] 40-40: Possible misspelling: NAT_EXAMPLES may not be assigned. Did you mean NAT_EXAMPLE?

(SC2153)

[info] 42-42: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ci/scripts/github/build_wheel.sh` around lines 40 - 42, The invocation expands NAT_EXAMPLE unquoted which can break argument parsing for paths with spaces or globs; update the call site in the loop to pass the variable quoted by changing build_wheel ${NAT_EXAMPLE} "examples" to use a quoted parameter like build_wheel "${NAT_EXAMPLE}" "examples" so the build_wheel function receives the full path as a single argument (refer to NAT_EXAMPLE, NAT_EXAMPLES and build_wheel).

coderabbitai · 2026-04-21T14:26:57Z

+FROM ghcr.io/astral-sh/uv:0.9.28-python3.13-bookworm
+
+RUN VERSION=2.99.0 curl -fL https://install-cli.jfrog.io | sh


⚠️ Potential issue | 🟠 Major

Run the helper image as a non-root user.

The image currently does not set USER, so runtime defaults can remain root (Trivy DS-0002).

🛡️ Proposed hardening diff

FROM ghcr.io/astral-sh/uv:0.9.28-python3.13-bookworm RUN VERSION=2.99.0 curl -fL https://install-cli.jfrog.io | sh +RUN useradd --create-home --uid 10001 appuser +USER appuser

As per coding guidelines "Check for security vulnerabilities and potential issues."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

FROM ghcr.io/astral-sh/uv:0.9.28-python3.13-bookworm

RUN VERSION=2.99.0 curl -fL https://install-cli.jfrog.io | sh

FROM ghcr.io/astral-sh/uv:0.9.28-python3.13-bookworm

RUN VERSION=2.99.0 curl -fL https://install-cli.jfrog.io | sh

RUN useradd --create-home --uid 10001 appuser

USER appuser

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ci/scripts/gitlab/artifactory/Dockerfile` around lines 16 - 18, The Dockerfile currently leaves the container running as root; update the Dockerfile (the image built from the FROM ghcr.io/astral-sh/uv:0.9.28-python3.13-bookworm and the RUN installing JFrog CLI) to create or use a non-root user and set USER to that non-root account before the final image layer. Specifically, add steps to create a dedicated non-root user/group (e.g., app user), adjust ownership of any required directories or installed artifacts created by the RUN installation, and then add a USER app (or chosen username) directive so the container runs as a non-root user at runtime. Ensure the user has necessary permissions for the installed tooling and runtime files created by the RUN command.

coderabbitai · 2026-04-21T14:26:57Z

+
+FROM ghcr.io/astral-sh/uv:0.9.28-python3.13-bookworm
+
+RUN VERSION=2.99.0 curl -fL https://install-cli.jfrog.io | sh


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify whether the Dockerfile installs JFrog CLI via remote script without checksum verification. rg -n 'install-cli\.jfrog\.io|curl\s+-fL|sha256sum|gpg|cosign' ci/scripts/gitlab/artifactory/Dockerfile

Repository: NVIDIA/NeMo-Agent-Toolkit

Length of output: 133

🏁 Script executed:

# Verify the Dockerfile exists and examine full context around line 18 cat -n ci/scripts/gitlab/artifactory/Dockerfile

Repository: NVIDIA/NeMo-Agent-Toolkit

Length of output: 991

🏁 Script executed:

# Search for other similar curl patterns in Dockerfiles fd -e Dockerfile -x rg 'curl.*\|.*sh' {}

Repository: NVIDIA/NeMo-Agent-Toolkit

Length of output: 51

Harden JFrog CLI installation against supply-chain risks by removing curl | sh pattern.

Line 18 executes a remote installation script without integrity verification, which weakens supply-chain security for CI images. The -fL flags hide errors and follow redirects, increasing attack surface. Replace with direct binary download and SHA256 checksum verification:

Proposed hardening diff

FROM ghcr.io/astral-sh/uv:0.9.28-python3.13-bookworm -RUN VERSION=2.99.0 curl -fL https://install-cli.jfrog.io | sh +ARG JF_VERSION=2.99.0 +ARG JF_SHA256 +RUN curl -fsSL "https://releases.jfrog.io/artifactory/jfrog-cli/v2-jf/${JF_VERSION}/jfrog-cli-linux-amd64/jf" -o /usr/local/bin/jf \ + && echo "${JF_SHA256} /usr/local/bin/jf" | sha256sum -c - \ + && chmod +x /usr/local/bin/jf

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ci/scripts/gitlab/artifactory/Dockerfile` at line 18, Replace the unsafe "RUN VERSION=2.99.0 curl -fL https://install-cli.jfrog.io | sh" pattern in the Dockerfile with a direct binary download and checksum verification: use the same VERSION variable to construct the jfrog CLI binary URL, download the binary to a temporary path (do not pipe to sh), download or embed the official SHA256 checksum for that VERSION, verify the downloaded file with sha256sum (or equivalent), and only on successful verification chmod +x and install/move the binary into /usr/local/bin; ensure the build fails if checksum verification fails and remove the use of an unconditional curl | sh in the RUN command.

coderabbitai · 2026-04-21T14:26:57Z



-rapids-logger "Runing Documentation checks"
+echo "Runing Documentation checks"


⚠️ Potential issue | 🟡 Minor

Fix typo in CI status output.

Line 37 prints "Runing Documentation checks"; please change to "Running Documentation checks".

✏️ Proposed diff

-echo "Runing Documentation checks" +echo "Running Documentation checks"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

echo "Runing Documentation checks"

echo "Running Documentation checks"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ci/scripts/gitlab/checks.sh` at line 37, Fix the typo in the echo statement that prints "Runing Documentation checks": locate the echo call containing the string "Runing Documentation checks" and change it to "Running Documentation checks" so the CI status output is spelled correctly.

coderabbitai · 2026-04-21T14:26:57Z

+    "- **Documentation**: [Full Documentation](https://docs.nvidia.com/nemo/agent-toolkit/1.7/index.html)\n",
+    "- **Get Started Guide**: [Getting Started](https://docs.nvidia.com/nemo/agent-toolkit/1.7/get-started/installation.html)\n",
    "- **Examples**: Check the `examples/` directory in the cloned repository\n",
-    "- **Evaluation Guide**: [Evaluate with NeMo Agent Toolkit](https://docs.nvidia.com/nemo/agent-toolkit/1.6/improve-workflows/evaluate.html)\n",
-    "- **Troubleshooting**: [Common Issues](https://docs.nvidia.com/nemo/agent-toolkit/1.6/resources/troubleshooting.html)\n"
+    "- **Evaluation Guide**: [Evaluate with NeMo Agent Toolkit](https://docs.nvidia.com/nemo/agent-toolkit/1.7/improve-workflows/evaluate.html)\n",
+    "- **Troubleshooting**: [Common Issues](https://docs.nvidia.com/nemo/agent-toolkit/1.7/resources/troubleshooting.html)\n"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Check if the documentation URLs return successful HTTP responses. urls=( "https://docs.nvidia.com/nemo/agent-toolkit/1.7/index.html" "https://docs.nvidia.com/nemo/agent-toolkit/1.7/get-started/installation.html" "https://docs.nvidia.com/nemo/agent-toolkit/1.7/improve-workflows/evaluate.html" "https://docs.nvidia.com/nemo/agent-toolkit/1.7/resources/troubleshooting.html" ) for url in "${urls[@]}"; do echo "Checking: $url" http_code=$(curl -s -o /dev/null -w "%{http_code}" "$url") if [ "$http_code" -eq 200 ]; then echo " ✓ OK ($http_code)" else echo " ✗ FAILED ($http_code)" fi done

Repository: NVIDIA/NeMo-Agent-Toolkit

Length of output: 468

Update or remove broken documentation links (all return 404 errors).

The four documentation URLs in lines 1160–1164 have been updated to version 1.7, but all return HTTP 404 errors:

Full Documentation

Getting Started / installation

Evaluate with NeMo Agent Toolkit

Troubleshooting

Either update the URLs to point to valid documentation or remove the broken references. Additionally, clarify whether this documentation version update is intentional for this PR or should be addressed separately, as it appears unrelated to the stated cache-middleware objectives.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/notebooks/launchables/GPU_Cluster_Sizing_with_NeMo_Agent_Toolkit.ipynb` around lines 1160 - 1164, The four broken documentation links in the markdown list (the link strings containing "Full Documentation", "Get Started Guide", "Examples", "Evaluation Guide", and "Troubleshooting" that currently point to URLs with "/nemo/agent-toolkit/1.7/") must be fixed by replacing each 1.7 URL with a validated, working documentation URL (or remove the corresponding list item if no valid URL exists); locate the exact link strings in the notebook cell and either (a) update the hrefs to the correct version (confirm the intended NeMo Agent Toolkit version) or (b) remove the broken entries and add a brief note pointing to the repository's README or canonical docs; ensure this change is limited to the notebook cell containing those link strings and include a concise commit message clarifying whether the version bump to 1.7 was intentional.

…resholds Per CodeRabbit on NVIDIA#1879: 1. cache_middleware_config.py was out of sync with the new constructor semantics — schema allowed similarity_threshold in [0, 1] and had no max_entries field at all, so config-driven instantiation could pass values the constructor now rejects. Re-export _MIN_FUZZY_THRESHOLD and _DEFAULT_MAX_CACHE_ENTRIES from the module and wire both constants into the schema: similarity_threshold: ge=_MIN_FUZZY_THRESHOLD, le=1.0 max_entries: ge=1, default=_DEFAULT_MAX_CACHE_ENTRIES Docstrings mirror the constructor's rationale (cache-poisoning for threshold, DoS for max_entries). 2. Constructor validation did not reject `bool` (since isinstance(True, int) is True in Python) or non-finite floats (NaN / +inf / -inf). Both are classic config-bug foot-guns: a boolean silently becoming 0 or 1.0, or a parser that hands back NaN from an upstream source, would slip past the range comparison. Add explicit: - bool rejection on similarity_threshold (before the number check) - math.isfinite() check on similarity_threshold - bool rejection on max_entries New tests: - test_threshold_bool_is_rejected (parametrized True / False) - test_threshold_non_finite_is_rejected (parametrized NaN / +inf / -inf) - test_bool_max_entries_is_rejected (parametrized True / False) Signed-off-by: ColinM-sys <cmcdonough@50words.com>

Per CodeRabbit on NVIDIA#1879: CacheMiddlewareConfig.max_entries was added to the config schema but register.py still instantiated CacheMiddleware with only (enabled_mode, similarity_threshold) — so a configured max_entries value was silently ignored and deployments always used the 1024 default. Pass config.max_entries through to the constructor so config-driven LRU sizing actually takes effect. Signed-off-by: ColinM-sys <cmcdonough@50words.com>

dagardner-nv · 2026-04-21T15:39:42Z

        self._cache[input_str] = result
-        logger.debug("Cached result for function %s", context.name)
+        self._cache.move_to_end(input_str)


Is this call to self._cache.move_to_end(input_str) needed? If it was just inserted on line 236, shouldn't it already be at the end?

dagardner-nv · 2026-04-21T15:41:57Z

+# exhaustion DoS and, combined with fuzzy matching, a long-lived surface for
+# cross-request confusion. OrderedDict-backed LRU evicts the oldest entry
+# when the cache exceeds this bound.
+_DEFAULT_MAX_CACHE_ENTRIES = 1024


I don't think this is needed as a global const, the default value should be in the config class, since we only construct CacheMiddleware from the config object, the constructor similarly doesn't need a default value.

dagardner-nv · 2026-04-21T15:42:44Z

+            "inputs return the same cached response."),
+    )
+    max_entries: int = Field(
+        default=_DEFAULT_MAX_CACHE_ENTRIES,


Suggested change

default=_DEFAULT_MAX_CACHE_ENTRIES,

default=1024,

Remove the const global, 1024 isn't a number that needs to be a const.

dagardner-nv · 2026-04-21T15:43:27Z

    def test_custom_initialization(self):
        """Test custom initialization."""
-        middleware = CacheMiddleware(enabled_mode="always", similarity_threshold=0.8)
+        middleware = CacheMiddleware(enabled_mode="always", similarity_threshold=0.9)


Revert, I don't think this is needed

dagardner-nv · 2026-04-21T15:43:39Z

    async def test_fuzzy_match_caching(self, middleware_context):
        """Test fuzzy matching with similarity_threshold < 1.0."""
-        middleware = CacheMiddleware(enabled_mode="always", similarity_threshold=0.8)
+        middleware = CacheMiddleware(enabled_mode="always", similarity_threshold=0.9)


Same, not needed

dagardner-nv · 2026-04-21T15:43:51Z

-        middleware = CacheMiddleware(enabled_mode="always", similarity_threshold=0.5)
+        middleware = CacheMiddleware(enabled_mode="always", similarity_threshold=0.9)


dagardner-nv · 2026-04-21T15:44:07Z

-        middleware = CacheMiddleware(enabled_mode="always", similarity_threshold=0.7)
+        middleware = CacheMiddleware(enabled_mode="always", similarity_threshold=0.85)


ColinM-sys requested a review from a team as a code owner April 16, 2026 22:45

coderabbitai Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py Outdated

Comment thread packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py Outdated

coderabbitai Bot reviewed Apr 16, 2026

View reviewed changes

ColinM-sys mentioned this pull request Apr 20, 2026

Cache middleware: design discussion for similarity_threshold floor and max_entries bound #1888

Open

coderabbitai Bot reviewed Apr 20, 2026

View reviewed changes

ColinM-sys force-pushed the fix/cache-middleware-poisoning-and-bounds branch from 2036cf2 to 870acac Compare April 21, 2026 14:19

ColinM-sys requested a review from a team as a code owner April 21, 2026 14:19

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

ColinM-sys force-pushed the fix/cache-middleware-poisoning-and-bounds branch 2 times, most recently from ec97055 to 2036cf2 Compare April 21, 2026 14:37

dagardner-nv reviewed Apr 21, 2026

View reviewed changes

dagardner-nv self-assigned this Apr 22, 2026

		FROM ghcr.io/astral-sh/uv:0.9.28-python3.13-bookworm

		RUN VERSION=2.99.0 curl -fL https://install-cli.jfrog.io \| sh



		rapids-logger "Runing Documentation checks"
		echo "Runing Documentation checks"

	echo "Runing Documentation checks"
	echo "Running Documentation checks"

		middleware = CacheMiddleware(enabled_mode="always", similarity_threshold=0.5)
		middleware = CacheMiddleware(enabled_mode="always", similarity_threshold=0.9)

Conversation

ColinM-sys commented Apr 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

CWE

Breaking change

Test plan

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Apr 16, 2026

Uh oh!

coderabbitai Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

dagardner-nv commented Apr 20, 2026

Uh oh!

ColinM-sys commented Apr 20, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

review-notebook-app Bot commented Apr 21, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

dagardner-nv Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

dagardner-nv Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

dagardner-nv Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

dagardner-nv Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

dagardner-nv Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

dagardner-nv Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

dagardner-nv Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

ColinM-sys commented Apr 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 16, 2026 •

edited

Loading