Skip to content

fix(cache-middleware): bound cache size and enforce safe fuzzy threshold#1879

Open
ColinM-sys wants to merge 4 commits intoNVIDIA:developfrom
ColinM-sys:fix/cache-middleware-poisoning-and-bounds
Open

fix(cache-middleware): bound cache size and enforce safe fuzzy threshold#1879
ColinM-sys wants to merge 4 commits intoNVIDIA:developfrom
ColinM-sys:fix/cache-middleware-poisoning-and-bounds

Conversation

@ColinM-sys
Copy link
Copy Markdown

@ColinM-sys ColinM-sys commented Apr 16, 2026

Summary

`CacheMiddleware` previously had two hardening gaps:

  1. Unbounded cache (dict-backed) — sustained unique inputs grew the cache without limit, a memory-exhaustion DoS surface, and combined with fuzzy matching a long-lived surface for cross-request confusion.
  2. Unrestricted `similarity_threshold` — any float in [0, 1] was accepted. Low values make cache poisoning trivial: a crafted input whose difflib ratio exceeds the threshold against a legitimate cached key hijacks the legitimate user's response. At 0.5, most short/structured inputs collide on first try.

What changed

  • Back the cache with an `OrderedDict`. Evict the oldest entry (LRU) when the cache exceeds `max_entries`. A cache hit `move_to_end()`s the entry so frequently-useful results are preferred on eviction.
  • Add `max_entries` parameter (default 1024, must be positive int).
  • Reject `similarity_threshold < 0.85` at construct time with an actionable error. 1.0 remains recommended.
  • Non-numeric threshold values now raise instead of failing later.

CWE

  • CWE-400 — Uncontrolled Resource Consumption
  • CWE-345 — Insufficient Verification of Data Authenticity (cache-key confusion)

Breaking change

Configs that pass `similarity_threshold < 0.85` will fail at construct time. Intentional — any such value is a cache-poisoning foot-gun. The error message points at the two safe choices:

  • Use `1.0` (exact match, recommended)
  • Use `>= 0.85` (fuzzy, reasonable edit tolerance)

Test plan

Existing tests at sub-0.85 thresholds were updated to 0.85/0.9 so they keep exercising fuzzy mode without tripping the new floor.

New coverage in two classes:

`TestSimilarityThresholdFloor`:

  • Below-floor values (0.0, 0.3, 0.5, 0.7, 0.84) rejected with "outside the safe range"
  • At-or-above-floor values (0.85, 0.9, 0.95, 1.0) accepted
  • Threshold > 1.0 rejected
  • Non-numeric threshold rejected with "must be a number"

`TestMaxEntriesLruEviction`:

  • Default `max_entries` is positive
  • Zero or negative `max_entries` rejected
  • Cache stays at `max_entries` after sustained unique inserts (size = 3 after 10 inserts with max_entries=3; newest 3 retained)
  • Cache hit promotes entry to MRU: A→B→C, hit A, insert D, B (oldest) is evicted; A and D remain

Syntax-checked locally; pytest deferred to CI (my env is missing `pytest-asyncio`).

🤖 Generated with Claude Code

Signed-off-by: Colin McDonough cmcdonough@50words.com

Summary by CodeRabbit

  • New Features

    • Cache gains a configurable max-entries limit (default 1024) with LRU eviction and promotion on hits.
    • Configuration exposes a new max_entries field.
  • Improvements

    • Fuzzy-match logic refined for more consistent matching and clearer insertion logging.
    • Various dependency and packaging constraints updated.
  • Tests

    • Added tests covering LRU eviction and adjusted similarity-threshold expectations.
  • Documentation

    • Docs and notebooks updated to version 1.7 links.
  • Chores

    • CI/workflow pins and build/script logging simplified; small container image and tooling updates.

CacheMiddleware previously:
  1. Used an unbounded dict for storage — sustained unique inputs would
     grow the cache without limit, a memory-exhaustion DoS surface.
  2. Accepted any similarity_threshold in [0, 1]. Low values make cache
     poisoning trivial: a crafted input whose difflib ratio exceeds the
     threshold against a legitimate cached key hijacks the legitimate
     user's response. At threshold=0.5, most short/structured inputs
     collide on first try.

Changes:
  - Back the cache with an OrderedDict. Evict the oldest entry (LRU)
    whenever the cache exceeds max_entries. A cache hit moves the entry
    to the MRU end so frequently-useful results are preferred on eviction.
  - Add max_entries to __init__ (default 1024, must be a positive int).
  - Reject similarity_threshold values below 0.85 at construct time with
    an actionable error. 1.0 remains recommended (exact match only).
  - Non-numeric threshold values now raise instead of failing later.

Breaking change: existing configs that pass similarity_threshold < 0.85
will fail at construct time. This is intentional — any such value is a
cache-poisoning foot-gun. The error message points at the two safe
choices (use 1.0, or use a value >= 0.85).

Test updates:
  - test_similarity_computation_for_different_thresholds / _fuzzy_match_caching
    / test_multiple_similar_entries / test_custom_initialization: bumped
    from 0.5/0.7/0.8 to values >= 0.85 so they exercise fuzzy matching
    without tripping the new floor.
  - Added TestSimilarityThresholdFloor:
      * below-floor values (0.0, 0.3, 0.5, 0.7, 0.84) are rejected
      * at/above-floor values (0.85, 0.9, 0.95, 1.0) are accepted
      * threshold > 1.0 is rejected
      * non-numeric threshold is rejected
  - Added TestMaxEntriesLruEviction:
      * default max_entries is positive
      * zero or negative max_entries is rejected
      * cache stays at max_entries under sustained unique input
      * hit promotes entry to MRU; later insert evicts the older one

CWE-400 (resource exhaustion) + CWE-345 (insufficient data authenticity /
cache-key confusion).

Signed-off-by: ColinM-sys <cmcdonough@50words.com>
@ColinM-sys ColinM-sys requested a review from a team as a code owner April 16, 2026 22:45
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 16, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

CacheMiddleware now uses an LRU-bounded OrderedDict with a configurable max_entries (default 1024), switches fuzzy matching to difflib.get_close_matches(...), promotes hits with move_to_end, and evicts least-recently-used entries when capacity is exceeded. Constructor gains a max_entries parameter.

Changes

Cohort / File(s) Summary
Core Cache Middleware Implementation
packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py
Introduce _DEFAULT_MAX_CACHE_ENTRIES = 1024; constructor accepts max_entries and stores _max_entries. Replace plain dict with OrderedDict[str, Any]. Replace per-key SequenceMatcher scanning with difflib.get_close_matches(..., n=1, cutoff=similarity_threshold). On cache hit call move_to_end(similar_key); on insert move new key to MRU and evict LRU entries via popitem(last=False) until len(self._cache) <= self._max_entries. Update insertion debug logging to include size=%d/%d.
Cache Middleware Config
packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py
Add max_entries: int field (default _DEFAULT_MAX_CACHE_ENTRIES, ge=1) and import _DEFAULT_MAX_CACHE_ENTRIES. Clarify similarity_threshold description/validation and note difflib performance/collision considerations.
Middleware Registration
packages/nvidia_nat_core/src/nat/middleware/cache/register.py
Pass config.max_entries into CacheMiddleware(...) when constructing the middleware (constructor call expanded to include the new parameter).
Cache Middleware Tests
packages/nvidia_nat_core/tests/nat/middleware/test_cache_middleware.py
Adjust test thresholds (e.g., 0.8→0.9, etc.) to match changed fuzzy-match behavior. Add TestMaxEntriesLruEviction tests covering default _max_entries positivity, LRU eviction on overflow, and MRU promotion on cache hits. Update expected similarity comments accordingly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 51.85% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main changes: bounding cache size and enforcing safe fuzzy threshold, which directly aligns with the PR's primary objectives of addressing CWE-400 (unbounded cache) and CWE-345 (cache-poisoning via unrestricted similarity_threshold).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py`:
- Around line 86-105: The config schema in cache_middleware_config.py is out of
sync with CacheMiddleware.__init__: update the schema to enforce
similarity_threshold >= 0.85 (or reference _MIN_FUZZY_THRESHOLD if exported)
instead of [0.0,1.0], add a max_entries field with validation (int >=1) and
default matching _DEFAULT_MAX_CACHE_ENTRIES, and update the docstrings/comments
for both similarity_threshold and max_entries to mirror the constructor
semantics so config-driven instantiation cannot pass invalid values; touch the
model class/fields in cache_middleware_config.py and adjust any validators or
examples to match CacheMiddleware.__init__ and the constants used there.
- Around line 111-122: The validation for similarity_threshold should explicitly
reject booleans and non-finite floats: update the checks around
similarity_threshold (which currently reference _MIN_FUZZY_THRESHOLD) to first
reject isinstance(similarity_threshold, bool), then ensure it's an int or float,
and use math.isfinite(similarity_threshold) to reject NaN/inf before doing the
range comparison; adjust the ValueError messages accordingly. Also consider
applying the same bool-rejection to max_entries (which is currently validated as
an int) so True/False cannot pass as valid integers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 1b52b49a-f114-47f3-9289-01aca9986fbd

📥 Commits

Reviewing files that changed from the base of the PR and between a328e8a and 24de910.

📒 Files selected for processing (2)
  • packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py
  • packages/nvidia_nat_core/tests/nat/middleware/test_cache_middleware.py

Comment thread packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py Outdated
Comment thread packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py Outdated
…resholds

Per CodeRabbit on NVIDIA#1879:

1. cache_middleware_config.py was out of sync with the new constructor
   semantics — schema allowed similarity_threshold in [0, 1] and had no
   max_entries field at all, so config-driven instantiation could pass
   values the constructor now rejects. Re-export
   _MIN_FUZZY_THRESHOLD and _DEFAULT_MAX_CACHE_ENTRIES from the module
   and wire both constants into the schema:
     similarity_threshold: ge=_MIN_FUZZY_THRESHOLD, le=1.0
     max_entries: ge=1, default=_DEFAULT_MAX_CACHE_ENTRIES
   Docstrings mirror the constructor's rationale (cache-poisoning for
   threshold, DoS for max_entries).

2. Constructor validation did not reject `bool` (since isinstance(True, int)
   is True in Python) or non-finite floats (NaN / +inf / -inf). Both are
   classic config-bug foot-guns: a boolean silently becoming 0 or 1.0,
   or a parser that hands back NaN from an upstream source, would slip
   past the range comparison. Add explicit:
     - bool rejection on similarity_threshold (before the number check)
     - math.isfinite() check on similarity_threshold
     - bool rejection on max_entries

New tests:
  - test_threshold_bool_is_rejected (parametrized True / False)
  - test_threshold_non_finite_is_rejected (parametrized NaN / +inf / -inf)
  - test_bool_max_entries_is_rejected (parametrized True / False)

Signed-off-by: ColinM-sys <cmcdonough@50words.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py (1)

59-65: ⚠️ Potential issue | 🟠 Major

Wire max_entries through the config builder.

CacheMiddlewareConfig.max_entries is public here, but packages/nvidia_nat_core/src/nat/middleware/cache/register.py:22-33 still instantiates CacheMiddleware with only enabled_mode and similarity_threshold. Any configured max_entries value is silently ignored, so config-driven deployments always fall back to 1024.

🔧 Suggested follow-up
--- a/packages/nvidia_nat_core/src/nat/middleware/cache/register.py
+++ b/packages/nvidia_nat_core/src/nat/middleware/cache/register.py
@@
-    yield CacheMiddleware(enabled_mode=config.enabled_mode, similarity_threshold=config.similarity_threshold)
+    yield CacheMiddleware(
+        enabled_mode=config.enabled_mode,
+        similarity_threshold=config.similarity_threshold,
+        max_entries=config.max_entries,
+    )

As per coding guidelines, "Keep documentation in sync with code" and "packages/nvidia_nat_core/src/nat/**/*: Changes in packages/nvidia_nat_core/src/nat/**/* should prioritize backward compatibility as this contains core functionality."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py`
around lines 59 - 65, CacheMiddlewareConfig.max_entries is never passed into the
CacheMiddleware instance created in the registration/builder, so any configured
max_entries is ignored; update the instantiation in register.py (where
CacheMiddleware(...) is created with enabled_mode and similarity_threshold) to
also pass config.max_entries (or the corresponding builder argument) into the
CacheMiddleware constructor, and ensure CacheMiddleware's __init__ or factory
accepts and uses that parameter to set the LRU capacity instead of always using
the default.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py`:
- Around line 59-65: CacheMiddlewareConfig.max_entries is never passed into the
CacheMiddleware instance created in the registration/builder, so any configured
max_entries is ignored; update the instantiation in register.py (where
CacheMiddleware(...) is created with enabled_mode and similarity_threshold) to
also pass config.max_entries (or the corresponding builder argument) into the
CacheMiddleware constructor, and ensure CacheMiddleware's __init__ or factory
accepts and uses that parameter to set the LRU capacity instead of always using
the default.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: f69bb38a-9df8-4903-97a2-908e8482841d

📥 Commits

Reviewing files that changed from the base of the PR and between 24de910 and 9f60ef6.

📒 Files selected for processing (3)
  • packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py
  • packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py
  • packages/nvidia_nat_core/tests/nat/middleware/test_cache_middleware.py
✅ Files skipped from review due to trivial changes (1)
  • packages/nvidia_nat_core/tests/nat/middleware/test_cache_middleware.py

Per CodeRabbit on NVIDIA#1879: CacheMiddlewareConfig.max_entries was added
to the config schema but register.py still instantiated CacheMiddleware
with only (enabled_mode, similarity_threshold) — so a configured
max_entries value was silently ignored and deployments always used the
1024 default.

Pass config.max_entries through to the constructor so config-driven
LRU sizing actually takes effect.

Signed-off-by: ColinM-sys <cmcdonough@50words.com>
@dagardner-nv
Copy link
Copy Markdown
Contributor

@ColinM-sys I think this PR has some value, but it really needs a GitHub issue to be opened to discuss the design of these changes (no need to close the PR).

At a first glance, I'm not sure that similarity_threshold requires a minimum value but instead a mention of the potential consequences are of a lower value.

I don't believe the type-checking around the similarity_threshold attribute is required as this should be enforced by Pydantic, as the intended use-case is to construct this class from the CacheMiddlewareConfig class, anyone attempting to construct an instance of CacheMiddleware another way is on their own.

@ColinM-sys
Copy link
Copy Markdown
Author

Opened #1888 for the design discussion. Makes sense on the type-checking since Pydantic handles that at the config layer. On the threshold floor I think there is still value in at least documenting the cache poisoning risk of lower values even if a hard minimum is not appropriate for a core library. Happy to revise once the issue discussion settles on a direction.

Per design discussion in NVIDIA#1888:

- Drop similarity_threshold floor from 0.85 to 0 and remove manual
  type/range validation from the constructor; Pydantic enforces bounds
  at the config layer (CacheMiddlewareConfig) and anyone constructing
  CacheMiddleware directly is on their own
- Update CacheMiddlewareConfig.similarity_threshold to ge=0 with
  updated description documenting the performance cost of low values
  and the cache-collision risk near 0
- Replace the manual difflib.SequenceMatcher loop in _find_similar_key
  with difflib.get_close_matches for cleaner, more idiomatic code
- Remove constructor-level validation tests that no longer apply

Signed-off-by: ColinM-sys <cmcdonough@50words.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py (1)

79-109: ⚠️ Potential issue | 🟠 Major

Guard max_entries in constructor to prevent runtime eviction failures.

Line 238-239 can raise KeyError when max_entries is zero/negative (especially negative: loop condition stays true even when cache is empty). Since this class is also directly instantiated, add a defensive check in __init__.

🐛 Proposed fix
     def __init__(
         self,
         *,
         enabled_mode: str,
         similarity_threshold: float,
         max_entries: int = _DEFAULT_MAX_CACHE_ENTRIES,
     ) -> None:
@@
         self._enabled_mode = enabled_mode
         self._similarity_threshold = similarity_threshold
+        if not isinstance(max_entries, int) or isinstance(max_entries, bool) or max_entries < 1:
+            raise ValueError(f"max_entries must be a positive integer, got {max_entries!r}")
         # OrderedDict gives O(1) LRU: move_to_end() on hit, popitem(last=False)
         # to evict the oldest when we exceed max_entries.
         self._cache: OrderedDict[str, Any] = OrderedDict()
         self._max_entries = max_entries

Also applies to: 238-239

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py` around
lines 79 - 109, The constructor for CacheMiddleware (__init__) must guard the
max_entries parameter to avoid zero/negative values causing KeyError or infinite
loops during eviction; validate max_entries (the argument and the assigned
self._max_entries) and either raise a ValueError for non-positive values or
coerce to a minimum of 1 before assigning, so later eviction logic (where
entries are popped when len(self._cache) > self._max_entries) cannot loop or pop
from an empty OrderedDict.
🧹 Nitpick comments (3)
packages/nvidia_nat_core/tests/nat/middleware/test_cache_middleware.py (1)

346-349: Make cache-key assertions format-independent.

Line 346 and Line 374 currently assert by substring-matching serialized JSON text. That is brittle to serializer formatting and harder to reason about than exact key checks.

♻️ Suggested refactor
-        latest_keys = list(mw._cache.keys())  # noqa: SLF001
-        for i in range(7, 10):
-            assert any(f"unique_input_{i}" in k for k in latest_keys)
+        expected_keys = {
+            mw._serialize_input({"value": f"unique_input_{i}"})  # noqa: SLF001
+            for i in range(7, 10)
+        }
+        assert set(mw._cache.keys()) == expected_keys  # noqa: SLF001
@@
-        keys = "".join(list(mw._cache.keys()))  # noqa: SLF001
-        assert '"value": "A"' in keys
-        assert '"value": "D"' in keys
-        assert '"value": "B"' not in keys
+        cached_keys = set(mw._cache.keys())  # noqa: SLF001
+        assert mw._serialize_input({"value": "A"}) in cached_keys  # noqa: SLF001
+        assert mw._serialize_input({"value": "D"}) in cached_keys  # noqa: SLF001
+        assert mw._serialize_input({"value": "B"}) not in cached_keys  # noqa: SLF001

Also applies to: 374-377

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_core/tests/nat/middleware/test_cache_middleware.py`
around lines 346 - 349, The test currently checks cache keys by
substring-matching serialized JSON which is brittle; update assertions to be
format-independent by parsing or normalizing keys before comparison: grab
mw._cache.keys() in latest_keys, extract/normalize the logical identifier (e.g.,
remove JSON serialization wrapper or parse JSON if keys are JSON strings) and
then assert equality (or membership) against the exact expected strings like
"unique_input_7", "unique_input_8", "unique_input_9" instead of substring
matching; apply the same change to both occurrences around latest_keys and the
block referencing unique_input_i so the tests assert exact keys rather than
serialized fragments.
packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py (1)

22-22: Avoid schema dependency on a private runtime symbol.

Line 22 imports _DEFAULT_MAX_CACHE_ENTRIES from the implementation module. This couples config import-time behavior to middleware internals and a private name. Consider moving the constant to a small shared constants module in nat.middleware.cache and importing it from both files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py`
at line 22, The config module is importing the private runtime symbol
_DEFAULT_MAX_CACHE_ENTRIES from cache_middleware, creating an implementation
dependency; fix this by extracting the constant into a small shared module
(e.g., create nat.middleware.cache.constants) and expose a public name like
DEFAULT_MAX_CACHE_ENTRIES there, then update both cache_middleware_config.py and
cache_middleware.py to import DEFAULT_MAX_CACHE_ENTRIES from
nat.middleware.cache.constants (and remove uses of the underscored name),
ensuring all references (e.g., in CacheMiddleware and cache config) use the new
public constant.
packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py (1)

233-235: Align the inline comment with actual eviction order.

Line 233 says eviction is enforced “BEFORE insert,” but insertion happens first and then eviction runs. Updating the comment avoids future confusion during maintenance.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py` around
lines 233 - 235, Update the inline comment in the "Phase 3: Postprocess" block
to correctly state that eviction runs after insertion (not before). Edit the
comment that currently reads "Enforce the LRU bound BEFORE insert so the new
entry always lands in a cache of size <= max_entries" to reflect the actual
operation order: insertion occurs first and then eviction/LRU trimming runs, so
reword it to indicate eviction is performed AFTER insert to maintain the cache
size bound. Target the comment in the Phase 3 section of cache_middleware.py
(around the postprocess/cache insert logic) so future readers understand the
real eviction order.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py`:
- Around line 79-109: The constructor for CacheMiddleware (__init__) must guard
the max_entries parameter to avoid zero/negative values causing KeyError or
infinite loops during eviction; validate max_entries (the argument and the
assigned self._max_entries) and either raise a ValueError for non-positive
values or coerce to a minimum of 1 before assigning, so later eviction logic
(where entries are popped when len(self._cache) > self._max_entries) cannot loop
or pop from an empty OrderedDict.

---

Nitpick comments:
In
`@packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py`:
- Line 22: The config module is importing the private runtime symbol
_DEFAULT_MAX_CACHE_ENTRIES from cache_middleware, creating an implementation
dependency; fix this by extracting the constant into a small shared module
(e.g., create nat.middleware.cache.constants) and expose a public name like
DEFAULT_MAX_CACHE_ENTRIES there, then update both cache_middleware_config.py and
cache_middleware.py to import DEFAULT_MAX_CACHE_ENTRIES from
nat.middleware.cache.constants (and remove uses of the underscored name),
ensuring all references (e.g., in CacheMiddleware and cache config) use the new
public constant.

In `@packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py`:
- Around line 233-235: Update the inline comment in the "Phase 3: Postprocess"
block to correctly state that eviction runs after insertion (not before). Edit
the comment that currently reads "Enforce the LRU bound BEFORE insert so the new
entry always lands in a cache of size <= max_entries" to reflect the actual
operation order: insertion occurs first and then eviction/LRU trimming runs, so
reword it to indicate eviction is performed AFTER insert to maintain the cache
size bound. Target the comment in the Phase 3 section of cache_middleware.py
(around the postprocess/cache insert logic) so future readers understand the
real eviction order.

In `@packages/nvidia_nat_core/tests/nat/middleware/test_cache_middleware.py`:
- Around line 346-349: The test currently checks cache keys by
substring-matching serialized JSON which is brittle; update assertions to be
format-independent by parsing or normalizing keys before comparison: grab
mw._cache.keys() in latest_keys, extract/normalize the logical identifier (e.g.,
remove JSON serialization wrapper or parse JSON if keys are JSON strings) and
then assert equality (or membership) against the exact expected strings like
"unique_input_7", "unique_input_8", "unique_input_9" instead of substring
matching; apply the same change to both occurrences around latest_keys and the
block referencing unique_input_i so the tests assert exact keys rather than
serialized fragments.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 00784370-cf08-4c25-aa8d-b9bf7e28bb18

📥 Commits

Reviewing files that changed from the base of the PR and between ee05bcd and 2036cf2.

📒 Files selected for processing (3)
  • packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py
  • packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py
  • packages/nvidia_nat_core/tests/nat/middleware/test_cache_middleware.py

@ColinM-sys ColinM-sys force-pushed the fix/cache-middleware-poisoning-and-bounds branch from 2036cf2 to 870acac Compare April 21, 2026 14:19
@ColinM-sys ColinM-sys requested a review from a team as a code owner April 21, 2026 14:19
ColinM-sys added a commit to ColinM-sys/NeMo-Agent-Toolkit1 that referenced this pull request Apr 21, 2026
…resholds

Per CodeRabbit on NVIDIA#1879:

1. cache_middleware_config.py was out of sync with the new constructor
   semantics — schema allowed similarity_threshold in [0, 1] and had no
   max_entries field at all, so config-driven instantiation could pass
   values the constructor now rejects. Re-export
   _MIN_FUZZY_THRESHOLD and _DEFAULT_MAX_CACHE_ENTRIES from the module
   and wire both constants into the schema:
     similarity_threshold: ge=_MIN_FUZZY_THRESHOLD, le=1.0
     max_entries: ge=1, default=_DEFAULT_MAX_CACHE_ENTRIES
   Docstrings mirror the constructor's rationale (cache-poisoning for
   threshold, DoS for max_entries).

2. Constructor validation did not reject `bool` (since isinstance(True, int)
   is True in Python) or non-finite floats (NaN / +inf / -inf). Both are
   classic config-bug foot-guns: a boolean silently becoming 0 or 1.0,
   or a parser that hands back NaN from an upstream source, would slip
   past the range comparison. Add explicit:
     - bool rejection on similarity_threshold (before the number check)
     - math.isfinite() check on similarity_threshold
     - bool rejection on max_entries

New tests:
  - test_threshold_bool_is_rejected (parametrized True / False)
  - test_threshold_non_finite_is_rejected (parametrized NaN / +inf / -inf)
  - test_bool_max_entries_is_rejected (parametrized True / False)

Signed-off-by: ColinM-sys <cmcdonough@50words.com>
ColinM-sys added a commit to ColinM-sys/NeMo-Agent-Toolkit1 that referenced this pull request Apr 21, 2026
Per CodeRabbit on NVIDIA#1879: CacheMiddlewareConfig.max_entries was added
to the config schema but register.py still instantiated CacheMiddleware
with only (enabled_mode, similarity_threshold) — so a configured
max_entries value was silently ignored and deployments always used the
1024 default.

Pass config.max_entries through to the constructor so config-driven
LRU sizing actually takes effect.

Signed-off-by: ColinM-sys <cmcdonough@50words.com>
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ci/scripts/common.sh`:
- Around line 75-77: In function build_wheel update the uv build invocation to
quote the directory argument so paths with spaces or special chars are handled;
specifically, change the --directory $1 usage in the uv build command to pass
the value quoted (reference symbols: build_wheel, uv build, --directory,
WHEELS_DIR) so the shell does not split or misinterpret the path.

In `@ci/scripts/github/build_wheel.sh`:
- Around line 40-42: The invocation expands NAT_EXAMPLE unquoted which can break
argument parsing for paths with spaces or globs; update the call site in the
loop to pass the variable quoted by changing build_wheel ${NAT_EXAMPLE}
"examples" to use a quoted parameter like build_wheel "${NAT_EXAMPLE}"
"examples" so the build_wheel function receives the full path as a single
argument (refer to NAT_EXAMPLE, NAT_EXAMPLES and build_wheel).

In `@ci/scripts/gitlab/artifactory/Dockerfile`:
- Around line 16-18: The Dockerfile currently leaves the container running as
root; update the Dockerfile (the image built from the FROM
ghcr.io/astral-sh/uv:0.9.28-python3.13-bookworm and the RUN installing JFrog
CLI) to create or use a non-root user and set USER to that non-root account
before the final image layer. Specifically, add steps to create a dedicated
non-root user/group (e.g., app user), adjust ownership of any required
directories or installed artifacts created by the RUN installation, and then add
a USER app (or chosen username) directive so the container runs as a non-root
user at runtime. Ensure the user has necessary permissions for the installed
tooling and runtime files created by the RUN command.
- Line 18: Replace the unsafe "RUN VERSION=2.99.0 curl -fL
https://install-cli.jfrog.io | sh" pattern in the Dockerfile with a direct
binary download and checksum verification: use the same VERSION variable to
construct the jfrog CLI binary URL, download the binary to a temporary path (do
not pipe to sh), download or embed the official SHA256 checksum for that
VERSION, verify the downloaded file with sha256sum (or equivalent), and only on
successful verification chmod +x and install/move the binary into
/usr/local/bin; ensure the build fails if checksum verification fails and remove
the use of an unconditional curl | sh in the RUN command.

In `@ci/scripts/gitlab/checks.sh`:
- Line 37: Fix the typo in the echo statement that prints "Runing Documentation
checks": locate the echo call containing the string "Runing Documentation
checks" and change it to "Running Documentation checks" so the CI status output
is spelled correctly.

In
`@examples/notebooks/launchables/GPU_Cluster_Sizing_with_NeMo_Agent_Toolkit.ipynb`:
- Around line 1160-1164: The four broken documentation links in the markdown
list (the link strings containing "Full Documentation", "Get Started Guide",
"Examples", "Evaluation Guide", and "Troubleshooting" that currently point to
URLs with "/nemo/agent-toolkit/1.7/") must be fixed by replacing each 1.7 URL
with a validated, working documentation URL (or remove the corresponding list
item if no valid URL exists); locate the exact link strings in the notebook cell
and either (a) update the hrefs to the correct version (confirm the intended
NeMo Agent Toolkit version) or (b) remove the broken entries and add a brief
note pointing to the repository's README or canonical docs; ensure this change
is limited to the notebook cell containing those link strings and include a
concise commit message clarifying whether the version bump to 1.7 was
intentional.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 60096a16-f0dd-4d23-bc9d-6f2f1ef22ec8

📥 Commits

Reviewing files that changed from the base of the PR and between 2036cf2 and 870acac.

⛔ Files ignored due to path filters (7)
  • examples/agents/uv.lock is excluded by !**/*.lock
  • examples/finetuning/rl_with_openpipe_art/uv.lock is excluded by !**/*.lock
  • examples/frameworks/agno_personal_finance/uv.lock is excluded by !**/*.lock
  • examples/frameworks/multi_frameworks/uv.lock is excluded by !**/*.lock
  • packages/nvidia_nat_agno/uv.lock is excluded by !**/*.lock
  • packages/nvidia_nat_openpipe_art/uv.lock is excluded by !**/*.lock
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (34)
  • .github/workflows/ci_pipe.yml
  • .github/workflows/pr.yaml
  • .gitlab-ci.yml
  • ci/scripts/common.sh
  • ci/scripts/github/build_wheel.sh
  • ci/scripts/github/checks.sh
  • ci/scripts/github/common.sh
  • ci/scripts/github/docs.sh
  • ci/scripts/github/tests.sh
  • ci/scripts/gitlab/artifactory/Dockerfile
  • ci/scripts/gitlab/artifactory_upload.sh
  • ci/scripts/gitlab/build_wheel.sh
  • ci/scripts/gitlab/checks.sh
  • ci/scripts/gitlab/common.sh
  • ci/scripts/gitlab/docs.sh
  • ci/scripts/gitlab/model_health_check.sh
  • ci/scripts/gitlab/tests.sh
  • docs/source/versions1.json
  • examples/frameworks/agno_personal_finance/pyproject.toml
  • examples/frameworks/multi_frameworks/pyproject.toml
  • examples/notebooks/getting_started_with_nat.ipynb
  • examples/notebooks/launchables/GPU_Cluster_Sizing_with_NeMo_Agent_Toolkit.ipynb
  • examples/notebooks/observability_evaluation_and_profiling.ipynb
  • packages/nvidia_nat_agno/pyproject.toml
  • packages/nvidia_nat_core/src/nat/meta/pypi.md
  • packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py
  • packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py
  • packages/nvidia_nat_core/src/nat/middleware/cache/register.py
  • packages/nvidia_nat_core/tests/nat/middleware/test_cache_middleware.py
  • packages/nvidia_nat_memmachine/src/nat/plugins/memmachine/memmachine_editor.py
  • packages/nvidia_nat_openpipe_art/pyproject.toml
  • packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/config.py
  • packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trainer_adapter.py
  • packages/nvidia_nat_openpipe_art/tests/test_trainer_adapter.py
💤 Files with no reviewable changes (6)
  • ci/scripts/github/common.sh
  • examples/frameworks/multi_frameworks/pyproject.toml
  • packages/nvidia_nat_openpipe_art/tests/test_trainer_adapter.py
  • packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/config.py
  • examples/frameworks/agno_personal_finance/pyproject.toml
  • packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trainer_adapter.py
✅ Files skipped from review due to trivial changes (8)
  • ci/scripts/gitlab/docs.sh
  • ci/scripts/github/tests.sh
  • ci/scripts/github/checks.sh
  • examples/notebooks/observability_evaluation_and_profiling.ipynb
  • packages/nvidia_nat_core/src/nat/meta/pypi.md
  • examples/notebooks/getting_started_with_nat.ipynb
  • .github/workflows/pr.yaml
  • docs/source/versions1.json
🚧 Files skipped from review as they are similar to previous changes (4)
  • packages/nvidia_nat_core/tests/nat/middleware/test_cache_middleware.py
  • packages/nvidia_nat_core/src/nat/middleware/cache/register.py
  • packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware_config.py
  • packages/nvidia_nat_core/src/nat/middleware/cache/cache_middleware.py

Comment thread ci/scripts/common.sh
Comment on lines 75 to 77
function build_wheel() {
rapids-logger "Building Wheel for $1"
echo "Building Wheel for $1"
uv build --wheel --no-progress --out-dir "${WHEELS_DIR}/$2" --directory $1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Quote the --directory parameter in build_wheel.

Line 77 passes $1 unquoted, which can break uv build argument parsing for non-trivial paths.

Suggested fix
 function build_wheel() {
     echo "Building Wheel for $1"
-    uv build --wheel --no-progress --out-dir "${WHEELS_DIR}/$2" --directory $1
+    uv build --wheel --no-progress --out-dir "${WHEELS_DIR}/$2" --directory "$1"
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function build_wheel() {
rapids-logger "Building Wheel for $1"
echo "Building Wheel for $1"
uv build --wheel --no-progress --out-dir "${WHEELS_DIR}/$2" --directory $1
function build_wheel() {
echo "Building Wheel for $1"
uv build --wheel --no-progress --out-dir "${WHEELS_DIR}/$2" --directory "$1"
}
🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 77-77: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci/scripts/common.sh` around lines 75 - 77, In function build_wheel update
the uv build invocation to quote the directory argument so paths with spaces or
special chars are handled; specifically, change the --directory $1 usage in the
uv build command to pass the value quoted (reference symbols: build_wheel, uv
build, --directory, WHEELS_DIR) so the shell does not split or misinterpret the
path.

Comment on lines +40 to 42
for NAT_EXAMPLE in "${NAT_EXAMPLES[@]}"; do
# places all wheels flat under example
build_wheel ${NAT_EXAMPLE} "examples"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Quote the example path when invoking build_wheel.

Line 42 still expands ${NAT_EXAMPLE} unquoted, so a path with spaces/globs can break argument parsing despite the safer array loop.

Suggested fix
-for NAT_EXAMPLE in "${NAT_EXAMPLES[@]}"; do
+for NAT_EXAMPLE in "${NAT_EXAMPLES[@]}"; do
     # places all wheels flat under example
-    build_wheel ${NAT_EXAMPLE} "examples"
+    build_wheel "${NAT_EXAMPLE}" "examples"
 done
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for NAT_EXAMPLE in "${NAT_EXAMPLES[@]}"; do
# places all wheels flat under example
build_wheel ${NAT_EXAMPLE} "examples"
for NAT_EXAMPLE in "${NAT_EXAMPLES[@]}"; do
# places all wheels flat under example
build_wheel "${NAT_EXAMPLE}" "examples"
done
🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 40-40: Possible misspelling: NAT_EXAMPLES may not be assigned. Did you mean NAT_EXAMPLE?

(SC2153)


[info] 42-42: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci/scripts/github/build_wheel.sh` around lines 40 - 42, The invocation
expands NAT_EXAMPLE unquoted which can break argument parsing for paths with
spaces or globs; update the call site in the loop to pass the variable quoted by
changing build_wheel ${NAT_EXAMPLE} "examples" to use a quoted parameter like
build_wheel "${NAT_EXAMPLE}" "examples" so the build_wheel function receives the
full path as a single argument (refer to NAT_EXAMPLE, NAT_EXAMPLES and
build_wheel).

Comment on lines +16 to +18
FROM ghcr.io/astral-sh/uv:0.9.28-python3.13-bookworm

RUN VERSION=2.99.0 curl -fL https://install-cli.jfrog.io | sh
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Run the helper image as a non-root user.

The image currently does not set USER, so runtime defaults can remain root (Trivy DS-0002).

🛡️ Proposed hardening diff
 FROM ghcr.io/astral-sh/uv:0.9.28-python3.13-bookworm
 
 RUN VERSION=2.99.0 curl -fL https://install-cli.jfrog.io | sh
+RUN useradd --create-home --uid 10001 appuser
+USER appuser

As per coding guidelines "Check for security vulnerabilities and potential issues."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
FROM ghcr.io/astral-sh/uv:0.9.28-python3.13-bookworm
RUN VERSION=2.99.0 curl -fL https://install-cli.jfrog.io | sh
FROM ghcr.io/astral-sh/uv:0.9.28-python3.13-bookworm
RUN VERSION=2.99.0 curl -fL https://install-cli.jfrog.io | sh
RUN useradd --create-home --uid 10001 appuser
USER appuser
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci/scripts/gitlab/artifactory/Dockerfile` around lines 16 - 18, The
Dockerfile currently leaves the container running as root; update the Dockerfile
(the image built from the FROM ghcr.io/astral-sh/uv:0.9.28-python3.13-bookworm
and the RUN installing JFrog CLI) to create or use a non-root user and set USER
to that non-root account before the final image layer. Specifically, add steps
to create a dedicated non-root user/group (e.g., app user), adjust ownership of
any required directories or installed artifacts created by the RUN installation,
and then add a USER app (or chosen username) directive so the container runs as
a non-root user at runtime. Ensure the user has necessary permissions for the
installed tooling and runtime files created by the RUN command.


FROM ghcr.io/astral-sh/uv:0.9.28-python3.13-bookworm

RUN VERSION=2.99.0 curl -fL https://install-cli.jfrog.io | sh
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether the Dockerfile installs JFrog CLI via remote script without checksum verification.
rg -n 'install-cli\.jfrog\.io|curl\s+-fL|sha256sum|gpg|cosign' ci/scripts/gitlab/artifactory/Dockerfile

Repository: NVIDIA/NeMo-Agent-Toolkit

Length of output: 133


🏁 Script executed:

# Verify the Dockerfile exists and examine full context around line 18
cat -n ci/scripts/gitlab/artifactory/Dockerfile

Repository: NVIDIA/NeMo-Agent-Toolkit

Length of output: 991


🏁 Script executed:

# Search for other similar curl patterns in Dockerfiles
fd -e Dockerfile -x rg 'curl.*\|.*sh' {}

Repository: NVIDIA/NeMo-Agent-Toolkit

Length of output: 51


Harden JFrog CLI installation against supply-chain risks by removing curl | sh pattern.

Line 18 executes a remote installation script without integrity verification, which weakens supply-chain security for CI images. The -fL flags hide errors and follow redirects, increasing attack surface. Replace with direct binary download and SHA256 checksum verification:

Proposed hardening diff
 FROM ghcr.io/astral-sh/uv:0.9.28-python3.13-bookworm
 
-RUN VERSION=2.99.0 curl -fL https://install-cli.jfrog.io | sh
+ARG JF_VERSION=2.99.0
+ARG JF_SHA256
+RUN curl -fsSL "https://releases.jfrog.io/artifactory/jfrog-cli/v2-jf/${JF_VERSION}/jfrog-cli-linux-amd64/jf" -o /usr/local/bin/jf \
+    && echo "${JF_SHA256}  /usr/local/bin/jf" | sha256sum -c - \
+    && chmod +x /usr/local/bin/jf
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci/scripts/gitlab/artifactory/Dockerfile` at line 18, Replace the unsafe "RUN
VERSION=2.99.0 curl -fL https://install-cli.jfrog.io | sh" pattern in the
Dockerfile with a direct binary download and checksum verification: use the same
VERSION variable to construct the jfrog CLI binary URL, download the binary to a
temporary path (do not pipe to sh), download or embed the official SHA256
checksum for that VERSION, verify the downloaded file with sha256sum (or
equivalent), and only on successful verification chmod +x and install/move the
binary into /usr/local/bin; ensure the build fails if checksum verification
fails and remove the use of an unconditional curl | sh in the RUN command.



rapids-logger "Runing Documentation checks"
echo "Runing Documentation checks"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo in CI status output.

Line 37 prints "Runing Documentation checks"; please change to "Running Documentation checks".

✏️ Proposed diff
-echo "Runing Documentation checks"
+echo "Running Documentation checks"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo "Runing Documentation checks"
echo "Running Documentation checks"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci/scripts/gitlab/checks.sh` at line 37, Fix the typo in the echo statement
that prints "Runing Documentation checks": locate the echo call containing the
string "Runing Documentation checks" and change it to "Running Documentation
checks" so the CI status output is spelled correctly.

Comment on lines +1160 to +1164
"- **Documentation**: [Full Documentation](https://docs.nvidia.com/nemo/agent-toolkit/1.7/index.html)\n",
"- **Get Started Guide**: [Getting Started](https://docs.nvidia.com/nemo/agent-toolkit/1.7/get-started/installation.html)\n",
"- **Examples**: Check the `examples/` directory in the cloned repository\n",
"- **Evaluation Guide**: [Evaluate with NeMo Agent Toolkit](https://docs.nvidia.com/nemo/agent-toolkit/1.6/improve-workflows/evaluate.html)\n",
"- **Troubleshooting**: [Common Issues](https://docs.nvidia.com/nemo/agent-toolkit/1.6/resources/troubleshooting.html)\n"
"- **Evaluation Guide**: [Evaluate with NeMo Agent Toolkit](https://docs.nvidia.com/nemo/agent-toolkit/1.7/improve-workflows/evaluate.html)\n",
"- **Troubleshooting**: [Common Issues](https://docs.nvidia.com/nemo/agent-toolkit/1.7/resources/troubleshooting.html)\n"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check if the documentation URLs return successful HTTP responses.

urls=(
  "https://docs.nvidia.com/nemo/agent-toolkit/1.7/index.html"
  "https://docs.nvidia.com/nemo/agent-toolkit/1.7/get-started/installation.html"
  "https://docs.nvidia.com/nemo/agent-toolkit/1.7/improve-workflows/evaluate.html"
  "https://docs.nvidia.com/nemo/agent-toolkit/1.7/resources/troubleshooting.html"
)

for url in "${urls[@]}"; do
  echo "Checking: $url"
  http_code=$(curl -s -o /dev/null -w "%{http_code}" "$url")
  if [ "$http_code" -eq 200 ]; then
    echo "  ✓ OK ($http_code)"
  else
    echo "  ✗ FAILED ($http_code)"
  fi
done

Repository: NVIDIA/NeMo-Agent-Toolkit

Length of output: 468


Update or remove broken documentation links (all return 404 errors).

The four documentation URLs in lines 1160–1164 have been updated to version 1.7, but all return HTTP 404 errors:

  • Full Documentation
  • Getting Started / installation
  • Evaluate with NeMo Agent Toolkit
  • Troubleshooting

Either update the URLs to point to valid documentation or remove the broken references. Additionally, clarify whether this documentation version update is intentional for this PR or should be addressed separately, as it appears unrelated to the stated cache-middleware objectives.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/notebooks/launchables/GPU_Cluster_Sizing_with_NeMo_Agent_Toolkit.ipynb`
around lines 1160 - 1164, The four broken documentation links in the markdown
list (the link strings containing "Full Documentation", "Get Started Guide",
"Examples", "Evaluation Guide", and "Troubleshooting" that currently point to
URLs with "/nemo/agent-toolkit/1.7/") must be fixed by replacing each 1.7 URL
with a validated, working documentation URL (or remove the corresponding list
item if no valid URL exists); locate the exact link strings in the notebook cell
and either (a) update the hrefs to the correct version (confirm the intended
NeMo Agent Toolkit version) or (b) remove the broken entries and add a brief
note pointing to the repository's README or canonical docs; ensure this change
is limited to the notebook cell containing those link strings and include a
concise commit message clarifying whether the version bump to 1.7 was
intentional.

ColinM-sys added a commit to ColinM-sys/NeMo-Agent-Toolkit1 that referenced this pull request Apr 21, 2026
…resholds

Per CodeRabbit on NVIDIA#1879:

1. cache_middleware_config.py was out of sync with the new constructor
   semantics — schema allowed similarity_threshold in [0, 1] and had no
   max_entries field at all, so config-driven instantiation could pass
   values the constructor now rejects. Re-export
   _MIN_FUZZY_THRESHOLD and _DEFAULT_MAX_CACHE_ENTRIES from the module
   and wire both constants into the schema:
     similarity_threshold: ge=_MIN_FUZZY_THRESHOLD, le=1.0
     max_entries: ge=1, default=_DEFAULT_MAX_CACHE_ENTRIES
   Docstrings mirror the constructor's rationale (cache-poisoning for
   threshold, DoS for max_entries).

2. Constructor validation did not reject `bool` (since isinstance(True, int)
   is True in Python) or non-finite floats (NaN / +inf / -inf). Both are
   classic config-bug foot-guns: a boolean silently becoming 0 or 1.0,
   or a parser that hands back NaN from an upstream source, would slip
   past the range comparison. Add explicit:
     - bool rejection on similarity_threshold (before the number check)
     - math.isfinite() check on similarity_threshold
     - bool rejection on max_entries

New tests:
  - test_threshold_bool_is_rejected (parametrized True / False)
  - test_threshold_non_finite_is_rejected (parametrized NaN / +inf / -inf)
  - test_bool_max_entries_is_rejected (parametrized True / False)

Signed-off-by: ColinM-sys <cmcdonough@50words.com>
ColinM-sys added a commit to ColinM-sys/NeMo-Agent-Toolkit1 that referenced this pull request Apr 21, 2026
Per CodeRabbit on NVIDIA#1879: CacheMiddlewareConfig.max_entries was added
to the config schema but register.py still instantiated CacheMiddleware
with only (enabled_mode, similarity_threshold) — so a configured
max_entries value was silently ignored and deployments always used the
1024 default.

Pass config.max_entries through to the constructor so config-driven
LRU sizing actually takes effect.

Signed-off-by: ColinM-sys <cmcdonough@50words.com>
@ColinM-sys ColinM-sys force-pushed the fix/cache-middleware-poisoning-and-bounds branch 2 times, most recently from ec97055 to 2036cf2 Compare April 21, 2026 14:37
Comment on lines 236 to +237
self._cache[input_str] = result
logger.debug("Cached result for function %s", context.name)
self._cache.move_to_end(input_str)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this call to self._cache.move_to_end(input_str) needed? If it was just inserted on line 236, shouldn't it already be at the end?

# exhaustion DoS and, combined with fuzzy matching, a long-lived surface for
# cross-request confusion. OrderedDict-backed LRU evicts the oldest entry
# when the cache exceeds this bound.
_DEFAULT_MAX_CACHE_ENTRIES = 1024
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed as a global const, the default value should be in the config class, since we only construct CacheMiddleware from the config object, the constructor similarly doesn't need a default value.

"inputs return the same cached response."),
)
max_entries: int = Field(
default=_DEFAULT_MAX_CACHE_ENTRIES,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
default=_DEFAULT_MAX_CACHE_ENTRIES,
default=1024,

Remove the const global, 1024 isn't a number that needs to be a const.

def test_custom_initialization(self):
"""Test custom initialization."""
middleware = CacheMiddleware(enabled_mode="always", similarity_threshold=0.8)
middleware = CacheMiddleware(enabled_mode="always", similarity_threshold=0.9)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert, I don't think this is needed

async def test_fuzzy_match_caching(self, middleware_context):
"""Test fuzzy matching with similarity_threshold < 1.0."""
middleware = CacheMiddleware(enabled_mode="always", similarity_threshold=0.8)
middleware = CacheMiddleware(enabled_mode="always", similarity_threshold=0.9)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, not needed

Comment on lines -271 to +270
middleware = CacheMiddleware(enabled_mode="always", similarity_threshold=0.5)
middleware = CacheMiddleware(enabled_mode="always", similarity_threshold=0.9)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert

Comment on lines -288 to +287
middleware = CacheMiddleware(enabled_mode="always", similarity_threshold=0.7)
middleware = CacheMiddleware(enabled_mode="always", similarity_threshold=0.85)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert

@dagardner-nv dagardner-nv self-assigned this Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants