Skip to content

Bound MongoDB sort memory in lifecycle current listing#2603

Open
delthas wants to merge 1 commit intodevelopment/8.3from
improvement/ARSN-565/bound-mongo-sort-lifecycle
Open

Bound MongoDB sort memory in lifecycle current listing#2603
delthas wants to merge 1 commit intodevelopment/8.3from
improvement/ARSN-565/bound-mongo-sort-lifecycle

Conversation

@delthas
Copy link
Copy Markdown

@delthas delthas commented Apr 1, 2026

Summary

When lifecycle v2 indexes are used for current listings, MongoDB must
re-sort results by _id in memory because the indexes are ordered by
value.last-modified. Without a cursor .limit(), this SORT stage
collects all matching documents before returning any — which can
exceed the 100MB memLimit and spill to disk.

On systems with low disk space (the scenario described in BB-753),
this disk spill fails, breaking lifecycle processing entirely. The
problem is not index creation failing, but index usage triggering
an unbounded sort that spills to a full disk.

Fix

Add a cursor limit based on maxScannedLifecycleListingEntries
(default 10,000) in DelimiterCurrent.genMDParamsV0(). This flows
through to MongoReadStream which already supports options.limit
on the cursor.

With the limit, MongoDB uses a bounded top-k heap sort — keeping only
~10,000 documents in memory regardless of total matches. We use
maxScannedLifecycleListingEntries rather than maxKeys because it
counts documents scanned (not results), which maps directly to cursor
documents regardless of bucket key format (v0 or v1).

Impact

Tested on a 100k-object bucket with explain("executionStats"):

Without .limit() With .limit(1001)
nReturned 448,497 1,001
totalDataSizeSorted 478 MB 1 MB
usedDisk true (spilled!) false

Issue: ARSN-565

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Apr 1, 2026

Hello delthas,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@claude
Copy link
Copy Markdown

claude bot commented Apr 1, 2026

LGTM

The change is well-targeted: adding params.limit = this.maxKeys + 1 bounds the MongoDB in-memory sort to a top-k heap, preventing the 100MB memLimit spill. The limit is applied post-filter in MongoDB (after lastModified and dataStoreName predicates), so the count is correct. genMDParamsV1() delegates to genMDParamsV0(), so both versioning formats benefit. maxKeys defaults to 1000 and can never be undefined/NaN, so limit is always a valid positive integer.

Review by Claude Code

@scality scality deleted a comment from bert-e Apr 1, 2026
@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Apr 1, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

@delthas delthas changed the title ARSN-565: Bound MongoDB sort memory in lifecycle current listing Bound MongoDB sort memory in lifecycle current listing Apr 1, 2026
@delthas delthas requested a review from francoisferrand April 1, 2026 08:18
@claude
Copy link
Copy Markdown

claude bot commented Apr 1, 2026

  • Import in delimiterCurrent.ts goes through the barrel index file instead of directly importing from ../../versioning/constants, risking circular dependencies and inconsistent with every other file in lib/algos/list/.
    • Import directly from ../../versioning/constants as other files in the same directory do.

Review by Claude Code

@delthas delthas marked this pull request as draft April 1, 2026 10:21
@delthas delthas removed the request for review from francoisferrand April 1, 2026 12:40
@claude
Copy link
Copy Markdown

claude bot commented Apr 1, 2026

LGTM

The fix correctly bounds MongoDB's in-memory sort by adding .limit(maxKeys + 1) for v1 bucket format. Key observations:

- The beforeDate and excludedDataStoreName filters are applied server-side in MongoDB (via readStream.js query construction) before the cursor limit, so no valid documents are lost.
- The client-side filtering in addContents() is redundant/defensive — the limit interacts safely with it.
- The v1-only guard is appropriate since v0 interleaves master/version keys.

Review by Claude Code

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.37%. Comparing base (eefeb8b) to head (617c843).

Additional details and impacted files
@@               Coverage Diff                @@
##           development/8.3    #2603   +/-   ##
================================================
  Coverage            73.37%   73.37%           
================================================
  Files                  222      222           
  Lines                18160    18162    +2     
  Branches              3761     3786   +25     
================================================
+ Hits                 13324    13326    +2     
  Misses                4831     4831           
  Partials                 5        5           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@delthas delthas force-pushed the improvement/ARSN-565/bound-mongo-sort-lifecycle branch from ab16800 to 0fd00c1 Compare April 1, 2026 13:47
@claude
Copy link
Copy Markdown

claude bot commented Apr 1, 2026

LGTM

Review by Claude Code

@claude
Copy link
Copy Markdown

claude bot commented Apr 1, 2026

LGTM

The fix correctly bounds the MongoDB cursor via params.limit in genMDParamsV0(), which flows through mainStreamParams to MongoReadStream where it is applied as cursor.limit(). The use of maxScannedLifecycleListingEntries + 1 is appropriate — it matches the existing scan-limit semantics and the +1 preserves truncation detection. Test coverage is updated accordingly.

Review by Claude Code

@delthas delthas marked this pull request as ready for review April 1, 2026 14:00
@francoisferrand
Copy link
Copy Markdown
Contributor

francoisferrand commented Apr 1, 2026

is this on the right branch?
ideally we may want this in zenko 2.13, which is still on backbeat 9.1, which is using artesca 8.2.26.
→ this is a long way from we should assess if there is a risk to go from 8.2.26 to 8.2.49...
→ how confident are we in this change? can we just put it in prod ASAP, or should we be slightly more careful?

@delthas delthas force-pushed the improvement/ARSN-565/bound-mongo-sort-lifecycle branch 2 times, most recently from abd7044 to 79cd9e0 Compare April 2, 2026 16:37
@claude
Copy link
Copy Markdown

claude bot commented Apr 2, 2026

LGTM

The fix correctly bounds the MongoDB cursor via params.limit = maxScannedLifecycleListingEntries + 1 in genMDParamsV0(), which flows through to MongoReadStream's existing .limit() support. The +1 to detect truncation is correct. The guard if (this.maxScannedLifecycleListingEntries) properly skips when the value is undefined, 0, or falsy. Both unit and functional tests cover the new behavior, including ordering verification.

Review by Claude Code

@claude
Copy link
Copy Markdown

claude bot commented Apr 2, 2026

LGTM

The cursor limit is correctly applied in genMDParamsV0(), which is also called by the V1 path via genMDParamsV1()listingParamsMasterKeysV0ToV1(), so the limit propagates to both bucket formats. The falsiness guard on maxScannedLifecycleListingEntries correctly handles undefined/null/0. Unit and functional tests cover the new behavior well.

Review by Claude Code

Add a cursor limit to DelimiterCurrent.genMDParamsV0() based on
maxScannedLifecycleListingEntries (default 10,000) so MongoDB can
use a bounded top-k sort instead of sorting all matching documents.

The lifecycle indexes (V2LifecycleLastModifiedPrefixed and
V2LifecycleDataStoreNamePrefixed) order by value.last-modified,
which forces MongoDB to perform an in-memory re-sort on _id. Without
a cursor limit, the SORT stage collects all matching documents before
returning any results. On large buckets this can exceed the 100MB
memLimit, causing MongoDB to spill to disk — which fails when the
volume is under storage pressure.

Issue: ARSN-565
@delthas delthas force-pushed the improvement/ARSN-565/bound-mongo-sort-lifecycle branch from 79cd9e0 to 617c843 Compare April 2, 2026 16:51
@claude
Copy link
Copy Markdown

claude bot commented Apr 2, 2026

LGTM

Review by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants