Skip to content

fix(archiver): pass context through rearchiveRange for cancellation#61

Open
faisalnugroho wants to merge 1 commit into
base:masterfrom
faisalnugroho:fix/rearchive-context-cancellation
Open

fix(archiver): pass context through rearchiveRange for cancellation#61
faisalnugroho wants to merge 1 commit into
base:masterfrom
faisalnugroho:fix/rearchive-context-cancellation

Conversation

@faisalnugroho

Copy link
Copy Markdown

What

Passes a context.Context through rearchiveRange so that cancellation propagates from both the HTTP handler and service shutdown.

The problem

rearchiveRange hardcodes context.Background() for both the retry.Do call and persistBlobsForBlockToS3:

rewritten, err := retry.Do(context.Background(), ...)
    _, _, e := a.persistBlobsForBlockToS3(context.Background(), id, true)

This causes two issues:

  1. Shutdown leaks: When the archiver is stopped, any in-progress rearchive continues running indefinitely. It holds references to the beacon client and storage, preventing clean shutdown.

  2. HTTP timeout leaks: The /rearchive handler has a 60-second middleware timeout. When the timeout fires, the HTTP response is aborted, but the underlying rearchive goroutine keeps running in the background — a silent resource leak for large slot ranges.

The fix

  • Added ctx context.Context parameter to rearchiveRange
  • Replaced both context.Background() calls with ctx
  • Updated the HTTP handler to pass r.Context(), which is cancelled on client disconnect or server shutdown
  • Updated the test to pass context.Background() (no cancellation needed in tests)

Fixes #58 (Bug 1)

rearchiveRange was using context.Background() for both the retry loop
and the persistBlobsForBlockToS3 call. This meant:

- Shutdown is not respected: in-progress rearchive continues running
  after the archiver is stopped, holding references to the beacon
  client and storage.
- HTTP timeout is not respected: the /rearchive handler has a 60s
  middleware timeout, but the underlying goroutine keeps running after
  the response is aborted.

Now accepts a context parameter and passes it through to both the retry
loop and the persist call. The HTTP handler passes r.Context() so
cancellation propagates correctly from both client disconnects and
server shutdown.

Fixes base#58 (Bug 1)
@cb-heimdall

Copy link
Copy Markdown
Collaborator

🟡 Heimdall Review Status

Requirement Status More Info
Reviews 🟡 0/1
Denominator calculation
Show calculation
1 if user is bot 0
1 if user is external 0
2 if repo is sensitive 0
From .codeflow.yml 1
Additional review requirements
Show calculation
Max 0
0
From CODEOWNERS 0
Global minimum 0
Max 1
1
1 if commit is unverified 1
Sum 2

@faisalnugroho

Copy link
Copy Markdown
Author

Hi! Following up on this bug fix PR. This PR passes the caller's context through rearchiveRange so that cancellation signals propagate correctly. Without this, calling Stop() during a rearchive operation leaves goroutines running with context.Background(). Happy to make adjustments.

All CI checks pass (StepSecurity ✅). Would love to get your review when you have a moment. Thanks!

@faisalnugroho

Copy link
Copy Markdown
Author

Hi! I'd love to get this PR reviewed when you have a moment. This PR addresses pass context through rearchiveRange for cancellation. Happy to make any adjustments based on your feedback. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(archiver): rearchiveRange ignores context — cannot be cancelled on shutdown or HTTP timeout

2 participants