Skip to content

rpc: set_validated_blocks lacks request timeout — stalled report endpoint hangs validation_reporter task #139

@Nexory

Description

@Nexory

set_validated_blocks in crates/stateless-common/src/rpc_client.rs:593-612 and its caller in bin/stateless-validator/src/workers.rs:161-166 lack any request timeout. A TCP connection to the report endpoint that is accepted but never replies hangs the validation_reporter task indefinitely, silently severing all subsequent upstream reports for the session.

Current code

// crates/stateless-common/src/rpc_client.rs:593-612
pub async fn set_validated_blocks(
    &self,
    blocks: Vec<ValidatedBlock>,
) -> Result<(), ProviderError> {
    let provider = ...;
    provider.client().request("...", (blocks,)).await
    //       ^^^^^^^^^^^^^^^^^^ — no tokio::time::timeout wrapper, no round_robin_with_backoff
}
// bin/stateless-validator/src/workers.rs:161-166
client.set_validated_blocks(reports).await
//                                  ^^^^^ — also no outer timeout

The alloy RootProvider built via connect_http uses reqwest (0.12/0.13) with no default request timeout.

Why this is not covered by existing PRs

  • PR fix: bound per-attempt RPC timeout and retry empty get_code responses #129 (merged 2026-04-30) added per_attempt_timeout (default 20s) applied via tokio::time::timeout inside round_robin_with_backoff. set_validated_blocks does not go through that helper, so the new guard does not apply.
  • PR feat: add per-method RPC timeout configuration #110 (open since 2026-03-31) adds block_timeout, witness_timeout, code_timeout as per-method configuration. It does not add a report/set_validated_blocks timeout.
  • The module-level doc at line 12 explicitly states "set_validated_blocks is unthrottled" — confirming this is the intended design, not an oversight. But the security/operational consequence of "unthrottled" extending to "no timeout at all" appears unintended given the parallel evolution of per_attempt_timeout on the other methods.

Impact

The validation_reporter is a separate task::spawn (workers.rs:55), so the main block validation pipeline is unaffected. What actually happens on a hung report call:

  1. The reporter task's loop blocks on the .await indefinitely.
  2. All subsequent validation reports for the session are silently dropped — upstream monitoring/coordination sees the validator go quiet without explicit error.
  3. At shutdown (workers.rs:106), the reporter JoinHandle is awaited under a 3-second tokio::time::timeout. The handle does not resolve (the inner .await is still hung), so shutdown proceeds after the 3-second budget, leaving the task running until process exit.

The net effect: silent loss of all upstream reports for the rest of the session, plus a noisy shutdown.

Suggested fix

Three options, increasing in invasiveness:

  1. Caller-side timeout (minimal):

    // workers.rs:161-166
    match tokio::time::timeout(
        Duration::from_secs(30),
        client.set_validated_blocks(reports),
    ).await { ... }
  2. Reuse per_attempt_timeout inside set_validated_blocks:

    tokio::time::timeout(
        self.config.per_attempt_timeout,
        provider.client().request(...).await,
    ).await
  3. Route set_validated_blocks through round_robin_with_backoff with n=1 (no retry). This reuses the existing per_attempt_timeout machinery and keeps the timeout policy centralized.

Option 2 is the smallest patch that fixes the root cause; option 3 is the cleanest architectural fit.

Secondary observation (related, informational)

The module-level doc at rpc_client.rs:21-25 claims: "every public method has a _with_deadline variant that takes an Option<Instant>." This is not quite accurate: get_block_unchecked (lines 460-469) has no _with_deadline companion. Worth either adding one (for trace-server callers that may want a bounded wait) or amending the module doc to note get_block_unchecked as an intentional exception.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions