Skip to content

Simplify block production (leanSpec PR #464)#251

Open
pablodeymo wants to merge 3 commits intomainfrom
simplify-block-production
Open

Simplify block production (leanSpec PR #464)#251
pablodeymo wants to merge 3 commits intomainfrom
simplify-block-production

Conversation

@pablodeymo
Copy link
Collaborator

Motivation

leanSpec PR #464 ("simplify block production") rewrites build_block() to work directly with aggregated payloads keyed by AttestationData instead of individual Attestation objects. This eliminates an unnecessary round-trip: the old approach extracted individual attestations from aggregated payloads, filtered them, then re-aggregated. The new approach filters and selects proofs directly from the payload map.

Spec PR: leanEthereum/leanSpec#464

Description

Storage layer (crates/storage/src/store.rs)

  • PayloadBuffer::grouped_by_data_root() — New method that groups buffer entries by data_root (the tree hash root of AttestationData), deduplicating proofs by comparing their participant bitfield bytes. Returns HashMap<H256, Vec<AggregatedSignatureProof>>.
  • Store::known_payloads_by_data_root() — Public wrapper exposing the grouped payloads to the blockchain crate.
  • Added AggregatedSignatureProof to the import list.

Blockchain layer (crates/blockchain/src/store.rs)

New: extend_proofs_greedily()

Greedy set-cover algorithm that selects proofs maximizing new validator coverage for a given attestation data entry. Each selected proof produces one AggregatedAttestation in the block body. This replaces both aggregate_attestations_by_data() and select_aggregated_proofs().

Rewritten: build_block()

Before: Took &[Attestation] (individual per-validator attestations) and a HashMap<SignatureKey, Vec<AggregatedSignatureProof>> keyed by (validator_id, data_root). Used a fixed-point loop to collect individual attestations, then called aggregate_attestations_by_data() to group them, then select_aggregated_proofs() to find proofs. Returned (Block, State, Vec<AggregatedSignatureProof>).

After: Takes HashMap<H256, (AttestationData, Vec<AggregatedSignatureProof>)> keyed by data_root. Sorts entries by target.slot for deterministic order, filters by known block roots and source checkpoint match, then calls extend_proofs_greedily() directly. Handles genesis edge case (slot 0 justified root = parent_root). Loops when justification advances to unlock attestation data with new source checkpoints. Returns (Block, Vec<AggregatedSignatureProof>).

Simplified: produce_block_with_signatures()

Before: Called iter_known_aggregated_payloads(), then extract_latest_attestations() to reconstruct per-validator Attestation objects, then re-keyed the payloads by SignatureKey for build_block().

After: Calls known_payloads_by_data_root() to get deduplicated proofs grouped by data_root, resolves AttestationData for each via get_attestation_data_by_root(), and passes the result directly to build_block().

Routing change: aggregate_committee_signatures()

Own aggregation output now goes directly to known_payloads (immediately usable for block building and fork choice) instead of new_payloads. Gossip-received aggregated attestations still go through new -> known promotion at intervals 0/4.

Deleted

  • aggregate_attestations_by_data() — Functionality absorbed into extend_proofs_greedily()
  • select_aggregated_proofs() — Functionality absorbed into extend_proofs_greedily()

Backward Compatibility

  • Block format: Unchanged. Wire-compatible.
  • Storage format: Unchanged. No migration needed.
  • Public API: produce_block_with_signatures() same signature and return type.
  • Gossip handling: on_gossip_attestation, on_gossip_aggregated_attestation, on_block_core unchanged.
  • Fork choice: extract_latest_attestations and related methods unchanged.

How to Test

make fmt    # ✅ Passes
make lint   # ✅ Passes
make test   # ✅ All 107 tests pass (27 forkchoice, 8 signature, 14 STF, rest unit)

For full validation:

  • make run-devnet with --is-aggregator: verify blocks contain attestations, justified_slot and finalized_slot advance
  • Multi-client devnet: .claude/skills/test-pr-devnet/scripts/test-branch.sh

…leanSpec PR #464)

Rewrite build_block() to filter and select proofs directly from a
payload map keyed by data_root, eliminating the round-trip of extracting
individual attestations from aggregated payloads, filtering them, then
re-aggregating. This matches the simplified approach in leanSpec.

Key changes:
- Add PayloadBuffer::grouped_by_data_root() for proof deduplication
- Replace aggregate_attestations_by_data() + select_aggregated_proofs()
  with extend_proofs_greedily() greedy set-cover selection
- Simplify produce_block_with_signatures() to resolve AttestationData
  per data_root instead of reconstructing per-validator attestations
- Route own aggregation output directly to known_payloads (immediately
  usable for block building) instead of going through new -> known
@github-actions
Copy link

🤖 Codex Code Review

  1. High: block production can now include multiple votes from the same validator across different attestation data, which can over-count justification/finalization weight.

    The old path reduced the pool to the latest attestation per validator before block assembly via crates/storage/src/store.rs:867. This PR replaces that with known_payloads_by_data_root() in crates/blockchain/src/store.rs:798 and crates/storage/src/store.rs:151, then build_block greedily includes proofs per data_root without any global per-validator dedup across roots in crates/blockchain/src/store.rs:948 and crates/blockchain/src/store.rs:1037. If validator v first attests to target A and later to target B, both roots can now be packed into one block. process_attestations counts votes independently per target root in crates/blockchain/state_transition/src/lib.rs:283, so the same validator can contribute to multiple checkpoint justifications in the same block. For consensus code, that is a correctness bug. The fix is to restore a latest-per-validator filter before block assembly, or maintain a global “already represented validators” set across all selected proofs.

I did not run the test suite: cargo/rustup could not create temp/toolchain files in this sandbox. A regression test for “same validator has two known attestations on different roots; only the latest may be included in a produced block” would be important here.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions
Copy link

🤖 Claude Code Review

Here is the review:


Review: Simplify block production (leanSpec PR #464)

The refactoring is well-motivated and the logic is substantially cleaner. The core ideas are sound. A few observations below.


Non-determinism in extend_proofs_greedily (blockchain/store.rs:959)

let mut remaining_indices: HashSet<usize> = (0..proofs.len()).collect();

HashSet<usize> iteration order is non-deterministic due to Rust's hash randomization. When two proofs have equal new coverage, max_by_key breaks ties by the last element encountered in iteration order — which is undefined. Two proposers building on the same head could include different proof subsets.

Since proofs arrives ordered by insertion (FIFO from PayloadBuffer), using Vec<usize> for remaining_indices and swapping out exhausted entries would preserve insertion order and make selection deterministic.


Sort instability when target.slot ties (blockchain/store.rs:1032)

sorted_entries.sort_by_key(|(_, (data, _))| data.target.slot);

When two entries share the same target.slot, their relative order depends on HashMap's iteration order. Adding a secondary key (e.g., data_root) makes the sort fully deterministic:

sorted_entries.sort_by_key(|(data_root, (data, _))| (data.target.slot, *data_root));

Pre-existing metrics double-count (not introduced here, but worth noting)

inc_pq_sig_aggregated_signatures and inc_pq_sig_attestations_in_aggregated_signatures are called in both:

  • aggregate_committee_signatures at lines 202–203 (aggregation time)
  • extend_proofs_greedily at lines 988–989 (inclusion time)

This double-counting was present before this PR (select_aggregated_proofs also called these metrics). The counter semantics are ambiguous — "aggregated" could mean "was aggregated" or "was included in a block". Worth resolving separately, but not introduced by this change.


Silent drop of unresolvable data roots (blockchain/store.rs:805)

.filter_map(|(data_root, proofs)| {
    let data = store.get_attestation_data_by_root(&data_root)?;
    Some((data_root, (data, proofs)))
})

Missing attestation data silently drops valid proofs. This is acceptable in practice (the payload was stored without its data, which shouldn't happen), but a warn! log here would surface inconsistencies in devnet debugging without overhead in the happy path.


Routing change correctness: own aggregation → known directly

aggregate_committee_signatures now inserts directly to known_payloads, bypassing the new → known promotion cycle. This is correct: the promotion cycle exists to batch-accept gossip-received attestations at intervals 0/4; own aggregations at interval 2 have no reason to wait. Fork choice weights now reflect own attestations immediately after interval 2. The comment at line 206–208 correctly documents this distinction.


Genesis edge case handling (blockchain/store.rs:1019)

The pre-computation of current_justified for the genesis case is correct. When head_state.latest_block_header.slot == 0, process_block_header will update the justified root to parent_root, so pre-loading that root ensures the first iteration's source filter matches what the state transition will produce.


Fixed-point loop termination

Guaranteed: processed_data_roots grows monotonically, and iteration continues only when found_new && justification_advanced. Since the payload map is finite and each data_root is processed at most once, the loop terminates. ✓


Minor: redundant state transition at loop end

The final process_slots + process_block at lines 1099–1101 replicates the last loop iteration's state transition in cases where the loop ran at least once. The extra work is minimal (common path is 1–2 iterations), but the intermediate post_state from the last loop iteration is discarded rather than reused. Acceptable given the simplicity gain.


Overall

The simplification is correct and a net improvement. The two determinism points (Items 1 and 2) are worth addressing before merge — non-deterministic block content between proposers is not a consensus safety issue here (any valid block is accepted), but it makes testing and debugging harder.


Automated review by Claude (Anthropic) · sonnet · custom prompt

@pablodeymo pablodeymo requested a review from MegaRedHand March 26, 2026 19:42
@pablodeymo pablodeymo marked this pull request as ready for review March 26, 2026 19:42
@github-actions
Copy link

🤖 Kimi Code Review

File: crates/blockchain/src/store.rs

Lines 948–1004 (extend_proofs_greedily):

  • Performance: The greedy selection algorithm has O(n²) worst-case complexity due to the nested iteration over remaining_indices (line 972) and the filtering of participants (line 969). For large committees, this could block the block production hot path. Consider pre-computing participant sets as bitvectors to accelerate coverage calculations.
  • Cloning: proof.clone() (line 985) and att_data.clone() (line 983) allocate in a tight loop. If AggregatedSignatureProof is large, this pressure on the allocator may be significant.
  • API design: Using output parameters (&mut Vec<...>) instead of returning values is unidiomatic in Rust. Consider returning (Vec<AggregatedAttestation>, Vec<AggregatedSignatureProof>).

Lines 1009–1100 (build_block):

  • Missing validation: The function filters attestations by known_block_roots and source checkpoint (lines 1060–1066), but does not validate target epoch, attestation slot bounds, or FFG vote consistency before inclusion. Invalid attestations waste block space and may cause the produced block to fail process_block. Add explicit validation against the current state's epoch and slot.
  • Inefficient cloning: aggregated_attestations.clone() (line 1047) occurs inside the justification advancement loop. Each iteration clones the entire attestation list. Instead, rebuild the candidate from the accumulated vectors or use std::mem::take patterns to avoid repeated allocations.
  • State root computation: The final state root computation (lines 1058–1061) correctly recomputes the post-state after finalizing the attestation list.

Lines 206–209 (aggregate_committee_signatures):

  • Semantic change: Changing from `insert_new

Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link

🤖 Claude Code Review

Now I have enough context for a complete review. Here is my assessment:


Review: Simplify block production (leanSpec PR #464)

The refactoring is well-motivated and cleaner overall. A few correctness and determinism concerns worth addressing before merge.


Correctness: same validator may appear in multiple data roots

crates/blockchain/src/store.rs:1034–1057 (build_block inner loop)

The old path called extract_latest_attestations (crates/storage/src/store.rs:867), which reduced the pool to one attestation per validator (latest by slot). The new code groups by data_root and processes each group independently, with no cross-group per-validator deduplication.

In practice, known_payloads can hold proofs from multiple slots, so a validator may appear in proofs for two different data_root values (e.g., slot 5 targeting checkpoint A, and slot 6 targeting checkpoint B), both passing the att_data.source == current_justified filter. extend_proofs_greedily then includes both.

In process_attestations (state_transition/src/lib.rs:283–303), votes are accumulated per target.root independently:

let votes = justifications.entry(target.root).or_insert_with(|| ...);
votes[validator_id] = true;

A validator in two included attestations with different targets would contribute to both targets' vote counts — which is equivocation. Whether leanSpec PR #464 explicitly permits this needs clarification. If not, the fix is to maintain a global seen_validators: HashSet<u64> across all selected proofs and skip any proof whose participants overlap with already-included validators. The old extract_latest_attestations was enforcing exactly this invariant.


Non-determinism in extend_proofs_greedily

crates/blockchain/src/store.rs:959

let mut remaining_indices: HashSet<usize> = (0..proofs.len()).collect();

HashSet<usize> iteration order is randomized per run (Rust HashMap/HashSet use SipHash with a random seed). When two proofs have equal new coverage, max_by_key picks the last element found during iteration — which varies between runs. Two proposers on the same head could select different proof subsets and produce blocks with different attestation content. Using BTreeSet<usize> or a sorted Vec<usize> for remaining_indices would make selection deterministic.


Sort instability on target.slot ties

crates/blockchain/src/store.rs:1032

sorted_entries.sort_by_key(|(_, (data, _))| data.target.slot);

aggregated_payloads is a HashMap, so iteration order before sorting is non-deterministic. When two entries share the same target.slot, the tie is broken by whatever order HashMap yielded them, which varies per run. A secondary key resolves this:

sorted_entries.sort_by_key(|(data_root, (data, _))| (data.target.slot, *data_root));

Silent drop on missing attestation data

crates/blockchain/src/store.rs:805

.filter_map(|(data_root, proofs)| {
    let data = store.get_attestation_data_by_root(&data_root)?;
    Some((data_root, (data, proofs)))
})

Dropping proofs silently when get_attestation_data_by_root returns None is acceptable (shouldn't happen if the store is consistent), but a warn! here would surface the inconsistency in devnet runs. Low priority.


Routing change: own aggregation → known directly

crates/blockchain/src/store.rs:206–210

Inserting own aggregations directly into known_payloads is correct. The new → known promotion cycle exists for gossip-received attestations that should only be promoted at intervals 0/4. Own aggregations at interval 2 are locally validated and ready immediately. Fork choice weights now reflect own attestations immediately after interval 2, which is correct and faster.


Genesis edge case

crates/blockchain/src/store.rs:1019–1026

The pre-computation of current_justified for slot 0 is correct. process_block_header updates latest_justified.root = parent_root for genesis, so pre-loading that root before the first iteration ensures attestation source filtering matches what the state transition will produce.


Loop termination and redundant final state transition

Loop termination is guaranteed: processed_data_roots grows monotonically, the payload map is finite, and each data_root is consumed at most once. The final process_slots + process_block at lines 1099–1101 re-runs the state transition on the already-stable attestation set. This is a minor inefficiency (the last loop iteration's post_state is discarded rather than reused), but the extra work is bounded and acceptable.


Overall

The simplification aligns with the spec and is a clear improvement. The non-determinism points (Items 2 and 3) are straightforward fixes. The double-vote concern (Item 1) needs spec clarification — if leanSpec PR #464 does not explicitly allow multiple attestations per validator in a block, restoring a cross-data_root validator dedup is needed before merge.


Automated review by Claude (Anthropic) · sonnet · custom prompt

@github-actions
Copy link

🤖 Codex Code Review

Findings

  1. High: block production no longer enforces “latest attestation per validator”, so one validator can be included multiple times across different data_roots in the same block. The old path collapsed known payloads through extract_latest_attestations(...); this PR replaces that with known_payloads_by_data_root() and then greedily selects proofs per root, with no cross-root validator dedup. See crates/blockchain/src/store.rs:799, crates/blockchain/src/store.rs:1037, crates/blockchain/src/store.rs:1051. That is dangerous because process_attestations records votes per target independently and does not remove a validator’s earlier target vote, so the same validator can contribute to multiple justifications/finalizations in one transition. See crates/blockchain/state_transition/src/lib.rs:283, crates/blockchain/state_transition/src/lib.rs:302, crates/blockchain/state_transition/src/lib.rs:309. I’d keep a global latest-per-validator filter before proof selection.

  2. Medium: interval-2 local aggregation now writes straight into known, which changes fork-choice timing and breaks the documented new -> known promotion boundary. See crates/blockchain/src/store.rs:209 and the existing scheduling at crates/blockchain/src/store.rs:328, crates/blockchain/src/store.rs:346. known is explicitly the fork-choice-active pool in storage, so these locally aggregated votes can now affect update_head() when block import or other paths call it before the normal acceptance point. See crates/storage/src/store.rs:904, crates/blockchain/src/store.rs:628. If this timing change is intentional, it needs targeted tests; otherwise block-building access should be separated from fork-choice activation.

Testing: I could not run cargo test -p ethlambda-blockchain --lib --tests here because the sandbox blocks rustup from creating temp files under /home/runner/.rustup.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 26, 2026

Greptile Summary

This PR implements leanSpec #464, replacing the old build_block round-trip (individual Attestation objects → re-aggregate → select proofs) with a direct, proof-map–based pipeline. The logic is sound and the refactoring is clean.\n\nKey changes:\n- PayloadBuffer::grouped_by_data_root() (storage) — groups buffer entries by data_root, deduplicating proofs by SSZ-serialised participant bitfield. Deduplication key is correct; the seen map is per-call so there is no cross-call state leak.\n- extend_proofs_greedily() (blockchain) — greedy set-cover that selects proofs by new-validator coverage. Loop termination is correct: max_by_key always picks the globally best remaining proof, and new_covered.is_empty() fires only when no proof provides new coverage.\n- build_block() rewrite — filters directly against att_data.source == current_justified and re-checks for justification advancement inside a fixed-point loop. The genesis edge case is correctly handled. The processed_data_roots set prevents re-processing entries in subsequent loop iterations.\n- Routing change in aggregate_committee_signatures — own-created proofs now go directly to known_payloads, bypassing new → known promotion. Gossip-received aggregations still follow the promotion path.\n- Three P2 style/efficiency suggestions: redundant final state transition on the common non-advancing path, non-deterministic secondary ordering within same-target.slot entries, and the unnecessary final O(n) scan in extend_proofs_greedily before breaking.

Confidence Score: 4/5

Safe to merge; no correctness or security issues found — all flagged items are non-blocking P2 efficiency/reproducibility suggestions.

The refactoring is logically equivalent to the original for correct inputs, all 107 tests pass, wire format and public API are unchanged. The three comments (redundant state transition, non-deterministic sort secondary key, stale remaining_indices) are style/performance observations that do not affect block validity or liveness. Score is 4 rather than 5 because the secondary-sort non-determinism is a mild reproducibility concern worth a one-line fix before merge.

Minor attention to crates/blockchain/src/store.rs for the sort determinism fix (line 1032); crates/storage/src/store.rs is clean.

Important Files Changed

Filename Overview
crates/blockchain/src/store.rs Core block production rewrite: new extend_proofs_greedily set-cover algorithm, simplified build_block that works directly on proof maps, routing change in aggregate_committee_signatures (own proofs now bypass new→known promotion). Logic is correct; one redundant state transition and non-deterministic secondary sort within same-slot entries are minor efficiency/reproducibility concerns.
crates/storage/src/store.rs Adds PayloadBuffer::grouped_by_data_root() and the public Store::known_payloads_by_data_root() wrapper; deduplication by SSZ-bytes of participants is sound. Also imports AggregatedSignatureProof.

Sequence Diagram

sequenceDiagram
    participant P as Proposer
    participant BC as blockchain::store
    participant ST as storage::store

    P->>BC: produce_block_with_signatures(slot, validator_index)
    BC->>ST: known_payloads_by_data_root()
    ST-->>BC: HashMap<H256, Vec<AggregatedSignatureProof>>
    BC->>ST: get_attestation_data_by_root(data_root) [per entry]
    ST-->>BC: Option<AttestationData>
    Note over BC: Build HashMap<H256, (AttestationData, Vec<Proof>)>
    BC->>BC: build_block(head_state, slot, ...)
    Note over BC: Sort entries by target.slot
    loop Until no new found OR justification stable
        Note over BC: Filter by known_block_roots & source == current_justified
        BC->>BC: extend_proofs_greedily(proofs, ...)
        alt found_new
            BC->>BC: process_slots + process_block (candidate)
            alt justification advanced
                Note over BC: Update current_justified, continue
            else no advance
                Note over BC: break
            end
        else not found_new
            Note over BC: break immediately
        end
    end
    BC->>BC: process_slots + process_block (final block, for state_root)
    BC-->>P: (Block, Vec<AggregatedSignatureProof>)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 1079-1104

Comment:
**Redundant final state transition**

When the loop exits via the `"justification didn't advance"` branch (line 1083), the inner-loop state transition (lines 1075–1077) and the unconditional final transition at lines 1099–1101 are applied to an identical block: `aggregated_attestations` is not mutated between that break and the final computation. This means `process_slots` + `process_block` are called twice on the same inputs in that common case.

Consider caching the `post_state` from the last loop iteration and reusing it when the block contents didn't change after breaking. This is not a correctness issue, but it doubles the cost of the most common execution path (a single round of attestations with no justification advance).

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 1030-1032

Comment:
**Non-deterministic ordering within same `target.slot`**

`aggregated_payloads` is a `HashMap<H256, …>`, so `aggregated_payloads.iter()` yields entries in arbitrary order. The sort on `data.target.slot` provides partial determinism, but entries that share the same `target.slot` retain their relative HashMap iteration order, which varies across runs.

Adding a secondary sort key (e.g., `data_root`) would make block composition fully deterministic and improve test reproducibility:

```suggestion
        sorted_entries.sort_by_key(|(data_root, (data, _))| (data.target.slot, *data_root));
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 958-993

Comment:
**`remaining_indices` is never pruned of zero-coverage entries**

After the best proof is selected and its covered validators added to `covered`, all other indices that are now fully subsumed remain in `remaining_indices`. On the next iteration the inner `.map()` re-evaluates their coverage (all zero), `max_by_key` picks an arbitrary zero-coverage one, and the `new_covered.is_empty()` guard fires and breaks the loop. This is correct, but it means the final iteration always does O(remaining) work to discover what could be known immediately.

A small improvement: after `covered.extend(new_covered)`, remove any index whose participants are now fully covered, or filter `remaining_indices` before the `max_by_key` call to skip already-subsumed proofs.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Merge branch 'main' into simplify-block-..." | Re-trigger Greptile

Comment on lines +1079 to +1104
if post_state.latest_justified != current_justified {
current_justified = post_state.latest_justified;
// Continue: new checkpoint may unlock more attestation data
} else {
break;
}
}

// Fixed point reached: no new attestations found
if new_attestations.is_empty() {
break;
}

// Add new attestations and continue iteration
included_attestations.extend(new_attestations);
}

// Select existing proofs for the attestations to include in the block.
let (aggregated_attestations, aggregated_signatures) =
select_aggregated_proofs(&included_attestations, aggregated_payloads)?;

// Build final block
let attestations: AggregatedAttestations = aggregated_attestations
.try_into()
.expect("attestation count exceeds limit");

let mut final_block = Block {
slot,
proposer_index,
parent_root,
state_root: H256::ZERO,
body: BlockBody { attestations },
};

// Recompute post-state with final block to get correct state root
let mut post_state = head_state.clone();
process_slots(&mut post_state, slot)?;
process_block(&mut post_state, &final_block)?;

final_block.state_root = post_state.tree_hash_root();

Ok((final_block, post_state, aggregated_signatures))
}

/// Select existing aggregated proofs for attestations to include in a block.
///
/// Fresh gossip aggregation happens at interval 2 (`aggregate_committee_signatures`).
/// This function only selects from existing proofs in the known aggregated payloads buffer
/// (proofs from previously received blocks and promoted gossip aggregations).
///
/// Returns a list of (attestation, proof) pairs ready for block inclusion.
fn select_aggregated_proofs(
attestations: &[Attestation],
aggregated_payloads: &HashMap<SignatureKey, Vec<AggregatedSignatureProof>>,
) -> Result<(Vec<AggregatedAttestation>, Vec<AggregatedSignatureProof>), StoreError> {
let mut results = vec![];

for aggregated in aggregate_attestations_by_data(attestations) {
let data = &aggregated.data;
let message = data.tree_hash_root();

let mut remaining: HashSet<u64> = validator_indices(&aggregated.aggregation_bits).collect();

// Select existing proofs that cover the most remaining validators
while !remaining.is_empty() {
let Some(&target_id) = remaining.iter().next() else {
break;
};

let Some(candidates) = aggregated_payloads
.get(&(target_id, message))
.filter(|v| !v.is_empty())
else {
break;
};

let (proof, covered) = candidates
.iter()
.map(|p| {
let covered: Vec<_> = validator_indices(&p.participants)
.filter(|vid| remaining.contains(vid))
.collect();
(p, covered)
})
.max_by_key(|(_, covered)| covered.len())
.expect("candidates is not empty");

// No proof covers any remaining validator
if covered.is_empty() {
break;
}

let aggregate = AggregatedAttestation {
aggregation_bits: proof.participants.clone(),
data: data.clone(),
};
results.push((aggregate, proof.clone()));

metrics::inc_pq_sig_aggregated_signatures();
metrics::inc_pq_sig_attestations_in_aggregated_signatures(covered.len() as u64);

for vid in covered {
remaining.remove(&vid);
}
}
}

Ok(results.into_iter().unzip())
Ok((final_block, aggregated_signatures))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant final state transition

When the loop exits via the "justification didn't advance" branch (line 1083), the inner-loop state transition (lines 1075–1077) and the unconditional final transition at lines 1099–1101 are applied to an identical block: aggregated_attestations is not mutated between that break and the final computation. This means process_slots + process_block are called twice on the same inputs in that common case.

Consider caching the post_state from the last loop iteration and reusing it when the block contents didn't change after breaking. This is not a correctness issue, but it doubles the cost of the most common execution path (a single round of attestations with no justification advance).

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 1079-1104

Comment:
**Redundant final state transition**

When the loop exits via the `"justification didn't advance"` branch (line 1083), the inner-loop state transition (lines 1075–1077) and the unconditional final transition at lines 1099–1101 are applied to an identical block: `aggregated_attestations` is not mutated between that break and the final computation. This means `process_slots` + `process_block` are called twice on the same inputs in that common case.

Consider caching the `post_state` from the last loop iteration and reusing it when the block contents didn't change after breaking. This is not a correctness issue, but it doubles the cost of the most common execution path (a single round of attestations with no justification advance).

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1030 to +1032
// Sort by target.slot for deterministic processing order
let mut sorted_entries: Vec<_> = aggregated_payloads.iter().collect();
sorted_entries.sort_by_key(|(_, (data, _))| data.target.slot);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Non-deterministic ordering within same target.slot

aggregated_payloads is a HashMap<H256, …>, so aggregated_payloads.iter() yields entries in arbitrary order. The sort on data.target.slot provides partial determinism, but entries that share the same target.slot retain their relative HashMap iteration order, which varies across runs.

Adding a secondary sort key (e.g., data_root) would make block composition fully deterministic and improve test reproducibility:

Suggested change
// Sort by target.slot for deterministic processing order
let mut sorted_entries: Vec<_> = aggregated_payloads.iter().collect();
sorted_entries.sort_by_key(|(_, (data, _))| data.target.slot);
sorted_entries.sort_by_key(|(data_root, (data, _))| (data.target.slot, *data_root));
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 1030-1032

Comment:
**Non-deterministic ordering within same `target.slot`**

`aggregated_payloads` is a `HashMap<H256, …>`, so `aggregated_payloads.iter()` yields entries in arbitrary order. The sort on `data.target.slot` provides partial determinism, but entries that share the same `target.slot` retain their relative HashMap iteration order, which varies across runs.

Adding a secondary sort key (e.g., `data_root`) would make block composition fully deterministic and improve test reproducibility:

```suggestion
        sorted_entries.sort_by_key(|(data_root, (data, _))| (data.target.slot, *data_root));
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +958 to +993
let mut covered: HashSet<u64> = HashSet::new();
let mut remaining_indices: HashSet<usize> = (0..proofs.len()).collect();

AggregatedAttestation {
aggregation_bits: bits,
data,
}
})
.collect()
while !remaining_indices.is_empty() {
// Pick proof covering the most uncovered validators
let best = remaining_indices
.iter()
.map(|&idx| {
let new_coverage: Vec<u64> = proofs[idx]
.participant_indices()
.filter(|vid| !covered.contains(vid))
.collect();
(idx, new_coverage)
})
.max_by_key(|(_, cov)| cov.len());

let Some((best_idx, new_covered)) = best else {
break;
};
if new_covered.is_empty() {
break;
}

let proof = &proofs[best_idx];
attestations.push(AggregatedAttestation {
aggregation_bits: proof.participants.clone(),
data: att_data.clone(),
});
selected_proofs.push(proof.clone());

metrics::inc_pq_sig_aggregated_signatures();
metrics::inc_pq_sig_attestations_in_aggregated_signatures(new_covered.len() as u64);

covered.extend(new_covered);
remaining_indices.remove(&best_idx);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 remaining_indices is never pruned of zero-coverage entries

After the best proof is selected and its covered validators added to covered, all other indices that are now fully subsumed remain in remaining_indices. On the next iteration the inner .map() re-evaluates their coverage (all zero), max_by_key picks an arbitrary zero-coverage one, and the new_covered.is_empty() guard fires and breaks the loop. This is correct, but it means the final iteration always does O(remaining) work to discover what could be known immediately.

A small improvement: after covered.extend(new_covered), remove any index whose participants are now fully covered, or filter remaining_indices before the max_by_key call to skip already-subsumed proofs.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 958-993

Comment:
**`remaining_indices` is never pruned of zero-coverage entries**

After the best proof is selected and its covered validators added to `covered`, all other indices that are now fully subsumed remain in `remaining_indices`. On the next iteration the inner `.map()` re-evaluates their coverage (all zero), `max_by_key` picks an arbitrary zero-coverage one, and the `new_covered.is_empty()` guard fires and breaks the loop. This is correct, but it means the final iteration always does O(remaining) work to discover what could be known immediately.

A small improvement: after `covered.extend(new_covered)`, remove any index whose participants are now fully covered, or filter `remaining_indices` before the `max_by_key` call to skip already-subsumed proofs.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant