Skip to content

Performance regression (30%) between Rust 1.90 and 1.91 #153154

@plafer

Description

@plafer

We noticed a ~30% performance regression in our benchmarks when going from Rust 1.90 to Rust 1.91. The benchmarks were run on a Macbook Pro M4. The regression still seems to be present in nightly (2026-02-26).

Code

I created a minimized version of our benchmark, although not minimal. Let me know how I can refine it if needed. The code is available here (link points to the right branch). To notice the regression, compare running

$ cargo bench --bench blake3_1to1_fast --features internal

between versions 1.90 and 1.91 (i.e. just changing the version in the rust-toolchain.toml).

//! file: miden-vm/benches/blake3_1to1_fast.rs
use criterion::{BatchSize, Criterion, criterion_group, criterion_main};
use miden_core::Felt;
use miden_core_lib::CoreLibrary;
use miden_processor::{FastProcessor, advice::AdviceInputs};
use miden_vm::{Assembler, DefaultHost, StackInputs};
use tokio::runtime::Runtime;


fn blake3_1to1_fast(c: &mut Criterion) {
    let mut group = c.benchmark_group("blake3_1to1_fast");

    // operand_stack: 8 words of 0xFFFFFFFF
    let stack_inputs =
        StackInputs::new(&[Felt::new(u64::from(u32::MAX)); 8]).unwrap();
    // advice_stack: 100 iterations
    let advice_inputs = AdviceInputs::default().with_stack([Felt::new(100)]);

    let mut assembler = Assembler::default();
    assembler
        .link_dynamic_library(CoreLibrary::default())
        .expect("failed to load core library");
    let program = assembler
        .assemble_program(BLAKE3_1TO1_MASM)
        .expect("Failed to compile test source.");

    group.bench_function("blake3_1to1", |bench| {
        bench.to_async(Runtime::new().unwrap()).iter_batched(
            || {
                let host =
                    DefaultHost::default().with_library(&CoreLibrary::default()).unwrap();
                let processor =
                    FastProcessor::new(stack_inputs).with_advice(advice_inputs.clone());
                (host, program.clone(), processor)
            },
            |(mut host, program, processor)| async move {
                processor.execute(&program, &mut host).await.unwrap();
            },
            BatchSize::SmallInput,
        );
    });

    group.finish();
}

const BLAKE3_1TO1_MASM: &str = "\
use miden::core::crypto::hashes::blake3
use miden::core::sys

begin
    # Push the number of iterations on the stack, and assess if we should loop
    adv_push.1 dup neq.0

    while.true
        # Move loop counter down
        movdn.8

        # Execute blake3 hash function
        exec.blake3::hash

        # Decrement counter, and check if we loop again
        movup.8 sub.1 dup neq.0
    end

    # Drop counter
    drop

    # Truncate stack to make constraints happy
    exec.sys::truncate_stack
end
";


criterion_group!(benchmark, blake3_1to1_fast);
criterion_main!(benchmark);

On my Macbook Pro M4, Rust 1.90 yields

$ cargo bench --bench blake3_1to1_fast --features internal
program_execution_fast/blake3_1to1
                        time:   [2.0877 ms 2.0909 ms 2.0942 ms]

while on version 1.91,

$ cargo bench --bench blake3_1to1_fast --features internal
program_execution_fast/blake3_1to1
                        time:   [2.7507 ms 2.7549 ms 2.7594 ms]
                        change: [+31.472% +31.756% +32.074%] (p = 0.00 < 0.05)
                        Performance has regressed.

Note that the performance is similar poor on nightly 2026-02-26,

$ cargo bench --bench blake3_1to1_fast --features internal
blake3_1to1_fast/blake3_1to1
                        time:   [2.6636 ms 2.6697 ms 2.6762 ms]

Version it worked on

It most recently worked on: Rust 1.90,

rustc --version --verbose:

rustc 1.90.0 (1159e78c4 2025-09-14)
binary: rustc
commit-hash: 1159e78c4747b02ef996e55082b704c09b970588
commit-date: 2025-09-14
host: aarch64-apple-darwin
release: 1.90.0
LLVM version: 20.1.8

Version with regression

rustc --version --verbose:

rustc 1.91.1 (ed61e7d7e 2025-11-07)
binary: rustc
commit-hash: ed61e7d7e242494fb7057f2657300d9e77bb4fcb
commit-date: 2025-11-07
host: aarch64-apple-darwin
release: 1.91.1
LLVM version: 21.1.2

Metadata

Metadata

Assignees

Labels

A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-bugCategory: This is a bug.E-needs-mcveCall for participation: This issue has a repro, but needs a Minimal Complete and Verifiable ExampleI-slowIssue: Problems and improvements with respect to performance of generated code.P-highHigh priorityneeds-triageThis issue may need triage. Remove it if it has been sufficiently triaged.regression-untriagedUntriaged performance or correctness regression.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions