-
-
Notifications
You must be signed in to change notification settings - Fork 14.7k
Open
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-bugCategory: This is a bug.Category: This is a bug.E-needs-mcveCall for participation: This issue has a repro, but needs a Minimal Complete and Verifiable ExampleCall for participation: This issue has a repro, but needs a Minimal Complete and Verifiable ExampleI-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.P-highHigh priorityHigh priorityneeds-triageThis issue may need triage. Remove it if it has been sufficiently triaged.This issue may need triage. Remove it if it has been sufficiently triaged.regression-untriagedUntriaged performance or correctness regression.Untriaged performance or correctness regression.
Description
We noticed a ~30% performance regression in our benchmarks when going from Rust 1.90 to Rust 1.91. The benchmarks were run on a Macbook Pro M4. The regression still seems to be present in nightly (2026-02-26).
Code
I created a minimized version of our benchmark, although not minimal. Let me know how I can refine it if needed. The code is available here (link points to the right branch). To notice the regression, compare running
$ cargo bench --bench blake3_1to1_fast --features internalbetween versions 1.90 and 1.91 (i.e. just changing the version in the rust-toolchain.toml).
//! file: miden-vm/benches/blake3_1to1_fast.rs
use criterion::{BatchSize, Criterion, criterion_group, criterion_main};
use miden_core::Felt;
use miden_core_lib::CoreLibrary;
use miden_processor::{FastProcessor, advice::AdviceInputs};
use miden_vm::{Assembler, DefaultHost, StackInputs};
use tokio::runtime::Runtime;
fn blake3_1to1_fast(c: &mut Criterion) {
let mut group = c.benchmark_group("blake3_1to1_fast");
// operand_stack: 8 words of 0xFFFFFFFF
let stack_inputs =
StackInputs::new(&[Felt::new(u64::from(u32::MAX)); 8]).unwrap();
// advice_stack: 100 iterations
let advice_inputs = AdviceInputs::default().with_stack([Felt::new(100)]);
let mut assembler = Assembler::default();
assembler
.link_dynamic_library(CoreLibrary::default())
.expect("failed to load core library");
let program = assembler
.assemble_program(BLAKE3_1TO1_MASM)
.expect("Failed to compile test source.");
group.bench_function("blake3_1to1", |bench| {
bench.to_async(Runtime::new().unwrap()).iter_batched(
|| {
let host =
DefaultHost::default().with_library(&CoreLibrary::default()).unwrap();
let processor =
FastProcessor::new(stack_inputs).with_advice(advice_inputs.clone());
(host, program.clone(), processor)
},
|(mut host, program, processor)| async move {
processor.execute(&program, &mut host).await.unwrap();
},
BatchSize::SmallInput,
);
});
group.finish();
}
const BLAKE3_1TO1_MASM: &str = "\
use miden::core::crypto::hashes::blake3
use miden::core::sys
begin
# Push the number of iterations on the stack, and assess if we should loop
adv_push.1 dup neq.0
while.true
# Move loop counter down
movdn.8
# Execute blake3 hash function
exec.blake3::hash
# Decrement counter, and check if we loop again
movup.8 sub.1 dup neq.0
end
# Drop counter
drop
# Truncate stack to make constraints happy
exec.sys::truncate_stack
end
";
criterion_group!(benchmark, blake3_1to1_fast);
criterion_main!(benchmark);On my Macbook Pro M4, Rust 1.90 yields
$ cargo bench --bench blake3_1to1_fast --features internal
program_execution_fast/blake3_1to1
time: [2.0877 ms 2.0909 ms 2.0942 ms]
while on version 1.91,
$ cargo bench --bench blake3_1to1_fast --features internal
program_execution_fast/blake3_1to1
time: [2.7507 ms 2.7549 ms 2.7594 ms]
change: [+31.472% +31.756% +32.074%] (p = 0.00 < 0.05)
Performance has regressed.
Note that the performance is similar poor on nightly 2026-02-26,
$ cargo bench --bench blake3_1to1_fast --features internal
blake3_1to1_fast/blake3_1to1
time: [2.6636 ms 2.6697 ms 2.6762 ms]
Version it worked on
It most recently worked on: Rust 1.90,
rustc --version --verbose:
rustc 1.90.0 (1159e78c4 2025-09-14)
binary: rustc
commit-hash: 1159e78c4747b02ef996e55082b704c09b970588
commit-date: 2025-09-14
host: aarch64-apple-darwin
release: 1.90.0
LLVM version: 20.1.8
Version with regression
rustc --version --verbose:
rustc 1.91.1 (ed61e7d7e 2025-11-07)
binary: rustc
commit-hash: ed61e7d7e242494fb7057f2657300d9e77bb4fcb
commit-date: 2025-11-07
host: aarch64-apple-darwin
release: 1.91.1
LLVM version: 21.1.2
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-bugCategory: This is a bug.Category: This is a bug.E-needs-mcveCall for participation: This issue has a repro, but needs a Minimal Complete and Verifiable ExampleCall for participation: This issue has a repro, but needs a Minimal Complete and Verifiable ExampleI-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.P-highHigh priorityHigh priorityneeds-triageThis issue may need triage. Remove it if it has been sufficiently triaged.This issue may need triage. Remove it if it has been sufficiently triaged.regression-untriagedUntriaged performance or correctness regression.Untriaged performance or correctness regression.
Type
Fields
Give feedbackNo fields configured for issues without a type.