refactor(network): use async for registry network operations by arlosi · Pull Request #16745 · rust-lang/cargo

arlosi · 2026-03-13T05:23:14Z

This PR introduces async into cargo, specifically for handling registry network operations.

Why use async?

Clearer code: the existing model for registry index operations uses Poll + block_until_ready to enable parallelism. This model is somewhat awkward to work with, especially when parallelism is desired. Using async simplifies creating and working with registries. When parallelism is not needed, the call can be wrapped in block_on
Migration path to have alternate http backends beyond curl: we may want to use reqwest or other rust-native HTTP clients that use async.
Performance improvements: this change shows slight performance improvements when accessing sparse registries, since requests can start slightly sooner. If the resolver is refactored to be more async-aware, more significant improvements are possible.

What executor is used?

This change uses futures::executor running on the current thread. It does not use a thread pool. Local I/O (filesystem), still uses std in blocking mode, even when in an async context. This choice was made to minimize dependencies and size of this change. In the future, we may decide to move to a larger executor like tokio.

What does this PR change?

Convert registry methods taking `&mut self` to `&self`

In order to execute multiple futures in parallel (using FuturesUnordered or similar), there must be multiple active borrows. Many of the registry querying methods used &mut self, which prevents this.

Any mutations are now handled by using Cell / RefCell. Since the executor is single-threaded, the operations aren't actually happening in parallel, but when passing an .await point, a different future could begin executing. We must be careful now not to hold mut borrows on RefCells across await points, as it will cause a panic.

Convert methods returning `Poll<_>` to return `Future<_>` (`async`).

When we want to block (and we're not in an async function), the async call is wrapped with crate::util::block_on. If this is attempted within an existing async executor, cargo will panic.

Remove all `block_until_ready` methods from all `Source`, `RegistryData` traits

For most of the sources, the contents of the block_until_ready function was moved into a a private update function that's called once before making any queries.

Add new HTTP async `http_async` module

Wraps cURL's Multi interface to provide an async interface for making HTTP requests through cURL.

Add `async-trait` crate

This is used to enable dynamic dispatch on traits that use async fn.

Error message output changes

Due to refactoring around removing block_until_ready, some error messages are changed, usually by having additional context information. This change makes the error messages more consistent between various code paths. I attempted to minimize the changes in this PR for ease of review. We may want to simplify the error messages (by removing one level of context) in a follow up.

Resolver

The resolver is the only portion still using the Poll approach. An adapter layer LocalPollAdapter is added that translates Polls into async.

Converting the resolver be async native is left as future work. In theory this conversion could provide performance improvements, since the resolution would not need to be restarted. It would also let the resolver start working immediately as new dependencies became available.

`.crate` downloading

The downloading of crate files using PackageSet is not yet converted to async. This can be done separately.

Progress reporting

The http_remote no longer has the ability to track how many block_until_ready calls have been made, so we can't do the trick where we assume we have a dependency tree depth of 10 to make a fake progress bar. Since the progress bar was fake anyway, this change removes it in favor of only displaying the information we know.

The new behavior is:

    Updating crates.io index
       Fetch 121 complete; 175 pending

Performance

This change is not expected to impact performance. My local measurements show neutral to possible 1% improvement. Release binary size increased about 1%.

How to review

Commit by commit. The final commit contains the majority of the change that was not straightforward to separate out.

weihanglo · 2026-03-13T15:12:45Z

Impressive. Net -20 LoC changes!

arlosi · 2026-03-24T16:22:20Z

@rustbot ready

I've addressed a couple performance issues since the last update in the async http client. On my machine, performance is now as good or slightly better with this change applied.

Pending investigation:

Check how difficult changing executors is to make sure we're not getting locked in here.

r? @weihanglo

weihanglo

Looks pretty good! I'll do a second pass in the next few days.

View changes since this review

crates/resolver-tests/src/lib.rs

src/cargo/core/resolver/dep_cache.rs

src/cargo/util/network/http_async.rs

+
+/// Configure a libcurl http handle with the defaults options for Cargo
+/// Note: keep in sync with `http::configure_http_handle`.
+fn configure_http_handle(


src/cargo/util/network/http_async.rs

arlosi · 2026-03-26T19:46:21Z

src/cargo/sources/registry/index/mod.rs

+            return Ok(summaries.clone());
        }

+        // Check if this request has already started. If so, return a oneshot that hands out the same data.


I'm leaning towards using adding a dependency on quick_cache here. It should reduce this whole function to something like:

self.summaries_cache.get_or_insert_async(name, self.load_summaries_uncached(name, load)).await

arlosi · 2026-03-27T01:16:21Z

Benchmarks show improvements or no change in all areas measured.

Criterion benchmarks which test the performance of the resolver when there is no blocking

git checkout async/bench/master
cargo bench -p benchsuite --bench resolve_core -- --save-baseline master
git checkout async/bench/test
cargo bench -p benchsuite --bench resolve_core -- --baseline master

resolve_core/linear_chain/depth_10
                        time:   [85.482 µs 86.182 µs 87.483 µs]
                        change: [−0.5604% +0.0804% +0.9429%] (p = 0.87 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
resolve_core/linear_chain/depth_50
                        time:   [294.51 µs 294.86 µs 295.23 µs]
                        change: [−0.8029% −0.4998% −0.2283%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
Benchmarking resolve_core/linear_chain/depth_200: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.8s, enable flat sampling, or reduce sample count to 60.
resolve_core/linear_chain/depth_200
                        time:   [1.1489 ms 1.1587 ms 1.1782 ms]
                        change: [+0.0532% +0.6599% +1.5036%] (p = 0.09 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  8 (8.00%) high mild
  4 (4.00%) high severe

resolve_core/wide_fan/10pkg_5ver
                        time:   [98.010 µs 98.244 µs 98.524 µs]
                        change: [−1.0657% +0.1889% +1.1756%] (p = 0.77 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe
resolve_core/wide_fan/50pkg_5ver
                        time:   [370.69 µs 371.51 µs 372.25 µs]
                        change: [−1.7939% −1.3826% −1.0013%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
resolve_core/wide_fan/100pkg_10ver
                        time:   [902.88 µs 904.07 µs 905.33 µs]
                        change: [−0.9531% −0.7585% −0.5490%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

resolve_core/diamond/5_diamonds
                        time:   [121.60 µs 121.69 µs 121.79 µs]
                        change: [−0.1601% +0.1002% +0.3582%] (p = 0.45 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild
resolve_core/diamond/20_diamonds
                        time:   [411.44 µs 411.80 µs 412.20 µs]
                        change: [−2.7003% −2.4730% −2.2579%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe
Benchmarking resolve_core/diamond/50_diamonds: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.1s, enable flat sampling, or reduce sample count to 60.
resolve_core/diamond/50_diamonds
                        time:   [1.0192 ms 1.0233 ms 1.0281 ms]
                        change: [−0.5022% −0.0666% +0.3574%] (p = 0.77 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

Hyperfine with local webserver and sample project at 100ms simulated latency

With simulated 100ms delay on the following assorted dependencies:

[dependencies]
anyhow = "1.0.102"
futures = "0.3.32"
futures-timer = "3.0.3"
serde = { version = "1", features = ["derive"] }
reqwest = "0.13"
hyper = "1.8.1"
http = "1.4.0"
cargo = "0.95.0"

hyperfine "~/git/cargo/target/release/cargo-master update --dry-run" "~/git/cargo/target/release/cargo-async update --dry-run" --warmup 5

Benchmark 1: ~/git/cargo/target/release/cargo-master update --dry-run
  Time (mean ± σ):      1.943 s ±  0.070 s    [User: 0.511 s, System: 0.106 s]
  Range (min … max):    1.856 s …  2.070 s    10 runs
 
Benchmark 2: ~/git/cargo/target/release/cargo-async update --dry-run
  Time (mean ± σ):      1.791 s ±  0.044 s    [User: 0.572 s, System: 0.125 s]
  Range (min … max):    1.718 s …  1.849 s    10 runs
 
Summary
  ~/git/cargo/target/release/cargo-async update --dry-run ran
    1.09 ± 0.05 times faster than ~/git/cargo/target/release/cargo-master update --dry-run

Hyperfine on `cargo` itself hitting live crates.io

hyperfine "target/release/cargo-master update --dry-run" "target/release/cargo-async update --dry-run" --warmup 5
Benchmark 1: target/release/cargo-master update --dry-run
  Time (mean ± σ):     986.2 ms ± 129.5 ms    [User: 540.7 ms, System: 70.4 ms]
  Range (min … max):   864.0 ms … 1246.0 ms    10 runs
 
Benchmark 2: target/release/cargo-async update --dry-run
  Time (mean ± σ):     757.5 ms ±  23.7 ms    [User: 531.9 ms, System: 67.5 ms]
  Range (min … max):   725.6 ms … 786.7 ms    10 runs
 
Summary
  target/release/cargo-async update --dry-run ran
    1.30 ± 0.18 times faster than target/release/cargo-master update --dry-run

weihanglo

I feel like it is pretty close to merge! This round I only have some diagnostic-related issues.

View changes since this review

tests/testsuite/patch.rs

weihanglo · 2026-03-27T19:41:18Z

tests/testsuite/registry_auth.rs

 [UPDATING] `alternative` index
 [ERROR] failed to get `bar` as a dependency of package `foo v0.0.1 ([ROOT]/foo)`

+Caused by:


Do we feel like these new context are good or redundant?

The context for auth errors now better matches other errors. It does feel somewhat redundant though. Removing one level of the error chain in a follow up PR (see my other comment) would somewhat mitigate it.

In the current Cargo, the additional context of "failed to load source" / "unable to update registry"
doesn't appear only for sparse registries.

The reason is that the context only gets added to errors that come from block_until_ready,
and sparse registry errors come through query.

So this change makes sparse registries behave like the other source types.

Example showing git vs sparse errors on existing cargo

Updating `sparse` index error: failed to get `anyhow` as a dependency of package `sample v0.1.0 (/home/arlo/git/sample-baseline)` Caused by: failed to query replaced source registry `crates-io` Caused by: download of config.json failed Caused by: failed to download from `https://localhost:34433/config.json` Caused by: [7] Could not connect to server (Failed to connect to localhost port 34433 after 0 ms: Could not connect to server)

Updating `git` index error: failed to get `anyhow` as a dependency of package `sample v0.1.0 (/home/arlo/git/sample-baseline)` Caused by: failed to load source for dependency `anyhow` Caused by: unable to update registry `crates-io` Caused by: failed to update replaced source registry `crates-io` Caused by: failed to fetch `https://localhost:34433/` Caused by: network failure seems to have happened if a proxy or similar is necessary `net.git-fetch-with-cli` may help here https://doc.rust-lang.org/cargo/reference/config.html#netgit-fetch-with-cli Caused by: [7] Could not connect to server (Failed to connect to localhost port 34433 after 0 ms: Could not connect to server); class=Net (12)

Cargo.toml

tests/testsuite/registry_auth.rs

weihanglo · 2026-03-27T21:11:14Z

tests/testsuite/git.rs

        .with_stderr_data(str![[r#"
 [UPDATING] git repository `[ROOTURL]/bar`
-[ERROR] unable to update [ROOTURL]/bar#0.1.2
+[ERROR] failed to get `bar` as a dependency of package `dep1 v0.5.0 ([ROOT]/foo/dep1)`


Not sure whether this is an improvement or regression. Personally feel like this is more an diagnostic improvement because now people know why git were cloning bar, but OTOH it is a bit too verbose.

Overall I think the additional context is good, but things are getting too verbose. I'd like to make a follow-up change that removes one level of this chain everywhere. Doing that in this PR would cause a significant diff though.

Changing most

Caused by: failed to load source for dependency `bar` Caused by: unable to update [ROOTURL]/bar#0.1.2

to

Caused by: failed to update [ROOTURL]/bar#0.1.2 for dependency `bar`

rustbot · 2026-03-31T19:00:36Z

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

This is only used by the resolver, which is the remaining non-async native component that accesses registries.

We need to be able to have multiple async futures in-flight, which means shared access is required for registry methods.

This is used to enable dynamic dispatch on traits that use async functions.

The method bodies are not changed. This is in preparation to remove the block_until_ready functions.

This error occurs when a registry returns "CacheValid", but we didn't provide a cache key. This is only a refactor, does not impact any functionality. This error path is only possible with http (sparse) registries currently.

Converts Poll into async. Some error messages are impacted due to slightly different code paths being taken. This is a major change to the sparse registry backend in http_remote.rs.

arlosi · 2026-03-31T20:47:54Z

I've minimized the tests diffs further, and fixed a couple typos in the last push. This should be ready to go now.

arlosi force-pushed the async-pr branch 2 times, most recently from 5072ba5 to e09814c Compare March 13, 2026 15:03