Skip to content

Conversation

@arjunr2
Copy link
Collaborator

@arjunr2 arjunr2 commented Nov 14, 2025

Notes

  • Support for all entrypoints through Func, TypedFunc, and ComponentFunc
  • Support for all component builtins
  • Support for all instantiation entrypoints
  • Support for both sync and async
  • Fully embedding agnostic replay driver

Structure

  • All core capabilities are feature-gated with the rr feature. The only things exposed without the feature are rr::hooks, which offers convenient methods to hook recording/replaying mechanisms into the rest of the engine.
    As a result, rr without hooks should be able to land with no impact on existing Wasmtime whatsoever.

Observed overheads

Between 4-8%

alexcrichton and others added 30 commits August 6, 2025 13:55
Helps get some binaries to test with
Things left to complete:
* Core wasm support
* Improved error handling
* Wasmtime CLI integration
impl<T: Read + Seek + Send + Sync> ReplayReader for T {}
/// In `no_std`, types must provide explicit read/seek capabilities
/// to a underlying byte slice through these methods.
pub trait ReplayReader: AsRef<[u8]> + Send + Sync {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than AsRef<[u8]>, which I think means that the reader must materialize the rest of the trace into memory, could we have a separate fn read(&mut self, len: usize) -> &[u8] that reads len bytes and returns a slice? If EOF then the slice may be shorter than len. This would allow an implementation with a buffer fed by underlying storage on demand (e.g. from a file).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought with no_std, the point is that you don't have a file-like storage? Isn't the implication that everything is materialized in memory?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, no, the literal implication is that there isn't a standard library. The user's custom environment may very well still have storage abstractions. Consider e.g. dropping Wasmtime into a big embedded system with a custom OS, or something like that.

@cfallin
Copy link
Member

cfallin commented Dec 2, 2025

Thanks for the quick response! I skimmed over most of the new commits; the one that I want to understand more closely (with non late-night brain) is the change to permit reentrancy (522bb60). I'll take a look at that one separately in a bit.

@arjunr2
Copy link
Collaborator Author

arjunr2 commented Dec 3, 2025

Btw, I have a commented out line of code in the repo now to throw a compile error if component-model-async and rr are enabled together (currently disabled since CI needs to pass). We should figure out how to add that into the CI as well

@cfallin
Copy link
Member

cfallin commented Dec 3, 2025

Btw, I have a commented out line of code in the repo now to throw a compile error if component-model-async and rr are enabled together (currently disabled since CI needs to pass). We should figure out how to add that into the CI as well

Hmm, looking through the CI config, it's more than one cargo check (the one I would expect is on line 373: checks -p wasmtime --all-features). It's e.g. the docs build as well, and probably other things that assume that one can turn on all features.

Perhaps we can put something in Store::run_concurrent (the main entrypoint for actually running async component model code) that errors out if RR is enabled on the store?

@cfallin
Copy link
Member

cfallin commented Dec 3, 2025

I left one comment in the reentrancy commit re: lifetimes (can we try harder to avoid the unsafe, basically; happy to help think through this). More generally though I'm not sure I understand the nested event loop thing in the return hook -- I mean, I understand how architecturally one could be forced into it, but ideally I'd want replay to be one loop at one level -- that also makes eventual trace seeking (which we'll need for reversible debugging) feasible, versus having to have a nested series of stack frames in certain states.

@arjunr2
Copy link
Collaborator Author

arjunr2 commented Dec 4, 2025

Yeah, this is challenging, I thought quite a bit on how to get re-entrancy to work and had settled on this.

The unsafe doesn't really have to do with lifetimes. It's that to accomplish re-entrancy, the replay stub for the host functions need access to the ReplayInstance which created it. The way I accomplish this right now is tying a special ReplayHostContext to the Store, that is fully opaque to users and only creatable by the ReplayInstance. This context allows us to access instance state and run the appropriate re-entrancy call. So basically, if we replay, we can assume the data has to be a ReplayHostContext, which is what the unsafe is doing. The other option to do this is to potentially have a field in the Store that has a weak pointer to a ReplayInstance, and access things through that. Maybe this works better?

As for replay being one loop at one level, is it possible to do that? I'm not sure how that's even possible with re-entrancy, because there is no getting around actually invoking the wasm function from the return hook right? The "loop" in the return hook is just because we can have re-entrancy arbitrary levels deep. The only thing it expects is a balanced set of entry/return events.

@cfallin
Copy link
Member

cfallin commented Dec 4, 2025

FYI, mentioned this PR to Alex today in the Wasmtime biweekly and he will take a look sometime in the next few weeks -- cc @alexcrichton.

@cfallin
Copy link
Member

cfallin commented Dec 4, 2025

As for replay being one loop at one level, is it possible to do that? I'm not sure how that's even possible with re-entrancy, because there is no getting around actually invoking the wasm function from the return hook right? The "loop" in the return hook is just because we can have re-entrancy arbitrary levels deep. The only thing it expects is a balanced set of entry/return events.

Thinking more about this, it seems like a very core challenge for reversible debugging. We fundamentally need to snapshot and restore to earlier execution states. Our plan-of-record for the active stack frames has been to actually memcpy out the fiber stack (the active parts of it, down to saved SP) and copy it back in; if no pointers move, then everything is still valid.

But if we have native frames between Wasm activations, all that goes right out. There's no safe way at all to restore a set of native frames that core Wasm called out to, that called back into Wasm.

Maybe we don't support reentrancy during record or replay; maybe the component model got this right. (Side-eyeing upcoming changes to support reentrancy that I've heard rumblings about -- I don't know any details.) In this case, we'll want to (I think) trap on attempted re-entry into Wasm when recording...

@arjunr2
Copy link
Collaborator Author

arjunr2 commented Dec 4, 2025

Actually yeah, I just realized, it's more than native frames, we'd actually need native snapshots between activations (which ends up being just like RR!) since Store/Instance state can be modified too.

@cfallin
Copy link
Member

cfallin commented Dec 18, 2025

@arjunr2 I see your new commits here while waiting for review from Alex. Quick question: what's the story on the new name wasm-crimp?

@arjunr2
Copy link
Collaborator Author

arjunr2 commented Dec 18, 2025

crimp (name is a work in-progress if you have suggestions :)) is now just a new crate that contains the record/replay interface specification. This was just a refactor that decouples the interface specification from wasmtime itself, so I write replay interpreter and recording embedders for the same crimp spec in different engines. My plan is to write a replayer in Wizard now with this.

But essentially in terms of logic, it didn't change anything at all. It was purely a move of almost everything in rr/core/* within wasmtime to a separate crate.

@arjunr2
Copy link
Collaborator Author

arjunr2 commented Dec 18, 2025

Also crimp currently does depend on a few things in wasmtime-environ (for things like checksum, component id, etc.). That's ok for now, and probably ok long term as well, but maybe it'd make more sense to live without any dependency on environ.

I also think this crate should live as a separate repo (under the bytecode alliance maybe, or my personal account), but I figure we can wait till review is complete to do that.

@cfallin
Copy link
Member

cfallin commented Dec 19, 2025

Ah, I see. IMHO that's a bigger discussion whether we want to export a set of types that other engines also use -- that's a big new public API surface.

Would you mind reverting that for now on this branch at least? I want to try to keep the review target stable, without new development, until we land it.

Speaking of which -- @alexcrichton I know you're extremely busy and the holidays are coming up but do you have any estimate on when you might be able to give this PR a review?

@alexcrichton
Copy link
Member

Oh no worries, I ended up not having the energy on planes but I was planning to get to this beginning of next week when things are quieter with other folks on vacation

@arjunr2
Copy link
Collaborator Author

arjunr2 commented Dec 19, 2025

I'll move any additional development henceforth on a different branch then, if necessary. But for now, do you want me to also revert the new internal rr/crimp crate and move it back into wasmtime? Almost nothing has changed code wise within these files with the move since it mostly didn't rely on wasmtime internals anyway.

@cfallin
Copy link
Member

cfallin commented Dec 19, 2025

Yes; that's a big change to the public API (even if it's not much of a change to code structure) and I think we'll want to have a discussion about the way we handle this.

@arjunr2
Copy link
Collaborator Author

arjunr2 commented Dec 19, 2025

Ok, it's moved back into wasmtime, leaving this branch as stable for review and fixes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants