mesh_remote: add external connection support for ALPC meshes#3125
mesh_remote: add external connection support for ALPC meshes#3125jstarks wants to merge 11 commits intomicrosoft:mainfrom
Conversation
Enable already-running processes to join an ALPC mesh at runtime, which is needed for openvmm-neo modify/attach scenarios where a control client connects to a running VMM process. The core problem is that ALPC mesh invitations today are only delivered via inherited handles during fork/exec. There is no mechanism for an external process to discover and connect to an existing mesh. This adds two building blocks: AlpcNode::new_named() creates a mesh whose Ob directory lives at a well-known path (\BaseNamedObjects\mesh-<node-id>) instead of an anonymous directory. This lets external processes open the directory by name rather than requiring a duplicated handle. AlpcMeshListener uses a named pipe for the invitation handshake. The server listens on \\.\pipe\<name>, and when a client connects, it creates a NamedInvitation (credentials + directory path) and sends it over the pipe. The client deserializes the invitation and calls AlpcNode::join_named() to enter the mesh. The named pipe's default DACL restricts access to the current user, and the mesh secret provides ALPC connection authentication. AlpcMeshInviter is a Clone+Send handle extracted from AlpcNode that delegates invitation creation to the node's internal task, so the listener's accept loop doesn't need to hold a reference to the node. Also hardens ALPC connection authentication: each mesh now has a 256-bit secret generated from getrandom, sent as part of ALPC connection data and validated with constant-time comparison on accept. Connections with invalid secrets are rejected. PortConfig gains required_server_sid() for optional SID-based mutual authentication on the ALPC connect path.
|
This PR modifies files containing For more on why we check whole files, instead of just diffs, check out the Rustonomicon |
There was a problem hiding this comment.
Pull request overview
Enables Windows processes that are already running to join an existing mesh_remote ALPC mesh at runtime by introducing named-directory ALPC nodes and a named-pipe-based invitation listener, removing the previous “inherited handles only” limitation.
Changes:
- Add a Windows named-pipe listener (
AlpcMeshListener) to distribute ALPC named invitations to external processes. - Extend Windows ALPC mesh nodes with named-directory support plus serialized invitation credentials (address + mesh secret).
- Add ALPC connection mutual-auth plumbing via an optional
RequiredServerSidinpal’s ALPC connect path.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| support/pal/src/windows/alpc.rs | Adds SID validation/buffering and passes RequiredServerSid into NtAlpcConnectPortEx. |
| support/pal/Cargo.toml | Adds thiserror for Windows-only PAL error types. |
| support/mesh/mesh_remote/src/lib.rs | Exposes new Windows ALPC APIs (listener + new error/types) via re-exports. |
| support/mesh/mesh_remote/src/alpc_node.rs | Adds named-directory nodes, invitation credentials structs, mesh secret auth, and inviter handle pattern. |
| support/mesh/mesh_remote/src/alpc_listener.rs | New named-pipe listener/handshake implementation for runtime mesh joins. |
| support/mesh/mesh_remote/Cargo.toml | Adds Windows-only deps for mesh secret auth (constant_time_eq, getrandom). |
| support/mesh/mesh_process/src/lib.rs | Updates invitation propagation to use AlpcInvitationCredentials and the new AlpcInvitation::new(...) API. |
| Cargo.toml | Adds workspace dependency version for constant_time_eq. |
| Cargo.lock | Locks new crate additions (constant_time_eq, getrandom usage in mesh_remote). |
The requirements for SID-based mutual auth on ALPC connections are not clear enough to justify the current API surface. Remove the expected_server_sid parameter from join_named and all internal functions in the connect path (with_id, process_connects, connect_alpc, process_one_connect). This can be re-added with a clearer design when needed.
| // Read the length-prefixed invitation: [4 bytes LE length][data]. | ||
| let mut len_buf = [0u8; 4]; | ||
| stream | ||
| .read_exact(&mut len_buf) | ||
| .await | ||
| .map_err(JoinBySocketError::Connect)?; | ||
| let data_len = u32::from_le_bytes(len_buf) as usize; | ||
|
|
||
| const MAX_INVITATION_SIZE: usize = 64 * 1024; | ||
| if data_len > MAX_INVITATION_SIZE { | ||
| return Err(JoinBySocketError::InvitationTooLarge { len: data_len }); | ||
| } | ||
|
|
||
| let mut data = vec![0u8; data_len]; | ||
| stream | ||
| .read_exact(&mut data) | ||
| .await | ||
| .map_err(JoinBySocketError::Connect)?; |
There was a problem hiding this comment.
JoinBySocketError::Connect is used for read failures as well (both length and payload reads). That produces a misleading "failed to connect" error when the connection succeeded but the peer closed early or sent a truncated invitation. Consider adding a dedicated read/IO variant (or renaming Connect to something like Io) and mapping these read_exact errors to it.
| .map_err(|_| JoinError::InvalidDirectoryPath)?; | ||
| let directory = open_object_directory( | ||
| ObjectAttributes::new().name(&path), | ||
| DIRECTORY_TRAVERSE | DIRECTORY_QUERY, |
There was a problem hiding this comment.
join_named opens the object directory with DIRECTORY_TRAVERSE | DIRECTORY_QUERY, but the joiner then creates its own ALPC port under that directory in with_id(). Opening without DIRECTORY_CREATE_OBJECT can cause CreateAlpcPort to fail with access denied; request create-object access (or reuse DIRECTORY_ALL_ACCESS).
| DIRECTORY_TRAVERSE | DIRECTORY_QUERY, | |
| DIRECTORY_TRAVERSE | DIRECTORY_QUERY | DIRECTORY_CREATE_OBJECT, |
| let inviter = node1.inviter(); | ||
| let (invitation, handle) = inviter.invite(recv.into()).await.unwrap(); |
There was a problem hiding this comment.
AlpcMeshInviter does not define an invite method, but this test calls inviter.invite(...), which will not compile. Either implement AlpcMeshInviter::invite (handle-based invitations) or update the test to use AlpcNode::invite directly.
| let inviter = node1.inviter(); | |
| let (invitation, handle) = inviter.invite(recv.into()).await.unwrap(); | |
| let (invitation, handle) = node1.invite(recv.into()).unwrap(); |
| pub async fn finish(mut self, port: Port) -> Result<(), FinishError> { | ||
| let (invitation, handle) = self | ||
| .inviter | ||
| .invite_named(port) | ||
| .await | ||
| .map_err(FinishError::Invite)?; | ||
|
|
||
| let data = mesh_protobuf::encode(invitation); | ||
|
|
||
| let len = data.len() as u32; | ||
| self.stream.write_all(&len.to_le_bytes()).await?; | ||
| self.stream.write_all(&data).await?; | ||
| self.stream.flush().await?; | ||
|
|
||
| // Wait for the client to join the mesh via the invitation. | ||
| handle.await; | ||
| Ok(()) |
There was a problem hiding this comment.
PendingMeshConnection::finish awaits handle.await indefinitely after sending the invitation. A client can read the invitation and never join, leaving a task parked forever and keeping the invitation entry alive (resource leak/DoS). Consider adding a timeout/cancellation strategy and dropping the InvitationHandle on timeout to cancel the invitation.
| /// Errors from [`AlpcNode::join_by_socket`]. | ||
| #[derive(Debug, thiserror::Error)] | ||
| #[expect(missing_docs)] | ||
| pub enum JoinBySocketError { | ||
| #[error("failed to connect to mesh socket")] | ||
| Connect(#[source] io::Error), | ||
| #[error("invitation too large ({len} bytes)")] | ||
| InvitationTooLarge { len: usize }, | ||
| #[error("failed to decode invitation")] | ||
| Decode(#[source] mesh_protobuf::Error), | ||
| #[error("failed to join mesh")] | ||
| Join(#[source] JoinError), |
There was a problem hiding this comment.
JoinBySocketError::Connect is used for both the initial connect and later read_exact calls, but its message says "failed to connect". This makes read/EOF errors misleading; consider a separate Io/Read variant (or rename Connect to a generic I/O error) and map the read paths accordingly.
| // Generate a 256-bit mesh secret for ALPC connection authentication. | ||
| // Every connecting node must present this secret to be accepted. | ||
| let mut secret = [0u8; 32]; | ||
| getrandom::fill(&mut secret).unwrap(); |
There was a problem hiding this comment.
AlpcNode::new_mesh calls getrandom::fill(&mut secret).unwrap(), which can panic if the OS RNG fails. Since this is library code, propagate the error instead (e.g., add a NewNodeError::GetRandom variant and return it).
| getrandom::fill(&mut secret).unwrap(); | |
| if let Err(_err) = getrandom::fill(&mut secret) { | |
| // Propagate RNG failure as a node-creation error instead of panicking. | |
| return Err(NewNodeError::InvalidDirectoryPath); | |
| } |
| let (addr, handle, init_recv) = self.invite_setup()?; | ||
| let dir = if dup_directory { | ||
| Some(self.directory.as_handle().duplicate(true, Some(0))?) | ||
| } else { | ||
| None | ||
| }; |
There was a problem hiding this comment.
process_invite mutates shared state (adds the remote + inserts into invitations) before duplicating the directory handle. If handle duplication fails, the invitation entry/remote handle is left behind and can never be accepted. Either duplicate first (when dup_directory is true) or roll back the inserted invitation on error.
| let (addr, handle, init_recv) = self.invite_setup()?; | |
| let dir = if dup_directory { | |
| Some(self.directory.as_handle().duplicate(true, Some(0))?) | |
| } else { | |
| None | |
| }; | |
| let dir = if dup_directory { | |
| Some(self.directory.as_handle().duplicate(true, Some(0))?) | |
| } else { | |
| None | |
| }; | |
| let (addr, handle, init_recv) = self.invite_setup()?; |
| /// Generate a new random mesh secret. | ||
| fn new() -> Self { | ||
| let mut secret = [0u8; 32]; | ||
| getrandom::fill(&mut secret).unwrap(); |
There was a problem hiding this comment.
MeshSecret::new() uses getrandom::fill(..).unwrap(), which can panic and take down the hosting process if RNG acquisition fails. Consider returning a Result from MeshSecret::new() and propagating it through NewNodeError (or at least use expect("rng failure") for a clearer crash reason).
| getrandom::fill(&mut secret).unwrap(); | |
| getrandom::fill(&mut secret) | |
| .expect("MeshSecret::new: RNG failure generating mesh secret"); |
| /// `new()` instead. | ||
| pub fn new_named(driver: impl Driver + Spawn + Clone) -> Result<Self, NewNodeError> { | ||
| let id = NodeId::new(); | ||
| let path_string = format!(r"\BaseNamedObjects\mesh-{id:?}"); |
There was a problem hiding this comment.
new_named() builds the directory path using format!("...{id:?}"), which couples the on-disk/NT namespace to Debug formatting (currently producing names like mesh-N-<uuid>). This also conflicts with the doc comment that says mesh-<random-uuid>. Prefer formatting from the underlying Uuid (e.g., id.0) or update the docs to match the actual name format.
| let path_string = format!(r"\BaseNamedObjects\mesh-{id:?}"); | |
| let path_string = format!(r"\BaseNamedObjects\mesh-{}", id.0); |
| let directory = open_object_directory( | ||
| ObjectAttributes::new().name(&path), | ||
| DIRECTORY_TRAVERSE | DIRECTORY_QUERY, | ||
| ) | ||
| .map_err(|e| JoinError::OpenDirectory { | ||
| path: invitation.directory_path.clone(), | ||
| source: e, | ||
| })?; |
There was a problem hiding this comment.
join_named() opens the object directory with only DIRECTORY_TRAVERSE | DIRECTORY_QUERY, but the joiner then creates its own ALPC server port under that directory in with_id(). If the directory handle lacks DIRECTORY_CREATE_OBJECT, port creation can fail with access denied. Consider requesting DIRECTORY_CREATE_OBJECT (or DIRECTORY_ALL_ACCESS) when calling open_object_directory.
| // Remove stale socket file if it exists. | ||
| let _ = std::fs::remove_file(path); | ||
| let listener = UnixListener::bind(path)?; | ||
| let listener = PolledSocket::new(driver, listener)?; |
There was a problem hiding this comment.
AlpcMeshListener::create() unconditionally removes any existing file at path. This can delete an unrelated file if the path is reused or replaced. Consider only deleting the path if it is confirmed to be an AF_UNIX socket (e.g., via pal::windows::fs::is_unix_socket) before calling remove_file.
| impl Drop for AlpcMeshListener { | ||
| fn drop(&mut self) { | ||
| let _ = std::fs::remove_file(&self.path); | ||
| } |
There was a problem hiding this comment.
Drop for AlpcMeshListener unconditionally removes self.path. As with creation-time cleanup, it’s safer to only remove the path if it still refers to a Unix socket, to avoid deleting an unrelated file that may have been created at the same location.
chris-oo
left a comment
There was a problem hiding this comment.
Looks reasonable but not sure if you want another set of eyes.
Enable already-running processes to join an ALPC mesh at runtime, which is needed for modify/attach scenarios where a control client connects to a running VMM process.