Skip to content

mesh_remote: add external connection support for ALPC meshes#3125

Open
jstarks wants to merge 11 commits intomicrosoft:mainfrom
jstarks:mesh-reconnect-windows
Open

mesh_remote: add external connection support for ALPC meshes#3125
jstarks wants to merge 11 commits intomicrosoft:mainfrom
jstarks:mesh-reconnect-windows

Conversation

@jstarks
Copy link
Copy Markdown
Member

@jstarks jstarks commented Mar 24, 2026

Enable already-running processes to join an ALPC mesh at runtime, which is needed for modify/attach scenarios where a control client connects to a running VMM process.

Enable already-running processes to join an ALPC mesh at runtime, which
is needed for openvmm-neo modify/attach scenarios where a control client
connects to a running VMM process.

The core problem is that ALPC mesh invitations today are only delivered
via inherited handles during fork/exec. There is no mechanism for an
external process to discover and connect to an existing mesh.

This adds two building blocks:

AlpcNode::new_named() creates a mesh whose Ob directory lives at a
well-known path (\BaseNamedObjects\mesh-<node-id>) instead of an
anonymous directory. This lets external processes open the directory by
name rather than requiring a duplicated handle.

AlpcMeshListener uses a named pipe for the invitation handshake. The
server listens on \\.\pipe\<name>, and when a client connects, it
creates a NamedInvitation (credentials + directory path) and sends it
over the pipe. The client deserializes the invitation and calls
AlpcNode::join_named() to enter the mesh. The named pipe's default DACL
restricts access to the current user, and the mesh secret provides ALPC
connection authentication.

AlpcMeshInviter is a Clone+Send handle extracted from AlpcNode that
delegates invitation creation to the node's internal task, so the
listener's accept loop doesn't need to hold a reference to the node.

Also hardens ALPC connection authentication: each mesh now has a 256-bit
secret generated from getrandom, sent as part of ALPC connection data
and validated with constant-time comparison on accept. Connections with
invalid secrets are rejected. PortConfig gains required_server_sid()
for optional SID-based mutual authentication on the ALPC connect path.
Copilot AI review requested due to automatic review settings March 24, 2026 23:40
@github-actions github-actions bot added the unsafe Related to unsafe code label Mar 24, 2026
@github-actions
Copy link
Copy Markdown

⚠️ Unsafe Code Detected

This PR modifies files containing unsafe Rust code. Extra scrutiny is required during review.

For more on why we check whole files, instead of just diffs, check out the Rustonomicon

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables Windows processes that are already running to join an existing mesh_remote ALPC mesh at runtime by introducing named-directory ALPC nodes and a named-pipe-based invitation listener, removing the previous “inherited handles only” limitation.

Changes:

  • Add a Windows named-pipe listener (AlpcMeshListener) to distribute ALPC named invitations to external processes.
  • Extend Windows ALPC mesh nodes with named-directory support plus serialized invitation credentials (address + mesh secret).
  • Add ALPC connection mutual-auth plumbing via an optional RequiredServerSid in pal’s ALPC connect path.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
support/pal/src/windows/alpc.rs Adds SID validation/buffering and passes RequiredServerSid into NtAlpcConnectPortEx.
support/pal/Cargo.toml Adds thiserror for Windows-only PAL error types.
support/mesh/mesh_remote/src/lib.rs Exposes new Windows ALPC APIs (listener + new error/types) via re-exports.
support/mesh/mesh_remote/src/alpc_node.rs Adds named-directory nodes, invitation credentials structs, mesh secret auth, and inviter handle pattern.
support/mesh/mesh_remote/src/alpc_listener.rs New named-pipe listener/handshake implementation for runtime mesh joins.
support/mesh/mesh_remote/Cargo.toml Adds Windows-only deps for mesh secret auth (constant_time_eq, getrandom).
support/mesh/mesh_process/src/lib.rs Updates invitation propagation to use AlpcInvitationCredentials and the new AlpcInvitation::new(...) API.
Cargo.toml Adds workspace dependency version for constant_time_eq.
Cargo.lock Locks new crate additions (constant_time_eq, getrandom usage in mesh_remote).

@jstarks jstarks requested a review from Copilot March 25, 2026 00:14
@jstarks jstarks changed the title mesh_remote: add ALPC mesh listener and named-directory nodes mesh_remote: add external connection support for ALPC meshes Mar 25, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

jstarks added 3 commits March 25, 2026 15:56
The requirements for SID-based mutual auth on ALPC connections are not
clear enough to justify the current API surface. Remove the
expected_server_sid parameter from join_named and all internal functions
in the connect path (with_id, process_connects, connect_alpc,
process_one_connect). This can be re-added with a clearer design when
needed.
Copilot AI review requested due to automatic review settings March 27, 2026 23:26
@jstarks jstarks marked this pull request as ready for review March 27, 2026 23:26
@jstarks jstarks requested a review from a team as a code owner March 27, 2026 23:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 4 comments.

Comment on lines +168 to +185
// Read the length-prefixed invitation: [4 bytes LE length][data].
let mut len_buf = [0u8; 4];
stream
.read_exact(&mut len_buf)
.await
.map_err(JoinBySocketError::Connect)?;
let data_len = u32::from_le_bytes(len_buf) as usize;

const MAX_INVITATION_SIZE: usize = 64 * 1024;
if data_len > MAX_INVITATION_SIZE {
return Err(JoinBySocketError::InvitationTooLarge { len: data_len });
}

let mut data = vec![0u8; data_len];
stream
.read_exact(&mut data)
.await
.map_err(JoinBySocketError::Connect)?;
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JoinBySocketError::Connect is used for read failures as well (both length and payload reads). That produces a misleading "failed to connect" error when the connection succeeded but the peer closed early or sent a truncated invitation. Consider adding a dedicated read/IO variant (or renaming Connect to something like Io) and mapping these read_exact errors to it.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 30, 2026 18:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 7 comments.

.map_err(|_| JoinError::InvalidDirectoryPath)?;
let directory = open_object_directory(
ObjectAttributes::new().name(&path),
DIRECTORY_TRAVERSE | DIRECTORY_QUERY,
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

join_named opens the object directory with DIRECTORY_TRAVERSE | DIRECTORY_QUERY, but the joiner then creates its own ALPC port under that directory in with_id(). Opening without DIRECTORY_CREATE_OBJECT can cause CreateAlpcPort to fail with access denied; request create-object access (or reuse DIRECTORY_ALL_ACCESS).

Suggested change
DIRECTORY_TRAVERSE | DIRECTORY_QUERY,
DIRECTORY_TRAVERSE | DIRECTORY_QUERY | DIRECTORY_CREATE_OBJECT,

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this real or no?

Comment on lines +1388 to +1389
let inviter = node1.inviter();
let (invitation, handle) = inviter.invite(recv.into()).await.unwrap();
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AlpcMeshInviter does not define an invite method, but this test calls inviter.invite(...), which will not compile. Either implement AlpcMeshInviter::invite (handle-based invitations) or update the test to use AlpcNode::invite directly.

Suggested change
let inviter = node1.inviter();
let (invitation, handle) = inviter.invite(recv.into()).await.unwrap();
let (invitation, handle) = node1.invite(recv.into()).unwrap();

Copilot uses AI. Check for mistakes.
Comment on lines +117 to +133
pub async fn finish(mut self, port: Port) -> Result<(), FinishError> {
let (invitation, handle) = self
.inviter
.invite_named(port)
.await
.map_err(FinishError::Invite)?;

let data = mesh_protobuf::encode(invitation);

let len = data.len() as u32;
self.stream.write_all(&len.to_le_bytes()).await?;
self.stream.write_all(&data).await?;
self.stream.flush().await?;

// Wait for the client to join the mesh via the invitation.
handle.await;
Ok(())
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PendingMeshConnection::finish awaits handle.await indefinitely after sending the invitation. A client can read the invitation and never join, leaving a task parked forever and keeping the invitation entry alive (resource leak/DoS). Consider adding a timeout/cancellation strategy and dropping the InvitationHandle on timeout to cancel the invitation.

Copilot uses AI. Check for mistakes.
Comment on lines +195 to +206
/// Errors from [`AlpcNode::join_by_socket`].
#[derive(Debug, thiserror::Error)]
#[expect(missing_docs)]
pub enum JoinBySocketError {
#[error("failed to connect to mesh socket")]
Connect(#[source] io::Error),
#[error("invitation too large ({len} bytes)")]
InvitationTooLarge { len: usize },
#[error("failed to decode invitation")]
Decode(#[source] mesh_protobuf::Error),
#[error("failed to join mesh")]
Join(#[source] JoinError),
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JoinBySocketError::Connect is used for both the initial connect and later read_exact calls, but its message says "failed to connect". This makes read/EOF errors misleading; consider a separate Io/Read variant (or rename Connect to a generic I/O error) and map the read paths accordingly.

Copilot uses AI. Check for mistakes.
// Generate a 256-bit mesh secret for ALPC connection authentication.
// Every connecting node must present this secret to be accepted.
let mut secret = [0u8; 32];
getrandom::fill(&mut secret).unwrap();
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AlpcNode::new_mesh calls getrandom::fill(&mut secret).unwrap(), which can panic if the OS RNG fails. Since this is library code, propagate the error instead (e.g., add a NewNodeError::GetRandom variant and return it).

Suggested change
getrandom::fill(&mut secret).unwrap();
if let Err(_err) = getrandom::fill(&mut secret) {
// Propagate RNG failure as a node-creation error instead of panicking.
return Err(NewNodeError::InvalidDirectoryPath);
}

Copilot uses AI. Check for mistakes.
Comment on lines +197 to +202
let (addr, handle, init_recv) = self.invite_setup()?;
let dir = if dup_directory {
Some(self.directory.as_handle().duplicate(true, Some(0))?)
} else {
None
};
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process_invite mutates shared state (adds the remote + inserts into invitations) before duplicating the directory handle. If handle duplication fails, the invitation entry/remote handle is left behind and can never be accepted. Either duplicate first (when dup_directory is true) or roll back the inserted invitation on error.

Suggested change
let (addr, handle, init_recv) = self.invite_setup()?;
let dir = if dup_directory {
Some(self.directory.as_handle().duplicate(true, Some(0))?)
} else {
None
};
let dir = if dup_directory {
Some(self.directory.as_handle().duplicate(true, Some(0))?)
} else {
None
};
let (addr, handle, init_recv) = self.invite_setup()?;

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 30, 2026 21:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 5 comments.

/// Generate a new random mesh secret.
fn new() -> Self {
let mut secret = [0u8; 32];
getrandom::fill(&mut secret).unwrap();
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MeshSecret::new() uses getrandom::fill(..).unwrap(), which can panic and take down the hosting process if RNG acquisition fails. Consider returning a Result from MeshSecret::new() and propagating it through NewNodeError (or at least use expect("rng failure") for a clearer crash reason).

Suggested change
getrandom::fill(&mut secret).unwrap();
getrandom::fill(&mut secret)
.expect("MeshSecret::new: RNG failure generating mesh secret");

Copilot uses AI. Check for mistakes.
/// `new()` instead.
pub fn new_named(driver: impl Driver + Spawn + Clone) -> Result<Self, NewNodeError> {
let id = NodeId::new();
let path_string = format!(r"\BaseNamedObjects\mesh-{id:?}");
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_named() builds the directory path using format!("...{id:?}"), which couples the on-disk/NT namespace to Debug formatting (currently producing names like mesh-N-<uuid>). This also conflicts with the doc comment that says mesh-<random-uuid>. Prefer formatting from the underlying Uuid (e.g., id.0) or update the docs to match the actual name format.

Suggested change
let path_string = format!(r"\BaseNamedObjects\mesh-{id:?}");
let path_string = format!(r"\BaseNamedObjects\mesh-{}", id.0);

Copilot uses AI. Check for mistakes.
Comment on lines +878 to +885
let directory = open_object_directory(
ObjectAttributes::new().name(&path),
DIRECTORY_TRAVERSE | DIRECTORY_QUERY,
)
.map_err(|e| JoinError::OpenDirectory {
path: invitation.directory_path.clone(),
source: e,
})?;
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

join_named() opens the object directory with only DIRECTORY_TRAVERSE | DIRECTORY_QUERY, but the joiner then creates its own ALPC server port under that directory in with_id(). If the directory handle lacks DIRECTORY_CREATE_OBJECT, port creation can fail with access denied. Consider requesting DIRECTORY_CREATE_OBJECT (or DIRECTORY_ALL_ACCESS) when calling open_object_directory.

Copilot uses AI. Check for mistakes.
Comment on lines +70 to +73
// Remove stale socket file if it exists.
let _ = std::fs::remove_file(path);
let listener = UnixListener::bind(path)?;
let listener = PolledSocket::new(driver, listener)?;
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AlpcMeshListener::create() unconditionally removes any existing file at path. This can delete an unrelated file if the path is reused or replaced. Consider only deleting the path if it is confirmed to be an AF_UNIX socket (e.g., via pal::windows::fs::is_unix_socket) before calling remove_file.

Copilot uses AI. Check for mistakes.
Comment on lines +102 to +105
impl Drop for AlpcMeshListener {
fn drop(&mut self) {
let _ = std::fs::remove_file(&self.path);
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop for AlpcMeshListener unconditionally removes self.path. As with creation-time cleanup, it’s safer to only remove the path if it still refers to a Unix socket, to avoid deleting an unrelated file that may have been created at the same location.

Copilot uses AI. Check for mistakes.
chris-oo
chris-oo previously approved these changes Mar 30, 2026
Copy link
Copy Markdown
Member

@chris-oo chris-oo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable but not sure if you want another set of eyes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

unsafe Related to unsafe code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants