fix: dial schedulers and peers over https when TLS is configured#1859
Open
rtainaan wants to merge 1 commit into
Open
fix: dial schedulers and peers over https when TLS is configured#1859rtainaan wants to merge 1 commit into
rtainaan wants to merge 1 commit into
Conversation
tonic derives a channel's transport security from the URL scheme, not from whether tls_config() was set: an "http://" endpoint dials cleartext even with a TLS config attached. The scheduler- and peer-dialing paths hardcoded the "http" scheme, so a client configured with a CA cert connected in cleartext and the handshake failed against a TLS-enforcing server. The host-announce and host-delete paths went further and never attached a TLS config at all. Add a net::scheme_for_tls helper that selects the scheme from a fully configured client TLS identity (ca_cert, cert, and key all present, the same predicate load_client_tls_config uses), and apply it at each dial site: the scheduler address in dynconfig, and the peer upload dials in parent_selector and piece_collector. Gating on ca_cert alone would emit an https URL with no TLS config and fail the dial. For the scheduler gRPC client, consolidate the four channel-construction sites into one connect_to_scheduler() helper that derives the scheme the same way and applies load_client_tls_config() uniformly, so the announce and delete paths get the same mTLS treatment as the request path. Tests: cargo fmt --all -- --check clean cargo clippy --all --all-targets -- -D warnings clean cargo test -p dragonfly-client-util net::tests 6 passed Signed-off-by: Raymond (Kai) Tainaan <282025710+rtainaan@users.noreply.github.com>
Contributor
|
good catch — the announce and delete paths at scheduler.rs L188/L242/L301 never loaded TLS config at all, which is the more serious side of this. did you test end to end with mTLS between dfdaemon and scheduler? the scheme switch looks right from the code but just want to make sure tonic picks up the https:// correctly with the config attached |
Author
|
Yeah I’m surprised no one noticed before. I was verifying end to end mtls when I caught it. And yep, tested this as a custom patch, dfdaemon handshakes with the scheduler now, fails without it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The scheduler and peer dial paths build their gRPC channel from an
http://URL, so tonic dials cleartext even when a TLS config is attached. A dfdaemon configured for mTLS then connects in cleartext and fails against a TLS-enforcing server (tls: first record does not look like a TLS handshake), and the host-announce/delete paths never attach a TLS config at all. This selects thehttpsscheme when client TLS is configured and routes the scheduler dials through a single helper so announce and delete get the same mTLS handling as the request path.Motivation and Context
With mTLS enabled between dfdaemon and the scheduler/peers, the TLS config is loaded but ignored on these paths today — the scheduler and upload-client certificates have no effect and traffic falls back to cleartext.