feat!: integrate obfuscation to the stats exporter [APMSP-2764]#1819
feat!: integrate obfuscation to the stats exporter [APMSP-2764]#1819
Conversation
📚 Documentation Check Results📦
|
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
🔒 Cargo Deny Results📦
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1819 +/- ##
==========================================
- Coverage 71.70% 71.61% -0.09%
==========================================
Files 429 429
Lines 67916 68095 +179
==========================================
+ Hits 48697 48767 +70
- Misses 19219 19328 +109
🚀 New features to boost your workflow:
|
🎉 All green!❄️ No new flaky tests detected 🔗 Commit SHA: fd1a97e | Docs | Datadog PR Page | Give us feedback! |
|
@codex review |
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f2621f3041
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| client_side_stats.store(Arc::new(StatsComputationStatus::Enabled { | ||
| stats_concentrator: stats_concentrator.clone(), | ||
| cancellation_token: cancellation_token.clone(), | ||
| obfuscation_active: new_obfuscation_active, | ||
| })); |
There was a problem hiding this comment.
Recreate stats worker when obfuscation mode flips
When /info changes obfuscation support, this branch only updates client_side_stats and leaves the running StatsExporter untouched. The exporter’s obfuscation_active flag is fixed at construction time in create_and_start_stats_worker, so it keeps sending the old datadog-obfuscation-version header state even after process_traces_for_stats switches to the new obfuscation behavior. In the toggle scenario (e.g., agent enables obfuscation after startup), stats payload resources and header version can diverge, which makes the agent interpret stats under the wrong format/version.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR integrates obfuscation support into the trace exporter for client-side stats computation. It enables SDKs to obfuscate spans before sending them to the agent when the agent supports it, reducing data transfer between SDKs and the agent.
Changes:
- Adds
obfuscate_resource_for_statsfunction to obfuscate resource names for SQL, Redis, Cassandra, and Valkey spans - Implements obfuscation state tracking through the stats computation lifecycle
- Creates an
ObfuscatedStatSpanwrapper to provide obfuscated resources to the stats concentrator without modifying original spans - Adds HTTP header
datadog-obfuscation-version: 1when obfuscation is active - Adds schema field for agent's obfuscation version support
- Fixes a typo in documentation ("will" → "while")
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| libdd-trace-stats/src/span_concentrator/mod.rs | Exports the stat_span module publicly for use in the stats exporter |
| libdd-trace-obfuscation/src/obfuscate.rs | Adds obfuscate_resource_for_stats function with comprehensive tests for SQL, Redis, Cassandra, and Valkey obfuscation |
| libdd-data-pipeline/src/trace_exporter/stats.rs | Implements obfuscation state management and applies obfuscation to spans during stats computation |
| libdd-data-pipeline/src/trace_exporter/mod.rs | Passes obfuscation state through the stats handler call chain |
| libdd-data-pipeline/src/stats_exporter.rs | Adds obfuscation header to HTTP requests when obfuscation is active, includes test coverage |
| libdd-data-pipeline/src/agent_info/schema.rs | Adds obfuscation_version field to capture agent capabilities |
| libdd-data-pipeline/Cargo.toml | Adds dependency on libdd-trace-obfuscation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…scation feature flag
…tate between StatsExporter and stats enum
This config field is needed for CSS obfuscation. In fact this is the only needed obfuscation config field AFAIK. ### What does this PR do? - adds a "sql_obfuscation_mode" field to /info endpoint for using it when obfuscating stats in libraries. ### Motivation - finish [integrating obfuscation to the trace exporter](DataDog/libdatadog#1819) ### Describe how you validated your changes - [x] unit test - [x] system test ### Additional Notes Co-authored-by: oscar.ledauphin <oscar.ledauphin@datadoghq.com>
| client_side_stats.store(Arc::new(StatsComputationStatus::Enabled { | ||
| stats_concentrator: stats_concentrator.clone(), | ||
| cancellation_token: cancellation_token.clone(), | ||
| obfuscation_active: new_obfuscation_active, |
There was a problem hiding this comment.
If you want to avoid having to recreate the StatsExporter when obfuscation status is updated, you could use an Arc<AtomicBool> or ArcSwap<AtomicBool> (since it's not updated often). And share between the trace exporter and the stats exporter.
| { | ||
| obfuscation_active.load(Ordering::Relaxed) | ||
| } else { | ||
| unreachable!() |
There was a problem hiding this comment.
Use an error! log instead. We want to avoid panicking as much as possible.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e7b04a68d3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| impl StatsComputationObfuscationConfig { | ||
| pub fn disabled() -> SharedStatsComputationObfuscationConfig { |
There was a problem hiding this comment.
Guard obfuscation helper impl with feature flag
StatsComputationObfuscationConfig and SharedStatsComputationObfuscationConfig are declared only under #[cfg(feature = "stats-obfuscation")], but this impl block is unconditional. Building libdd-trace-stats without that feature (its default feature set) will fail with missing-type errors, so consumers that do not opt into obfuscation can no longer compile the crate.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 14 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #[derive(Clone, Hash, PartialEq, Eq)] | ||
| /// Represent a stats aggregation key borrowed from span data | ||
| pub(super) struct BorrowedAggregationKey<'a> { | ||
| resource_name: &'a str, | ||
| resource_name: String, | ||
| service_name: &'a str, |
There was a problem hiding this comment.
BorrowedAggregationKey now owns resource_name: String, forcing an allocation per span and making the “borrowed” key concept misleading. Consider using Cow<'a, str> (borrow by default, allocate only when obfuscation changes it) to keep the common case allocation-free.
| let mut resource_name = span.resource().to_owned(); | ||
|
|
||
| #[cfg(feature = "stats-obfuscation")] | ||
| if self.obfuscation_config.load().enabled { | ||
| let dbms_hint: Option<&str> = span.get_meta("db.type"); |
There was a problem hiding this comment.
This unconditionally allocates a new String for every eligible span (span.resource().to_owned()), even when stats-obfuscation is disabled and even when no obfuscation is applied. This is on the hot path for stats aggregation and can be a significant regression; consider using Cow<'a, str> for the resource name (borrow by default, allocate only when obfuscation changes it) or otherwise avoid cloning unless needed.
| let stats_exporter = StatsExporter::new( | ||
| BUCKETS_DURATION, | ||
| Arc::new(Mutex::new(get_test_concentrator())), | ||
| get_test_metadata(), | ||
| Endpoint::from_url(stats_url_from_agent_url(&server.url("/")).unwrap()), | ||
| NativeCapabilities::new_client(), | ||
| StatsComputationObfuscationConfig::disabled(), |
There was a problem hiding this comment.
This test expects the datadog-obfuscation-version header, but it builds the exporter with StatsComputationObfuscationConfig::disabled() (which sets enabled = false). As written, the header won’t be inserted and the test should fail; set enabled = true in the shared config (or add an enabled() helper) before calling send.
| let stats_exporter = StatsExporter::new( | |
| BUCKETS_DURATION, | |
| Arc::new(Mutex::new(get_test_concentrator())), | |
| get_test_metadata(), | |
| Endpoint::from_url(stats_url_from_agent_url(&server.url("/")).unwrap()), | |
| NativeCapabilities::new_client(), | |
| StatsComputationObfuscationConfig::disabled(), | |
| let mut obfuscation_config = StatsComputationObfuscationConfig::disabled(); | |
| obfuscation_config.enabled = true; | |
| let stats_exporter = StatsExporter::new( | |
| BUCKETS_DURATION, | |
| Arc::new(Mutex::new(get_test_concentrator())), | |
| get_test_metadata(), | |
| Endpoint::from_url(stats_url_from_agent_url(&server.url("/")).unwrap()), | |
| NativeCapabilities::new_client(), | |
| obfuscation_config, |
| @@ -299,6 +319,7 @@ mod tests { | |||
| get_test_metadata(), | |||
| Endpoint::from_url(stats_url_from_agent_url(&server.url("/")).unwrap()), | |||
| NativeCapabilities::new_client(), | |||
There was a problem hiding this comment.
This StatsExporter::new call passes StatsComputationObfuscationConfig::disabled() without a #[cfg(feature = "stats-obfuscation")] guard. When the feature is off, StatsExporter::new won’t take this extra argument and the test won’t compile. Gate this argument (or cfg-gate the entire test) the same way as in test_send_stats.
| NativeCapabilities::new_client(), | |
| NativeCapabilities::new_client(), | |
| #[cfg(feature = "stats-obfuscation")] |
| get_test_metadata(), | ||
| Endpoint::from_url(stats_url_from_agent_url(&server.url("/")).unwrap()), | ||
| NativeCapabilities::new_client(), | ||
| StatsComputationObfuscationConfig::disabled(), | ||
| ); |
There was a problem hiding this comment.
This test passes StatsComputationObfuscationConfig::disabled() without a #[cfg(feature = "stats-obfuscation")] guard. With the feature disabled, the call signature changes and this test won’t compile; wrap the argument in #[cfg(feature = "stats-obfuscation")] or cfg-gate the test.
| status: ArcSwap::new(stats.into()), | ||
| #[cfg(feature = "stats-obfuscation")] | ||
| obfuscation_config: Arc::new(ArcSwap::from_pointee( | ||
| StatsComputationObfuscationConfig::default(), |
There was a problem hiding this comment.
Isn't this only imported with the stats-obfuscation feature?
There was a problem hiding this comment.
Yes, StatsComputationObfuscationConfig is only needed with the stats-obfuscation feature otherwise it's not relevant. Now that I'm reading it again maybe StatsComputationObfuscationConfig is too similar to StatsComputationConfig which can make it confusing to read
What does this PR do?
Integrate the work that has been done on obfuscate_span into the trace exporter
Motivation
Let the sdks obfuscate spans before sending them to the agent when possible, reducing the amount of data transfered from the sdks to the agent.
Additional Notes
Anything else we should know when reviewing?
How to test the change?
TODO
obfuscate more than just sql and redisnot for stats