Conversation
a7773cc to
070ae12
Compare
2056c19 to
d5a3f08
Compare
006cb8e to
4142915
Compare
5a64507 to
3562c93
Compare
|
@codex review |
There was a problem hiding this comment.
Pull request overview
This PR adds trace stats computation capabilities to the Serverless Compatibility Layer. It enables the system to compute and send trace statistics when tracers haven't already computed them, enhancing stats availability across all supported serverless environments.
Changes:
- Adds a new
StatsConcentratorServicemodule that manages span concentration and stats bucket generation using thelibdd-trace-statslibrary - Adds a new
StatsGeneratormodule that extracts metadata and spans from traces and sends them to the stats concentrator - Extends
ServerlessTraceProcessorto optionally call the stats generator when the tracer hasn't already computed stats - Extends
ServerlessStatsFlusherto flush both tracer-computed and serverless-computed stats - Adds integration tests to verify stats computation and to verify stats are skipped when client has already computed them
- Adds configuration options for service, env, and version fields, and an environment variable to enable/disable stats computation
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
src/stats_concentrator_service.rs |
New module implementing stats concentration service with metadata tracking and flush logic |
src/stats_generator.rs |
New module for extracting trace information and sending to stats concentrator |
src/trace_processor.rs |
Modified to conditionally invoke stats generator based on client-computed-stats header |
src/stats_flusher.rs |
Modified to flush both channel and concentrator-based stats with proper flow control |
src/config.rs |
Added service, env, version fields and corresponding environment variable reading |
src/main.rs |
Added initialization of stats computation with environment variable toggle |
src/lib.rs |
Exported new modules |
src/mini_agent.rs |
Minor code style improvements (let-else pattern) |
Cargo.toml |
Added thiserror and libdd-trace-stats dependencies |
tests/integration_test.rs |
Added comprehensive integration tests for stats generation scenarios |
tests/common/mock_server.rs |
Updated to handle stats intake with proper 202 status code response |
tests/common/helpers.rs |
Extended send_tcp_request to support additional headers |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ac8d049051
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| self.flush_stats(config.clone(), buffer.to_vec()).await; | ||
| buffer.clear(); | ||
| // copy the batch for this flush | ||
| let channel_stats = buffer.to_vec(); |
There was a problem hiding this comment.
Nit: LLM recommends let channel_stats = std::mem::take(&mut *buffer); to replace let channel_stats = buffer.to_vec(); buffer.clear(); but that's unrelated to the actual intent of the change here, so.
There was a problem hiding this comment.
Applied this change!
| } | ||
| Ok(None) => {} | ||
| Err(e) => { | ||
| error!("Failed to flush stats concentrator: {e}"); |
There was a problem hiding this comment.
Do we want a log in a loop here or a error to be caught and have the whole process restarted/shut down? The trace flusher and stats flusher both have a a Err("<this code> task terminated unexpectedly".into());, would be consistent to have that for the concentrator as well.
There was a problem hiding this comment.
I added a task terminated unexpectedly check for the span concentrator service to be consistent with these other background tasks.
9e4544f to
c2191f8
Compare
df2f714 to
3b74133
Compare
…nt-computed-stats (#1900) # What does this PR do? Treat an empty string as a falsy value for the `Datadog-Client-Computed-Stats` header. # Motivation We want to support agent computed stats in the Serverless Compatibility Layer. Currently when the `Datadog-Client-Computed-Stats` header is sent it always disables agent computed stats, even when the value of the header is an empty string. https://datadoghq.atlassian.net/browse/SVLS-8789 # Additional Notes - [Java Tracer sends an empty string](https://github.com/DataDog/dd-trace-java/blob/6e28457d70c41bf847c4af7b3ba4e2f6c1371070/dd-trace-core/src/main/java/datadog/trace/common/writer/ddagent/DDAgentApi.java#L110-L117) for `Datadog-Client-Computed-Stats` header where other tracers omit the header altogether - Go Agent [treats an empty string as a falsy value](https://github.com/DataDog/datadog-agent/blob/76aff83162011a15e5ee50295ac835f708e8ffa9/pkg/trace/api/api.go#L1049) - See DataDog/serverless-components#51 for adding agent computed stats in the Serverless Compatibility Layer. # How to test the change? Added a debug log in a test build: Before change with `DD_TRACE_STATS_COMPUTATION_ENABLED=false` ``` DEBUG datadog_trace_agent::trace_processor: Resolved tracer header tags: TracerHeaderTags { lang: "java", lang_version: "21.0.6", lang_interpreter: "OpenJDK 64-Bit Server VM", lang_vendor: "Microsoft", tracer_version: "1.61.1~e32291a78b", container_id: "", client_computed_top_level: true, client_computed_stats: true, dropped_p0_traces: 0, dropped_p0_spans: 0 } ``` After change with `DD_TRACE_STATS_COMPUTATION_ENABLED=false` ``` DEBUG datadog_trace_agent::trace_processor: Resolved tracer header tags: TracerHeaderTags { lang: "java", lang_version: "21.0.6", lang_interpreter: "OpenJDK 64-Bit Server VM", lang_vendor: "Microsoft", tracer_version: "1.61.1~e32291a78b", container_id: "", client_computed_top_level: true, client_computed_stats: false, dropped_p0_traces: 0, dropped_p0_spans: 0 } ``` Co-authored-by: duncan.harvey <duncan.harvey@datadoghq.com>
| } | ||
| let _ = flusher_shutdown_tx.send(()); | ||
| if let Err(e) = stats_flusher_handle.await { | ||
| error!("Stats flusher task failed during shutdown: {e:?}"); |
There was a problem hiding this comment.
in this case we don't return an error?
There was a problem hiding this comment.
I updated this stats flusher shutdown to return an error so it is with the other task terminated unexpectedly errors.
| PROFILING_ENDPOINT_PATH | ||
| ], | ||
| "client_drop_p0s": true, | ||
| "client_drop_p0s": false, |
There was a problem hiding this comment.
what does this mean?
There was a problem hiding this comment.
client_drop_p0s tells the tracer whether it should drop unsampled p0 traces before sending them. In order to actually support this, the Serverless Compatibility Layer needs to be able to send sampling rates back to the tracer. This feature is not currently supported today:
Essentially client_drop_p0s: true allows for stats computation in the tracer and client_drop_p0s: false disables stats computation in the tracer, with the expectation that stats computation will be handled in the agent.
There was a problem hiding this comment.
might be worth adding a note in a comment about that.
…before asserting the results
3b74133 to
e46f752
Compare
| pub fn new(config: Arc<Config>) -> (Self, StatsConcentratorHandle) { | ||
| let (tx, rx) = mpsc::unbounded_channel(); | ||
| let handle = StatsConcentratorHandle::new(tx); | ||
| // TODO: set span_kinds_stats_computed and peer_tag_keys |
There was a problem hiding this comment.
when will we do this todo?
| // Drain client stats in buffer and stats from concentrator on interval | ||
| _ = interval.tick() => { | ||
| let client_stats = std::mem::take(&mut buffer); | ||
| if !client_stats.is_empty() || self.stats_concentrator.is_some() { |
There was a problem hiding this comment.
why do we check for the stats concentrator to be some here?
| tracer_language: tracer_payload.language_name.clone(), | ||
| tracer_version: tracer_payload.tracer_version.clone(), | ||
| hostname: String::new(), | ||
| service_name: None, |
There was a problem hiding this comment.
the payload doesn't have a service name to pull?
| } | ||
| } | ||
| } else { | ||
| error!("Unsupported tracer payload version. Failed to send trace stats."); |
There was a problem hiding this comment.
we should probably say which version we saw that we don't know how to handle. i saw a proposal for a new tracer payload version coming down the pike
What does this PR do?
Motivation
Some tracers support trace stats computation, but all don't support it, or have it disabled by default. Trace stats computation in the Serverless Compatibility Layer allows all environments supported by this package to have trace stats.
https://datadoghq.atlassian.net/browse/SVLS-8032
Additional Notes
DD_SERVERLESS_STATS_COMPUTATION_ENABLED=falseclient_drop_p0s: falsein agent/infoendpointclient_computed: falseto stats intakeDescribe how to test/QA your changes
Datadog-Client-Computed-Statsis present in the trace payload