Skip to content

feat: add trace stats computation#51

Open
duncanpharvey wants to merge 13 commits intomainfrom
duncan-harvey/trace-stats
Open

feat: add trace stats computation#51
duncanpharvey wants to merge 13 commits intomainfrom
duncan-harvey/trace-stats

Conversation

@duncanpharvey
Copy link
Copy Markdown
Collaborator

@duncanpharvey duncanpharvey commented Dec 3, 2025

What does this PR do?

  • Add trace stats computation in the Serverless Compatibility Layer.
  • Disables tracer computed stats

Motivation

Some tracers support trace stats computation, but all don't support it, or have it disabled by default. Trace stats computation in the Serverless Compatibility Layer allows all environments supported by this package to have trace stats.

https://datadoghq.atlassian.net/browse/SVLS-8032

Additional Notes

Describe how to test/QA your changes

@duncanpharvey duncanpharvey force-pushed the duncan-harvey/trace-stats branch 2 times, most recently from a7773cc to 070ae12 Compare December 11, 2025 14:51
@duncanpharvey duncanpharvey force-pushed the duncan-harvey/trace-stats branch from 2056c19 to d5a3f08 Compare December 17, 2025 22:16
@duncanpharvey duncanpharvey force-pushed the duncan-harvey/trace-stats branch 2 times, most recently from 006cb8e to 4142915 Compare March 27, 2026 14:16
@duncanpharvey duncanpharvey force-pushed the duncan-harvey/trace-stats branch from 5a64507 to 3562c93 Compare March 31, 2026 17:50
@duncanpharvey duncanpharvey marked this pull request as ready for review April 1, 2026 20:48
@duncanpharvey duncanpharvey requested a review from a team as a code owner April 1, 2026 20:48
Copilot AI review requested due to automatic review settings April 1, 2026 20:48
@duncanpharvey duncanpharvey requested a review from a team as a code owner April 1, 2026 20:48
@duncanpharvey duncanpharvey requested review from Lewis-E and litianningdatadog and removed request for a team April 1, 2026 20:48
@duncanpharvey
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds trace stats computation capabilities to the Serverless Compatibility Layer. It enables the system to compute and send trace statistics when tracers haven't already computed them, enhancing stats availability across all supported serverless environments.

Changes:

  • Adds a new StatsConcentratorService module that manages span concentration and stats bucket generation using the libdd-trace-stats library
  • Adds a new StatsGenerator module that extracts metadata and spans from traces and sends them to the stats concentrator
  • Extends ServerlessTraceProcessor to optionally call the stats generator when the tracer hasn't already computed stats
  • Extends ServerlessStatsFlusher to flush both tracer-computed and serverless-computed stats
  • Adds integration tests to verify stats computation and to verify stats are skipped when client has already computed them
  • Adds configuration options for service, env, and version fields, and an environment variable to enable/disable stats computation

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/stats_concentrator_service.rs New module implementing stats concentration service with metadata tracking and flush logic
src/stats_generator.rs New module for extracting trace information and sending to stats concentrator
src/trace_processor.rs Modified to conditionally invoke stats generator based on client-computed-stats header
src/stats_flusher.rs Modified to flush both channel and concentrator-based stats with proper flow control
src/config.rs Added service, env, version fields and corresponding environment variable reading
src/main.rs Added initialization of stats computation with environment variable toggle
src/lib.rs Exported new modules
src/mini_agent.rs Minor code style improvements (let-else pattern)
Cargo.toml Added thiserror and libdd-trace-stats dependencies
tests/integration_test.rs Added comprehensive integration tests for stats generation scenarios
tests/common/mock_server.rs Updated to handle stats intake with proper 202 status code response
tests/common/helpers.rs Extended send_tcp_request to support additional headers

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ac8d049051

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/datadog-trace-agent/src/stats_concentrator_service.rs
Comment thread crates/datadog-trace-agent/src/stats_concentrator_service.rs Outdated
Comment thread crates/datadog-trace-agent/src/stats_concentrator_service.rs Outdated
Comment thread crates/datadog-trace-agent/src/stats_concentrator_service.rs Outdated
Comment thread crates/datadog-trace-agent/src/stats_concentrator_service.rs
Comment thread crates/datadog-trace-agent/src/stats_concentrator_service.rs Outdated
Comment thread crates/datadog-serverless-compat/src/main.rs Outdated
self.flush_stats(config.clone(), buffer.to_vec()).await;
buffer.clear();
// copy the batch for this flush
let channel_stats = buffer.to_vec();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: LLM recommends let channel_stats = std::mem::take(&mut *buffer); to replace let channel_stats = buffer.to_vec(); buffer.clear(); but that's unrelated to the actual intent of the change here, so.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied this change!

}
Ok(None) => {}
Err(e) => {
error!("Failed to flush stats concentrator: {e}");
Copy link
Copy Markdown
Contributor

@Lewis-E Lewis-E Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want a log in a loop here or a error to be caught and have the whole process restarted/shut down? The trace flusher and stats flusher both have a a Err("<this code> task terminated unexpectedly".into());, would be consistent to have that for the concentrator as well.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a task terminated unexpectedly check for the span concentrator service to be consistent with these other background tasks.

@duncanpharvey duncanpharvey force-pushed the duncan-harvey/trace-stats branch from 9e4544f to c2191f8 Compare April 7, 2026 15:50
@duncanpharvey duncanpharvey marked this pull request as draft April 9, 2026 20:51
@duncanpharvey duncanpharvey force-pushed the duncan-harvey/trace-stats branch 3 times, most recently from df2f714 to 3b74133 Compare April 20, 2026 17:14
@duncanpharvey duncanpharvey marked this pull request as ready for review April 20, 2026 17:21
gh-worker-dd-mergequeue-cf854d Bot pushed a commit to DataDog/libdatadog that referenced this pull request Apr 22, 2026
…nt-computed-stats (#1900)

# What does this PR do?

Treat an empty string as a falsy value for the `Datadog-Client-Computed-Stats` header.

# Motivation

We want to support agent computed stats in the Serverless Compatibility Layer. Currently when the `Datadog-Client-Computed-Stats` header is sent it always disables agent computed stats, even when the value of the header is an empty string.

https://datadoghq.atlassian.net/browse/SVLS-8789

# Additional Notes

- [Java Tracer sends an empty string](https://github.com/DataDog/dd-trace-java/blob/6e28457d70c41bf847c4af7b3ba4e2f6c1371070/dd-trace-core/src/main/java/datadog/trace/common/writer/ddagent/DDAgentApi.java#L110-L117) for `Datadog-Client-Computed-Stats` header where other tracers omit the header altogether
- Go Agent [treats an empty string as a falsy value](https://github.com/DataDog/datadog-agent/blob/76aff83162011a15e5ee50295ac835f708e8ffa9/pkg/trace/api/api.go#L1049)
- See DataDog/serverless-components#51 for adding agent computed stats in the Serverless Compatibility Layer.

# How to test the change?

Added a debug log in a test build:

Before change with `DD_TRACE_STATS_COMPUTATION_ENABLED=false`
```
DEBUG datadog_trace_agent::trace_processor: Resolved tracer header tags: TracerHeaderTags { lang: "java", lang_version: "21.0.6", lang_interpreter: "OpenJDK 64-Bit Server VM", lang_vendor: "Microsoft", tracer_version: "1.61.1~e32291a78b", container_id: "", client_computed_top_level: true, client_computed_stats: true, dropped_p0_traces: 0, dropped_p0_spans: 0 }
```

After change with `DD_TRACE_STATS_COMPUTATION_ENABLED=false`
```
DEBUG datadog_trace_agent::trace_processor: Resolved tracer header tags: TracerHeaderTags { lang: "java", lang_version: "21.0.6", lang_interpreter: "OpenJDK 64-Bit Server VM", lang_vendor: "Microsoft", tracer_version: "1.61.1~e32291a78b", container_id: "", client_computed_top_level: true, client_computed_stats: false, dropped_p0_traces: 0, dropped_p0_spans: 0 }
```


Co-authored-by: duncan.harvey <duncan.harvey@datadoghq.com>
Comment thread crates/datadog-serverless-compat/src/main.rs Outdated
Comment thread crates/datadog-serverless-compat/src/main.rs Outdated
Comment thread crates/datadog-trace-agent/src/mini_agent.rs Outdated
Comment thread crates/datadog-trace-agent/src/mini_agent.rs
}
let _ = flusher_shutdown_tx.send(());
if let Err(e) = stats_flusher_handle.await {
error!("Stats flusher task failed during shutdown: {e:?}");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case we don't return an error?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this stats flusher shutdown to return an error so it is with the other task terminated unexpectedly errors.

PROFILING_ENDPOINT_PATH
],
"client_drop_p0s": true,
"client_drop_p0s": false,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client_drop_p0s tells the tracer whether it should drop unsampled p0 traces before sending them. In order to actually support this, the Serverless Compatibility Layer needs to be able to send sampling rates back to the tracer. This feature is not currently supported today:

let body = json!({"rate_by_service":{"service:,env:":1}}).to_string();

Essentially client_drop_p0s: true allows for stats computation in the tracer and client_drop_p0s: false disables stats computation in the tracer, with the expectation that stats computation will be handled in the agent.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth adding a note in a comment about that.

Comment thread crates/datadog-trace-agent/src/stats_concentrator_service.rs
Comment thread crates/datadog-trace-agent/src/stats_flusher.rs Outdated
Comment thread crates/datadog-trace-agent/src/stats_generator.rs Outdated
@duncanpharvey duncanpharvey force-pushed the duncan-harvey/trace-stats branch from 3b74133 to e46f752 Compare April 24, 2026 18:23
pub fn new(config: Arc<Config>) -> (Self, StatsConcentratorHandle) {
let (tx, rx) = mpsc::unbounded_channel();
let handle = StatsConcentratorHandle::new(tx);
// TODO: set span_kinds_stats_computed and peer_tag_keys
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when will we do this todo?

// Drain client stats in buffer and stats from concentrator on interval
_ = interval.tick() => {
let client_stats = std::mem::take(&mut buffer);
if !client_stats.is_empty() || self.stats_concentrator.is_some() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we check for the stats concentrator to be some here?

tracer_language: tracer_payload.language_name.clone(),
tracer_version: tracer_payload.tracer_version.clone(),
hostname: String::new(),
service_name: None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the payload doesn't have a service name to pull?

}
}
} else {
error!("Unsupported tracer payload version. Failed to send trace stats.");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably say which version we saw that we don't know how to handle. i saw a proposal for a new tracer payload version coming down the pike

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants