-
Notifications
You must be signed in to change notification settings - Fork 948
feat: adding telemetry-trace-sample-rate cli arg and set default to 1%
#8561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
Conversation
Adding a `new telemetry-trace-sample-rate` cli arg for both VC and BN to adjust sampling rate for OpenTelemetry tracing spans sigp#8554
eserilev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few minor things and it should be ready for another round of review. Thanks!
lighthouse/src/main.rs
Outdated
| let sample_rate = matches | ||
| .get_one::<f64>("telemetry-trace-sample-rate") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should keep this a usize and allow values between 0 - 100. we try to avoid using f64 in our codebase in general and i dont think we need more granularity here than full percentage points
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm ok i'm down to convert it to u8 but this needs to be convert to ratio in f64 when we set the variant.
see below
TraceIdRatioBased([f64](https://doc.rust-lang.org/nightly/std/primitive.f64.html))
Sample a given fraction of traces. Fractions >= 1 will always sample. If the parent span is sampled, then it’s child spans will automatically be sampled. Fractions < 0 are treated as zero, but spans may still be sampled if their parent is. Note: If this is used then all Spans in a trace will become sampled assuming that the first span is sampled as it is based on the trace_id not the span_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gonna use u8 instead of usize given ValueParserFactory does not support usize for range, if its cool hehe
| .display_order(0) | ||
| ) | ||
| .arg( | ||
| Arg::new("telemetry-trace-sample-rate") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please write a test case for this flag. we have some example flag specific tests you can use as an example
| ) | ||
| .requires("telemetry-collector-url") | ||
| .value_parser(clap::value_parser!(f64)) | ||
| .action(ArgAction::Set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets set the default here explicitly via .default_value("1")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can also set a min and max value explicitly at the flag level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure and we can use the range of value parser for min and max
book/src/help_general.md
Outdated
| Override the OpenTelemetry service name. Defaults to 'lighthouse-bn' | ||
| for beacon node, 'lighthouse-vc' for validator client, or 'lighthouse' | ||
| for other subcommands. | ||
| --telemetry-trace-sample-rate <RATE> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in order to update this file you should actually just run make cli-local.
|
tagging as backwards incompatible because the default trace span sampling is reduced from 100% to 1% |
Fix an issue where a kurtosis testnet script was failing because no supernodes were provided ``` There was an error interpreting Starlark code Evaluation error: fail: Fulu fork is enabled (epoch: 0) but no supernodes are configured, no nodes have 128 or more validators, and perfect_peerdas_enabled is not enabled. Either configure a supernode, ensure at least one node has 128+ validators, or enable perfect_peerdas_enabled in network_params with 16 participants. at [github.com/ethpandaops/ethereum-package/main.star:83:57]: run at [github.com/ethpandaops/ethereum-package/src/package_io/input_parser.star:377:17]: input_parser at [0:0]: fail ``` Co-Authored-By: Eitan Seri-Levi <[email protected]> Co-Authored-By: Pawan Dhananjay <[email protected]>
…8559) Co-Authored-By: Tan Chee Keong <[email protected]>
|
@eserilev should be good to go for another review. Just used the config in the BN to keep the tests simple and following similar pattern as the other cli flag tests. |
Adding a new
telemetry-trace-sample-ratecli arg for both VC and BN to adjust sampling rate for OpenTelemetry tracing spans. Default will now be 1%.#8554
Proposed Changes
Simply add the cli arg then set the sampler arg in the open telemetry object constructor via below variant
https://docs.rs/opentelemetry_sdk/0.30.0/opentelemetry_sdk/trace/enum.Sampler.html#variant.TraceIdRatioBased
Additional Info
Didn't add unit tests for this given low risk and low ROI to added meaningfully test in this case
Testing
ran cargo nextest run -p lighthouse --release and all tests passed, see below