Skip to content

Conversation

@0xmrree
Copy link

@0xmrree 0xmrree commented Dec 10, 2025

Adding a new telemetry-trace-sample-rate cli arg for both VC and BN to adjust sampling rate for OpenTelemetry tracing spans. Default will now be 1%.

#8554

Proposed Changes

Simply add the cli arg then set the sampler arg in the open telemetry object constructor via below variant
https://docs.rs/opentelemetry_sdk/0.30.0/opentelemetry_sdk/trace/enum.Sampler.html#variant.TraceIdRatioBased

Additional Info

Didn't add unit tests for this given low risk and low ROI to added meaningfully test in this case

Testing

ran cargo nextest run -p lighthouse --release and all tests passed, see below

...
        PASS [   0.021s] lighthouse::lighthouse_tests validator_manager::validator_import_defaults
        PASS [   0.022s] lighthouse::lighthouse_tests validator_manager::validator_import_misc_flags
        PASS [   0.013s] lighthouse::lighthouse_tests validator_manager::validator_import_missing_both_file_flags
        PASS [   0.017s] lighthouse::lighthouse_tests validator_manager::validator_import_missing_token
        PASS [   0.014s] lighthouse::lighthouse_tests validator_manager::validator_import_using_both_file_flags
        PASS [   0.019s] lighthouse::lighthouse_tests validator_manager::validator_list_defaults
        PASS [   0.019s] lighthouse::lighthouse_tests validator_manager::validator_move_count
        PASS [   0.019s] lighthouse::lighthouse_tests validator_manager::validator_move_defaults
        PASS [   0.021s] lighthouse::lighthouse_tests validator_manager::validator_move_misc_flags_0
        PASS [   0.021s] lighthouse::lighthouse_tests validator_manager::validator_move_misc_flags_1
        PASS [   0.023s] lighthouse::lighthouse_tests validator_manager::validator_move_misc_flags_2
        PASS [  25.886s] lighthouse::lighthouse_tests beacon_node::validator_monitor_file_flag
        PASS [  26.837s] lighthouse::lighthouse_tests beacon_node::validator_monitor_metrics_threshold_custom
        PASS [  26.960s] lighthouse::lighthouse_tests beacon_node::validator_monitor_metrics_threshold_default
        PASS [  23.081s] lighthouse::lighthouse_tests beacon_node::validator_monitor_pubkeys_flag
        PASS [  47.206s] lighthouse::lighthouse_tests beacon_node::test_builder_disable_ssz_flag
        PASS [  20.507s] lighthouse::lighthouse_tests beacon_node::wss_checkpoint_flag
        PASS [  15.336s] lighthouse::lighthouse_tests beacon_node::zero_ports_flag
────────────
     Summary [ 696.662s] 310 tests run: 310 passed, 0 skipped

Adding a `new telemetry-trace-sample-rate` cli arg for both VC and BN to adjust sampling rate for OpenTelemetry tracing spans

sigp#8554
Copy link
Member

@eserilev eserilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few minor things and it should be ready for another round of review. Thanks!

Comment on lines 695 to 696
let sample_rate = matches
.get_one::<f64>("telemetry-trace-sample-rate")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should keep this a usize and allow values between 0 - 100. we try to avoid using f64 in our codebase in general and i dont think we need more granularity here than full percentage points

Copy link
Author

@0xmrree 0xmrree Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ok i'm down to convert it to u8 but this needs to be convert to ratio in f64 when we set the variant.

see below

TraceIdRatioBased([f64](https://doc.rust-lang.org/nightly/std/primitive.f64.html))

Sample a given fraction of traces. Fractions >= 1 will always sample. If the parent span is sampled, then it’s child spans will automatically be sampled. Fractions < 0 are treated as zero, but spans may still be sampled if their parent is. Note: If this is used then all Spans in a trace will become sampled assuming that the first span is sampled as it is based on the trace_id not the span_id

Copy link
Author

@0xmrree 0xmrree Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gonna use u8 instead of usize given ValueParserFactory does not support usize for range, if its cool hehe

.display_order(0)
)
.arg(
Arg::new("telemetry-trace-sample-rate")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please write a test case for this flag. we have some example flag specific tests you can use as an example

)
.requires("telemetry-collector-url")
.value_parser(clap::value_parser!(f64))
.action(ArgAction::Set)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets set the default here explicitly via .default_value("1")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also set a min and max value explicitly at the flag level

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure and we can use the range of value parser for min and max

Override the OpenTelemetry service name. Defaults to 'lighthouse-bn'
for beacon node, 'lighthouse-vc' for validator client, or 'lighthouse'
for other subcommands.
--telemetry-trace-sample-rate <RATE>
Copy link
Member

@eserilev eserilev Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in order to update this file you should actually just run make cli-local.

@eserilev eserilev added UX-and-logs waiting-on-author The reviewer has suggested changes and awaits thier implementation. tracing backwards-incompat Backwards-incompatible API change labels Dec 11, 2025
@eserilev
Copy link
Member

tagging as backwards incompatible because the default trace span sampling is reduced from 100% to 1%

eserilev and others added 5 commits December 11, 2025 23:58
Fix an issue where a kurtosis testnet script was failing because no supernodes were provided


```
There was an error interpreting Starlark code
Evaluation error: fail: Fulu fork is enabled (epoch: 0) but no supernodes are configured, no nodes have 128 or more validators, and perfect_peerdas_enabled is not enabled. Either configure a supernode, ensure at least one node has 128+ validators, or enable perfect_peerdas_enabled in network_params with 16 participants.
at [github.com/ethpandaops/ethereum-package/main.star:83:57]: run
at [github.com/ethpandaops/ethereum-package/src/package_io/input_parser.star:377:17]: input_parser
at [0:0]: fail
```


  


Co-Authored-By: Eitan Seri-Levi <[email protected]>

Co-Authored-By: Pawan Dhananjay <[email protected]>
@0xmrree
Copy link
Author

0xmrree commented Dec 12, 2025

@eserilev should be good to go for another review. Just used the config in the BN to keep the tests simple and following similar pattern as the other cli flag tests.

@0xmrree 0xmrree requested a review from eserilev December 12, 2025 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backwards-incompat Backwards-incompatible API change tracing UX-and-logs waiting-on-author The reviewer has suggested changes and awaits thier implementation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants