Skip to content

extend hotblocks metrics with client information#63

Open
tmcgroul wants to merge 1 commit intomasterfrom
hotblocks-metrics
Open

extend hotblocks metrics with client information#63
tmcgroul wants to merge 1 commit intomasterfrom
hotblocks-metrics

Conversation

@tmcgroul
Copy link
Copy Markdown
Contributor

this pr extends existing metrics with information about client. in order to use it client should add x-client-id header

@tmcgroul tmcgroul requested a review from kalabukdima March 30, 2026 18:09

let client_id = req
.headers()
.get("x-client-id")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have a few headers that are prefixed with x-sqd-: x-sqd-finalized-head-number, x-sqd-finalized-head-hash etc. would it be better to change it to x-sqd-client-id or you insist on client-id?

LazyLock::new(|| Family::new_with_constructor(|| Histogram::new(buckets(1., 20))));
pub static STREAM_BYTES_PER_SECOND: LazyLock<Histogram> =
LazyLock::new(|| Histogram::new(exponential_buckets(100., 3.0, 20)));
pub static STREAM_BYTES_PER_SECOND: LazyLock<Family<Labels, Histogram>> = LazyLock::new(|| {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this one not broken down by dataset?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it wasn't required 🤷‍♂️
now i am wondering if it will break existing chart that uses hotblocks_stream_bytes_per_second_bucket metric. should i revert this change back?

LazyLock::new(|| Family::new_with_constructor(|| Histogram::new(buckets(1., 20))));
pub static STREAM_BYTES_PER_SECOND: LazyLock<Histogram> =
LazyLock::new(|| Histogram::new(exponential_buckets(100., 3.0, 20)));
pub static STREAM_BYTES_PER_SECOND: LazyLock<Family<Labels, Histogram>> = LazyLock::new(|| {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metric measures meaningless value as throughput totally depends on a query.

Apart from that, it is a memory leak in case of unrestricted client ids.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client id supposed to be controlled by ourselves and it represents portals and tooling around hotblocks so the amount of clients expected to be small

This metric measures meaningless value as throughput totally depends on a query.

idk... it already allows us to see if there were high throughput streams or not. not sure if we need it be splitted by client id though

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client id supposed to be controlled by ourselves and it represents portals and tooling around hotblocks so the amount of clients expected to be small

That's the assumption about the usage context which all portal deployers must be aware of.

This is what "accidental complexity" is all about.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we could pass a list of expected clients via cli. is it a good solution and should i implement it?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always treated metrics as a health check instrument and nothing more.

Metric analysis should tell whether the system is OK or how specifically the deployment needs to be changed to fix the situation.

From that perspective none of touched metrics makes sense and should exist in the first place.
Those metrics look a lot like descendants of the portal benchmark harness put in the wrong place,
and that's why things do not align here.

So, what is the problem you are trying to solve?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the context: we've concuded that lauching hotblocks for every portal is heavy and "overkill" so the idea was to deploy and scale both portals and hotblocks separately. let's say we want to setup a dedicated portal setup for a specific client then why not reuse hotblocks instance that already supports all required datasets and isn't overloaded.
considering this we wan't to see which client send invalid or heavy requests so we could act accordingly.
for example on this chart we can see amount of requests but have no understanding from where it goes
изображение

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Access log will solve such problems in a comprehensive manner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants