extend hotblocks metrics with client information#63
Conversation
part of NET-68
|
|
||
| let client_id = req | ||
| .headers() | ||
| .get("x-client-id") |
There was a problem hiding this comment.
I think "x-" prefixes are deprecated now:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers
https://www.keycdn.com/support/custom-http-headers#naming-conventions
I suggest using just "client-id"
There was a problem hiding this comment.
we already have a few headers that are prefixed with x-sqd-: x-sqd-finalized-head-number, x-sqd-finalized-head-hash etc. would it be better to change it to x-sqd-client-id or you insist on client-id?
| LazyLock::new(|| Family::new_with_constructor(|| Histogram::new(buckets(1., 20)))); | ||
| pub static STREAM_BYTES_PER_SECOND: LazyLock<Histogram> = | ||
| LazyLock::new(|| Histogram::new(exponential_buckets(100., 3.0, 20))); | ||
| pub static STREAM_BYTES_PER_SECOND: LazyLock<Family<Labels, Histogram>> = LazyLock::new(|| { |
There was a problem hiding this comment.
Why was this one not broken down by dataset?
There was a problem hiding this comment.
maybe it wasn't required 🤷♂️
now i am wondering if it will break existing chart that uses hotblocks_stream_bytes_per_second_bucket metric. should i revert this change back?
| LazyLock::new(|| Family::new_with_constructor(|| Histogram::new(buckets(1., 20)))); | ||
| pub static STREAM_BYTES_PER_SECOND: LazyLock<Histogram> = | ||
| LazyLock::new(|| Histogram::new(exponential_buckets(100., 3.0, 20))); | ||
| pub static STREAM_BYTES_PER_SECOND: LazyLock<Family<Labels, Histogram>> = LazyLock::new(|| { |
There was a problem hiding this comment.
This metric measures meaningless value as throughput totally depends on a query.
Apart from that, it is a memory leak in case of unrestricted client ids.
There was a problem hiding this comment.
client id supposed to be controlled by ourselves and it represents portals and tooling around hotblocks so the amount of clients expected to be small
This metric measures meaningless value as throughput totally depends on a query.
idk... it already allows us to see if there were high throughput streams or not. not sure if we need it be splitted by client id though
There was a problem hiding this comment.
client id supposed to be controlled by ourselves and it represents portals and tooling around hotblocks so the amount of clients expected to be small
That's the assumption about the usage context which all portal deployers must be aware of.
This is what "accidental complexity" is all about.
There was a problem hiding this comment.
then we could pass a list of expected clients via cli. is it a good solution and should i implement it?
There was a problem hiding this comment.
I've always treated metrics as a health check instrument and nothing more.
Metric analysis should tell whether the system is OK or how specifically the deployment needs to be changed to fix the situation.
From that perspective none of touched metrics makes sense and should exist in the first place.
Those metrics look a lot like descendants of the portal benchmark harness put in the wrong place,
and that's why things do not align here.
So, what is the problem you are trying to solve?
There was a problem hiding this comment.
for the context: we've concuded that lauching hotblocks for every portal is heavy and "overkill" so the idea was to deploy and scale both portals and hotblocks separately. let's say we want to setup a dedicated portal setup for a specific client then why not reuse hotblocks instance that already supports all required datasets and isn't overloaded.
considering this we wan't to see which client send invalid or heavy requests so we could act accordingly.
for example on this chart we can see amount of requests but have no understanding from where it goes

There was a problem hiding this comment.
Access log will solve such problems in a comprehensive manner.
this pr extends existing metrics with information about client. in order to use it client should add
x-client-idheader