-
Notifications
You must be signed in to change notification settings - Fork 29
Description
We're using Remora for exporting consumer group lag to CloudWatch metrics. Thanks for open sourcing this!
The issue
Metrics are currently exported as follows:
This limits how they can be queried (for example in Grafana). When creating a single graph that shows the lag for all partitions in a certain consumer group, you have to add a query for each of them individually. This is because you can't do wildcard searches on metric a name. Grafana allows for up to 5 CloudWatch searches in a single panel, so a maximum of 5 partitions can be plotted.
It is possible to do wildcard searches on dimensions though. This way, you would be able to do a single query that displays all partition offsets regardless of the number of partitions.
Proposed solution
I propose we change how metrics are exported to CloudWatch:
- Metric name:
By consumer group.<Consumer group id>.<metric>where is one of 'lag', 'logend' and 'offset' - Metric dimensions:
Topic(e.g. 'MyTopic')Partition(e.g. '2')
For internal metrics like KafkaClientActor.receiveCounter:
- Metric name:
Remora internals.<metric>where is the same as what it is now - Metric dimensions:
metricType(e.g. 'gauge' or 'counterCount')
This would be a breaking change, so we'd have to change the version to 2.0.0.
More info on CloudWatch dimensions: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html
What do you think?

