Skip to content

[CelerData] Add slow_lock_held_time_ms and and slow_lock_wait_time_ms metrics metrics#3021

Open
jaogoy wants to merge 3 commits into
DataDog:masterfrom
jaogoy:feat.celerdata-slow-lock-metrics
Open

[CelerData] Add slow_lock_held_time_ms and and slow_lock_wait_time_ms metrics metrics#3021
jaogoy wants to merge 3 commits into
DataDog:masterfrom
jaogoy:feat.celerdata-slow-lock-metrics

Conversation

@jaogoy
Copy link
Copy Markdown

@jaogoy jaogoy commented May 29, 2026

What does this PR do?

Celerdata/StarRocks FE introduced two summary metrics for slow-lock observability in StarRocks/starrocks#66027:

  • starrocks_fe_slow_lock_held_time_ms — how long locks were held when slow locks were detected (max across owners).
  • starrocks_fe_slow_lock_wait_time_ms — how long waiters waited before the lock was acquired.

Both are emitted as Prometheus summary (quantiles 0.75/0.95/0.98/0.99/0.999 plus _sum / _count), so map them with the same three-line pattern already used by other histogram metrics in this integration. Add corresponding metadata.csv entries.

Motivation

It's important to monitor the lock info when the cluster is pending for a lot of loadings and queries.

Review checklist

  • PR has a meaningful title or PR has the no-changelog label attached
  • Feature or bugfix has tests
  • Git history is clean
  • If PR impacts documentation, docs team has been notified or an issue has been opened on the documentation repo
  • If this PR includes a log pipeline, please add a description describing the remappers and processors.

Additional Notes

Anything else we should know when reviewing?

jaogoy added 2 commits May 28, 2026 18:58
…rics

StarRocks FE introduced two summary metrics for slow-lock observability
in StarRocks/starrocks#66027 (back-ported to 4.0/3.5):

- starrocks_fe_slow_lock_held_time_ms — how long locks were held when slow
  locks were detected (max across owners).
- starrocks_fe_slow_lock_wait_time_ms — how long waiters waited before the
  lock was acquired.

Both are emitted as Prometheus summary (quantiles 0.75/0.95/0.98/0.99/0.999
plus _sum / _count), so map them with the same three-line pattern already
used by other histogram metrics in this integration. Add corresponding
metadata.csv entries.

Bump version to 1.3.0 and update CHANGELOG + README.

Signed-off-by: Planck Li <jaogoy@gmail.com>
@jaogoy jaogoy requested review from a team as code owners May 29, 2026 02:18
@jaogoy jaogoy requested review from bgoldberg122 and removed request for a team May 29, 2026 02:18
@datadog-prod-us1-4
Copy link
Copy Markdown

datadog-prod-us1-4 Bot commented May 29, 2026

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 3 Pipeline jobs failed

PR | test / test-minimum-base-package (linux, ubuntu-22.04, celerdata, celerdata (py3.13), py3.13) / minimum-base-package-celerdata (py3.13)-py3.13   View in Datadog   GitHub Actions

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration. Failed to build `ddtrace==2.10.6`: ModuleNotFoundError: No module named 'pkg_resources' during installation process.

Validate repository | run / Validate   View in Datadog   GitHub Actions

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration. File `conf.yaml.example` is not in sync, run "ddev validate config celerdata -s".

PR | test / check   View in Datadog   GitHub Actions

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 4b456f1 | Docs | Datadog PR Page | Give us feedback!

…rics

StarRocks FE introduced two summary metrics for slow-lock observability
in StarRocks/starrocks#66027 (back-ported to 4.0/3.5):

- starrocks_fe_slow_lock_held_time_ms — how long locks were held when slow
  locks were detected (max across owners).
- starrocks_fe_slow_lock_wait_time_ms — how long waiters waited before the
  lock was acquired.

Both are emitted as Prometheus summary (quantiles 0.75/0.95/0.98/0.99/0.999
plus _sum / _count), so map them with the same three-line pattern already
used by other histogram metrics in this integration. Add corresponding
metadata.csv entries.

Bump version to 1.3.0 and update CHANGELOG + README.

Also regenerate config_models/{defaults,instance}.py via `ddev validate
models celerdata -s` (ddev 16.1.1) — the generated files had drifted from
their spec.yaml source since the previous sync in 1.2.0, which CI flags
as out-of-sync. These changes are auto-generated and not behavioral.

Signed-off-by: Planck Li <jaogoy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants