Skip to content

libpcp_web, pmproxy: support for optional labels#2591

Open
lmchilton wants to merge 1 commit into
performancecopilot:mainfrom
lmchilton:pmproxy-optional-labels
Open

libpcp_web, pmproxy: support for optional labels#2591
lmchilton wants to merge 1 commit into
performancecopilot:mainfrom
lmchilton:pmproxy-optional-labels

Conversation

@lmchilton
Copy link
Copy Markdown
Contributor

@lmchilton lmchilton commented May 13, 2026

Added support for optional labels to be ingested into a key-value server by pmproxy. SHA1 hash calculations are not affected.

libpcp_web code additions include creating a new buffer to separate hash calculation from display calculation and creating a new context hash function to separate context hash w/ identifying labels and context hash w/ identifying + optional labels.

Added qa test 1744 to test optional label ingestion and display.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Enhanced label handling in web API with support for optional labels across contexts, metrics, instances, and domains. Improved identity construction for better consistency.
  • Tests

    • Added comprehensive test for pmseries label storage behavior.
    • Improved key server detection reliability in test infrastructure.

Walkthrough

Refactors label handling in libpcp_web to cache all labels while using only identifying labels for SHA1 identity hashing. Fixes key server detection in QA helpers, registers new test 1744 in the group list, and adds a comprehensive QA test script that validates pmseries label storage by configuring a temporary key server, loading an archive, and querying metrics.

Changes

Label caching and hashing refactoring

Layer / File(s) Summary
Label filter and context label caching
src/libpcp_web/src/util.c
Adds all_labels() callback to include optional labels, updates context_labels_str() to cache full labels via all_labels() while context_labels_str_identifying() continues using identifying-only labels.
Hash functions using identifying labels for SHA1
src/libpcp_web/src/util.c
pmwebapi_context_hash(), pmwebapi_metric_hash(), and pmwebapi_instance_hash() now cache full labels but compute identifying-only label JSON separately and use that for SHA1 hash inputs. pmwebapi_add_indom_labels() caches indom labels via all_labels().

QA test 1744 and infrastructure updates

Layer / File(s) Summary
Key server detection fix
qa/common.check, qa/group
_check_key_server_version() now uses _get_pids_by_name command substitution to reliably detect both valkey-server and redis-server processes. Test 1744 added to group associations.
QA test 1744 implementation
qa/1744
New test script verifies pmseries label storage by configuring a temporary key server, loading a test archive, querying nvidia.gpuactive, and validating label normalization via filter helpers and output validation.

🐰 A test for labels, so grand,
With keys that serve and filters stand,
The pmseries dances in the sand,
All metrics marked with identity's hand,
Where optional whispers help us understand! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'libpcp_web, pmproxy: support for optional labels' directly summarizes the main change: adding support for optional labels across multiple components.
Description check ✅ Passed The description explains the purpose (optional label support in pmproxy), key implementation details (separate buffers for hashing vs display, context hash function), and testing additions (QA test 1744).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
qa/openmetrics/samples/pmseries_label_test.txt (1)

1-9: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Change # Type to # TYPE per the OpenMetrics/Prometheus exposition spec.

The PMDA parser uses case-sensitive matching for the TYPE directive (line 798 of pmdaopenmetrics.python: elif lp[1] == 'TYPE':). Using # Type instead of # TYPE causes the parser to silently ignore the type metadata, leaving the metric without a declared type. This impacts proper type handling in the archive.

Suggested fix
-# Type test_metric1 gauge
+# TYPE test_metric1 gauge
@@
-# Type test_metric2 gauge
+# TYPE test_metric2 gauge
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@qa/openmetrics/samples/pmseries_label_test.txt` around lines 1 - 9, The
sample exposition uses lowercase "# Type" which the PMDA parser's case-sensitive
check (see the token handling that inspects lp[1] == 'TYPE' in
pmdaopenmetrics.python) will ignore; update the sample lines in
pmseries_label_test.txt to use uppercase "# TYPE" for each metric definition
(e.g., change "# Type test_metric1 gauge" and "# Type test_metric2 gauge" to "#
TYPE test_metric1 gauge" and "# TYPE test_metric2 gauge") so the parser
correctly recognizes and applies metric type metadata.
🧹 Nitpick comments (4)
src/libpcp_web/src/util.c (2)

460-474: 💤 Low value

idlabels is recomputed on every call, even when context->labels is cached.

Previously the pmMergeLabelSets/sds construction was guarded by context->labels == NULL. Now context_labels_str_identifying() runs unconditionally on every invocation of pmwebapi_context_hash(). This is fine functionally (the SHA1 input is identical to the prior identifying-only JSON), but it does mean repeated label-set merges if this is ever called more than once per context. If callers may invoke this on a hot path, consider caching the identifying-label digest input alongside context->labels, or hoisting the computation to be guarded by an identity-cache flag.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/libpcp_web/src/util.c` around lines 460 - 474, pmwebapi_context_hash
currently always calls context_labels_str_identifying() causing repeated merges;
change it to compute and cache the identifying-label string only when it is
first needed (same condition used for caching context->labels) by adding a
cached sds (e.g. context->idlabels or similar) or an identity-cache flag on the
context struct, populate that cache inside the block where context->labels is
set (or guarded by context->labels == NULL), and then pass the cached
identifying string and length to pmwebapi_source_hash instead of recomputing
each call; ensure the new cache is allocated with sdsnewlen and cleaned up where
context->labels is freed.

487-513: 💤 Low value

Dead len assignment on the idbuf fallback path.

idbuf is consumed via %s in sdscatfmt (null-terminated), so the len = pmsprintf(idbuf, sizeof(idbuf), "null") reassignment is never read. Either drop the assignment or rename the second variable to keep the intent obvious that len only tracks the cached metric->labels length. Also note that the same pattern is duplicated in pmwebapi_instance_hash() (lines 548-550) — extracting a small helper that returns an "identifying labels" C-string would remove the duplication across context_hash, metric_hash, and instance_hash.

♻️ Sketch of minimal cleanup
-    len = metric_labelsets(metric, idbuf, sizeof(idbuf), labels, NULL);
-	if (len <= 0)
-	    len = pmsprintf(idbuf, sizeof(idbuf), "null");
+    if (metric_labelsets(metric, idbuf, sizeof(idbuf), labels, NULL) <= 0)
+	(void)pmsprintf(idbuf, sizeof(idbuf), "null");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/libpcp_web/src/util.c` around lines 487 - 513, The second pmsprintf
assigns to the same local `len` but that value is never used (idbuf is consumed
as a C-string), so either remove the redundant `len = pmsprintf(idbuf, ...)` or
use a separate variable name to make the intent clear; better yet, extract a
small helper (e.g. get_identifying_labels or similar) that encapsulates the
logic around `metric_labelsets(...)` / fallback `pmsprintf("null")` and returns
the C-string to use in sdscatfmt, then call that helper from
context_hash/metric_hash/instance_hash (and from pmwebapi_instance_hash) to
eliminate duplication and avoid clobbering `len`.
qa/1744 (2)

27-161: 💤 Low value

Inconsistent use of $seq.full vs $seq_full.

The script mixes $here/$seq.full (lines 27-35, 82, 143, 144, 147, 150) with the $seq_full shell variable (lines 133, 161, 190-193). Most other PCP QA tests pick one form consistently. If $seq_full ends up unset (depending on which common helper sources it), the tee -a $seq_full on line 161 will tee to a file literally named "" and the cat >>$seq_full lines will silently lose output. Pick one form to keep the test's debug log path predictable.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@qa/1744` around lines 27 - 161, The test mixes two log variables ($seq.full
and $seq_full) causing silent loss of output; pick one name and replace all
occurrences for consistency (I suggest using $seq_full since other tests use it)
— update the echo/redirects that write to "$here/$seq.full" (and any uses of
$seq.full) to use "$here/$seq_full" and change the tee and any concatenations
that reference $seq_full if you instead choose $seq.full; ensure all references
in the key-server shutdown block, pmlogger start/stop messages, archive
listings, flush/verify logs, and the pmseries load tee (symbols: _cleanup,
_prepare_pmda, _pmdaopenmetrics_install, pmloggerpid, flush_result, pmseries
invocation) are updated so every write uses the same variable.

124-140: ⚡ Quick win

Race: archive load may run before pmlogger has actually flushed.

pmsleep 3 then kill -TERM then pmsleep 1 is a fragile way to ensure the archive is complete. kill -TERM is asynchronous and pmlogger needs time to flush its final volume — on slow/loaded CI, this can yield an incomplete archive and the later pmseries --load will then store fewer/no labels, causing intermittent ERROR: No series found for test_metricN failures rather than reproducible diffs. Consider $sudo $signal -s $pmloggerpid; wait $pmloggerpid (or use pmlogger -s 3 for a fixed-sample run instead of background + SIGTERM).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@qa/1744` around lines 124 - 140, The test currently starts pmlogger in
background (pmlogger, pmloggerpid) and uses pmsleep + kill -TERM + pmsleep which
can cause a race where pmlogger hasn't flushed before pmseries --load; replace
the async stop with a synchronous shutdown: after starting pmlogger (the
pmlogger/pmloggerpid code) send the termination signal and then wait for
pmlogger to exit (use wait on pmloggerpid) OR alternatively run pmlogger in
fixed-sample mode (pmlogger -s 3) instead of background + SIGTERM so the archive
is guaranteed complete before continuing; update the stop sequence that
references kill -TERM, pmsleep, and pmloggerpid accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@qa/1744`:
- Around line 22-43: The cleanup function unconditionally removes
$PCP_ETC_DIR/pcp/labels/* which can delete real system files if _save_config
never ran; modify the test so that _save_config sets a guard flag (e.g.,
labels_saved=1) after it successfully completes (reference the _save_config call
and the _restore_config function) and change _cleanup to only perform the
destructive "$sudo rm -rf $PCP_ETC_DIR/pcp/labels/*" when that guard is set (or
alternatively move the rm into the same conditional block that runs the test
body), ensuring _restore_config is only relied on when a prior backup was made.
- Around line 170-187: The script sets status=1 when a series is missing (in the
checks around series1 and series2) but later unconditionally resets status=0,
masking failures; fix by preserving failure state instead of clearing it: remove
the unconditional status=0 reset (or introduce a separate boolean like
had_failure that you set to true on any "ERROR" and only set status=0 at the
very end if had_failure is false), ensure the checks in the pmseries blocks
(series1, series2) only set status=1/had_failure and never overwrite it with
success later so the exit value reflects any missing-series errors.
- Line 17: The test currently gates execution on presence of pmseries_import
(checked via the test invoking pmseries_import and calling _notrun), but the
test never calls pmseries_import; instead it uses pmseries to load archives, so
remove the pmseries_import existence check and either drop that _notrun gate
entirely or replace it with a check for pmseries (the binary actually used by
the test). Update the check that calls _notrun to reference pmseries (or remove
it) and keep the rest of the test logic unchanged.
- Around line 102-108: The FILTER line in the test config uses the label value
instead of the label name so pmdaopenmetrics (pmdaopenmetrics.python matching
logic) will never match; update the here-doc content that adds "FILTER: OPTIONAL
LABEL some_optional_label" to use the label name "optional_label" instead (i.e.,
replace some_optional_label with optional_label in the cat <<EOF block that
writes $tmp.tmp) and verify the optional-label feature is enabled (the test
comment "*** NOTE: optional labels are not implemented yet") before merging so
the test will exercise the behavior.

---

Outside diff comments:
In `@qa/openmetrics/samples/pmseries_label_test.txt`:
- Around line 1-9: The sample exposition uses lowercase "# Type" which the PMDA
parser's case-sensitive check (see the token handling that inspects lp[1] ==
'TYPE' in pmdaopenmetrics.python) will ignore; update the sample lines in
pmseries_label_test.txt to use uppercase "# TYPE" for each metric definition
(e.g., change "# Type test_metric1 gauge" and "# Type test_metric2 gauge" to "#
TYPE test_metric1 gauge" and "# TYPE test_metric2 gauge") so the parser
correctly recognizes and applies metric type metadata.

---

Nitpick comments:
In `@qa/1744`:
- Around line 27-161: The test mixes two log variables ($seq.full and $seq_full)
causing silent loss of output; pick one name and replace all occurrences for
consistency (I suggest using $seq_full since other tests use it) — update the
echo/redirects that write to "$here/$seq.full" (and any uses of $seq.full) to
use "$here/$seq_full" and change the tee and any concatenations that reference
$seq_full if you instead choose $seq.full; ensure all references in the
key-server shutdown block, pmlogger start/stop messages, archive listings,
flush/verify logs, and the pmseries load tee (symbols: _cleanup, _prepare_pmda,
_pmdaopenmetrics_install, pmloggerpid, flush_result, pmseries invocation) are
updated so every write uses the same variable.
- Around line 124-140: The test currently starts pmlogger in background
(pmlogger, pmloggerpid) and uses pmsleep + kill -TERM + pmsleep which can cause
a race where pmlogger hasn't flushed before pmseries --load; replace the async
stop with a synchronous shutdown: after starting pmlogger (the
pmlogger/pmloggerpid code) send the termination signal and then wait for
pmlogger to exit (use wait on pmloggerpid) OR alternatively run pmlogger in
fixed-sample mode (pmlogger -s 3) instead of background + SIGTERM so the archive
is guaranteed complete before continuing; update the stop sequence that
references kill -TERM, pmsleep, and pmloggerpid accordingly.

In `@src/libpcp_web/src/util.c`:
- Around line 460-474: pmwebapi_context_hash currently always calls
context_labels_str_identifying() causing repeated merges; change it to compute
and cache the identifying-label string only when it is first needed (same
condition used for caching context->labels) by adding a cached sds (e.g.
context->idlabels or similar) or an identity-cache flag on the context struct,
populate that cache inside the block where context->labels is set (or guarded by
context->labels == NULL), and then pass the cached identifying string and length
to pmwebapi_source_hash instead of recomputing each call; ensure the new cache
is allocated with sdsnewlen and cleaned up where context->labels is freed.
- Around line 487-513: The second pmsprintf assigns to the same local `len` but
that value is never used (idbuf is consumed as a C-string), so either remove the
redundant `len = pmsprintf(idbuf, ...)` or use a separate variable name to make
the intent clear; better yet, extract a small helper (e.g.
get_identifying_labels or similar) that encapsulates the logic around
`metric_labelsets(...)` / fallback `pmsprintf("null")` and returns the C-string
to use in sdscatfmt, then call that helper from
context_hash/metric_hash/instance_hash (and from pmwebapi_instance_hash) to
eliminate duplication and avoid clobbering `len`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ff2ebaba-8485-46fa-a894-b5aff961bcb3

📥 Commits

Reviewing files that changed from the base of the PR and between 6aa5eb1 and 10c68e2.

⛔ Files ignored due to path filters (1)
  • qa/1744.out is excluded by !**/*.out
📒 Files selected for processing (5)
  • qa/1744
  • qa/common.check
  • qa/group
  • qa/openmetrics/samples/pmseries_label_test.txt
  • src/libpcp_web/src/util.c

Comment thread qa/1744 Outdated
Comment thread qa/1744
Comment thread qa/1744 Outdated
Comment thread qa/1744 Outdated
@lmchilton lmchilton requested review from natoscott and wcohen May 14, 2026 14:52
@natoscott
Copy link
Copy Markdown
Member

@lmchilton please go through all the coderabbit reported issues and resolve/comment on each, then I'll take a look (so I don't double up on anything already reported). Thanks!

Added support for optional labels to be ingested into a key-value
server by pmproxy. SHA1 hashes calculations are not affected.

Added qa test 1744 to test optional label ingestion and display.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@lmchilton lmchilton force-pushed the pmproxy-optional-labels branch from 10c68e2 to f8ff5b1 Compare May 20, 2026 21:03
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/libpcp_web/src/util.c (1)

494-496: 💤 Low value

Consider adding pmDebugOptions diagnostic for identifying labels.

When the identifying-label filter produces different output from the full labels cache, it may be useful for debugging to log this under pmDebugOptions.series. This follows the pattern seen elsewhere in this file (e.g., lines 959-963 for instance hash).

As per coding guidelines, C files should "Add permanent diagnostics using pmDebugOptions framework."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/libpcp_web/src/util.c` around lines 494 - 496, Add a
pmDebugOptions.series diagnostic when the identifying-label filter output
differs from the full labels cache: after calling metric_labelsets(metric,
idbuf, sizeof(idbuf), labels, NULL) and when it returns <=0 causing the
pmsprintf fallback, or when it returns a value but the returned id differs from
the full-labels representation, log both representations using
pmDebugOptions.series (or pmDebug) so debuggers can see the identifying-labels
vs full-labels; locate the metric_labelsets/idbuf/pmsprintf call in util.c and
add a conditional pmDebugOptions.series call that prints the idbuf and the full
labels string (or error) to follow the existing pattern used for instance hash
logging.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@qa/1744`:
- Around line 33-40: The cleanup routine omits restoring the pmcd auto-restart
policy after it was disabled by the call to _stop_auto_restart pmcd; update the
_cleanup function to call _restore_auto_restart pmcd (using the same service
name "pmcd") before exiting/returning so the service auto-restart state is
reverted, and ensure this new call is invoked in the trap path that references
_cleanup so later QA tests do not inherit the modified service state.

---

Nitpick comments:
In `@src/libpcp_web/src/util.c`:
- Around line 494-496: Add a pmDebugOptions.series diagnostic when the
identifying-label filter output differs from the full labels cache: after
calling metric_labelsets(metric, idbuf, sizeof(idbuf), labels, NULL) and when it
returns <=0 causing the pmsprintf fallback, or when it returns a value but the
returned id differs from the full-labels representation, log both
representations using pmDebugOptions.series (or pmDebug) so debuggers can see
the identifying-labels vs full-labels; locate the
metric_labelsets/idbuf/pmsprintf call in util.c and add a conditional
pmDebugOptions.series call that prints the idbuf and the full labels string (or
error) to follow the existing pattern used for instance hash logging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 630ac389-a953-4047-9069-3c02d6d24d98

📥 Commits

Reviewing files that changed from the base of the PR and between 10c68e2 and f8ff5b1.

⛔ Files ignored due to path filters (1)
  • qa/1744.out is excluded by !**/*.out
📒 Files selected for processing (4)
  • qa/1744
  • qa/common.check
  • qa/group
  • src/libpcp_web/src/util.c

Comment thread qa/1744
Comment on lines +33 to +40
_sighup_pmcd
_restore_config $PCP_SYSCONF_DIR/pmseries
$sudo rm -rf $tmp $tmp.*
}

trap "_cleanup; exit \$status" 0 1 2 3 15
_stop_auto_restart pmcd

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restore pmcd auto-restart policy in cleanup.

Line 39 disables pmcd auto-restart via _stop_auto_restart pmcd, but _cleanup() never calls _restore_auto_restart pmcd. This can leak modified service state into later QA tests.

Suggested fix
 _cleanup()
 {
     cd $here
     [ -n "$pmloggerpid" ] && $sudo kill -TERM $pmloggerpid 2>/dev/null
@@
     fi
     _sighup_pmcd
+    _restore_auto_restart pmcd
     _restore_config $PCP_SYSCONF_DIR/pmseries
     $sudo rm -rf $tmp $tmp.*
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_sighup_pmcd
_restore_config $PCP_SYSCONF_DIR/pmseries
$sudo rm -rf $tmp $tmp.*
}
trap "_cleanup; exit \$status" 0 1 2 3 15
_stop_auto_restart pmcd
_sighup_pmcd
_restore_auto_restart pmcd
_restore_config $PCP_SYSCONF_DIR/pmseries
$sudo rm -rf $tmp $tmp.*
}
trap "_cleanup; exit \$status" 0 1 2 3 15
_stop_auto_restart pmcd
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@qa/1744` around lines 33 - 40, The cleanup routine omits restoring the pmcd
auto-restart policy after it was disabled by the call to _stop_auto_restart
pmcd; update the _cleanup function to call _restore_auto_restart pmcd (using the
same service name "pmcd") before exiting/returning so the service auto-restart
state is reverted, and ensure this new call is invoked in the trap path that
references _cleanup so later QA tests do not inherit the modified service state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants