libpcp_web, pmproxy: support for optional labels#2591
Conversation
📝 WalkthroughSummary by CodeRabbit
WalkthroughRefactors label handling in libpcp_web to cache all labels while using only identifying labels for SHA1 identity hashing. Fixes key server detection in QA helpers, registers new test 1744 in the group list, and adds a comprehensive QA test script that validates pmseries label storage by configuring a temporary key server, loading an archive, and querying metrics. ChangesLabel caching and hashing refactoring
QA test 1744 and infrastructure updates
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
qa/openmetrics/samples/pmseries_label_test.txt (1)
1-9:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winChange
# Typeto# TYPEper the OpenMetrics/Prometheus exposition spec.The PMDA parser uses case-sensitive matching for the TYPE directive (line 798 of pmdaopenmetrics.python:
elif lp[1] == 'TYPE':). Using# Typeinstead of# TYPEcauses the parser to silently ignore the type metadata, leaving the metric without a declared type. This impacts proper type handling in the archive.Suggested fix
-# Type test_metric1 gauge +# TYPE test_metric1 gauge @@ -# Type test_metric2 gauge +# TYPE test_metric2 gauge🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@qa/openmetrics/samples/pmseries_label_test.txt` around lines 1 - 9, The sample exposition uses lowercase "# Type" which the PMDA parser's case-sensitive check (see the token handling that inspects lp[1] == 'TYPE' in pmdaopenmetrics.python) will ignore; update the sample lines in pmseries_label_test.txt to use uppercase "# TYPE" for each metric definition (e.g., change "# Type test_metric1 gauge" and "# Type test_metric2 gauge" to "# TYPE test_metric1 gauge" and "# TYPE test_metric2 gauge") so the parser correctly recognizes and applies metric type metadata.
🧹 Nitpick comments (4)
src/libpcp_web/src/util.c (2)
460-474: 💤 Low value
idlabelsis recomputed on every call, even whencontext->labelsis cached.Previously the
pmMergeLabelSets/sds construction was guarded bycontext->labels == NULL. Nowcontext_labels_str_identifying()runs unconditionally on every invocation ofpmwebapi_context_hash(). This is fine functionally (the SHA1 input is identical to the prior identifying-only JSON), but it does mean repeated label-set merges if this is ever called more than once per context. If callers may invoke this on a hot path, consider caching the identifying-label digest input alongsidecontext->labels, or hoisting the computation to be guarded by an identity-cache flag.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/libpcp_web/src/util.c` around lines 460 - 474, pmwebapi_context_hash currently always calls context_labels_str_identifying() causing repeated merges; change it to compute and cache the identifying-label string only when it is first needed (same condition used for caching context->labels) by adding a cached sds (e.g. context->idlabels or similar) or an identity-cache flag on the context struct, populate that cache inside the block where context->labels is set (or guarded by context->labels == NULL), and then pass the cached identifying string and length to pmwebapi_source_hash instead of recomputing each call; ensure the new cache is allocated with sdsnewlen and cleaned up where context->labels is freed.
487-513: 💤 Low valueDead
lenassignment on theidbuffallback path.
idbufis consumed via%sinsdscatfmt(null-terminated), so thelen = pmsprintf(idbuf, sizeof(idbuf), "null")reassignment is never read. Either drop the assignment or rename the second variable to keep the intent obvious thatlenonly tracks the cachedmetric->labelslength. Also note that the same pattern is duplicated inpmwebapi_instance_hash()(lines 548-550) — extracting a small helper that returns an "identifying labels" C-string would remove the duplication acrosscontext_hash,metric_hash, andinstance_hash.♻️ Sketch of minimal cleanup
- len = metric_labelsets(metric, idbuf, sizeof(idbuf), labels, NULL); - if (len <= 0) - len = pmsprintf(idbuf, sizeof(idbuf), "null"); + if (metric_labelsets(metric, idbuf, sizeof(idbuf), labels, NULL) <= 0) + (void)pmsprintf(idbuf, sizeof(idbuf), "null");🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/libpcp_web/src/util.c` around lines 487 - 513, The second pmsprintf assigns to the same local `len` but that value is never used (idbuf is consumed as a C-string), so either remove the redundant `len = pmsprintf(idbuf, ...)` or use a separate variable name to make the intent clear; better yet, extract a small helper (e.g. get_identifying_labels or similar) that encapsulates the logic around `metric_labelsets(...)` / fallback `pmsprintf("null")` and returns the C-string to use in sdscatfmt, then call that helper from context_hash/metric_hash/instance_hash (and from pmwebapi_instance_hash) to eliminate duplication and avoid clobbering `len`.qa/1744 (2)
27-161: 💤 Low valueInconsistent use of
$seq.fullvs$seq_full.The script mixes
$here/$seq.full(lines 27-35, 82, 143, 144, 147, 150) with the$seq_fullshell variable (lines 133, 161, 190-193). Most other PCP QA tests pick one form consistently. If$seq_fullends up unset (depending on which common helper sources it), thetee -a $seq_fullon line 161 will tee to a file literally named""and thecat >>$seq_fulllines will silently lose output. Pick one form to keep the test's debug log path predictable.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@qa/1744` around lines 27 - 161, The test mixes two log variables ($seq.full and $seq_full) causing silent loss of output; pick one name and replace all occurrences for consistency (I suggest using $seq_full since other tests use it) — update the echo/redirects that write to "$here/$seq.full" (and any uses of $seq.full) to use "$here/$seq_full" and change the tee and any concatenations that reference $seq_full if you instead choose $seq.full; ensure all references in the key-server shutdown block, pmlogger start/stop messages, archive listings, flush/verify logs, and the pmseries load tee (symbols: _cleanup, _prepare_pmda, _pmdaopenmetrics_install, pmloggerpid, flush_result, pmseries invocation) are updated so every write uses the same variable.
124-140: ⚡ Quick winRace: archive load may run before pmlogger has actually flushed.
pmsleep 3thenkill -TERMthenpmsleep 1is a fragile way to ensure the archive is complete.kill -TERMis asynchronous and pmlogger needs time to flush its final volume — on slow/loaded CI, this can yield an incomplete archive and the laterpmseries --loadwill then store fewer/no labels, causing intermittentERROR: No series found for test_metricNfailures rather than reproducible diffs. Consider$sudo $signal -s $pmloggerpid; wait $pmloggerpid(or usepmlogger -s 3for a fixed-sample run instead of background + SIGTERM).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@qa/1744` around lines 124 - 140, The test currently starts pmlogger in background (pmlogger, pmloggerpid) and uses pmsleep + kill -TERM + pmsleep which can cause a race where pmlogger hasn't flushed before pmseries --load; replace the async stop with a synchronous shutdown: after starting pmlogger (the pmlogger/pmloggerpid code) send the termination signal and then wait for pmlogger to exit (use wait on pmloggerpid) OR alternatively run pmlogger in fixed-sample mode (pmlogger -s 3) instead of background + SIGTERM so the archive is guaranteed complete before continuing; update the stop sequence that references kill -TERM, pmsleep, and pmloggerpid accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@qa/1744`:
- Around line 22-43: The cleanup function unconditionally removes
$PCP_ETC_DIR/pcp/labels/* which can delete real system files if _save_config
never ran; modify the test so that _save_config sets a guard flag (e.g.,
labels_saved=1) after it successfully completes (reference the _save_config call
and the _restore_config function) and change _cleanup to only perform the
destructive "$sudo rm -rf $PCP_ETC_DIR/pcp/labels/*" when that guard is set (or
alternatively move the rm into the same conditional block that runs the test
body), ensuring _restore_config is only relied on when a prior backup was made.
- Around line 170-187: The script sets status=1 when a series is missing (in the
checks around series1 and series2) but later unconditionally resets status=0,
masking failures; fix by preserving failure state instead of clearing it: remove
the unconditional status=0 reset (or introduce a separate boolean like
had_failure that you set to true on any "ERROR" and only set status=0 at the
very end if had_failure is false), ensure the checks in the pmseries blocks
(series1, series2) only set status=1/had_failure and never overwrite it with
success later so the exit value reflects any missing-series errors.
- Line 17: The test currently gates execution on presence of pmseries_import
(checked via the test invoking pmseries_import and calling _notrun), but the
test never calls pmseries_import; instead it uses pmseries to load archives, so
remove the pmseries_import existence check and either drop that _notrun gate
entirely or replace it with a check for pmseries (the binary actually used by
the test). Update the check that calls _notrun to reference pmseries (or remove
it) and keep the rest of the test logic unchanged.
- Around line 102-108: The FILTER line in the test config uses the label value
instead of the label name so pmdaopenmetrics (pmdaopenmetrics.python matching
logic) will never match; update the here-doc content that adds "FILTER: OPTIONAL
LABEL some_optional_label" to use the label name "optional_label" instead (i.e.,
replace some_optional_label with optional_label in the cat <<EOF block that
writes $tmp.tmp) and verify the optional-label feature is enabled (the test
comment "*** NOTE: optional labels are not implemented yet") before merging so
the test will exercise the behavior.
---
Outside diff comments:
In `@qa/openmetrics/samples/pmseries_label_test.txt`:
- Around line 1-9: The sample exposition uses lowercase "# Type" which the PMDA
parser's case-sensitive check (see the token handling that inspects lp[1] ==
'TYPE' in pmdaopenmetrics.python) will ignore; update the sample lines in
pmseries_label_test.txt to use uppercase "# TYPE" for each metric definition
(e.g., change "# Type test_metric1 gauge" and "# Type test_metric2 gauge" to "#
TYPE test_metric1 gauge" and "# TYPE test_metric2 gauge") so the parser
correctly recognizes and applies metric type metadata.
---
Nitpick comments:
In `@qa/1744`:
- Around line 27-161: The test mixes two log variables ($seq.full and $seq_full)
causing silent loss of output; pick one name and replace all occurrences for
consistency (I suggest using $seq_full since other tests use it) — update the
echo/redirects that write to "$here/$seq.full" (and any uses of $seq.full) to
use "$here/$seq_full" and change the tee and any concatenations that reference
$seq_full if you instead choose $seq.full; ensure all references in the
key-server shutdown block, pmlogger start/stop messages, archive listings,
flush/verify logs, and the pmseries load tee (symbols: _cleanup, _prepare_pmda,
_pmdaopenmetrics_install, pmloggerpid, flush_result, pmseries invocation) are
updated so every write uses the same variable.
- Around line 124-140: The test currently starts pmlogger in background
(pmlogger, pmloggerpid) and uses pmsleep + kill -TERM + pmsleep which can cause
a race where pmlogger hasn't flushed before pmseries --load; replace the async
stop with a synchronous shutdown: after starting pmlogger (the
pmlogger/pmloggerpid code) send the termination signal and then wait for
pmlogger to exit (use wait on pmloggerpid) OR alternatively run pmlogger in
fixed-sample mode (pmlogger -s 3) instead of background + SIGTERM so the archive
is guaranteed complete before continuing; update the stop sequence that
references kill -TERM, pmsleep, and pmloggerpid accordingly.
In `@src/libpcp_web/src/util.c`:
- Around line 460-474: pmwebapi_context_hash currently always calls
context_labels_str_identifying() causing repeated merges; change it to compute
and cache the identifying-label string only when it is first needed (same
condition used for caching context->labels) by adding a cached sds (e.g.
context->idlabels or similar) or an identity-cache flag on the context struct,
populate that cache inside the block where context->labels is set (or guarded by
context->labels == NULL), and then pass the cached identifying string and length
to pmwebapi_source_hash instead of recomputing each call; ensure the new cache
is allocated with sdsnewlen and cleaned up where context->labels is freed.
- Around line 487-513: The second pmsprintf assigns to the same local `len` but
that value is never used (idbuf is consumed as a C-string), so either remove the
redundant `len = pmsprintf(idbuf, ...)` or use a separate variable name to make
the intent clear; better yet, extract a small helper (e.g.
get_identifying_labels or similar) that encapsulates the logic around
`metric_labelsets(...)` / fallback `pmsprintf("null")` and returns the C-string
to use in sdscatfmt, then call that helper from
context_hash/metric_hash/instance_hash (and from pmwebapi_instance_hash) to
eliminate duplication and avoid clobbering `len`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: ff2ebaba-8485-46fa-a894-b5aff961bcb3
⛔ Files ignored due to path filters (1)
qa/1744.outis excluded by!**/*.out
📒 Files selected for processing (5)
qa/1744qa/common.checkqa/groupqa/openmetrics/samples/pmseries_label_test.txtsrc/libpcp_web/src/util.c
|
@lmchilton please go through all the coderabbit reported issues and resolve/comment on each, then I'll take a look (so I don't double up on anything already reported). Thanks! |
Added support for optional labels to be ingested into a key-value server by pmproxy. SHA1 hashes calculations are not affected. Added qa test 1744 to test optional label ingestion and display. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
10c68e2 to
f8ff5b1
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/libpcp_web/src/util.c (1)
494-496: 💤 Low valueConsider adding pmDebugOptions diagnostic for identifying labels.
When the identifying-label filter produces different output from the full labels cache, it may be useful for debugging to log this under
pmDebugOptions.series. This follows the pattern seen elsewhere in this file (e.g., lines 959-963 for instance hash).As per coding guidelines, C files should "Add permanent diagnostics using pmDebugOptions framework."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/libpcp_web/src/util.c` around lines 494 - 496, Add a pmDebugOptions.series diagnostic when the identifying-label filter output differs from the full labels cache: after calling metric_labelsets(metric, idbuf, sizeof(idbuf), labels, NULL) and when it returns <=0 causing the pmsprintf fallback, or when it returns a value but the returned id differs from the full-labels representation, log both representations using pmDebugOptions.series (or pmDebug) so debuggers can see the identifying-labels vs full-labels; locate the metric_labelsets/idbuf/pmsprintf call in util.c and add a conditional pmDebugOptions.series call that prints the idbuf and the full labels string (or error) to follow the existing pattern used for instance hash logging.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@qa/1744`:
- Around line 33-40: The cleanup routine omits restoring the pmcd auto-restart
policy after it was disabled by the call to _stop_auto_restart pmcd; update the
_cleanup function to call _restore_auto_restart pmcd (using the same service
name "pmcd") before exiting/returning so the service auto-restart state is
reverted, and ensure this new call is invoked in the trap path that references
_cleanup so later QA tests do not inherit the modified service state.
---
Nitpick comments:
In `@src/libpcp_web/src/util.c`:
- Around line 494-496: Add a pmDebugOptions.series diagnostic when the
identifying-label filter output differs from the full labels cache: after
calling metric_labelsets(metric, idbuf, sizeof(idbuf), labels, NULL) and when it
returns <=0 causing the pmsprintf fallback, or when it returns a value but the
returned id differs from the full-labels representation, log both
representations using pmDebugOptions.series (or pmDebug) so debuggers can see
the identifying-labels vs full-labels; locate the
metric_labelsets/idbuf/pmsprintf call in util.c and add a conditional
pmDebugOptions.series call that prints the idbuf and the full labels string (or
error) to follow the existing pattern used for instance hash logging.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Repository UI (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 630ac389-a953-4047-9069-3c02d6d24d98
⛔ Files ignored due to path filters (1)
qa/1744.outis excluded by!**/*.out
📒 Files selected for processing (4)
qa/1744qa/common.checkqa/groupsrc/libpcp_web/src/util.c
| _sighup_pmcd | ||
| _restore_config $PCP_SYSCONF_DIR/pmseries | ||
| $sudo rm -rf $tmp $tmp.* | ||
| } | ||
|
|
||
| trap "_cleanup; exit \$status" 0 1 2 3 15 | ||
| _stop_auto_restart pmcd | ||
|
|
There was a problem hiding this comment.
Restore pmcd auto-restart policy in cleanup.
Line 39 disables pmcd auto-restart via _stop_auto_restart pmcd, but _cleanup() never calls _restore_auto_restart pmcd. This can leak modified service state into later QA tests.
Suggested fix
_cleanup()
{
cd $here
[ -n "$pmloggerpid" ] && $sudo kill -TERM $pmloggerpid 2>/dev/null
@@
fi
_sighup_pmcd
+ _restore_auto_restart pmcd
_restore_config $PCP_SYSCONF_DIR/pmseries
$sudo rm -rf $tmp $tmp.*
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| _sighup_pmcd | |
| _restore_config $PCP_SYSCONF_DIR/pmseries | |
| $sudo rm -rf $tmp $tmp.* | |
| } | |
| trap "_cleanup; exit \$status" 0 1 2 3 15 | |
| _stop_auto_restart pmcd | |
| _sighup_pmcd | |
| _restore_auto_restart pmcd | |
| _restore_config $PCP_SYSCONF_DIR/pmseries | |
| $sudo rm -rf $tmp $tmp.* | |
| } | |
| trap "_cleanup; exit \$status" 0 1 2 3 15 | |
| _stop_auto_restart pmcd |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@qa/1744` around lines 33 - 40, The cleanup routine omits restoring the pmcd
auto-restart policy after it was disabled by the call to _stop_auto_restart
pmcd; update the _cleanup function to call _restore_auto_restart pmcd (using the
same service name "pmcd") before exiting/returning so the service auto-restart
state is reverted, and ensure this new call is invoked in the trap path that
references _cleanup so later QA tests do not inherit the modified service state.
Added support for optional labels to be ingested into a key-value server by pmproxy. SHA1 hash calculations are not affected.
libpcp_web code additions include creating a new buffer to separate hash calculation from display calculation and creating a new context hash function to separate context hash w/ identifying labels and context hash w/ identifying + optional labels.
Added qa test 1744 to test optional label ingestion and display.