Skip to content

Conversation

@keivenchang
Copy link
Contributor

@keivenchang keivenchang commented Dec 19, 2025

Overview:

Document LMCache + vLLM (0.12.0) metrics/log troubleshooting.

Details:

  • Add troubleshooting to docs/backends/vllm/LMCache_Integration.md:
    • PrometheusLogger ... different metadata (log-only; 0.12.0 repro, not seen on 0.11.0)
    • PROMETHEUS_MULTIPROC_DIR warning guidance

Where should the reviewer start?

docs/backends/vllm/LMCache_Integration.md

Related Issues

DIS-1172

/coderabbit profile chill

Summary by CodeRabbit

  • Documentation
    • Added troubleshooting guidance to LMCache and Prometheus documentation addressing common setup issues. New sections cover PrometheusLogger instance creation warnings with version compatibility notes, environment variable configuration problems when running under Dynamo or containers, concrete reproduction steps, environmental guidance, and detailed remediation commands to help users quickly resolve observability setup challenges.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 19, 2025

Walkthrough

Two vLLM documentation files were updated with troubleshooting sections. LMCache Integration guide now details PrometheusLogger and PROMETHEUS_MULTIPROC_DIR issues with mitigations. Prometheus metrics documentation references the new troubleshooting guidance. No functional code changes introduced.

Changes

Cohort / File(s) Summary
Documentation: LMCache Troubleshooting Guidance
docs/backends/vllm/LMCache_Integration.md, docs/backends/vllm/prometheus.md
Added Troubleshooting sections documenting LMCache-related log issues (PrometheusLogger metadata conflicts and PROMETHEUS_MULTIPROC_DIR environment variable warnings), including reproduction steps, impact notes, and mitigation commands. Cross-referenced between guides.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

  • Pure documentation additions with no code logic changes
  • Straightforward informational content (troubleshooting guidance, environment variable notes)
  • Consistent formatting across both files
  • No functional or structural concerns to evaluate

Poem

🐰✨ Docs grew whiskers, guidance bloomed,
LMCache troubles clearly groomed!
With mitigations, commands, and care,
Users won't pull out their hair! 📚🔧

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main change: documenting LMCache Prometheus metrics troubleshooting for vLLM 0.12.0.
Description check ✅ Passed The description includes all required template sections: Overview, Details, Where should the reviewer start, and Related Issues. All sections are complete with specific information.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1247fe3 and f45d3e9.

📒 Files selected for processing (2)
  • docs/backends/vllm/LMCache_Integration.md (1 hunks)
  • docs/backends/vllm/prometheus.md (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3035
File: lib/runtime/src/metrics/prometheus_names.rs:49-53
Timestamp: 2025-09-16T00:26:37.092Z
Learning: keivenchang prefers consistency in metric naming standardization over strict adherence to Prometheus conventions about gauge vs counter suffixes. When standardizing metrics naming, prioritize consistency across the codebase rather than technical pedantry about individual metric type conventions.
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3051
File: container/templates/Dockerfile.trtllm.j2:424-437
Timestamp: 2025-09-16T17:16:03.785Z
Learning: keivenchang prioritizes maintaining exact backward compatibility during migration/refactoring PRs, even when bugs are identified in the original code. Fixes should be deferred to separate PRs after the migration is complete.
🪛 markdownlint-cli2 (0.18.1)
docs/backends/vllm/LMCache_Integration.md

169-169: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (3)
docs/backends/vllm/prometheus.md (1)

132-137: Excellent cross-referencing and section structure.

The new Troubleshooting subsection appropriately directs users to the detailed LMCache Integration Guide while maintaining a concise overview in the Prometheus metrics documentation. The placement and content are well-organized.

docs/backends/vllm/LMCache_Integration.md (2)

163-193: Excellent documentation of the PrometheusLogger singleton issue.

The PrometheusLogger section is well-structured and provides essential context: version-specific behavior (v0.12.0), clear cause explanation, impact assessment (log-only, non-blocking), concrete reproduction steps, and practical mitigation. The note distinguishing this issue's vLLM version scope is particularly helpful.


194-200: Clear guidance for PROMETHEUS_MULTIPROC_DIR in Dynamo context.

The PROMETHEUS_MULTIPROC_DIR section appropriately distinguishes between user-managed and Dynamo-managed scenarios, with clear remediation steps. Directing users to check their shell/container environment is practical and actionable.

@keivenchang keivenchang force-pushed the keivenchang/DIS-1172__document-vLLM-v0.12.0-Prometheus-error branch from f45d3e9 to 001ef15 Compare December 19, 2025 02:02
@keivenchang keivenchang merged commit 4d0b1a1 into main Dec 19, 2025
26 checks passed
@keivenchang keivenchang deleted the keivenchang/DIS-1172__document-vLLM-v0.12.0-Prometheus-error branch December 19, 2025 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants