fix(ci): add Dependabot for docker-compose, pin Doltgres, disable OTEL in Cypress#2240
fix(ci): add Dependabot for docker-compose, pin Doltgres, disable OTEL in Cypress#2240amikofalvy wants to merge 1 commit intomainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
There was a problem hiding this comment.
PR Review Summary
(2) Total Issues | Risk: Low
🟠⚠️ Major (1) 🟠⚠️
🟠 1) .github/workflows/ci.yml create-agents-e2e job may hit same OTEL interference
Issue: The create-agents-e2e job in ci.yml does NOT set ENVIRONMENT: test (unlike the main ci job on line 97), nor does it set OTEL_SDK_DISABLED: true. Since this job builds and runs agents-manage-ui via Turbo, the OTEL initialization could still interfere with Playwright-based tests, causing similar timeout issues to what was fixed in Cypress.
Why: This creates an inconsistency where Cypress tests are protected from OTEL interference but the create-agents-e2e job is not. The job runs Playwright tests which may also suffer from HTTP module monkey-patching if OTEL loads.
Fix: Add ENVIRONMENT: test and/or OTEL_SDK_DISABLED: true to the create-agents-e2e job's env vars. The ENVIRONMENT check in instrumentation.ts will then prevent OTEL from loading.
# In .github/workflows/ci.yml, add to the create-agents-e2e job's env:
ENVIRONMENT: test
OTEL_SDK_DISABLED: trueRefs:
🟡 Minor (1) 🟡
Inline Comments:
- 🟡 Minor:
.github/workflows/cypress.yml:16Missing explicit permissions block
💭 Consider (2) 💭
💭 1) agents-api/src/instrumentation.ts OTEL guard pattern not applied to agents-api
Issue: The PR adds an ENVIRONMENT !== 'test' guard to agents-manage-ui but agents-api/src/instrumentation.ts unconditionally initializes OTEL SDK.
Why: For consistency, sibling instrumentation files should follow the same pattern. The PR description already notes this as a follow-up.
Refs: agents-api/src/index.ts:4
💭 2) .github/dependabot.yml Dependabot only tracks standard compose files
Issue: Dependabot's docker-compose ecosystem only tracks files named docker-compose.yml/compose.yml. The repo also has docker-compose.dbs.yml which contains Docker images that won't be tracked.
Why: Those images (doltgres, postgres, spicedb) will need manual updates.
Refs: docker-compose.dbs.yml
💡 APPROVE WITH SUGGESTIONS
Summary: This is a solid fix for the Cypress CI failures! The root cause analysis is thorough and the multi-layered approach (code guard + env var) provides good defense-in-depth. The one actionable item is ensuring the create-agents-e2e job in ci.yml also gets the same OTEL protection to prevent similar failures there. The other items are minor consistency improvements that can be addressed in follow-up work. 🎉
Discarded (4)
| Location | Issue | Reason Discarded |
|---|---|---|
agents-manage-ui/src/otel.ts:25 |
BatchSpanProcessor retry loop when no collector exists | Addressed by this PR — the guard in instrumentation.ts prevents otel.ts from loading entirely in test environments |
agents-manage-ui/src/instrumentation.ts:6 |
Dual mechanism (OTEL_SDK_DISABLED + ENVIRONMENT check) | This is intentional defense-in-depth; the code guard prevents module load while the env var is a backup |
docker-compose.dbs.yml |
Doltgres version pinning inconsistency across compose files | The Dependabot addition will help converge versions over time; informational only |
agents-manage-ui/src/instrumentation.ts |
OTEL guard pattern is sound | Positive feedback, not an issue |
Reviewers (3)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-devops |
5 | 1 | 1 | 0 | 1 | 0 | 2 |
pr-review-sre |
4 | 0 | 0 | 0 | 0 | 0 | 4 |
pr-review-consistency |
2 | 0 | 1 | 0 | 0 | 0 | 1 |
| Total | 11 | 1 | 2 | 0 | 1 | 0 | 7 |
Note: SRE findings largely overlapped with DevOps findings and were merged/deduplicated.
b75fd12 to
b4c8e57
Compare
b4c8e57 to
5085d21
Compare
- Add Dependabot config to track Docker image versions in docker-compose files (root and create-agents-template) - Pin Doltgres to 0.54.10 in cypress.yml to match docker-compose.yml - Skip OTEL instrumentation when ENVIRONMENT=test in instrumentation.ts to prevent getNodeAutoInstrumentations() from breaking Next.js SSR - Add OTEL_SDK_DISABLED=true env var in Cypress workflow (belt-and-suspenders) - Add 10-min timeout to Cypress test step so artifact uploads can run before the 30-min job timeout - Always upload Cypress screenshots/videos (if: always() + if-no-files-found: ignore) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5085d21 to
56c2343
Compare
There was a problem hiding this comment.
PR Review Summary
(3) Total Issues | Risk: Low
🟡 Minor (1) 🟡
Inline Comments:
- 🟡 Minor:
.github/composite-actions/cypress-e2e/action.yml:118Hardcoded timeout value without documentation
💭 Consider (3) 💭
💭 1) .github/dependabot.yml Dependabot docker-compose coverage has limitations
Issue: Dependabot's docker-compose ecosystem only scans standard filenames (docker-compose.yml, compose.yml). Files like docker-compose.dbs.yml and create-agents-template/docker-compose.db.yml will NOT receive automated update PRs.
Why: The Doltgres, Postgres, and SpiceDB images in these files will require manual version tracking.
Fix: Consider adding a comment in dependabot.yml documenting which files are tracked vs. not tracked.
Refs: Dependabot docker-compose docs
💭 2) docker-compose.*.yml Doltgres version drift across compose files
Issue: The PR pins Doltgres to 0.54.10 in cypress.yml and docker-compose.yml, but docker-compose.dbs.yml, create-agents-template/*.yml, and ci.yml:156 still use :latest.
Why: This creates a split-world where CI and production use pinned versions while local dev uses floating tags.
Fix: Either pin all references to 0.54.10 or document the intentional asymmetry (CI pinned, local dev uses latest).
Refs: docker-compose.dbs.yml:3 · ci.yml:156
💭 3) .github/dependabot.yml Consider adding github-actions ecosystem
Issue: The new Dependabot config tracks docker-compose but not github-actions.
Why: Adding github-actions tracking would automate updates for action versions, improving supply chain security.
Fix: Consider adding:
- package-ecosystem: 'github-actions'
directory: '/'
schedule:
interval: 'weekly'🕐 Pending Recommendations (2)
- 🟠
.github/workflows/ci.yml:276-286create-agents-e2ejob lacksENVIRONMENT: testorOTEL_SDK_DISABLED: true, may hit same OTEL interference - 🟡
.github/workflows/cypress.yml:16Missing explicitpermissions:block (consistency with other workflows)
💡 APPROVE WITH SUGGESTIONS
Summary: The core Cypress fix is solid — the OTEL guard and belt-and-suspenders approach effectively address the root cause. The prior review finding about ci.yml's create-agents-e2e job lacking OTEL protection remains unaddressed. This job also runs agents-manage-ui E2E tests and could experience the same OTEL interference. Recommend adding ENVIRONMENT: test and OTEL_SDK_DISABLED: true to that job's env vars. The other items are minor documentation suggestions that can be addressed in follow-up work. 🎉
Discarded (6)
| Location | Issue | Reason Discarded |
|---|---|---|
.github/composite-actions/cypress-e2e/action.yml:121-136 |
Artifact upload reliability improvements | Positive feedback — no issue |
agents-manage-ui/src/instrumentation.ts:6 |
OTEL guard pattern is defense-in-depth | Positive feedback — no issue |
agents-api/src/instrumentation.ts |
OTEL pattern divergence between packages | Intentional per PR description; agents-api relies on OTEL_SDK_DISABLED env var |
.github/composite-actions/cypress-e2e/action.yml:55,66 |
ENVIRONMENT split (development for setup, test for execution) | Working as expected; the guard activates when tests run |
.github/workflows/cypress.yml:53-84 |
Mutable action tags (@v4 vs SHA pins) | Pre-existing pattern across all workflows; not introduced by this PR |
agents-manage-ui/src/otel.ts:25 |
OTLP exporter instantiated at module load | Addressed by this PR's guard in instrumentation.ts which prevents module load |
Reviewers (3)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-devops |
6 | 0 | 3 | 0 | 1 | 0 | 2 |
pr-review-consistency |
4 | 0 | 0 | 0 | 0 | 2 | 2 |
pr-review-sre |
4 | 0 | 0 | 0 | 0 | 0 | 4 |
| Total | 14 | 0 | 3 | 0 | 1 | 2 | 8 |
Note: Many findings overlapped across reviewers (Doltgres versioning, Dependabot coverage) and were deduplicated. Prior review findings re-surfaced as Pending Recommendations.
| - name: Run Cypress E2E Tests | ||
| shell: bash | ||
| run: pnpm --filter @inkeep/agents-manage-ui test:e2e:ci | ||
| run: timeout 600 pnpm --filter @inkeep/agents-manage-ui test:e2e:ci |
There was a problem hiding this comment.
🟡 Minor: Hardcoded timeout value without documentation
Issue: The timeout 600 uses a magic number (10 minutes) without inline explanation.
Why: Future maintainers may not understand why 600 was chosen or what the relationship is to the 30-minute job timeout.
Fix: Add a brief comment:
# 10-min timeout: fail gracefully before the 30-min job timeout
run: timeout 600 pnpm --filter @inkeep/agents-manage-ui test:e2e:ciRefs:
Summary
docker-compose.yml(root) andcreate-agents-template/docker-compose.ymlfor automated dependency updatesdolthub/doltgresql:0.54.10incypress.ymlto matchdocker-compose.ymlinstrumentation.tswhenENVIRONMENT=test— preventsgetNodeAutoInstrumentations()from monkeypatching HTTP modules during Next.js SSROTEL_SDK_DISABLED=trueenv var in Cypress workflow (belt-and-suspenders)if: always()withif-no-files-found: ignoreso screenshots/videos are uploaded on failure, cancellation, or timeoutRoot cause analysis (Cypress failures since Feb 19)
The OTEL commit (#2168,
0aea45a3) was the first commit after the last passing main run. It:instrumentation.ts→otel.tsgetNodeAutoInstrumentations()which patcheshttp/undicimodules globallyBatchSpanProcessorthat retries exporting to a non-existentlocalhost:4318every 500msEvidence: All 10 Cypress specs fail with "element not found" errors (
#id,#description,.react-flow__node,[data-uri=...] textarea) — consistent with a global rendering failure, not individual test issues.Test plan
🤖 Generated with Claude Code