Skip to content

Comments

fix(ci): add Dependabot for docker-compose, pin Doltgres, disable OTEL in Cypress#2240

Closed
amikofalvy wants to merge 1 commit intomainfrom
worktree-doltgres-upgrade
Closed

fix(ci): add Dependabot for docker-compose, pin Doltgres, disable OTEL in Cypress#2240
amikofalvy wants to merge 1 commit intomainfrom
worktree-doltgres-upgrade

Conversation

@amikofalvy
Copy link
Collaborator

@amikofalvy amikofalvy commented Feb 21, 2026

Summary

  • Add Dependabot for docker-compose: Track Docker image versions in docker-compose.yml (root) and create-agents-template/docker-compose.yml for automated dependency updates
  • Pin Doltgres in Cypress CI: Pin dolthub/doltgresql:0.54.10 in cypress.yml to match docker-compose.yml
  • Disable OTEL in test environments:
    • Skip OTEL import in instrumentation.ts when ENVIRONMENT=test — prevents getNodeAutoInstrumentations() from monkeypatching HTTP modules during Next.js SSR
    • Add OTEL_SDK_DISABLED=true env var in Cypress workflow (belt-and-suspenders)
  • Cypress CI reliability:
    • Add 10-min timeout to the Cypress test step so it fails before the 30-min job timeout
    • Change artifact upload to if: always() with if-no-files-found: ignore so screenshots/videos are uploaded on failure, cancellation, or timeout

Root cause analysis (Cypress failures since Feb 19)

The OTEL commit (#2168, 0aea45a3) was the first commit after the last passing main run. It:

  1. Loads unconditionally during SSR via instrumentation.tsotel.ts
  2. Calls getNodeAutoInstrumentations() which patches http/undici modules globally
  3. Creates a BatchSpanProcessor that retries exporting to a non-existent localhost:4318 every 500ms
  4. The HTTP patching interferes with Next.js internal request handling, causing all pages to fail to render

Evidence: All 10 Cypress specs fail with "element not found" errors (#id, #description, .react-flow__node, [data-uri=...] textarea) — consistent with a global rendering failure, not individual test issues.

Test plan

  • Verify Cypress CI passes on this PR (or at least progresses past the rendering failures)
  • Verify screenshots/videos are uploaded as artifacts on failure
  • Verify Dependabot creates PRs for docker-compose image updates

🤖 Generated with Claude Code

@vercel
Copy link

vercel bot commented Feb 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Feb 21, 2026 10:13am
agents-docs Ready Ready Preview, Comment Feb 21, 2026 10:13am
agents-manage-ui Ready Ready Preview, Comment Feb 21, 2026 10:13am

Request Review

@changeset-bot
Copy link

changeset-bot bot commented Feb 21, 2026

⚠️ No Changeset found

Latest commit: 56c2343

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(2) Total Issues | Risk: Low

🟠⚠️ Major (1) 🟠⚠️

🟠 1) .github/workflows/ci.yml create-agents-e2e job may hit same OTEL interference

Issue: The create-agents-e2e job in ci.yml does NOT set ENVIRONMENT: test (unlike the main ci job on line 97), nor does it set OTEL_SDK_DISABLED: true. Since this job builds and runs agents-manage-ui via Turbo, the OTEL initialization could still interfere with Playwright-based tests, causing similar timeout issues to what was fixed in Cypress.

Why: This creates an inconsistency where Cypress tests are protected from OTEL interference but the create-agents-e2e job is not. The job runs Playwright tests which may also suffer from HTTP module monkey-patching if OTEL loads.

Fix: Add ENVIRONMENT: test and/or OTEL_SDK_DISABLED: true to the create-agents-e2e job's env vars. The ENVIRONMENT check in instrumentation.ts will then prevent OTEL from loading.

# In .github/workflows/ci.yml, add to the create-agents-e2e job's env:
ENVIRONMENT: test
OTEL_SDK_DISABLED: true

Refs:

🟡 Minor (1) 🟡

Inline Comments:

  • 🟡 Minor: .github/workflows/cypress.yml:16 Missing explicit permissions block

💭 Consider (2) 💭

💭 1) agents-api/src/instrumentation.ts OTEL guard pattern not applied to agents-api
Issue: The PR adds an ENVIRONMENT !== 'test' guard to agents-manage-ui but agents-api/src/instrumentation.ts unconditionally initializes OTEL SDK.
Why: For consistency, sibling instrumentation files should follow the same pattern. The PR description already notes this as a follow-up.
Refs: agents-api/src/index.ts:4

💭 2) .github/dependabot.yml Dependabot only tracks standard compose files
Issue: Dependabot's docker-compose ecosystem only tracks files named docker-compose.yml/compose.yml. The repo also has docker-compose.dbs.yml which contains Docker images that won't be tracked.
Why: Those images (doltgres, postgres, spicedb) will need manual updates.
Refs: docker-compose.dbs.yml


💡 APPROVE WITH SUGGESTIONS

Summary: This is a solid fix for the Cypress CI failures! The root cause analysis is thorough and the multi-layered approach (code guard + env var) provides good defense-in-depth. The one actionable item is ensuring the create-agents-e2e job in ci.yml also gets the same OTEL protection to prevent similar failures there. The other items are minor consistency improvements that can be addressed in follow-up work. 🎉

Discarded (4)
Location Issue Reason Discarded
agents-manage-ui/src/otel.ts:25 BatchSpanProcessor retry loop when no collector exists Addressed by this PR — the guard in instrumentation.ts prevents otel.ts from loading entirely in test environments
agents-manage-ui/src/instrumentation.ts:6 Dual mechanism (OTEL_SDK_DISABLED + ENVIRONMENT check) This is intentional defense-in-depth; the code guard prevents module load while the env var is a backup
docker-compose.dbs.yml Doltgres version pinning inconsistency across compose files The Dependabot addition will help converge versions over time; informational only
agents-manage-ui/src/instrumentation.ts OTEL guard pattern is sound Positive feedback, not an issue
Reviewers (3)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-devops 5 1 1 0 1 0 2
pr-review-sre 4 0 0 0 0 0 4
pr-review-consistency 2 0 1 0 0 0 1
Total 11 1 2 0 1 0 7

Note: SRE findings largely overlapped with DevOps findings and were merged/deduplicated.

- Add Dependabot config to track Docker image versions in docker-compose
  files (root and create-agents-template)
- Pin Doltgres to 0.54.10 in cypress.yml to match docker-compose.yml
- Skip OTEL instrumentation when ENVIRONMENT=test in instrumentation.ts
  to prevent getNodeAutoInstrumentations() from breaking Next.js SSR
- Add OTEL_SDK_DISABLED=true env var in Cypress workflow (belt-and-suspenders)
- Add 10-min timeout to Cypress test step so artifact uploads can run
  before the 30-min job timeout
- Always upload Cypress screenshots/videos (if: always() + if-no-files-found: ignore)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(3) Total Issues | Risk: Low

🟡 Minor (1) 🟡

Inline Comments:

  • 🟡 Minor: .github/composite-actions/cypress-e2e/action.yml:118 Hardcoded timeout value without documentation

💭 Consider (3) 💭

💭 1) .github/dependabot.yml Dependabot docker-compose coverage has limitations

Issue: Dependabot's docker-compose ecosystem only scans standard filenames (docker-compose.yml, compose.yml). Files like docker-compose.dbs.yml and create-agents-template/docker-compose.db.yml will NOT receive automated update PRs.
Why: The Doltgres, Postgres, and SpiceDB images in these files will require manual version tracking.
Fix: Consider adding a comment in dependabot.yml documenting which files are tracked vs. not tracked.
Refs: Dependabot docker-compose docs

💭 2) docker-compose.*.yml Doltgres version drift across compose files

Issue: The PR pins Doltgres to 0.54.10 in cypress.yml and docker-compose.yml, but docker-compose.dbs.yml, create-agents-template/*.yml, and ci.yml:156 still use :latest.
Why: This creates a split-world where CI and production use pinned versions while local dev uses floating tags.
Fix: Either pin all references to 0.54.10 or document the intentional asymmetry (CI pinned, local dev uses latest).
Refs: docker-compose.dbs.yml:3 · ci.yml:156

💭 3) .github/dependabot.yml Consider adding github-actions ecosystem

Issue: The new Dependabot config tracks docker-compose but not github-actions.
Why: Adding github-actions tracking would automate updates for action versions, improving supply chain security.
Fix: Consider adding:

  - package-ecosystem: 'github-actions'
    directory: '/'
    schedule:
      interval: 'weekly'

🕐 Pending Recommendations (2)


💡 APPROVE WITH SUGGESTIONS

Summary: The core Cypress fix is solid — the OTEL guard and belt-and-suspenders approach effectively address the root cause. The prior review finding about ci.yml's create-agents-e2e job lacking OTEL protection remains unaddressed. This job also runs agents-manage-ui E2E tests and could experience the same OTEL interference. Recommend adding ENVIRONMENT: test and OTEL_SDK_DISABLED: true to that job's env vars. The other items are minor documentation suggestions that can be addressed in follow-up work. 🎉

Discarded (6)
Location Issue Reason Discarded
.github/composite-actions/cypress-e2e/action.yml:121-136 Artifact upload reliability improvements Positive feedback — no issue
agents-manage-ui/src/instrumentation.ts:6 OTEL guard pattern is defense-in-depth Positive feedback — no issue
agents-api/src/instrumentation.ts OTEL pattern divergence between packages Intentional per PR description; agents-api relies on OTEL_SDK_DISABLED env var
.github/composite-actions/cypress-e2e/action.yml:55,66 ENVIRONMENT split (development for setup, test for execution) Working as expected; the guard activates when tests run
.github/workflows/cypress.yml:53-84 Mutable action tags (@v4 vs SHA pins) Pre-existing pattern across all workflows; not introduced by this PR
agents-manage-ui/src/otel.ts:25 OTLP exporter instantiated at module load Addressed by this PR's guard in instrumentation.ts which prevents module load
Reviewers (3)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-devops 6 0 3 0 1 0 2
pr-review-consistency 4 0 0 0 0 2 2
pr-review-sre 4 0 0 0 0 0 4
Total 14 0 3 0 1 2 8

Note: Many findings overlapped across reviewers (Doltgres versioning, Dependabot coverage) and were deduplicated. Prior review findings re-surfaced as Pending Recommendations.

- name: Run Cypress E2E Tests
shell: bash
run: pnpm --filter @inkeep/agents-manage-ui test:e2e:ci
run: timeout 600 pnpm --filter @inkeep/agents-manage-ui test:e2e:ci
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Hardcoded timeout value without documentation

Issue: The timeout 600 uses a magic number (10 minutes) without inline explanation.

Why: Future maintainers may not understand why 600 was chosen or what the relationship is to the 30-minute job timeout.

Fix: Add a brief comment:

      # 10-min timeout: fail gracefully before the 30-min job timeout
      run: timeout 600 pnpm --filter @inkeep/agents-manage-ui test:e2e:ci

Refs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant