Skip to content

Fix/msrv tts dockerfile healthcheck circuitbreaker#1012

Open
Valreb001 wants to merge 4 commits into
solutions-plug:mainfrom
Valreb001:fix/msrv-tts-dockerfile-healthcheck-circuitbreaker
Open

Fix/msrv tts dockerfile healthcheck circuitbreaker#1012
Valreb001 wants to merge 4 commits into
solutions-plug:mainfrom
Valreb001:fix/msrv-tts-dockerfile-healthcheck-circuitbreaker

Conversation

@Valreb001

@Valreb001 Valreb001 commented Jun 27, 2026

Copy link
Copy Markdown

Summary

Closes #990
Closes #991
Closes #992
Closes #993

  • MSRV — added rust-version = "1.75" to Cargo.toml, a CI workflow that builds against it, and docs in CONTRIBUTING.md

  • TTS Dockerfile — replaced the single-stage Dockerfile with a multi-stage build (builder + slim runtime), non-root user, HEALTHCHECK on /health/live, pinned Node 20 in package.json, and added the TTS service to docker-compose.tracing.yml

  • Real health checks — /health/ready now runs actual probes: writes a test file to verify output dir, calls ElevenLabs GET /v1/user to verify the API key, validates Google credentials JSON, and checks job queue depth against a 500-job limit — returns 503 with a structured body if anything fails

  • Circuit breaker — wrapped both TTS providers with opossum circuit breakers (5 failures / 30s window, 30s half-open interval), fast-failing with 503 when open, exposing breaker state in /health/ready, with 5 passing tests

- Add rust-version = '1.75' to services/api/Cargo.toml (determined by
  axum 0.7 / sqlx 0.8 / tower-http 0.6 requirements, matching the
  rust:1.75-slim image already pinned in the root Dockerfile)
- Add .github/workflows/msrv.yml that reads the MSRV dynamically from
  Cargo.toml, installs the toolchain via dtolnay/rust-toolchain, and
  runs cargo check + cargo build --release
- Document MSRV policy in CONTRIBUTING.md
- Replace single-stage Dockerfile with a two-stage build:
  builder (node:20-alpine) compiles TypeScript; runtime (node:20-alpine)
  installs production deps only and runs as non-root user ttsuser
- Add engines field to package.json to pin Node.js 20 LTS
- Add HEALTHCHECK calling /health/live with 30s interval
- Add tts service to docker-compose.tracing.yml with OTEL Collector
  dependency, tts-output volume, resource limits, and healthcheck
- checkOutputDirectory: write + delete a probe file to confirm writability
- checkElevenLabs: make a live GET /v1/user call (5 s timeout) to verify
  API key validity and network reachability; returns 503 on 401 or network
  errors
- checkGoogle: validate key file exists, parses as JSON, and contains
  project_id + private_key; same for inline credentials object
- checkJobQueueDepth: count pending+processing jobs against MAX_QUEUE_DEPTH
  (500); warns at 80 %, errors at 100 % — surfaced in /health/ready
- readiness() now aggregates all four probes and returns 503 with a
  structured JSON body listing every failing check when any fail
- /health/ready response body now includes a checks map alongside status,
  message, and timestamp
- Import opossum and add CircuitBreakerState/CircuitBreakerConfig types
- Default config: 5-failure threshold (volumeThreshold), 30 s rolling
  window (rollingCountTimeout), 30 s half-open interval (resetTimeout),
  10 s per-call timeout
- TTSService._initCircuitBreakers builds one opossum CircuitBreaker per
  configured provider (elevenlabs, google) and wires open/halfOpen/close
  log events
- TTSService._callProvider routes all provider calls through breaker.fire()
  so the open circuit fast-fails with HTTP 503 without making a network call
- TTSService.getCircuitBreakerStates returns state snapshots exposed in
  /health/ready response body
- HealthCheck.readiness includes circuit breaker states; open breakers
  produce an 'error' check entry causing a 503 readiness response
- Add circuit-breaker.test.ts with 5 tests: initial closed state, trips
  open after threshold, fast-fail without network call, half-open
  transition, and empty state when no providers configured
@drips-wave

drips-wave Bot commented Jun 27, 2026

Copy link
Copy Markdown

@Valreb001 Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant