Fix/msrv tts dockerfile healthcheck circuitbreaker#1012
Open
Valreb001 wants to merge 4 commits into
Open
Conversation
- Add rust-version = '1.75' to services/api/Cargo.toml (determined by axum 0.7 / sqlx 0.8 / tower-http 0.6 requirements, matching the rust:1.75-slim image already pinned in the root Dockerfile) - Add .github/workflows/msrv.yml that reads the MSRV dynamically from Cargo.toml, installs the toolchain via dtolnay/rust-toolchain, and runs cargo check + cargo build --release - Document MSRV policy in CONTRIBUTING.md
- Replace single-stage Dockerfile with a two-stage build: builder (node:20-alpine) compiles TypeScript; runtime (node:20-alpine) installs production deps only and runs as non-root user ttsuser - Add engines field to package.json to pin Node.js 20 LTS - Add HEALTHCHECK calling /health/live with 30s interval - Add tts service to docker-compose.tracing.yml with OTEL Collector dependency, tts-output volume, resource limits, and healthcheck
- checkOutputDirectory: write + delete a probe file to confirm writability - checkElevenLabs: make a live GET /v1/user call (5 s timeout) to verify API key validity and network reachability; returns 503 on 401 or network errors - checkGoogle: validate key file exists, parses as JSON, and contains project_id + private_key; same for inline credentials object - checkJobQueueDepth: count pending+processing jobs against MAX_QUEUE_DEPTH (500); warns at 80 %, errors at 100 % — surfaced in /health/ready - readiness() now aggregates all four probes and returns 503 with a structured JSON body listing every failing check when any fail - /health/ready response body now includes a checks map alongside status, message, and timestamp
- Import opossum and add CircuitBreakerState/CircuitBreakerConfig types - Default config: 5-failure threshold (volumeThreshold), 30 s rolling window (rollingCountTimeout), 30 s half-open interval (resetTimeout), 10 s per-call timeout - TTSService._initCircuitBreakers builds one opossum CircuitBreaker per configured provider (elevenlabs, google) and wires open/halfOpen/close log events - TTSService._callProvider routes all provider calls through breaker.fire() so the open circuit fast-fails with HTTP 503 without making a network call - TTSService.getCircuitBreakerStates returns state snapshots exposed in /health/ready response body - HealthCheck.readiness includes circuit breaker states; open breakers produce an 'error' check entry causing a 503 readiness response - Add circuit-breaker.test.ts with 5 tests: initial closed state, trips open after threshold, fast-fail without network call, half-open transition, and empty state when no providers configured
|
@Valreb001 Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits. You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #990
Closes #991
Closes #992
Closes #993
MSRV — added rust-version = "1.75" to Cargo.toml, a CI workflow that builds against it, and docs in CONTRIBUTING.md
TTS Dockerfile — replaced the single-stage Dockerfile with a multi-stage build (builder + slim runtime), non-root user, HEALTHCHECK on /health/live, pinned Node 20 in package.json, and added the TTS service to docker-compose.tracing.yml
Real health checks — /health/ready now runs actual probes: writes a test file to verify output dir, calls ElevenLabs GET /v1/user to verify the API key, validates Google credentials JSON, and checks job queue depth against a 500-job limit — returns 503 with a structured body if anything fails
Circuit breaker — wrapped both TTS providers with opossum circuit breakers (5 failures / 30s window, 30s half-open interval), fast-failing with 503 when open, exposing breaker state in /health/ready, with 5 passing tests