Improve Logging and Observability (Django + Celery + Docker)
We currently rely on many print() statements across backend paths and Celery execution, which makes logs noisy, hard to correlate, and difficult to query in production.
Examples
api/services/job_service.py (around line 140+)
api/views/similarity_views.py (line 14+)
api/services/progress_service.py (line 51+)
api/tasks.py, api/prediction_engines/*, and subprocess runner paths
Goal
Implement a proper logging/observability system that is:
- Structured (JSON logs, not raw
print)
- Correlated across request + job + task boundaries
- Production-friendly for our Docker deployment (
backend, celery, celery-beat, redis, frontend)
Proposed Scope
- Add centralized Django logging config (
LOGGING) with consistent format and levels.
- Replace backend
print() usage with logging calls in API/service/task code.
- Add correlation fields where possible:
request_id
job_public_id
celery_task_id
method_key, target
- Add Celery task lifecycle logging (
task_prerun, task_postrun, task_failure).
- Keep SSE/session UX logs (
push_line) working, but separate from infra logs.
- Add production log aggregation stack in Docker:
- Loki + Promtail + Grafana (or equivalent)
- Document how to query logs (by service, job id, task id, error level).
Nice-to-have
- Sentry integration for exception grouping/alerting.
- Dashboard for task failures + durations by method.
Acceptance Criteria
- No new backend operational logging via bare
print() in api/ runtime paths.
- Logs are searchable by
job_public_id and celery_task_id.
- Contributor docs include:
- how to run stack locally/dev
- how to inspect logs in prod
- sample queries for common incidents
Notes for Contributors
Deployment topology lives in:
docker-compose.prod.yml
Dockerfile.web (backend + celery-beat image)
Dockerfile (full celery worker image with model envs)
Task orchestration entrypoint:
api/services/job_service.py → run_multi_prediction.delay(...) → api/tasks.py
Improve Logging and Observability (Django + Celery + Docker)
We currently rely on many
print()statements across backend paths and Celery execution, which makes logs noisy, hard to correlate, and difficult to query in production.Examples
api/services/job_service.py(around line 140+)api/views/similarity_views.py(line 14+)api/services/progress_service.py(line 51+)api/tasks.py,api/prediction_engines/*, and subprocess runner pathsGoal
Implement a proper logging/observability system that is:
print)backend,celery,celery-beat,redis,frontend)Proposed Scope
LOGGING) with consistent format and levels.print()usage with logging calls in API/service/task code.request_idjob_public_idcelery_task_idmethod_key,targettask_prerun,task_postrun,task_failure).push_line) working, but separate from infra logs.Nice-to-have
Acceptance Criteria
print()inapi/runtime paths.job_public_idandcelery_task_id.Notes for Contributors
Deployment topology lives in:
docker-compose.prod.ymlDockerfile.web(backend + celery-beat image)Dockerfile(full celery worker image with model envs)Task orchestration entrypoint:
api/services/job_service.py→run_multi_prediction.delay(...)→api/tasks.py