Skip to content

Replace print()-based logging with structured production observability (Django + Celery + Docker) #5

@saleha1wer

Description

@saleha1wer

Improve Logging and Observability (Django + Celery + Docker)

We currently rely on many print() statements across backend paths and Celery execution, which makes logs noisy, hard to correlate, and difficult to query in production.

Examples

  • api/services/job_service.py (around line 140+)
  • api/views/similarity_views.py (line 14+)
  • api/services/progress_service.py (line 51+)
  • api/tasks.py, api/prediction_engines/*, and subprocess runner paths

Goal

Implement a proper logging/observability system that is:

  • Structured (JSON logs, not raw print)
  • Correlated across request + job + task boundaries
  • Production-friendly for our Docker deployment (backend, celery, celery-beat, redis, frontend)

Proposed Scope

  • Add centralized Django logging config (LOGGING) with consistent format and levels.
  • Replace backend print() usage with logging calls in API/service/task code.
  • Add correlation fields where possible:
    • request_id
    • job_public_id
    • celery_task_id
    • method_key, target
  • Add Celery task lifecycle logging (task_prerun, task_postrun, task_failure).
  • Keep SSE/session UX logs (push_line) working, but separate from infra logs.
  • Add production log aggregation stack in Docker:
    • Loki + Promtail + Grafana (or equivalent)
  • Document how to query logs (by service, job id, task id, error level).

Nice-to-have

  • Sentry integration for exception grouping/alerting.
  • Dashboard for task failures + durations by method.

Acceptance Criteria

  • No new backend operational logging via bare print() in api/ runtime paths.
  • Logs are searchable by job_public_id and celery_task_id.
  • Contributor docs include:
    • how to run stack locally/dev
    • how to inspect logs in prod
    • sample queries for common incidents

Notes for Contributors

Deployment topology lives in:

  • docker-compose.prod.yml
  • Dockerfile.web (backend + celery-beat image)
  • Dockerfile (full celery worker image with model envs)

Task orchestration entrypoint:

api/services/job_service.pyrun_multi_prediction.delay(...)api/tasks.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions