Skip to content

PSv2: Show online / offline status for async processing services#1146

Open
mihow wants to merge 10 commits intomainfrom
issue-1122-rename-last-checked
Open

PSv2: Show online / offline status for async processing services#1146
mihow wants to merge 10 commits intomainfrom
issue-1122-rename-last-checked

Conversation

@mihow
Copy link
Collaborator

@mihow mihow commented Feb 21, 2026

Summary

Closes #1122

  • Renames last_checkedlast_seen, last_checked_livelast_seen_live, last_checked_latencylast_seen_latency on the ProcessingService model
  • Adds mark_seen() method for async/pull-mode services to record liveness when they register pipelines
  • Updates the pipeline registration endpoint (POST /api/v2/projects/{id}/pipelines/) to call mark_seen(live=True) after successful registration
  • Updates all backend references (serializers, pipeline queryset, tasks docstring)
  • Updates all frontend references (TypeScript models, table columns, dialog, language strings)
  • Includes Django migration using RenameField (data-preserving, no data loss)

The naming better reflects the semantic difference between:

  • Sync/push services (with endpoint_url): Antenna actively checks the service via the periodic Celery Beat task, updating last_seen/last_seen_live from the health check response
  • Async/pull services (without endpoint_url): Workers report in by registering pipelines, and we record when we last heard from them via mark_seen()

Test plan

  • New unit tests for mark_seen() method (live=True and live=False)
  • New integration test verifying pipeline registration updates last_seen/last_seen_live
  • Existing TestProcessingServiceAPI tests pass with renamed fields
  • Existing TestPipeline and TestProjectPipelinesAPI tests pass
  • Verify UI columns show "Last seen" instead of "Last checked"
  • Verify processing service detail dialog shows "Last seen" label

Summary by CodeRabbit

  • New Features

    • Heartbeat tracking for asynchronous (pull-mode) processing services; pipeline registration now updates a service's "last seen" status.
  • Refactor

    • Renamed monitoring concept from "last checked" to "last seen" across the app.
    • Monitoring now distinguishes async (pull) vs sync (push) services and handles them separately.
  • UI

    • Labels, columns and status displays show "Last seen" instead of "Last checked".
  • Tests

    • Added tests for last-seen behavior and registration side-effects.

@netlify
Copy link

netlify bot commented Feb 21, 2026

Deploy Preview for antenna-ssec canceled.

Name Link
🔨 Latest commit a159ba3
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/69a2278e45c252000829b872

@netlify
Copy link

netlify bot commented Feb 21, 2026

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit a159ba3
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/69a2278ec0775400088b87f3

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 21, 2026

📝 Walkthrough

Walkthrough

Renamed ProcessingService monitoring fields from last_checked* to last_seen*; added is_async and mark_seen() to record heartbeats for pull-mode services; updated models, serializers, views, tasks, migrations, tests, and UI to use last_seen* and emit heartbeats on pipeline registration and job endpoints.

Changes

Cohort / File(s) Summary
Database Schema & Migration
.agents/DATABASE_SCHEMA.md, ami/ml/migrations/0027_rename_last_checked_to_last_seen.py
Replace last_checked* fields with last_seen (timestamp), last_seen_live, and last_seen_latency; migration renames the three fields.
Backend Model Layer
ami/ml/models/processing_service.py
Add ProcessingServiceQuerySet/manager with async_services()/sync_services(); rename last_checked*last_seen*; add is_async property and mark_seen(); update get_status() to use last_seen fields.
Pipeline Model
ami/ml/models/pipeline.py
Switch pipeline online/latency selection to use last_seen_live and last_seen_latency.
Views & Job Flow
ami/ml/views.py, ami/jobs/views.py
Call processing_service.mark_seen(live=True) after pipeline registration; add _mark_pipeline_pull_services_seen(job) and invoke it in job task/result endpoints to heartbeat pull-mode services.
Background Tasks
ami/ml/tasks.py
Rework check_processing_services_online() to separately handle async (pull) vs sync (push) services; add _BEAT_STATUS_TIMEOUT and _BEAT_TASK_EXPIRES; mark stale async services offline via last_seen.
Serializers
ami/ml/serializers.py
Expose last_seen, last_seen_live, and is_async in serializers; remove deprecated last_checked* fields.
Backend Tests
ami/ml/tests.py
Add tests for mark_seen() and pipeline registration updating last_seen; adjust existing status tests to expect server_live semantics and new fields.
Views (project pipelines)
ami/ml/views.py
Invoke mark_seen() when creating pipelines for a processing service during registration.
Frontend Models
ui/src/data-services/models/processing-service.ts, ui/src/data-services/models/pipeline.ts
Rename getters and logic from lastChecked/lastCheckedLive → lastSeen/lastSeenLive; add isAsync getter; adjust endpointUrl typing.
Frontend UI & i18n
ui/src/pages/.../*.tsx, ui/src/utils/language.ts
Update labels, column ids, sort fields, and display strings from "Last checked" → "Last seen" and bind UI to lastSeen fields.
Documentation
.agents/AGENTS.md
Documented the rename and new mark_seen() heartbeat behavior for async services.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant JobsView as "Jobs View"
  participant Pipeline as "Pipeline Service"
  participant PSModel as "ProcessingService"
  participant DB as "Database"

  Client->>JobsView: POST /jobs/{id}/tasks or /results
  JobsView->>Pipeline: validate job.pipeline_id
  JobsView->>PSModel: _mark_pipeline_pull_services_seen(job)
  PSModel->>PSModel: for each async service: mark_seen(live=True)
  PSModel->>DB: UPDATE processing_service SET last_seen=now(), last_seen_live=TRUE
  DB-->>PSModel: OK
  PSModel-->>JobsView: done
  JobsView-->>Client: continue tasks/results flow
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

Poem

🐰 I hopped in quick to tap a beat,
Seen, not checked — a gentler heartbeat.
Pipelines register, workers reply,
Last-seen timestamps wink an eye.
Tiny rabbit cheers: "All systems nigh!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: renaming fields and adding functionality to track online/offline status for async processing services.
Description check ✅ Passed The PR description provides a clear summary, lists key changes, references the related issue (#1122), includes a detailed description of the semantic rationale, and documents the test plan.
Linked Issues check ✅ Passed The PR fully addresses issue #1122 by renaming last_checked* fields to last_seen*, adding mark_seen() for async services, updating all backend/frontend references, and preserving the polling mechanism for sync services.
Out of Scope Changes check ✅ Passed All changes are directly related to renaming last_checked fields to last_seen, implementing the mark_seen() method for async services, and updating all corresponding references as required by issue #1122.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch issue-1122-rename-last-checked

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

mihow and others added 2 commits February 26, 2026 17:01
…1122)

Rename fields to better reflect the semantic difference between sync and
async processing service status tracking:
- last_checked → last_seen
- last_checked_live → last_seen_live
- last_checked_latency → last_seen_latency

For sync services with endpoint URLs, fields are updated by the periodic
status checker. For async/pull-mode services, a new mark_seen() method
is called when the service registers pipelines, recording that we heard
from it.

Also updates all references in serializers, pipeline queryset, views,
frontend models, columns, dialog, and language strings.

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow force-pushed the issue-1122-rename-last-checked branch from faf0a9e to af6e267 Compare February 27, 2026 01:01
Add structured queryset methods and a heartbeat mechanism so async
(pull-mode) processing services stay in sync with their actual liveness.

ProcessingService:
- New ProcessingServiceQuerySet with async_services() / sync_services()
  methods — single canonical filter for endpoint_url null-or-empty, used
  everywhere instead of ad-hoc Q expressions
- is_async property (derived from endpoint_url, no DB column)
- Docstrings reference Job.dispatch_mode ASYNC_API / SYNC_API for context

Liveness tracking:
- PROCESSING_SERVICE_LAST_SEEN_MAX = 60s constant (12× the worker's 5s
  poll interval) — async services are considered offline after this
- check_processing_services_online task now handles both modes:
  sync → active /readyz poll; async → bulk mark stale via async_services()
- _mark_pipeline_pull_services_seen() helper in jobs/views.py: single bulk
  UPDATE via job.pipeline.processing_services.async_services(), called at
  the top of both /jobs/{id}/tasks/ and /jobs/{id}/result/ so every worker
  poll cycle refreshes last_seen without needing a separate registration

Async job cleanup (from carlosg/redisatomic):
- Rename _cleanup_job_if_needed → cleanup_async_job_if_needed and export
  it so Job.cancel() can call it directly without a local import
- JobLogHandler: refresh_from_db before appending to avoid last-writer-
  wins race across concurrent worker processes
- Job.logger: update existing handler's job reference instead of always
  adding a new handler (process-level singleton leak fix)

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow force-pushed the issue-1122-rename-last-checked branch from 77dd024 to 2c029f8 Compare February 27, 2026 02:16
- PROCESSING_SERVICE_LAST_SEEN_MAX = 60s constant (12x the worker's 5s
  poll interval) used by check_processing_services_online to expire stale
  async service heartbeats
- get_status() pull-mode branch: derives server_live from staleness check,
  populates pipelines_online from registered pipelines, uses
  `not self.endpoint_url` to also handle empty-string endpoints
- endpointUrl getter: returns undefined instead of stringified "null" so
  async services show a blank cell in the endpoint column

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow marked this pull request as ready for review February 27, 2026 02:27
Copilot AI review requested due to automatic review settings February 27, 2026 02:27
@mihow mihow changed the title Rename ProcessingService last_checked fields to last_seen PSv2: Show online / offline status for async processing services Feb 27, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
ui/src/pages/project/processing-services/processing-services-columns.tsx (1)

54-59: Consider using the translation key for consistency.

The status details use a hardcoded 'Last seen ' string, while other parts of the codebase (e.g., processing-service-details-dialog.tsx at line 81) use translate(STRING.FIELD_LABEL_LAST_SEEN). For i18n consistency, consider using the translation function here as well.

Suggested change
     renderCell: (item: ProcessingService) => (
       <StatusTableCell
         color={item.status.color}
-        details={'Last seen ' + item.lastSeen}
+        details={translate(STRING.FIELD_LABEL_LAST_SEEN) + ' ' + item.lastSeen}
         label={item.status.label}
       />
     ),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui/src/pages/project/processing-services/processing-services-columns.tsx`
around lines 54 - 59, Replace the hardcoded "Last seen " prefix in the
StatusTableCell details with the i18n translation key usage; locate the
StatusTableCell usage where details is built (details={'Last seen ' +
item.lastSeen}) and change it to use translate(STRING.FIELD_LABEL_LAST_SEEN)
concatenated or interpolated with item.lastSeen (e.g.,
translate(STRING.FIELD_LABEL_LAST_SEEN) + item.lastSeen) so it matches other
components like processing-service-details-dialog.tsx and uses the same
STRING.FIELD_LABEL_LAST_SEEN key.
ami/ml/models/pipeline.py (1)

1072-1092: Field renames look correct, but note pre-existing edge case.

The field references are correctly updated to last_seen_live and last_seen_latency. However, there's a pre-existing edge case: if all online processing services have last_seen_latency as None, processing_service_lowest_latency will never be assigned, causing an UnboundLocalError at line 1088.

This isn't introduced by this PR, but consider addressing it as a follow-up if async/pull-mode services may not have latency values.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/models/pipeline.py` around lines 1072 - 1092, The loop can leave
processing_service_lowest_latency unassigned when every online
processing_service has last_seen_latency == None; after iterating
processing_services check if processing_service_lowest_latency is still None and
if so pick a fallback online service (e.g., the first processing_service where
last_seen_live is True) and set lowest_latency to None or a sentinel, then log
that latency is unknown for the chosen service using task_logger (include
pipeline_name and the selected processing_service) before returning it; update
references to processing_services, processing_service_lowest_latency,
last_seen_live, last_seen_latency, pipeline_name, and task_logger in this fix.
ami/ml/tasks.py (1)

121-123: Use logger.exception() to preserve traceback when service health checks fail.

At line 122, logger.error() logs only the exception message, losing the full stack trace. When debugging intermittent polling failures in production, the traceback is essential. Use logger.exception() instead, which is designed for exception handlers and automatically includes the full context.

♻️ Proposed refactor
        except Exception as e:
-            logger.error(f"Error checking service {service}: {e}")
+            logger.exception("Error checking service %s", service)
            continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/tasks.py` around lines 121 - 123, Replace the logger.error call inside
the exception handler that catches service health-check failures so the full
traceback is preserved: in the except Exception as e: block where
logger.error(f"Error checking service {service}: {e}") is used (in
ami/ml/tasks.py, referencing logger and the service variable), call
logger.exception with a clear message (e.g., "Error checking service {service}")
so the stack trace is included, then continue as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/jobs/views.py`:
- Around line 49-52: The bulk update on
job.pipeline.processing_services.async_services().update(...) incorrectly
updates services across all projects; restrict the update to services linked to
the current job's project by adding an M2M filter on the ProcessingService
projects relation (e.g., replace async_services().update(...) with
async_services().filter(projects=job.project).update(...) or
.filter(projects__in=[job.project]).update(...) as appropriate), ensuring you
reference the existing job.pipeline and job.project objects so only services
associated with this job's project are marked last_seen/last_seen_live.

In `@ui/src/data-services/models/processing-service.ts`:
- Around line 53-54: The endpointUrl getter currently returns
this._processingService.endpoint_url using nullish coalescing, which leaves
empty string values intact; update the endpointUrl getter to normalize both null
and "" (and optionally whitespace-only strings) to undefined by reading const v
= this._processingService.endpoint_url and returning undefined when v is null or
v.trim() === "" (otherwise return v) so the UI treats empty endpoints as
async/pull-mode; modify the endpointUrl getter implementation accordingly.

---

Nitpick comments:
In `@ami/ml/models/pipeline.py`:
- Around line 1072-1092: The loop can leave processing_service_lowest_latency
unassigned when every online processing_service has last_seen_latency == None;
after iterating processing_services check if processing_service_lowest_latency
is still None and if so pick a fallback online service (e.g., the first
processing_service where last_seen_live is True) and set lowest_latency to None
or a sentinel, then log that latency is unknown for the chosen service using
task_logger (include pipeline_name and the selected processing_service) before
returning it; update references to processing_services,
processing_service_lowest_latency, last_seen_live, last_seen_latency,
pipeline_name, and task_logger in this fix.

In `@ami/ml/tasks.py`:
- Around line 121-123: Replace the logger.error call inside the exception
handler that catches service health-check failures so the full traceback is
preserved: in the except Exception as e: block where logger.error(f"Error
checking service {service}: {e}") is used (in ami/ml/tasks.py, referencing
logger and the service variable), call logger.exception with a clear message
(e.g., "Error checking service {service}") so the stack trace is included, then
continue as before.

In `@ui/src/pages/project/processing-services/processing-services-columns.tsx`:
- Around line 54-59: Replace the hardcoded "Last seen " prefix in the
StatusTableCell details with the i18n translation key usage; locate the
StatusTableCell usage where details is built (details={'Last seen ' +
item.lastSeen}) and change it to use translate(STRING.FIELD_LABEL_LAST_SEEN)
concatenated or interpolated with item.lastSeen (e.g.,
translate(STRING.FIELD_LABEL_LAST_SEEN) + item.lastSeen) so it matches other
components like processing-service-details-dialog.tsx and uses the same
STRING.FIELD_LABEL_LAST_SEEN key.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b01f01b and b5309ec.

📒 Files selected for processing (16)
  • .agents/AGENTS.md
  • .agents/DATABASE_SCHEMA.md
  • ami/jobs/views.py
  • ami/ml/migrations/0027_rename_last_checked_to_last_seen.py
  • ami/ml/models/pipeline.py
  • ami/ml/models/processing_service.py
  • ami/ml/serializers.py
  • ami/ml/tasks.py
  • ami/ml/tests.py
  • ami/ml/views.py
  • ui/src/data-services/models/pipeline.ts
  • ui/src/data-services/models/processing-service.ts
  • ui/src/pages/processing-service-details/processing-service-details-dialog.tsx
  • ui/src/pages/project/pipelines/pipelines-columns.tsx
  • ui/src/pages/project/processing-services/processing-services-columns.tsx
  • ui/src/utils/language.ts

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Renames ProcessingService.last_checked* fields to last_seen* across backend and UI, and adds heartbeat-style liveness tracking for pull-mode (async/no-endpoint) processing services.

Changes:

  • Backend: rename model fields + add mark_seen() and async/sync query helpers; update polling task + endpoints to update last_seen.
  • Frontend: rename model accessors/columns/labels from “last checked” to “last seen”.
  • Data migration: add Django RenameField migration to preserve existing data.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
ami/ml/models/processing_service.py Renames fields, adds mark_seen(), and changes get_status() semantics for pull-mode services.
ami/ml/tasks.py Updates periodic liveness checker to poll sync services + mark stale async services offline.
ami/ml/views.py Marks service seen after successful async pipeline registration.
ami/jobs/views.py Adds heartbeat updates on async worker task-fetch/result-submit endpoints.
ami/ml/serializers.py Renames serialized fields to last_seen/last_seen_live.
ami/ml/models/pipeline.py Updates queryset filters and comments to use last_seen*.
ami/ml/migrations/0027_rename_last_checked_to_last_seen.py Data-preserving field renames via RenameField.
ami/ml/tests.py Adds unit/integration tests for mark_seen() and pipeline registration.
ui/src/data-services/models/processing-service.ts Renames getters to lastSeen* and makes endpointUrl nullable-safe.
ui/src/data-services/models/pipeline.ts Renames computed fields to processingServicesOnlineLastSeen.
ui/src/pages/project/processing-services/processing-services-columns.tsx Updates status details display to use lastSeen.
ui/src/pages/project/pipelines/pipelines-columns.tsx Renames “last checked” column to “last seen”.
ui/src/pages/processing-service-details/processing-service-details-dialog.tsx Updates label/value bindings to lastSeen.
ui/src/utils/language.ts Adds FIELD_LABEL_LAST_SEEN string and English translation.
.agents/DATABASE_SCHEMA.md / .agents/AGENTS.md Updates documentation references to last_seen_live and pull-mode heartbeat.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Fix ImportError: import PROCESSING_SERVICE_LAST_SEEN_MAX directly from
  ami.ml.models.processing_service (not re-exported from ami.ml.models)
- Fix null last_seen causing epoch timestamp in processingServicesOnlineLastSeen
  getter — filter out null values before Math.max
- Fix "Last seen undefined" rendered in status column when lastSeen is undefined

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow
Copy link
Collaborator Author

mihow commented Feb 27, 2026

Code review

Found 2 issues; fixed 3 (including the critical one). Responding to existing CodeRabbit/Copilot comments inline below.


Fixed in commit 671da1a:

  1. ImportError on PROCESSING_SERVICE_LAST_SEEN_MAXami/ml/tasks.py:112 imports from ami.ml.models, but the constant is defined in the submodule and not re-exported from __init__.py. This would crash the check_processing_services_online Celery beat task on every run, meaning async services would never be marked offline. Fixed by importing from ami.ml.models.processing_service directly.

    antenna/ami/ml/tasks.py

    Lines 110 to 113 in b5309ec

    import datetime
    from ami.ml.models import PROCESSING_SERVICE_LAST_SEEN_MAX, ProcessingService

  2. Null last_seen producing epoch timestampprocessingServicesOnlineLastSeen in pipeline.ts pushed new Date(null).getTime() (= 0, i.e. Jan 1 1970) into the last_seen_times array without filtering, causing Math.max(...) to silently return an ancient timestamp for any pipeline with an async service that had never checked in. Fixed by filtering out null values; returns undefined when no valid timestamps exist.

    const last_seen_times = []
    for (const processingService of processingServices) {
    last_seen_times.push(new Date(processingService.last_seen).getTime())
    }
    return getFormatedDateTimeString({
    date: new Date(Math.max(...last_seen_times)),
    })
    }

  3. "Last seen undefined" in status column — (Copilot comment 2862206485) Valid. Fixed with the suggested guard.


Needs your input:

  1. pipelines_online returns all pipelines even when service is offline — In the async branch of get_status(), pipelines_online=pipeline_names is always populated regardless of is_live. When the heartbeat expires and is_live=False, the response still reports all pipelines as online. Sync services only populate pipelines_online when actually live. Suggest pipelines_online=pipeline_names if is_live else [] for consistency.

    self.save(update_fields=["last_seen_live"])
    pipeline_names = list(self.pipelines.values_list("name", flat=True))
    return ProcessingServiceStatusResponse(
    timestamp=self.last_seen or datetime.datetime.now(),
    request_successful=is_live,
    server_live=is_live,
    pipelines_online=pipeline_names,
    pipeline_configs=[],
    endpoint_url=None,
    latency=0.0,
    )

  2. Heartbeat marks all async services on a pipeline (Copilot/CodeRabbit comment 2862206476) — _mark_pipeline_pull_services_seen() does a bulk update on all async services linked to the pipeline, not just the caller. The docstring acknowledges this ("A future application-token scheme…"). This seems acceptable for now given the single-service-per-pipeline deployment model, but worth tracking.


Responding to other comments:

  • CodeRabbit: i18n 'Last seen ' string — Valid nitpick. translate(STRING.FIELD_LABEL_LAST_SEEN) would be consistent with the details dialog. Worth a follow-up.

  • Copilot: test makes real network call (2862206503) — get_status(timeout=1) will still hit retry logic in create_session. Worth mocking, but not blocking.

  • Copilot: server_live contract change (2862206515) — The prior behavior returned server_live=None with an error string for no-endpoint services. Now it returns server_live=False with no error. If any client checks server_live is None to detect "no endpoint configured", that would break. The UI only checks truthiness so it's fine there, but worth noting for external API consumers.

🤖 Generated with Claude Code

If this code review was useful, please react with 👍. Otherwise, react with 👎.

mihow and others added 4 commits February 26, 2026 19:25
Move the async service stale-check to the top of check_processing_services_online
so it always runs, even if a slow sync service check hits the time limit.

Reduce the per-request timeout for the beat task from 90s (designed for cold-start
waits) to 8s — if a sync service is starting up it will recover on the next cycle.
Raise soft_time_limit/time_limit accordingly to give the sync loop room to complete
(worst case ~30s per service with retries).

Co-Authored-By: Claude <noreply@anthropic.com>
Async services now derive liveness from heartbeats rather than returning
an error message. Update assertions: server_live=False (not None) when
no heartbeat has been received, and remove error message checks.

Co-Authored-By: Claude <noreply@anthropic.com>
Filter async services by project when marking them as seen, preventing
cross-project contamination when a pipeline is shared across projects.
Clarify in the docstring that this still marks all async services on the
pipeline within the project, not the individual caller, until
application-token auth (PR #1117) is available.

Co-Authored-By: Claude <noreply@anthropic.com>
Add is_async to ProcessingServiceSerializer and
ProcessingServiceNestedSerializer so the frontend can distinguish
pull-mode from push-mode services. Also normalize empty endpoint_url
strings to undefined in the FE model for consistency with the backend.

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
ui/src/data-services/models/processing-service.ts (1)

86-88: Consider guarding against null last_seen_live.

The backend model defines last_seen_live = models.BooleanField(null=True), meaning it can be null. The return type boolean doesn't reflect this possibility. While the current status getter handles falsy values gracefully (treating null as OFFLINE), direct consumers of lastSeenLive may expect a definite boolean.

♻️ Optional: coerce to boolean
   get lastSeenLive(): boolean {
-    return this._processingService.last_seen_live
+    return this._processingService.last_seen_live ?? false
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui/src/data-services/models/processing-service.ts` around lines 86 - 88, The
getter lastSeenLive currently declares a boolean return but directly returns
_processingService.last_seen_live which can be null; change the getter signature
to return boolean | null and keep returning
this._processingService.last_seen_live (or, if you prefer to coerce, return
!!this._processingService.last_seen_live and keep boolean) so consumers and the
type system reflect the backend's nullable models.BooleanField; update any
callers if you choose the nullable variant.
ami/ml/tests.py (1)

192-204: Consider mocking the network call for deterministic tests.

This test makes a real network request to a nonexistent host, which depends on DNS resolution and socket timeouts. While it works, mocking requests or the underlying HTTP call would make this test faster and more reliable in CI environments with restricted network access.

Example mock approach
from unittest.mock import patch

def test_get_status_updates_last_seen_for_sync_service(self):
    """Test that get_status() updates last_seen fields for sync services."""
    service = ProcessingService.objects.create(
        name="Sync Service", 
        endpoint_url="http://nonexistent-host:9999"
    )
    service.projects.add(self.project)

    with patch('requests.Session.get') as mock_get:
        mock_get.side_effect = Exception("Connection refused")
        service.get_status(timeout=1)
    
    service.refresh_from_db()
    # assertions...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/tests.py` around lines 192 - 204, Replace the real network call in
test_get_status_updates_last_seen_for_sync_service with a deterministic mock:
patch requests.Session.get (or the exact HTTP call used by
ProcessingService.get_status) to raise an exception or return a controlled
response, then call service.get_status(timeout=1), refresh_from_db and assert
last_seen, last_seen_live and last_seen_latency; this ensures the test for
ProcessingService.get_status runs fast and reliably without performing
DNS/socket I/O.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ami/ml/tests.py`:
- Around line 192-204: Replace the real network call in
test_get_status_updates_last_seen_for_sync_service with a deterministic mock:
patch requests.Session.get (or the exact HTTP call used by
ProcessingService.get_status) to raise an exception or return a controlled
response, then call service.get_status(timeout=1), refresh_from_db and assert
last_seen, last_seen_live and last_seen_latency; this ensures the test for
ProcessingService.get_status runs fast and reliably without performing
DNS/socket I/O.

In `@ui/src/data-services/models/processing-service.ts`:
- Around line 86-88: The getter lastSeenLive currently declares a boolean return
but directly returns _processingService.last_seen_live which can be null; change
the getter signature to return boolean | null and keep returning
this._processingService.last_seen_live (or, if you prefer to coerce, return
!!this._processingService.last_seen_live and keep boolean) so consumers and the
type system reflect the backend's nullable models.BooleanField; update any
callers if you choose the nullable variant.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5309ec and cb368ac.

📒 Files selected for processing (7)
  • ami/jobs/views.py
  • ami/ml/serializers.py
  • ami/ml/tasks.py
  • ami/ml/tests.py
  • ui/src/data-services/models/pipeline.ts
  • ui/src/data-services/models/processing-service.ts
  • ui/src/pages/project/processing-services/processing-services-columns.tsx
🚧 Files skipped from review as they are similar to previous changes (3)
  • ui/src/pages/project/processing-services/processing-services-columns.tsx
  • ami/jobs/views.py
  • ui/src/data-services/models/pipeline.ts

@mihow
Copy link
Collaborator Author

mihow commented Feb 27, 2026

🧹 Nitpick comments (2)

ui/src/data-services/models/processing-service.ts (1)> 86-88: Consider guarding against null last_seen_live.

The backend model defines last_seen_live = models.BooleanField(null=True), meaning it can be null. The return type boolean doesn't reflect this possibility. While the current status getter handles falsy values gracefully (treating null as OFFLINE), direct consumers of lastSeenLive may expect a definite boolean.

♻️ Optional: coerce to boolean

   get lastSeenLive(): boolean {
-    return this._processingService.last_seen_live
+    return this._processingService.last_seen_live ?? false
   }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ui/src/data-services/models/processing-service.ts` around lines 86 - 88, The
getter lastSeenLive currently declares a boolean return but directly returns
_processingService.last_seen_live which can be null; change the getter signature
to return boolean | null and keep returning
this._processingService.last_seen_live (or, if you prefer to coerce, return
!!this._processingService.last_seen_live and keep boolean) so consumers and the
type system reflect the backend's nullable models.BooleanField; update any
callers if you choose the nullable variant.

ami/ml/tests.py (1)> 192-204: Consider mocking the network call for deterministic tests.

This test makes a real network request to a nonexistent host, which depends on DNS resolution and socket timeouts. While it works, mocking requests or the underlying HTTP call would make this test faster and more reliable in CI environments with restricted network access.

Example mock approach

from unittest.mock import patch

def test_get_status_updates_last_seen_for_sync_service(self):
    """Test that get_status() updates last_seen fields for sync services."""
    service = ProcessingService.objects.create(
        name="Sync Service", 
        endpoint_url="http://nonexistent-host:9999"
    )
    service.projects.add(self.project)

    with patch('requests.Session.get') as mock_get:
        mock_get.side_effect = Exception("Connection refused")
        service.get_status(timeout=1)
    
    service.refresh_from_db()
    # assertions...

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ami/ml/tests.py` around lines 192 - 204, Replace the real network call in
test_get_status_updates_last_seen_for_sync_service with a deterministic mock:
patch requests.Session.get (or the exact HTTP call used by
ProcessingService.get_status) to raise an exception or return a controlled
response, then call service.get_status(timeout=1), refresh_from_db and assert
last_seen, last_seen_live and last_seen_latency; this ensures the test for
ProcessingService.get_status runs fast and reliably without performing
DNS/socket I/O.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ami/ml/tests.py`:
- Around line 192-204: Replace the real network call in
test_get_status_updates_last_seen_for_sync_service with a deterministic mock:
patch requests.Session.get (or the exact HTTP call used by
ProcessingService.get_status) to raise an exception or return a controlled
response, then call service.get_status(timeout=1), refresh_from_db and assert
last_seen, last_seen_live and last_seen_latency; this ensures the test for
ProcessingService.get_status runs fast and reliably without performing
DNS/socket I/O.

In `@ui/src/data-services/models/processing-service.ts`:
- Around line 86-88: The getter lastSeenLive currently declares a boolean return
but directly returns _processingService.last_seen_live which can be null; change
the getter signature to return boolean | null and keep returning
this._processingService.last_seen_live (or, if you prefer to coerce, return
!!this._processingService.last_seen_live and keep boolean) so consumers and the
type system reflect the backend's nullable models.BooleanField; update any
callers if you choose the nullable variant.

ℹ️ Review info

I agree that a boolean with a default of Null is confusing. I think I chose that to represent "not checked" or "unknown". So the frontend could show blank instead of Offline. But I think False would also be a fine simplification.

…le copies

- Run async stale-check before the sync loop so it always executes
  regardless of how long sync checks take
- Reduce per-request timeout for the beat task from 90s (designed for
  cold-start waits) to 8s — a slow or unreachable service just waits
  for the next 5-minute cycle
- Add expires=240s so copies queued during a worker outage are discarded
  when the worker returns; only the most recent firing runs

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
ami/ml/tasks.py (1)

98-101: Minor: Comment arithmetic may be slightly off.

The comment states worst case is "8 + 2 + 8 + 4 + 8 = 30s" but with retries=3 and backoff_factor=2, there are 4 total attempts (initial + 3 retries). The actual sum shown equals 30s, but if there are 4 attempts at 8s each plus backoff delays (0s, 2s, 4s), worst case would be closer to 38s. This doesn't affect functionality—just a documentation nit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/tasks.py` around lines 98 - 101, The inline comment for
_BEAT_STATUS_TIMEOUT miscalculates the worst-case total; update the comment to
state there are 4 attempts (initial + 3 retries) so the total is 4 * 8s = 32s
plus backoff delays (0s + 2s + 4s = 6s) => ~38s; edit the comment above
_BEAT_STATUS_TIMEOUT to reflect this corrected arithmetic and briefly note the
calculation (attempts * timeout + backoff delays).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ami/ml/tasks.py`:
- Around line 98-101: The inline comment for _BEAT_STATUS_TIMEOUT miscalculates
the worst-case total; update the comment to state there are 4 attempts (initial
+ 3 retries) so the total is 4 * 8s = 32s plus backoff delays (0s + 2s + 4s =
6s) => ~38s; edit the comment above _BEAT_STATUS_TIMEOUT to reflect this
corrected arithmetic and briefly note the calculation (attempts * timeout +
backoff delays).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb368ac and a159ba3.

📒 Files selected for processing (1)
  • ami/ml/tasks.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update how the status of processing services are checked & reported

2 participants