Skip to content

feat: add monitoring to cronjobs#1835

Open
MarceloRobert wants to merge 1 commit intokernelci:mainfrom
MarceloRobert:feat/email-monitoring
Open

feat: add monitoring to cronjobs#1835
MarceloRobert wants to merge 1 commit intokernelci:mainfrom
MarceloRobert:feat/email-monitoring

Conversation

@MarceloRobert
Copy link
Copy Markdown
Collaborator

@MarceloRobert MarceloRobert commented Apr 1, 2026

Changes

  • Added healthcheck_id parameter to relevant commands used in the cronjobs
  • Added map from healthcheck ids to healthcheck.io uuid, the ids reflect the defined cronjobs and are set via environment_variable
  • Added wrapper to run functions with healthcheck.io pings for start, success and failure
  • Applied healthcheck changes to main cronjob execution

How to test

For this PR, the ideal way to test would be with a healthcheck account and uuid so you can monitor the pings. Follow the instructions there to create an account and a check.

  • In the code, set one of the cronjobs to notifications --action=fake-report and set a small interval, such as */2 * * * * (run every 2 minutes). That is:
(
    "*/2 * * * *",
    "django.core.management.call_command",
    [
        "notifications",
        "--monitoring-id=notifications_new_issues",
        "--action=fake_report",
    ],
),
  • You won't need any other cronjob for testing, but if you want to make more than one check you're welcome
  • Check that the cronjobs are not going to send email reports accidentally
  • Get your healthcheck id and use it in the cronjobs
  • Start the cronjobs either via docker compose up backend --build or locally with poetry run ./manage.py crontab add (docker is the recommended way)
  • In the healthcheck.io dashboard, see the details of your healthcheck and monitor the received pings

The cronjobs should even if the ping fails or the monitoring_id is empty

Required production changes

  • Create 6 different checks for the 6 monitored cronjobs
  • Set up on Discord a desired webhook (could reuse the already-present ones)
  • Set up in the checks the integration to webhook, POSTing to the discord webhook when a check goes down. Example of body:
{
  "avatar_url": "https://avatars.githubusercontent.com/u/11725450?s=200&v=4",
  "username": "KernelCI Healthcheck Notifications",
  "content": "Healthcheck.io Alert: $NAME service went down!"
}

Also set the request headers to Content-Type: application/json

  • Disable the default email notification integration in the checks

Closes #1792

@MarceloRobert MarceloRobert self-assigned this Apr 1, 2026
@MarceloRobert MarceloRobert added enhancement New feature or request notifications Backend Most or all of the changes for this issue will be in the backend code. labels Apr 1, 2026
@bhcopeland
Copy link
Copy Markdown
Member

bhcopeland commented Apr 1, 2026

Awesome to see healthchecks.io being used. Have you setup an account? We need to make sure one global KCI account is used with me/denys (@nuclearcat) (and you guys on it). Lets talk about this in tomorrows meeting

@MarceloRobert
Copy link
Copy Markdown
Collaborator Author

Awesome to see healthchecks.io being used. Have you setup an account? We need to make sure one global KCI account is used with me/denys (@ nuclearcat) (and you guys on it). Lets talk about this in tomorrows meeting

Yeah, I have setup a personal account so I can test this later. Since this change won't stop the cronjobs that don't have monitoring we can go checking the accounts without blocking the PR.

if monitoring_path.startswith("http://") or monitoring_path.startswith("https://"):
return monitoring_path

return f"{HEALTHCHECK_BASE_URL.rstrip('/')}/{monitoring_path.lstrip('/')}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can use urllib.parse.urljoin

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urljoin actually was a bit worse because if you join url/middle with end, it will result in url/end instead of url/middle/end. Formatting the strings seems to be safe for this operation

if monitoring_path.startswith("http://") or monitoring_path.startswith("https://"):
return monitoring_path

return f"{HEALTHCHECK_BASE_URL.rstrip('/')}/{monitoring_path.lstrip('/')}"

This comment was marked as duplicate.

@MarceloRobert MarceloRobert force-pushed the feat/email-monitoring branch 2 times, most recently from ed28904 to d912983 Compare April 7, 2026 18:01
@MarceloRobert MarceloRobert marked this pull request as ready for review April 7, 2026 18:56
@MarceloRobert MarceloRobert force-pushed the feat/email-monitoring branch from d912983 to dcbc86a Compare April 7, 2026 19:00
@MarceloRobert
Copy link
Copy Markdown
Collaborator Author

@bhcopeland all set up and tested, this is now ready for review. If you want to move forward with the healthcheck.io integration, I left some instructions on what to do.

cc @nuclearcat

@nuclearcat
Copy link
Copy Markdown
Member

I will proceed tomorrow with that, thank you very much!

Copy link
Copy Markdown
Contributor

@gustavobtflores gustavobtflores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just update the docs to reflect the variables set on .env.example

Comment on lines +110 to +115
- `HEALTHCHECK_MONITORING_NOTIFICATIONS_NEW_ISSUES_HOURLY`
- `HEALTHCHECK_MONITORING_NOTIFICATIONS_SUMMARY_MICROSOFT_DAILY`
- `HEALTHCHECK_MONITORING_NOTIFICATIONS_SUMMARY_MAESTRO_DAILY`
- `HEALTHCHECK_MONITORING_NOTIFICATIONS_HARDWARE_SUMMARY_WEEKLY`
- `HEALTHCHECK_MONITORING_DELETE_UNUSED_HARDWARE_STATUS_WEEKLY`
- `HEALTHCHECK_MONITORING_NOTIFICATIONS_METRICS_SUMMARY_WEEKLY`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these seem different from the ones in .env.example

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I shortened the names in settings but forgot about them here, fixed.

@MarceloRobert MarceloRobert force-pushed the feat/email-monitoring branch from dcbc86a to 1007550 Compare April 9, 2026 17:13
return

# Success just needs to ping base healthcheck.io url + uuid, no subpath
monitoring_status_url = (
Copy link
Copy Markdown
Contributor

@dede999 dede999 Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The success path special-cases status != "success" to avoid appending a
subpath, making the URL-building logic fragile. healthcheck.io also accepts
/{uuid}/0 for success, but more importantly, this branching makes the code
harder to follow and test.

Consider extracting a helper:

    def _build_status_url(base: str, status: str) -> str:
        if status == "success":
            return base
        return f"{base.rstrip('/')}/{status}"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included the status into the _resolve_monitoring_url function


import requests

from kernelCI_app.helpers.logger import out
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: out() is used for both info and error messages. The rest of the
backend uses logging.getLogger(__name__) with appropriate levels. Consider:

logger = logging.getLogger(__name__)
# then: logger.info(...), logger.warning(...), logger.error(...)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was having problems with a simple logger before, but I replaced out for log_message and it seems to be ok now

"Monitoring ID configured in settings for healthcheck.io pings "
"(optional, used only for monitoring the command execution over time)"
)
HEALTHCHECK_BASE_URL = "https://hc-ping.com"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: HEALTHCHECK_BASE_URL is a module-level constant but is patched directly
in tests via @patch(... "HEALTHCHECK_BASE_URL" ...). This couples tests to
internals and makes it harder to change the constant's location later.

Consider moving it to settings.py alongside HEALTHCHECK_MONITORING_PATH_MAP,
which would make _resolve_monitoring_url purely functional and easier to test

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@MarceloRobert MarceloRobert force-pushed the feat/email-monitoring branch from 1007550 to 0a63c20 Compare April 10, 2026 17:35
@MarceloRobert MarceloRobert force-pushed the feat/email-monitoring branch from 0a63c20 to 574d87b Compare April 10, 2026 17:39
Copy link
Copy Markdown
Contributor

@alanpeixinho alanpeixinho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got the pings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend Most or all of the changes for this issue will be in the backend code. enhancement New feature or request notifications

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add healthcheck to cronjobs

6 participants