feat: add monitoring to cronjobs#1835
Conversation
|
Awesome to see healthchecks.io being used. Have you setup an account? We need to make sure one global KCI account is used with me/denys (@nuclearcat) (and you guys on it). Lets talk about this in tomorrows meeting |
Yeah, I have setup a personal account so I can test this later. Since this change won't stop the cronjobs that don't have monitoring we can go checking the accounts without blocking the PR. |
backend/kernelCI_app/management/commands/helpers/healthcheck.py
Outdated
Show resolved
Hide resolved
backend/kernelCI_app/management/commands/helpers/healthcheck.py
Outdated
Show resolved
Hide resolved
| if monitoring_path.startswith("http://") or monitoring_path.startswith("https://"): | ||
| return monitoring_path | ||
|
|
||
| return f"{HEALTHCHECK_BASE_URL.rstrip('/')}/{monitoring_path.lstrip('/')}" |
There was a problem hiding this comment.
nit: we can use urllib.parse.urljoin
There was a problem hiding this comment.
urljoin actually was a bit worse because if you join url/middle with end, it will result in url/end instead of url/middle/end. Formatting the strings seems to be safe for this operation
backend/kernelCI_app/management/commands/helpers/healthcheck.py
Outdated
Show resolved
Hide resolved
backend/kernelCI_app/management/commands/helpers/healthcheck.py
Outdated
Show resolved
Hide resolved
| if monitoring_path.startswith("http://") or monitoring_path.startswith("https://"): | ||
| return monitoring_path | ||
|
|
||
| return f"{HEALTHCHECK_BASE_URL.rstrip('/')}/{monitoring_path.lstrip('/')}" |
This comment was marked as duplicate.
This comment was marked as duplicate.
Sorry, something went wrong.
ed28904 to
d912983
Compare
d912983 to
dcbc86a
Compare
|
@bhcopeland all set up and tested, this is now ready for review. If you want to move forward with the healthcheck.io integration, I left some instructions on what to do. cc @nuclearcat |
|
I will proceed tomorrow with that, thank you very much! |
gustavobtflores
left a comment
There was a problem hiding this comment.
LGTM, just update the docs to reflect the variables set on .env.example
docs/monitoring.md
Outdated
| - `HEALTHCHECK_MONITORING_NOTIFICATIONS_NEW_ISSUES_HOURLY` | ||
| - `HEALTHCHECK_MONITORING_NOTIFICATIONS_SUMMARY_MICROSOFT_DAILY` | ||
| - `HEALTHCHECK_MONITORING_NOTIFICATIONS_SUMMARY_MAESTRO_DAILY` | ||
| - `HEALTHCHECK_MONITORING_NOTIFICATIONS_HARDWARE_SUMMARY_WEEKLY` | ||
| - `HEALTHCHECK_MONITORING_DELETE_UNUSED_HARDWARE_STATUS_WEEKLY` | ||
| - `HEALTHCHECK_MONITORING_NOTIFICATIONS_METRICS_SUMMARY_WEEKLY` |
There was a problem hiding this comment.
these seem different from the ones in .env.example
There was a problem hiding this comment.
I shortened the names in settings but forgot about them here, fixed.
dcbc86a to
1007550
Compare
| return | ||
|
|
||
| # Success just needs to ping base healthcheck.io url + uuid, no subpath | ||
| monitoring_status_url = ( |
There was a problem hiding this comment.
The success path special-cases status != "success" to avoid appending a
subpath, making the URL-building logic fragile. healthcheck.io also accepts
/{uuid}/0 for success, but more importantly, this branching makes the code
harder to follow and test.
Consider extracting a helper:
def _build_status_url(base: str, status: str) -> str:
if status == "success":
return base
return f"{base.rstrip('/')}/{status}"There was a problem hiding this comment.
I included the status into the _resolve_monitoring_url function
|
|
||
| import requests | ||
|
|
||
| from kernelCI_app.helpers.logger import out |
There was a problem hiding this comment.
nit: out() is used for both info and error messages. The rest of the
backend uses logging.getLogger(__name__) with appropriate levels. Consider:
logger = logging.getLogger(__name__)
# then: logger.info(...), logger.warning(...), logger.error(...)
There was a problem hiding this comment.
I think I was having problems with a simple logger before, but I replaced out for log_message and it seems to be ok now
| "Monitoring ID configured in settings for healthcheck.io pings " | ||
| "(optional, used only for monitoring the command execution over time)" | ||
| ) | ||
| HEALTHCHECK_BASE_URL = "https://hc-ping.com" |
There was a problem hiding this comment.
nit: HEALTHCHECK_BASE_URL is a module-level constant but is patched directly
in tests via @patch(... "HEALTHCHECK_BASE_URL" ...). This couples tests to
internals and makes it harder to change the constant's location later.
Consider moving it to settings.py alongside HEALTHCHECK_MONITORING_PATH_MAP,
which would make _resolve_monitoring_url purely functional and easier to test
1007550 to
0a63c20
Compare
0a63c20 to
574d87b
Compare
Changes
How to test
For this PR, the ideal way to test would be with a healthcheck account and uuid so you can monitor the pings. Follow the instructions there to create an account and a check.
notifications --action=fake-reportand set a small interval, such as*/2 * * * *(run every 2 minutes). That is:docker compose up backend --buildor locally withpoetry run ./manage.py crontab add(docker is the recommended way)The cronjobs should even if the ping fails or the monitoring_id is empty
Required production changes
Also set the request headers to
Content-Type: application/jsonCloses #1792