Background Context
I'm working in a hosting environment that at a high level works like this:
- Load balancer monitors
/health endpoint, exposed via admin port, to identify if hosts are healthy and routable.
- If the health check fails for an instance some number of times in a row, the service is marked as
unlealthy and traffic will not be routed to that instance.
- During deployments or instance replacements, a
SIGTERM is sent to the service to trigger a shutdown.
Observed behavior
- When the service receives a
SIGTERM, it appears that a few things happen:
Pain points
- This is problematic because there is a period of time where:
- The admin server has shut down and won't report
healthy
- The application server has shut down and won't accept any requests
- The load balancer is still routing traffic to this instance because there needs to be > 1 occurrence of a failed health check before the instance is taken out of rotation. These requests fail. It is not always safe to retry these requests because they may not be idempotent.
Desired behavior
- Is there a way to have fine control over the shutdown sequence? Ideally, I would:
- Shut down admin server (or tell admin server to report a non 200 status code) and keep the application server up for a configurable amount of time, say
15 seconds)
- After some period of time, shutdown the application server and then gracefully terminate the program.
Background Context
I'm working in a hosting environment that at a high level works like this:
/healthendpoint, exposed via admin port, to identify if hosts are healthy and routable.unlealthyand traffic will not be routed to that instance.SIGTERMis sent to the service to trigger a shutdown.Observed behavior
SIGTERM, it appears that a few things happen:Pain points
healthyDesired behavior
15 seconds)