Skip to content

Worker timeout needs to be set too long to account for worker startup #3436

@toddpalino

Description

@toddpalino

(note - I'm not asking for someone to create a solution on this. I want to gauge if a proposed solution would be acceptable as a PR that I would create)

I've run into a situation where my worker startup takes a fairly long time, and I need to increase the worker timeout to cover that. But once the worker is up, that timeout is far too long to catch a problem, especially with gevent workers (where the timeout is not tied to request times). For example, my workers take 2-3 minutes to start up normally (with some outliers), but I really only want to have the timeout at 3-5 seconds so a worker failure is detected and handled quickly enough to not cause serious impact.

For a variety of reasons (complex code base, preload is not a working solution for us, not being able to get down to sub 10 seconds even at that) shortening worker startup isn't an option. What I would like to have is a configurable grace period for the timeout. So if I set the grace period to 5 minutes, this would be what happens:

  1. arbiter starts a new worker. The time of the start is tracked
  2. worker starts up - this will take 2 minutes to complete before the timeout loop is active
  3. arbiter checks for the worker checking in, which it is not. Since the time is currently earlier than start time + grace period, it ignores it
  4. Repeat step 3 for 2 minutes
  5. worker completes startup and starts the timeout checkin loop
  6. arbiter checks for the worker checking in, which it sees at start time + 2 minutes. It clears the grace period
  7. arbiter now expects to see the worker check in at least every timeout seconds, or it will reap the worker.

If for some reason the worker did not start properly and did not get to the timeout checkin loop, the arbiter would see this at "start time + grace period" and follow the normal process for reaping the worker and restarting it. If the worker.

If I've missed a way to do this already, I'd appreciate a pointer. Otherwise, if this sounds like a reasonable approach I am happy to create a PR for it for further review and refining.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions