LoopingCall sleep causes graceful process shutdown delay
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
oslo.service |
Fix Released
|
Undecided
|
Allain Legacy |
Bug Description
When trying do a SIGTERM graceful shutdown of various OpenStack services (Cinder, Neutron, etc), it was noticed that the processes can take up to a minute or more to shutdown.
When looking into this, the delay seems to be caused by the fact that the loopingcall.py _run_loop method will do a sleep for the entire time it wants to wait (both on the initial delay and the time until the next interval). By sleeping for this entire time (which could be 60 seconds or even 5 minutes in some cases), this causes the process to take this amount of time to die (unless killed prior to that), even thought it really isn't doing any processing or cleanup during that time.
There might be a better way than this to do it, but seems like it might be better to have the sleep/wait dependent on being notified that it is stopping (like using a threading.Condition wait/notify). There is some overhead involved in waking up more often, but since it isn't processing anything isn't using a lot of cpu. So I was thinking something like this in loopingcall.py could eliminate most of that delay:
def stop(self):
def _run_loop(self, idle_for_func,
.....
if initial_delay:
if self._running:
.....
if self._running:
except LoopingCallDone as e:
Changed in oslo.service: | |
status: | New → Confirmed |
Changed in oslo.service: | |
assignee: | nobody → Allain Legacy (alegacy) |
status: | Confirmed → In Progress |
I have proposed the following change as a possible fix.
https:/ /review. openstack. org/#/c/ 469859
Not sure why a comment wasn't added automatically to point to it so figured I would add it here manually.