specifically, the three methods, _heartbeat_start, _heartbeat_stop,
and _heartbeat_thread_job.
In _heartbeat_start, it creates a threading.Event (this event ends up
being the root of the problem), a threading.Thread for the heartbeat
thread, and then starts the heartbeat thread.
The heartbeat thread does work, and then at the end executes:
The _heartbeat_stop method sets the exit event when the connection is
stopped, this is where it receives that and shuts down.
With the default heartbeat configuration, _heartbeat_wait_timeout ends
up being 15.0 seconds. This sounds reasonable, but explodes under the
hood.
Event.wait() ultimately calls Condition.wait(), which runs this gem[1] of
code:
# Balancing act: We can't afford a pure busy loop, so we
# have to sleep; but if we sleep the whole timeout time,
# we'll be unresponsive. The scheme here sleeps very
# little at first, longer as time goes on, but never longer
# than 20 times per second (or the timeout time remaining).
endtime = _time() + timeout
delay = 0.0005 # 500 us -> initial delay of 1 ms
while True:
gotit = waiter.acquire(0)
if gotit:
break
remaining = endtime - _time()
if remaining <= 0:
break
delay = min(delay * 2, remaining, .05) _sleep(delay)
Now, the problem is that _sleep here is time.sleep, which has been
previously monkey-patched by eventlet to be eventlet.greenthread.sleep
instead.
When you call sleep() in eventlet, it adds a timer onto an internal
heap that it maintains[2]. When the main event loop runs, it checks
whatever the shortest timer is, and uses that delta as the argument
for waiting (this ultimately ends up to epoll_wait). Here's the
crucial part of the code[3]:
wakeup_when = self.sleep_until()
if wakeup_when is None:
sleep_time = self.default_sleep()
else:
sleep_time = wakeup_when - self.clock()
if sleep_time > 0: self.wait(sleep_time)
else: self.wait(0)
Because Condition.wait() uses such small sleep times between
0.0005..0.05, these tiny timers are almost always the next timer to
fire, and sleep_time above ends up being negative (or small enough to
practically be zero anyways). The self.wait() call propagates down to
epoll_wait() and you get the behavior we're seeing.
So, the root of the problem is that using threading.Event inside of
eventlet is bad.
Copy/paste from https:/ /bugzilla. redhat. com/show_ bug.cgi? id=1384183# c10:
In the oslo.messaging rabbit driver, there is a thread for heartbeats:
https:/ /git.openstack. org/cgit/ openstack/ oslo.messaging/ tree/oslo_ messaging/ _drivers/ impl_rabbit. py?id=293084f9b 5ef59c4b2dcd8c7 180f7053b47337a 9#n968
specifically, the three methods, _heartbeat_start, _heartbeat_stop, thread_ job.
and _heartbeat_
In _heartbeat_start, it creates a threading.Event (this event ends up
being the root of the problem), a threading.Thread for the heartbeat
thread, and then starts the heartbeat thread.
The heartbeat thread does work, and then at the end executes:
self. _heartbeat_ exit_event. wait(
timeout= self._heartbeat _wait_timeout)
The _heartbeat_stop method sets the exit event when the connection is
stopped, this is where it receives that and shuts down.
With the default heartbeat configuration, _heartbeat_ wait_timeout ends
up being 15.0 seconds. This sounds reasonable, but explodes under the
hood.
Event.wait() ultimately calls Condition.wait(), which runs this gem[1] of
code:
# Balancing act: We can't afford a pure busy loop, so we
_sleep( delay)
# have to sleep; but if we sleep the whole timeout time,
# we'll be unresponsive. The scheme here sleeps very
# little at first, longer as time goes on, but never longer
# than 20 times per second (or the timeout time remaining).
endtime = _time() + timeout
delay = 0.0005 # 500 us -> initial delay of 1 ms
while True:
gotit = waiter.acquire(0)
if gotit:
break
remaining = endtime - _time()
if remaining <= 0:
break
delay = min(delay * 2, remaining, .05)
Now, the problem is that _sleep here is time.sleep, which has been greenthread. sleep
previously monkey-patched by eventlet to be eventlet.
instead.
When you call sleep() in eventlet, it adds a timer onto an internal
heap that it maintains[2]. When the main event loop runs, it checks
whatever the shortest timer is, and uses that delta as the argument
for waiting (this ultimately ends up to epoll_wait). Here's the
crucial part of the code[3]:
wakeup_when = self.sleep_until() sleep()
self.wait( sleep_time)
self.wait( 0)
if wakeup_when is None:
sleep_time = self.default_
else:
sleep_time = wakeup_when - self.clock()
if sleep_time > 0:
else:
Because Condition.wait() uses such small sleep times between
0.0005..0.05, these tiny timers are almost always the next timer to
fire, and sleep_time above ends up being negative (or small enough to
practically be zero anyways). The self.wait() call propagates down to
epoll_wait() and you get the behavior we're seeing.
So, the root of the problem is that using threading.Event inside of
eventlet is bad.
[1] https:/ /hg.python. org/cpython/ file/v2. 7.12/Lib/ threading. py#l344 /github. com/eventlet/ eventlet/ blob/7343f2e817 fbc09d07c36727d fb2e4655a37901f /eventlet/ hubs/hub. py#L389- L392 /github. com/eventlet/ eventlet/ blob/7343f2e817 fbc09d07c36727d fb2e4655a37901f /eventlet/ hubs/hub. py#L340- L348
[2] https:/
[3] https:/