Eventlet 0.19.0 with python2.7 blocking main-thread greenthread
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
oslo.service |
New
|
Undecided
|
Unassigned |
Bug Description
I don't know much about oslo.service, I hope to get help from the community, thank you
OS: Centos8
package version:
eventlet (0.19.0)
oslo.log (3.16.1)
oslo.service (1.16.1)
Start my process with oslo.service, and use eventlet.
For the convenience of description, I paste the main code logic
from oslo_log import log as logging
from oslo_service import periodic_task
CONF = config.CONF
LOG = logging.
class NodeManager(
target = messaging.
def init(self, host=None, service_
...
def add_tasks(self, tg):
def do_update_
while True:
try:
except Exception as ex:
def do_heartbeat(self):
while True:
try:
except Exception as ex:
eventlet.spawn setup do_set_
Sometimes this do_update_host_info coroutine will block and there will be no log output,only do_heartbeat native thread run fine.
I can't reproduce this problem stably, it usually takes days or even months to happen
I find, when greenthread hang, self.cluster_
class ClusterInfo(
def init(self, hostha_
...
def update_
for network, client in self.heartbeat_
if network == constant.MGMT_NET and mgmt_remotes:
def get_heartbeat_
for controller in self.controller
mgmt_ip = controller.
if mgmt_ip:
return mgmt_remotes
I suspect that when greenthread is mixed with native thread, there is a deadlock in logging, but I don't know how to analyze it.
I dumped thread stack for a few minutes and strace process:
Stack for thread 140586048190208
File "/usr/lib64/
self.__
File "/usr/lib64/
self.run()
File "", line 167, in run
File "/usr/lib64/
more = self.push(line)
File "/usr/lib64/
more = self.runsource(
File "/usr/lib64/
self.runcode(code)
File "/usr/lib64/
exec code in self.locals
File "", line 3, in
Stack for thread 140586056582912
File "/usr/lib/
self.wait(
File "/usr/lib/
presult = self.do_
File "/usr/lib/
return self.poll.
Stack for thread 140586521663296
File "/usr/lib/
self.wait(
File "/usr/lib/
presult = self.do_
File "/usr/lib/
return self.poll.
)[root@node-1 /]# strace -t -p 38688
strace: Process 38688 attached
20:34:13 epoll_pwait(3, [], 1023, 22085, NULL, 8) = 0
20:34:35 epoll_pwait(3, [], 1023, 59999, NULL, 8) = 0
20:35:35 epoll_pwait(3, [], 1023, 59999, NULL, 8) = 0
20:36:35 epoll_pwait(3, ^Cstrace: Process 38688 detached
<detached ...>