Terminating neutron-server leads to waitall() inside a GreenPool's greenthread

Bug #1423250 reported by Elena Ezhova on 2015-02-18
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack DBaaS (Trove)
Medium
Unassigned
OpenStack Identity (keystone)
High
Elena Ezhova
neutron
Undecided
Elena Ezhova

Bug Description

Sending a termination signal (e.g. SIGHUP or SIGTERM) to neutron server with more that one api/rpc worker leads to the following traceback:

2015-02-18 16:05:18.411 CRITICAL neutron [-] AssertionError: Calling waitall() from within one of the GreenPool's greenthreads will never terminate.

2015-02-18 16:05:18.411 TRACE neutron Traceback (most recent call last):
2015-02-18 16:05:18.411 TRACE neutron File "/usr/local/bin/neutron-server", line 9, in <module>
2015-02-18 16:05:18.411 TRACE neutron load_entry_point('neutron==2015.1.dev117', 'console_scripts', 'neutron-server')()
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/server/__init__.py", line 46, in main
2015-02-18 16:05:18.411 TRACE neutron neutron_api = service.serve_wsgi(service.NeutronApiService)
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/service.py", line 106, in serve_wsgi
2015-02-18 16:05:18.411 TRACE neutron LOG.exception(_LE('Unrecoverable error: please check log '
2015-02-18 16:05:18.411 TRACE neutron File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 82, in __exit__
2015-02-18 16:05:18.411 TRACE neutron six.reraise(self.type_, self.value, self.tb)
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/service.py", line 103, in serve_wsgi
2015-02-18 16:05:18.411 TRACE neutron service.start()
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/service.py", line 74, in start
2015-02-18 16:05:18.411 TRACE neutron self.wsgi_app = _run_wsgi(self.app_name)
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/service.py", line 175, in _run_wsgi
2015-02-18 16:05:18.411 TRACE neutron workers=cfg.CONF.api_workers)
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/wsgi.py", line 224, in start
2015-02-18 16:05:18.411 TRACE neutron self._launch(application, workers)
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/wsgi.py", line 238, in _launch
2015-02-18 16:05:18.411 TRACE neutron self._server.launch_service(service, workers=workers)
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/openstack/common/service.py", line 333, in launch_service
2015-02-18 16:05:18.411 TRACE neutron self._start_child(wrap)
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/openstack/common/service.py", line 314, in _start_child
2015-02-18 16:05:18.411 TRACE neutron status, signo = self._child_wait_for_exit_or_signal(launcher)
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/openstack/common/service.py", line 272, in _child_wait_for_exit_or_signal
2015-02-18 16:05:18.411 TRACE neutron launcher.stop()
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/openstack/common/service.py", line 126, in stop
2015-02-18 16:05:18.411 TRACE neutron self.services.stop()
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/openstack/common/service.py", line 462, in stop
2015-02-18 16:05:18.411 TRACE neutron service.stop()
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/wsgi.py", line 117, in stop
2015-02-18 16:05:18.411 TRACE neutron self._server.kill()
2015-02-18 16:05:18.411 TRACE neutron File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 240, in kill
2015-02-18 16:05:18.411 TRACE neutron return kill(self, *throw_args)
2015-02-18 16:05:18.411 TRACE neutron File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 294, in kill
2015-02-18 16:05:18.411 TRACE neutron g.throw(*throw_args)
2015-02-18 16:05:18.411 TRACE neutron File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
2015-02-18 16:05:18.411 TRACE neutron result = function(*args, **kwargs)
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/wsgi.py", line 263, in _run
2015-02-18 16:05:18.411 TRACE neutron socket_timeout=self.client_socket_timeout)
2015-02-18 16:05:18.411 TRACE neutron File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 828, in server
2015-02-18 16:05:18.411 TRACE neutron pool.waitall()
2015-02-18 16:05:18.411 TRACE neutron File "/usr/local/lib/python2.7/dist-packages/eventlet/greenpool.py", line 117, in waitall
2015-02-18 16:05:18.411 TRACE neutron "Calling waitall() from within one of the " \
2015-02-18 16:05:18.411 TRACE neutron AssertionError: Calling waitall() from within one of the GreenPool's greenthreads will never terminate.
2015-02-18 16:05:18.411 TRACE neutron

Elena Ezhova (eezhova) on 2015-02-18
Changed in neutron:
assignee: nobody → Elena Ezhova (eezhova)
Elena Ezhova (eezhova) on 2015-02-18
Changed in neutron:
status: New → In Progress
Matt Riedemann (mriedem) wrote :

I'm also seeing this in keystone when running with wsgi/eventlet and 16 workers (32 CPUs on the host).

Matt Riedemann (mriedem) wrote :

I'm on eventlet 0.16.1 and greenlet 0.4.1 if that helps, running basically trunk kilo code on RHEL 7 ppc64 BE.

Brant Knudson (blk-u) wrote :

Here's the commit in eventlet that added the pool.waitall(): http://mylokin.com/rebelmouse/eventlet/commit/1c30e9b39f4330babcb3665df4c1a3ba694bc6c1

Brant Knudson (blk-u) wrote :

It looks like the "bug" is down in eventlet.

1) The server's accepting connections here: https://github.com/eventlet/eventlet/blob/master/eventlet/wsgi.py#L796

2) Keystone wants to break out of the accept loop gracefully, so the wsgi thread drops into the finally: block where it calls pool.waitall(), which then fails an assertion because "Calling waitall() from within one of the GreenPool's greenthreads will never terminate.": https://github.com/eventlet/eventlet/blob/master/eventlet/greenpool.py#L116

Also, here's a better link to the commit: https://github.com/eventlet/eventlet/commit/1c30e9b39f4330babcb3665df4c1a3ba694bc6c1

Brant Knudson (blk-u) wrote :

I opened a bug against eventlet: https://github.com/eventlet/eventlet/issues/207

Elena Ezhova (eezhova) wrote :

It seems that the problem in keystone has the same nature as the one in neutron. A wsgi server is started in a green thread from the same green pool that as passed to the server as a custom pool which is used to handle client requests [1],[2]. So when we try to stop a server, waitall() is eventually called for all green threads in that pool, including the thread which issued that call. And that results in an exception that is observed in logs.

There is a working fix for neutron [3], which is currently under review, that simply uses different pools for spawning
a wsgi server and for its internal usage. Comments are highly appreciated.

[1] https://github.com/openstack/keystone/blob/b72fb2feeb066ec09ee4b6c6b6cc680e8a8f37b7/keystone/common/environment/eventlet_server.py#L138
[2] https://github.com/openstack/keystone/blob/b72fb2feeb066ec09ee4b6c6b6cc680e8a8f37b7/keystone/common/environment/eventlet_server.py#L178
[3] https://review.openstack.org/#/c/157320

Fix proposed to branch: master
Review: https://review.openstack.org/160720

Changed in keystone:
assignee: nobody → Elena Ezhova (eezhova)
status: Confirmed → In Progress
Elena Ezhova (eezhova) wrote :

I have verified the same approach for keystone running in standalone mode and it also works.

Brant Knudson (blk-u) on 2015-03-08
Changed in keystone:
importance: Undecided → High
milestone: none → kilo-3

Reviewed: https://review.openstack.org/160720
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=71310ba309b4c79908ea209ffe8e19d27bf5116e
Submitter: Jenkins
Branch: master

commit 71310ba309b4c79908ea209ffe8e19d27bf5116e
Author: Elena Ezhova <email address hidden>
Date: Tue Mar 3 12:58:25 2015 +0300

    Prevent calling waitall() inside a GreenPool's greenthread

    Keystone eventlet wsgi server is started in a green thread from
    the same green pool that is passed to the server itself
    to be used for spawning client green threads.

    To avoid it, it is reasonable to use different pools for spawning
    a wsgi server and for its internal usage.

    Change-Id: I1246e2b730a522480ed495fbb53ae376ee7593d5
    Closes-Bug: #1423250

Changed in keystone:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/157320
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f9b2791735863b58171ee8be46c47ef3ed2c5edd
Submitter: Jenkins
Branch: master

commit f9b2791735863b58171ee8be46c47ef3ed2c5edd
Author: Elena Ezhova <email address hidden>
Date: Thu Feb 19 13:34:17 2015 +0300

    Prevent calling waitall() inside a GreenPool's greenthread

    When neutron-server is running with several api workers
    sending it a termination signal (SIGTERM, SIGHUP or SIGINT)
    leads to waitall() being called inside a GreenPool's greenthread.

    The reason is that a wsgi server is started in a green thread
    from the same green pool that is passed to the server itself
    to be used for spawning client green threads.

    To avoid it, it is reasonable to use different pools for spawning
    a wsgi server and for its internal usage.

    This is also the case for metadata agent running with several
    metadata workers.

    Change-Id: I38174396f06fcb29ac0776534ac6494dabb00df6
    Closes-Bug: #1423250

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2015-03-19
Changed in keystone:
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2015-03-19
Changed in neutron:
milestone: none → kilo-3
status: Fix Committed → Fix Released
Priti Desai (priti-desai) wrote :

I am seeing the same issue in Kilo Keystone, can we backport these changes to Kilo?

File "/tmp/jenkins-continuous_review-59/openstack/keystone/.venv/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 115, in wait
    listener.cb(fileno)
  File "/tmp/jenkins-continuous_review-59/openstack/keystone/.venv/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "keystone/common/environment/eventlet_server.py", line 182, in _run
    debug=False)
  File "/tmp/jenkins-continuous_review-59/openstack/keystone/.venv/lib/python2.7/site-packages/eventlet/wsgi.py", line 842, in server
    pool.waitall()
  File "/tmp/jenkins-continuous_review-59/openstack/keystone/.venv/lib/python2.7/site-packages/eventlet/greenpool.py", line 117, in waitall
    "Calling waitall() from within one of the " \
AssertionError: Calling waitall() from within one of the GreenPool's greenthreads will never terminate.

Thierry Carrez (ttx) on 2015-04-30
Changed in keystone:
milestone: kilo-3 → 2015.1.0
Thierry Carrez (ttx) on 2015-04-30
Changed in neutron:
milestone: kilo-3 → 2015.1.0
Tom Verdaat (tom-verdaat) wrote :

Seeing the same thing in Trove:

ERROR trove.openstack.common.threadgroup [-] Calling waitall() from within one of the GreenPool's greenthreads will never terminate.
TRACE trove.openstack.common.threadgroup Traceback (most recent call last):
TRACE trove.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/trove/openstack/common/threadgroup.py", line 98, in _stop_threads
TRACE trove.openstack.common.threadgroup x.stop()
TRACE trove.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/trove/openstack/common/threadgroup.py", line 44, in stop
TRACE trove.openstack.common.threadgroup self.thread.kill()
TRACE trove.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 240, in kill
TRACE trove.openstack.common.threadgroup return kill(self, *throw_args)
TRACE trove.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 294, in kill
TRACE trove.openstack.common.threadgroup g.throw(*throw_args)
TRACE trove.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
TRACE trove.openstack.common.threadgroup result = function(*args, **kwargs)
TRACE trove.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/trove/common/base_wsgi.py", line 162, in _run
TRACE trove.openstack.common.threadgroup log=logging.WritableLogger(logger))
TRACE trove.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 828, in server
TRACE trove.openstack.common.threadgroup pool.waitall()
TRACE trove.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/greenpool.py", line 117, in waitall
TRACE trove.openstack.common.threadgroup "Calling waitall() from within one of the " \
TRACE trove.openstack.common.threadgroup AssertionError: Calling waitall() from within one of the GreenPool's greenthreads will never terminate.
TRACE trove.openstack.common.threadgroup

Amrith Kumar (amrith) on 2016-04-05
Changed in trove:
importance: Undecided → Medium
status: New → Confirmed
milestone: none → newton-1
Amrith Kumar (amrith) on 2016-09-18
Changed in trove:
milestone: newton-1 → ongoing
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers