OpenStack DBaaS (Trove)

Terminating neutron-server leads to waitall() inside a GreenPool's greenthread

Bug #1423250 reported by Elena Ezhova on 2015-02-18

This bug affects 2 people

	Status	Importance	Assigned to	Milestone
OpenStack DBaaS (Trove)	Confirmed	Medium	Unassigned	OpenStack DBaaS (Trove) ongoing
OpenStack Identity (keystone)	Fix Released	High	Elena Ezhova	OpenStack Identity (keystone) 2015.1.0 "kilo"
neutron	Fix Released	Undecided	Elena Ezhova	neutron 2015.1.0 "kilo"

Bug Description

Sending a termination signal (e.g. SIGHUP or SIGTERM) to neutron server with more that one api/rpc worker leads to the following traceback:

2015-02-18 16:05:18.411 CRITICAL neutron [-] AssertionError: Calling waitall() from within one of the GreenPool's greenthreads will never terminate.

2015-02-18 16:05:18.411 TRACE neutron Traceback (most recent call last):
2015-02-18 16:05:18.411 TRACE neutron File "/usr/local/bin/neutron-server", line 9, in <module>
2015-02-18 16:05:18.411 TRACE neutron load_entry_point('neutron==2015.1.dev117', 'console_scripts', 'neutron-server')()
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/server/__init__.py", line 46, in main
2015-02-18 16:05:18.411 TRACE neutron neutron_api = service.serve_wsgi(service.NeutronApiService)
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/service.py", line 106, in serve_wsgi
2015-02-18 16:05:18.411 TRACE neutron LOG.exception(_LE('Unrecoverable error: please check log '
2015-02-18 16:05:18.411 TRACE neutron File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 82, in __exit__
2015-02-18 16:05:18.411 TRACE neutron six.reraise(self.type_, self.value, self.tb)
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/service.py", line 103, in serve_wsgi
2015-02-18 16:05:18.411 TRACE neutron service.start()
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/service.py", line 74, in start
2015-02-18 16:05:18.411 TRACE neutron self.wsgi_app = _run_wsgi(self.app_name)
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/service.py", line 175, in _run_wsgi
2015-02-18 16:05:18.411 TRACE neutron workers=cfg.CONF.api_workers)
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/wsgi.py", line 224, in start
2015-02-18 16:05:18.411 TRACE neutron self._launch(application, workers)
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/wsgi.py", line 238, in _launch
2015-02-18 16:05:18.411 TRACE neutron self._server.launch_service(service, workers=workers)
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/openstack/common/service.py", line 333, in launch_service
2015-02-18 16:05:18.411 TRACE neutron self._start_child(wrap)
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/openstack/common/service.py", line 314, in _start_child
2015-02-18 16:05:18.411 TRACE neutron status, signo = self._child_wait_for_exit_or_signal(launcher)
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/openstack/common/service.py", line 272, in _child_wait_for_exit_or_signal
2015-02-18 16:05:18.411 TRACE neutron launcher.stop()
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/openstack/common/service.py", line 126, in stop
2015-02-18 16:05:18.411 TRACE neutron self.services.stop()
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/openstack/common/service.py", line 462, in stop
2015-02-18 16:05:18.411 TRACE neutron service.stop()
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/wsgi.py", line 117, in stop
2015-02-18 16:05:18.411 TRACE neutron self._server.kill()
2015-02-18 16:05:18.411 TRACE neutron File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 240, in kill
2015-02-18 16:05:18.411 TRACE neutron return kill(self, *throw_args)
2015-02-18 16:05:18.411 TRACE neutron File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 294, in kill
2015-02-18 16:05:18.411 TRACE neutron g.throw(*throw_args)
2015-02-18 16:05:18.411 TRACE neutron File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
2015-02-18 16:05:18.411 TRACE neutron result = function(*args, **kwargs)
2015-02-18 16:05:18.411 TRACE neutron File "/opt/stack/neutron/neutron/wsgi.py", line 263, in _run
2015-02-18 16:05:18.411 TRACE neutron socket_timeout=self.client_socket_timeout)
2015-02-18 16:05:18.411 TRACE neutron File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 828, in server
2015-02-18 16:05:18.411 TRACE neutron pool.waitall()
2015-02-18 16:05:18.411 TRACE neutron File "/usr/local/lib/python2.7/dist-packages/eventlet/greenpool.py", line 117, in waitall
2015-02-18 16:05:18.411 TRACE neutron "Calling waitall() from within one of the " \
2015-02-18 16:05:18.411 TRACE neutron AssertionError: Calling waitall() from within one of the GreenPool's greenthreads will never terminate.
2015-02-18 16:05:18.411 TRACE neutron

Elena Ezhova (eezhova) on 2015-02-18

Changed in neutron:
assignee:	nobody → Elena Ezhova (eezhova)

Elena Ezhova (eezhova) on 2015-02-18

Changed in neutron:
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-02-19: Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/157320

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-02-21:

I'm also seeing this in keystone when running with wsgi/eventlet and 16 workers (32 CPUs on the host).

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-02-21:

I'm on eventlet 0.16.1 and greenlet 0.4.1 if that helps, running basically trunk kilo code on RHEL 7 ppc64 BE.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-02-21:

This confirms it's happening in the community code for keystone and neutron, but apparently it's not fatal:

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiIEFzc2VydGlvbkVycm9yOiBDYWxsaW5nIHdhaXRhbGwoKSBmcm9tIHdpdGhpbiBvbmUgb2YgdGhlIEdyZWVuUG9vbCdzIGdyZWVudGhyZWFkcyB3aWxsIG5ldmVyIHRlcm1pbmF0ZS5cIiBBTkQgdGFnczpcImNvbnNvbGVcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTQyNDUzMzY5NDMzMCwibW9kZSI6IiIsImFuYWx5emVfZmllbGQiOiIifQ==

Changed in keystone:
status:	New → Confirmed

Revision history for this message

Brant Knudson (blk-u) wrote on 2015-03-02:

Here's the commit in eventlet that added the pool.waitall(): http://mylokin.com/rebelmouse/eventlet/commit/1c30e9b39f4330babcb3665df4c1a3ba694bc6c1

Revision history for this message

Brant Knudson (blk-u) wrote on 2015-03-02:

It looks like the "bug" is down in eventlet.

1) The server's accepting connections here: https://github.com/eventlet/eventlet/blob/master/eventlet/wsgi.py#L796

2) Keystone wants to break out of the accept loop gracefully, so the wsgi thread drops into the finally: block where it calls pool.waitall(), which then fails an assertion because "Calling waitall() from within one of the GreenPool's greenthreads will never terminate.": https://github.com/eventlet/eventlet/blob/master/eventlet/greenpool.py#L116

Also, here's a better link to the commit: https://github.com/eventlet/eventlet/commit/1c30e9b39f4330babcb3665df4c1a3ba694bc6c1

Revision history for this message

Brant Knudson (blk-u) wrote on 2015-03-02:

I opened a bug against eventlet: https://github.com/eventlet/eventlet/issues/207

Revision history for this message

Elena Ezhova (eezhova) wrote on 2015-03-02:

It seems that the problem in keystone has the same nature as the one in neutron. A wsgi server is started in a green thread from the same green pool that as passed to the server as a custom pool which is used to handle client requests [1],[2]. So when we try to stop a server, waitall() is eventually called for all green threads in that pool, including the thread which issued that call. And that results in an exception that is observed in logs.

There is a working fix for neutron [3], which is currently under review, that simply uses different pools for spawning
a wsgi server and for its internal usage. Comments are highly appreciated.

[1] https://github.com/openstack/keystone/blob/b72fb2feeb066ec09ee4b6c6b6cc680e8a8f37b7/keystone/common/environment/eventlet_server.py#L138
[2] https://github.com/openstack/keystone/blob/b72fb2feeb066ec09ee4b6c6b6cc680e8a8f37b7/keystone/common/environment/eventlet_server.py#L178
[3] https://review.openstack.org/#/c/157320

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-03: Fix proposed to keystone (master)

Fix proposed to branch: master
Review: https://review.openstack.org/160720

Changed in keystone:
assignee:	nobody → Elena Ezhova (eezhova)
status:	Confirmed → In Progress

Revision history for this message

Elena Ezhova (eezhova) wrote on 2015-03-03:

#10

I have verified the same approach for keystone running in standalone mode and it also works.

Brant Knudson (blk-u) on 2015-03-08

Changed in keystone:
importance:	Undecided → High
milestone:	none → kilo-3

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-10: Fix merged to keystone (master)

#11

Reviewed: https://review.openstack.org/160720
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=71310ba309b4c79908ea209ffe8e19d27bf5116e
Submitter: Jenkins
Branch: master

commit 71310ba309b4c79908ea209ffe8e19d27bf5116e
Author: Elena Ezhova <email address hidden>
Date: Tue Mar 3 12:58:25 2015 +0300

Prevent calling waitall() inside a GreenPool's greenthread

    Keystone eventlet wsgi server is started in a green thread from
    the same green pool that is passed to the server itself
    to be used for spawning client green threads.

To avoid it, it is reasonable to use different pools for spawning
a wsgi server and for its internal usage.

Change-Id: I1246e2b730a522480ed495fbb53ae376ee7593d5
Closes-Bug: #1423250

Changed in keystone:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-11: Fix merged to neutron (master)

#12

Reviewed: https://review.openstack.org/157320
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f9b2791735863b58171ee8be46c47ef3ed2c5edd
Submitter: Jenkins
Branch: master

commit f9b2791735863b58171ee8be46c47ef3ed2c5edd
Author: Elena Ezhova <email address hidden>
Date: Thu Feb 19 13:34:17 2015 +0300

Prevent calling waitall() inside a GreenPool's greenthread

    When neutron-server is running with several api workers
    sending it a termination signal (SIGTERM, SIGHUP or SIGINT)
    leads to waitall() being called inside a GreenPool's greenthread.

    The reason is that a wsgi server is started in a green thread
    from the same green pool that is passed to the server itself
    to be used for spawning client green threads.

To avoid it, it is reasonable to use different pools for spawning
a wsgi server and for its internal usage.

This is also the case for metadata agent running with several
metadata workers.

Change-Id: I38174396f06fcb29ac0776534ac6494dabb00df6
Closes-Bug: #1423250

Changed in neutron:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2015-03-19

Changed in keystone:
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2015-03-19

Changed in neutron:
milestone:	none → kilo-3
status:	Fix Committed → Fix Released

Revision history for this message

Priti Desai (priti-desai) wrote on 2015-04-22:

#13

I am seeing the same issue in Kilo Keystone, can we backport these changes to Kilo?

File "/tmp/jenkins-continuous_review-59/openstack/keystone/.venv/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 115, in wait
    listener.cb(fileno)
  File "/tmp/jenkins-continuous_review-59/openstack/keystone/.venv/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "keystone/common/environment/eventlet_server.py", line 182, in _run
    debug=False)
  File "/tmp/jenkins-continuous_review-59/openstack/keystone/.venv/lib/python2.7/site-packages/eventlet/wsgi.py", line 842, in server
    pool.waitall()
  File "/tmp/jenkins-continuous_review-59/openstack/keystone/.venv/lib/python2.7/site-packages/eventlet/greenpool.py", line 117, in waitall
    "Calling waitall() from within one of the " \
AssertionError: Calling waitall() from within one of the GreenPool's greenthreads will never terminate.

Thierry Carrez (ttx) on 2015-04-30

Changed in keystone:
milestone:	kilo-3 → 2015.1.0

Thierry Carrez (ttx) on 2015-04-30

Changed in neutron:
milestone:	kilo-3 → 2015.1.0

Revision history for this message

Tom Verdaat (tom-verdaat) wrote on 2015-11-24:

#14

Seeing the same thing in Trove:

ERROR trove.openstack.common.threadgroup [-] Calling waitall() from within one of the GreenPool's greenthreads will never terminate.
TRACE trove.openstack.common.threadgroup Traceback (most recent call last):
TRACE trove.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/trove/openstack/common/threadgroup.py", line 98, in _stop_threads
TRACE trove.openstack.common.threadgroup x.stop()
TRACE trove.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/trove/openstack/common/threadgroup.py", line 44, in stop
TRACE trove.openstack.common.threadgroup self.thread.kill()
TRACE trove.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 240, in kill
TRACE trove.openstack.common.threadgroup return kill(self, *throw_args)
TRACE trove.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 294, in kill
TRACE trove.openstack.common.threadgroup g.throw(*throw_args)
TRACE trove.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
TRACE trove.openstack.common.threadgroup result = function(*args, **kwargs)
TRACE trove.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/trove/common/base_wsgi.py", line 162, in _run
TRACE trove.openstack.common.threadgroup log=logging.WritableLogger(logger))
TRACE trove.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 828, in server
TRACE trove.openstack.common.threadgroup pool.waitall()
TRACE trove.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/greenpool.py", line 117, in waitall
TRACE trove.openstack.common.threadgroup "Calling waitall() from within one of the " \
TRACE trove.openstack.common.threadgroup AssertionError: Calling waitall() from within one of the GreenPool's greenthreads will never terminate.
TRACE trove.openstack.common.threadgroup

Seeing the same thing in Trove:

ERROR trove.openstack.common.threadgroup [-] Calling waitall() from within one of the GreenPool's greenthreads will never terminate.
TRACE trove.openstack.common.threadgroup Traceback (most recent call last):
TRACE trove.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/trove/openstack/common/threadgroup.py", line 98, in _stop_threads
TRACE trove.openstack.common.threadgroup     x.stop()
TRACE trove.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/trove/openstack/common/threadgroup.py", line 44, in stop
TRACE trove.openstack.common.threadgroup     self.thread.kill()
TRACE trove.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 240, in kill
TRACE trove.openstack.common.threadgroup     return kill(self, *throw_args)
TRACE trove.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 294, in kill
TRACE trove.openstack.common.threadgroup     g.throw(*throw_args)
TRACE trove.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
TRACE trove.openstack.common.threadgroup     result = function(*args, **kwargs)
TRACE trove.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/trove/common/base_wsgi.py", line 162, in _run
TRACE trove.openstack.common.threadgroup     log=logging.WritableLogger(logger))
TRACE trove.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 828, in server
TRACE trove.openstack.common.threadgroup     pool.waitall()
TRACE trove.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/eventlet/greenpool.py", line 117, in waitall
TRACE trove.openstack.common.threadgroup     "Calling waitall() from within one of the " \
TRACE trove.openstack.common.threadgroup AssertionError: Calling waitall() from within one of the GreenPool's greenthreads will never terminate.
TRACE trove.openstack.common.threadgroup

Amrith Kumar (amrith) on 2016-04-05

Changed in trove:
importance:	Undecided → Medium
status:	New → Confirmed
milestone:	none → newton-1

Amrith Kumar (amrith) on 2016-09-18

Changed in trove:
milestone:	newton-1 → ongoing

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.