Glance hangs in qpid notification when adding/removing an image

Bug #1100317 reported by Daniel Berrange
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Glance
Fix Released
High
Russell Bryant
Grizzly
Fix Released
High
Russell Bryant

Bug Description

I am using DevStack to setup openstack. The following is my devstack config

# cat localrc
DEST=$HOME/src/cloud
DATA_DIR=$DEST/data

LOGFILE=$DATA_DIR/logs/stack.log
SCREEN_LOGDIR=$DATA_DIR/logs
VERBOSE=False

disable_service rabbit
enable_service qpid

enable_service n-spice
disable_service n-novnc
disable_service n-xvnc

MYSQL_PASSWORD=123456
SERVICE_TOKEN=123456
SERVICE_PASSWORD=123456
ADMIN_PASSWORD=123456

IMAGE_URLS="http://berrange.fedorapeople.org/images/2012-02-29/f16-x86_64-openstack-sda.qcow2,http://launchpad.net/cirros/trunk/0.3.0/+download/cirros-0.3.0-x86_64-uec.tar.gz"

When running 'devstack.sh' everything works fine until 'glance add' is invoked

 # glance add name=f16-jeos is_public=true disk_format=qcow2 container_format=bare < files/f16-x86_64-openstack-sda.qcow2

At which point both glance api / register services appear to be hung processing the request. This eventually times out on the client, but no further commands can be run against glance until it is restarted.

See also

http://lists.openstack.org/pipermail/openstack-dev/2013-January/004467.html

Revision history for this message
Daniel Berrange (berrange) wrote :
Revision history for this message
Daniel Berrange (berrange) wrote :
Revision history for this message
Daniel Berrange (berrange) wrote :

With further debugging I've determined that the hang occurs in glance/notifier/notify_qpid.py in the call

        self.sender_info.send(qpid_msg)

This is calling off into the qpid messaging APIs, and GDB shows it gets stuck in a poll() /select() syscall.

Russell suggests that glance may not be properly dealing with eventlet monkey patching, but adding

  import eventlet
  eventlet.monkey_patch()

as the first line of 'glance-api' does not help.

NB, it turns out that the glance-registry is operating fine. Only 'glance-api' is hung in the above notify path

Changing

  notifier_strategy = noop

in glance-api.conf avoids the hang.

summary: - Glance hangs when adding an image
+ Glance hangs in qpid notification when adding/removing an image
Revision history for this message
Daniel Berrange (berrange) wrote :

Adding Russell since I don't really understand any of the qpid or eventlet patching code

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :
Download full text (3.5 KiB)

Using the dumpstacks trick here - http://stackoverflow.com/questions/132058/showing-the-stack-trace-from-a-running-python-application - found the stack trace of where the select() was being called from

2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images File "/opt/stack/glance/glance/api/v1/images.py", line 457, in _upload
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images self.notifier.info('image.upload', image_meta)
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images File "/opt/stack/glance/glance/notifier/__init__.py", line 84, in info
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images self.strategy.info(msg)
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images File "/opt/stack/glance/glance/notifier/notify_qpid.py", line 139, in info
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images self.sender_info.send(qpid_msg)
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images File "<string>", line 6, in send
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images File "/usr/lib/python2.7/dist-packages/qpid/messaging/endpoints.py", line 874, in send
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images self.sync()
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images File "<string>", line 6, in sync
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images File "/usr/lib/python2.7/dist-packages/qpid/messaging/endpoints.py", line 885, in sync
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images if not self._ewait(lambda: self.acked >= mno, timeout=timeout):
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images File "/usr/lib/python2.7/dist-packages/qpid/messaging/endpoints.py", line 799, in _ewait
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images result = self.session._ewait(lambda: self.error or predicate(), timeout)
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images File "/usr/lib/python2.7/dist-packages/qpid/messaging/endpoints.py", line 566, in _ewait
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images result = self.connection._ewait(lambda: self.error or predicate(), timeout)
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images File "/usr/lib/python2.7/dist-packages/qpid/messaging/endpoints.py", line 208, in _ewait
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images result = self._wait(lambda: self.error or predicate(), timeout)
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images File "/usr/lib/python2.7/dist-packages/qpid/messaging/endpoints.py", line 193, in _wait
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images return self._waiter.wait(predicate, timeout=timeout)
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images File "/usr/lib/python2.7/dist-packages/qpid/concurrency.py", line 57, in wait
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images self.condition.wait(3)
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images File "/usr/lib/python2.7/dist-packages/qpid/concurrency.py", line 96, in wait
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images sw.wait(timeout)
2013-02-12 23:46:46.934 5091 TRACE glance.api.v1.images File "/usr/lib/python2.7/dist-packages/qpid/compat.py", line 53, in wait...

Read more...

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance (master)

Fix proposed to branch: master
Review: https://review.openstack.org/21852

Changed in glance:
assignee: nobody → Davanum Srinivas (DIMS) (dims-v)
status: New → In Progress
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

I am not making progress, unassigning just in case someone knows this better.

Changed in glance:
assignee: Davanum Srinivas (DIMS) (dims-v) → nobody
Revision history for this message
Daniel Berrange (berrange) wrote :
Download full text (7.3 KiB)

Here is a stack trace showing where glance has hung

ThreadID: 140483818166016
  File "/usr/lib64/python2.7/threading.py", line 524, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib64/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.7/site-packages/qpid/selector.py", line 124, in run
    rd, wr, ex = select(self.reading, self.writing, (), timeout)

ThreadID: 140484070934336
  File "/usr/lib/python2.7/site-packages/eventlet/greenpool.py", line 80, in _spawn_n_impl
    func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/eventlet/wsgi.py", line 567, in process_request
    proto = self.protocol(socket, address, self)
  File "/usr/lib64/python2.7/SocketServer.py", line 638, in __init__
    self.handle()
  File "/usr/lib64/python2.7/BaseHTTPServer.py", line 340, in handle
    self.handle_one_request()
  File "/usr/lib/python2.7/site-packages/eventlet/wsgi.py", line 278, in handle_one_request
    self.handle_one_response()
  File "/usr/lib/python2.7/site-packages/eventlet/wsgi.py", line 382, in handle_one_response
    result = self.application(self.environ, start_response)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 130, in __call__
    resp = self.call_func(req, *args, **self.kwargs)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 195, in call_func
    return self.func(req, *args, **kwargs)
  File "/home/berrange/src/cloud/glance/glance/common/wsgi.py", line 335, in __call__
    response = req.get_response(self.application)
  File "/usr/lib/python2.7/site-packages/webob/request.py", line 1296, in send
    application, catch_exc_info=False)
  File "/usr/lib/python2.7/site-packages/webob/request.py", line 1260, in call_application
    app_iter = application(self.environ, start_response)
  File "/home/berrange/src/cloud/python-keystoneclient/keystoneclient/middleware/auth_token.py", line 348, in __call__
    return self.app(env, start_response)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 130, in __call__
    resp = self.call_func(req, *args, **self.kwargs)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 195, in call_func
    return self.func(req, *args, **kwargs)
  File "/home/berrange/src/cloud/glance/glance/common/wsgi.py", line 335, in __call__
    response = req.get_response(self.application)
  File "/usr/lib/python2.7/site-packages/webob/request.py", line 1296, in send
    application, catch_exc_info=False)
  File "/usr/lib/python2.7/site-packages/webob/request.py", line 1260, in call_application
    app_iter = application(self.environ, start_response)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 130, in __call__
    resp = self.call_func(req, *args, **self.kwargs)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 195, in call_func
    return self.func(req, *args, **kwargs)
  File "/home/berrange/src/cloud/glance/glance/common/wsgi.py", line 335, in __call__
    response = req.get_response(self.application)
  File "/usr/lib/python2.7/site-packages/webob/request.py", line 1296, in send...

Read more...

Revision history for this message
Russell Bryant (russellb) wrote :

Setting "workers = 0" in glance-api.conf is a workaround for this issue in test environments.

The attached patch seems to fix it for me. Can someone else verify? Then I'll submit it if it seems ok ...

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Russell,

Patch works like a charm. please go ahead and submit

-- dims

Changed in glance:
importance: Undecided → High
assignee: nobody → Russell Bryant (russellb)
Revision history for this message
Mark Vanderwiel (vanderwl) wrote :

Russell, I can also verify that your patch worked. But I have a question on the patch approach. Why not make the simple change to have Glance use the new common notifier code base to be more consistent with the other projects? I see the glance/openstack/common/notifier looks like it's ready to go, but I don't see the glance/openstack/common/rpc included yet, so that would also be needed. Or are there plans to do this for Havana?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to glance (master)

Reviewed: https://review.openstack.org/22206
Committed: http://github.com/openstack/glance/commit/60be7133544afa5e63c0b742d31b9998688530b6
Submitter: Jenkins
Branch: master

commit 60be7133544afa5e63c0b742d31b9998688530b6
Author: Russell Bryant <email address hidden>
Date: Fri Feb 15 17:50:02 2013 -0500

    Create connection for each qpid notification.

    Previously the qpid notifier created a connection to the qpid broker
    when the notifier was initialized and reused that for each notification.
    This is broken when API workers are used. Instead, just create a new
    connection on demand when a notification gets sent out.

    Fix bug 1100317.

    Change-Id: I7a17e5e4bd8efd771ff8cd0e1d04107a79789763

Changed in glance:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in glance:
milestone: none → grizzly-3
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.