Periodic ValueErrors when using qpid

Bug #1185465 reported by David Ripton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
David Ripton
Grizzly
Fix Released
Low
Unassigned

Bug Description

When using qpid, I see periodic tracebacks like this in various nova logs:

2013-05-26 03:56:03.821 2785 INFO nova.service [-] Caught SIGTERM, exiting
2013-05-26 03:56:03.831 2785 CRITICAL nova [-] need more than 0 values to unpack
2013-05-26 03:56:03.831 2785 TRACE nova Traceback (most recent call last):
2013-05-26 03:56:03.831 2785 TRACE nova File "/usr/bin/nova-compute", line 85, in <module>
2013-05-26 03:56:03.831 2785 TRACE nova service.wait()
2013-05-26 03:56:03.831 2785 TRACE nova File "/usr/lib/python2.6/site-packages/nova/service.py", line 69
6, in wait
2013-05-26 03:56:03.831 2785 TRACE nova _launcher.wait()
2013-05-26 03:56:03.831 2785 TRACE nova File "/usr/lib/python2.6/site-packages/nova/service.py", line 21
9, in wait
2013-05-26 03:56:03.831 2785 TRACE nova rpc.cleanup()
2013-05-26 03:56:03.831 2785 TRACE nova File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc
/__init__.py", line 240, in cleanup
2013-05-26 03:56:03.831 2785 TRACE nova return _get_impl().cleanup()
2013-05-26 03:56:03.831 2785 TRACE nova File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc
/impl_qpid.py", line 649, in cleanup
2013-05-26 03:56:03.831 2785 TRACE nova return rpc_amqp.cleanup(Connection.pool)
2013-05-26 03:56:03.831 2785 TRACE nova File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc
/amqp.py", line 670, in cleanup
2013-05-26 03:56:03.831 2785 TRACE nova connection_pool.empty()
2013-05-26 03:56:03.831 2785 TRACE nova File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc
/amqp.py", line 80, in empty
2013-05-26 03:56:03.831 2785 TRACE nova self.get().close()
2013-05-26 03:56:03.831 2785 TRACE nova File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc
/impl_qpid.py", line 386, in close
2013-05-26 03:56:03.831 2785 TRACE nova self.connection.close()
2013-05-26 03:56:03.831 2785 TRACE nova File "<string>", line 6, in close
2013-05-26 03:56:03.831 2785 TRACE nova File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.
py", line 316, in close
2013-05-26 03:56:03.831 2785 TRACE nova ssn.close(timeout=timeout)
2013-05-26 03:56:03.831 2785 TRACE nova File "<string>", line 6, in close
2013-05-26 03:56:03.831 2785 TRACE nova File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 749, in close
2013-05-26 03:56:03.831 2785 TRACE nova if not self._ewait(lambda: self.closed, timeout=timeout):
2013-05-26 03:56:03.831 2785 TRACE nova File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 566, in _ewait
2013-05-26 03:56:03.831 2785 TRACE nova result = self.connection._ewait(lambda: self.error or predicate(), timeout)
2013-05-26 03:56:03.831 2785 TRACE nova File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 208, in _ewait
2013-05-26 03:56:03.831 2785 TRACE nova result = self._wait(lambda: self.error or predicate(), timeout)
2013-05-26 03:56:03.831 2785 TRACE nova File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 193, in _wait
2013-05-26 03:56:03.831 2785 TRACE nova return self._waiter.wait(predicate, timeout=timeout)
2013-05-26 03:56:03.831 2785 TRACE nova File "/usr/lib/python2.6/site-packages/qpid/concurrency.py", line 57, in wait
2013-05-26 03:56:03.831 2785 TRACE nova self.condition.wait(3)
2013-05-26 03:56:03.831 2785 TRACE nova File "/usr/lib/python2.6/site-packages/qpid/concurrency.py", line 96, in wait
2013-05-26 03:56:03.831 2785 TRACE nova sw.wait(timeout)
2013-05-26 03:56:03.831 2785 TRACE nova File "/usr/lib/python2.6/site-packages/qpid/compat.py", line 53, in wait
2013-05-26 03:56:03.831 2785 TRACE nova ready, _, _ = select([self], [], [], timeout)
2013-05-26 03:56:03.831 2785 TRACE nova ValueError: need more than 0 values to unpack
2013-05-26 03:56:03.831 2785 TRACE nova

Revision history for this message
David Ripton (dripton) wrote :

The actual bug may be in the python-qpid package, not in Nova proper. It's unpacking the return value from select(), which it expects to have length 3, but which actually has length 1 for some reason.

Revision history for this message
David Ripton (dripton) wrote :

Also seen in http://lists.openstack.org/pipermail/openstack-dev/2013-January/004740.html

I'm not sure why select() is ever returning the wrong number of items, but it would be simple enough to check the length of the return value.

The code for python-qpid is inside the python directory in the main Apache Qpid svn repository, not a separate project. (Though some distributions package it separately.)

Revision history for this message
Russell Bryant (russellb) wrote :

Note that it's actually select() from eventlet, since it's monkey_patched

Revision history for this message
Russell Bryant (russellb) wrote :

I would actually start by looking into the eventlet code to see if you can find where it be returning nothing.

Revision history for this message
David Ripton (dripton) wrote :

eventlet.green.select's select only explicitly returns hub.switch(), which is Magic. It can implicitly return None by falling off the end of the function, but that first invokes two finally blocks that remove all listeners and timers.

Basically, it looks like we're getting an exception from trying to use a dead socket during cleanup.

Getting cleanup timing perfect and race-free across Nova, Oslo, python-qpid, and eventlet is very difficult, and not really worth a lot of effort, since you're shutting down that connection anyway, and software can *never* guarantee clean shutdown. (No matter how clever your cleanup logic, if I cut power to your CPU before it's done running, you still lose. Real cleanup has to happen after the next restart, not on shutdown. Once you're done that, you might as well just always crash hard rather than even trying to clean up on shutdown. http://en.wikipedia.org/wiki/Crash-only_software )

So, I think the correct solution is to squelch exceptions during shutdown rather than logging them. I will propose a patch.

David Ripton (dripton)
Changed in nova:
assignee: nobody → David Ripton (dripton)
status: New → In Progress
importance: Undecided → Low
Revision history for this message
David Ripton (dripton) wrote :

The change is in oslo-incubator: 15d8d698b7c67

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/31496

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/31496
Committed: http://github.com/openstack/nova/commit/361d116d820939207a0b01399c3ce50c7bc30dcd
Submitter: Jenkins
Branch: master

commit 361d116d820939207a0b01399c3ce50c7bc30dcd
Author: David Ripton <email address hidden>
Date: Mon Jun 3 09:23:13 2013 -0400

    Silence exceptions from qpid connection.close() (from oslo)

    This is commit 15d8d698b7 in oslo-incubator.

    Fixes bug 1185465

    Don't log exceptions that happen during connection.close(). They're
    not surprising (a closed socket can't be read or written), don't
    require sysadmin attention, and provide red herrings in the logs
    to distract from finding real root causes of problems.

    Change-Id: Ia0e25fbc11094b5ad5d6d2cf7d0e786acb3df1f6

Changed in nova:
status: In Progress → Fix Committed
tags: added: grizzly-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/grizzly)

Fix proposed to branch: stable/grizzly
Review: https://review.openstack.org/33997

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/grizzly)

Reviewed: https://review.openstack.org/33997
Committed: http://github.com/openstack/nova/commit/19fbc29754f72de6c178ba008f2b330ac24ef1c7
Submitter: Jenkins
Branch: stable/grizzly

commit 19fbc29754f72de6c178ba008f2b330ac24ef1c7
Author: David Ripton <email address hidden>
Date: Mon Jun 3 09:23:13 2013 -0400

    Silence exceptions from qpid connection.close() (from oslo)

    This is commit 15d8d698b7 in oslo-incubator.

    Fixes bug 1185465

    Don't log exceptions that happen during connection.close(). They're
    not surprising (a closed socket can't be read or written), don't
    require sysadmin attention, and provide red herrings in the logs
    to distract from finding real root causes of problems.

    Change-Id: Ia0e25fbc11094b5ad5d6d2cf7d0e786acb3df1f6
    (cherry picked from commit 361d116d820939207a0b01399c3ce50c7bc30dcd)

tags: added: in-stable-grizzly
Thierry Carrez (ttx)
Changed in nova:
milestone: none → havana-2
status: Fix Committed → Fix Released
Alan Pevec (apevec)
tags: removed: grizzly-backport-potential in-stable-grizzly
Thierry Carrez (ttx)
Changed in nova:
milestone: havana-2 → 2013.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.