Failed to stop nova-api in grenade tests

Bug #1538204 reported by Thomas Herve
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
oslo.service
Invalid
Critical
Unassigned

Bug Description

Saw this during a grenade run:

2016-01-26 16:12:58.553 22016 ERROR oslo_service.threadgroup File "/usr/local/lib/python2.7/dist-packages/oslo_service/service.py", line 143, in clear
2016-01-26 16:12:58.553 22016 ERROR oslo_service.threadgroup for sig in self._signal_handlers:
2016-01-26 16:12:58.553 22016 ERROR oslo_service.threadgroup RuntimeError: dictionary changed size during iteration

(From http://logs.openstack.org/25/272425/1/gate/gate-grenade-dsvm-heat/b32eda2/).

May be due to a change in oslo, but it's in the old process so I'm not sure it ought to use it.

Revision history for this message
Thomas Herve (therve) wrote :
Revision history for this message
Thomas Herve (therve) wrote :

Ah in fact it's possible that it's https://bugs.launchpad.net/oslo.service/+bug/1524907

Revision history for this message
Victor Stinner (vstinner) wrote :

I looked at the traceback. I'm unable to understand exactly why and how the bug was triggered, but it really looks like a reentrant call to the _sigterm() method. A reentrant call to _sigterm() means a reentrant call to clear() which can explain a RuntimeError exception. The second call modifies the dictionary whereas the first call is still iterating on it.

If it's a reentrant call, the second call (the reentrant call) is done when the RuntimeError is raised in the first call. So it's not possible to see the reentrant call in the traceback.

I proposed the change https://review.openstack.org/#/c/272718/ which may (or may not) fix this bug.

Revision history for this message
Victor Stinner (vstinner) wrote :

To reproduce the bug, you should try to send quickly a signal twice. The SIGTERM is a good candidate since its signal handler calls clear() and clear() modifies the dictionary of signal handlers, so it can explain the bug.

Reproducing the bug would help to understand it and also to test my change.

Revision history for this message
Marian Horban (mhorban) wrote :

Why we can not just add lockers in oslo_service.threadgroup to avoid changing dict during iteration?

Revision history for this message
Sean Dague (sdague) wrote :

This is definitely a core oslo.service issue with shutting down, this keeps tripping us up.

Changed in oslo.service:
status: New → Confirmed
importance: Undecided → Critical
Changed in nova:
status: New → Invalid
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

16 hits in last week for message:"dictionary changed size during iteration" logstash query. Only one of them is a FAILURE though unrelated to this bug.

-- Dims

Changed in oslo.service:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.