nova.tests.functional.notification_sample_tests.test_instance.TestInstanceNotificationSample.test_volume_swap_server_with_error intermittently fails with IndexError

Bug #1814177 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann
Rocky
Fix Committed
Medium
Matt Riedemann

Bug Description

Seen here:

http://logs.openstack.org/86/633086/1/gate/nova-tox-functional/8a2c597/job-output.txt.gz#_2019-01-31_20_37_18_577225

2019-01-31 20:37:18.641058 | ubuntu-xenial | 2019-01-31 20:37:17,911 ERROR [nova.compute.manager] Failed to connect to volume 9c6d9c2d-7a8f-4c80-938d-3bf062b8d489 with volume at /dev/sdb
2019-01-31 20:37:18.641191 | ubuntu-xenial | Traceback (most recent call last):
2019-01-31 20:37:18.641370 | ubuntu-xenial | File "nova/compute/manager.py", line 5710, in _swap_volume
2019-01-31 20:37:18.641508 | ubuntu-xenial | bdm, new_attachment_id, mountpoint)
2019-01-31 20:37:18.641707 | ubuntu-xenial | File "nova/compute/manager.py", line 5697, in _init_volume_connection
2019-01-31 20:37:18.641850 | ubuntu-xenial | new_cinfo['serial'] = new_volume_id
2019-01-31 20:37:18.642025 | ubuntu-xenial | TypeError: 'tuple' object does not support item assignment
2019-01-31 20:37:18.642275 | ubuntu-xenial | 2019-01-31 20:37:18,369 ERROR [oslo_messaging.rpc.server] Exception during message handling
2019-01-31 20:37:18.642402 | ubuntu-xenial | Traceback (most recent call last):
2019-01-31 20:37:18.642783 | ubuntu-xenial | File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
2019-01-31 20:37:18.642931 | ubuntu-xenial | res = self.dispatcher.dispatch(message)
2019-01-31 20:37:18.643307 | ubuntu-xenial | File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch
2019-01-31 20:37:18.643490 | ubuntu-xenial | return self._do_dispatch(endpoint, method, ctxt, args)
2019-01-31 20:37:18.643874 | ubuntu-xenial | File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch
2019-01-31 20:37:18.644007 | ubuntu-xenial | result = func(ctxt, **new_args)
2019-01-31 20:37:18.644175 | ubuntu-xenial | File "nova/exception_wrapper.py", line 79, in wrapped
2019-01-31 20:37:18.644314 | ubuntu-xenial | function_name, call_dict, binary, tb)
2019-01-31 20:37:18.644739 | ubuntu-xenial | File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-01-31 20:37:18.644850 | ubuntu-xenial | self.force_reraise()
2019-01-31 20:37:18.645209 | ubuntu-xenial | File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-01-31 20:37:18.645365 | ubuntu-xenial | six.reraise(self.type_, self.value, self.tb)
2019-01-31 20:37:18.645532 | ubuntu-xenial | File "nova/exception_wrapper.py", line 69, in wrapped
2019-01-31 20:37:18.645668 | ubuntu-xenial | return f(self, context, *args, **kw)
2019-01-31 20:37:18.645864 | ubuntu-xenial | File "nova/compute/utils.py", line 1141, in decorated_function
2019-01-31 20:37:18.646051 | ubuntu-xenial | return function(self, context, *args, **kwargs)
2019-01-31 20:37:18.646263 | ubuntu-xenial | File "nova/compute/manager.py", line 216, in decorated_function
2019-01-31 20:37:18.646406 | ubuntu-xenial | kwargs['instance'], e, sys.exc_info())
2019-01-31 20:37:18.646755 | ubuntu-xenial | File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-01-31 20:37:18.646864 | ubuntu-xenial | self.force_reraise()
2019-01-31 20:37:18.647224 | ubuntu-xenial | File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-01-31 20:37:18.647379 | ubuntu-xenial | six.reraise(self.type_, self.value, self.tb)
2019-01-31 20:37:18.647564 | ubuntu-xenial | File "nova/compute/manager.py", line 204, in decorated_function
2019-01-31 20:37:18.647721 | ubuntu-xenial | return function(self, context, *args, **kwargs)
2019-01-31 20:37:18.647894 | ubuntu-xenial | File "nova/compute/manager.py", line 5868, in swap_volume
2019-01-31 20:37:18.647997 | ubuntu-xenial | is_cinder_migration)
2019-01-31 20:37:18.648170 | ubuntu-xenial | File "nova/compute/manager.py", line 5766, in _swap_volume
2019-01-31 20:37:18.648288 | ubuntu-xenial | context, new_attachment_id)
2019-01-31 20:37:18.648676 | ubuntu-xenial | File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-01-31 20:37:18.648784 | ubuntu-xenial | self.force_reraise()
2019-01-31 20:37:18.649144 | ubuntu-xenial | File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-01-31 20:37:18.649298 | ubuntu-xenial | six.reraise(self.type_, self.value, self.tb)
2019-01-31 20:37:18.649473 | ubuntu-xenial | File "nova/compute/manager.py", line 5710, in _swap_volume
2019-01-31 20:37:18.649606 | ubuntu-xenial | bdm, new_attachment_id, mountpoint)
2019-01-31 20:37:18.649809 | ubuntu-xenial | File "nova/compute/manager.py", line 5697, in _init_volume_connection
2019-01-31 20:37:18.649966 | ubuntu-xenial | new_cinfo['serial'] = new_volume_id
2019-01-31 20:37:18.650149 | ubuntu-xenial | TypeError: 'tuple' object does not support item assignment
2019-01-31 20:37:18.650208 | ubuntu-xenial |
2019-01-31 20:37:18.650254 | ubuntu-xenial |
2019-01-31 20:37:18.650337 | ubuntu-xenial | Captured traceback:
2019-01-31 20:37:18.650423 | ubuntu-xenial | ~~~~~~~~~~~~~~~~~~~
2019-01-31 20:37:18.650545 | ubuntu-xenial | Traceback (most recent call last):
2019-01-31 20:37:18.650846 | ubuntu-xenial | File "nova/tests/functional/notification_sample_tests/test_instance.py", line 1443, in test_volume_swap_server_with_error
2019-01-31 20:37:18.651028 | ubuntu-xenial | tb2 = fake_notifier.VERSIONED_NOTIFICATIONS[7]['payload'][
2019-01-31 20:37:18.651153 | ubuntu-xenial | IndexError: list index out of range

The TypeError is expected, it's part of the swap.volume.error notification test. The bug is the IndexError in the expected notifications. We should probably debug that with a patch to check the length and dump the notifications if the length isn't what we expect.

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22tb2%20%3D%20fake_notifier.VERSIONED_NOTIFICATIONS%5B7%5D%5B'payload'%5D%5B%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=10d

6 hits in 10 days, check and gate, all failures.

Tags: gate-failure
Revision history for this message
Matt Riedemann (mriedem) wrote :

Looks like we already hit this once with bug 1704392.

Revision history for this message
Matt Riedemann (mriedem) wrote :

The difference since that bug was fixed is the check here:

https://review.openstack.org/#/c/564092/16/nova/tests/functional/notification_sample_tests/test_instance.py@1348

We're already checking the size of the list, but we are accepting anything less than or equal to 7:

        # Seven versioned notifications are generated. We only rely on the
        # first six because _wait_until_swap_volume_error will return True
        # after volume_api.unreserve is called on the cinder fixture, and that
        # happens before the instance fault is handled in the compute manager
        # which generates the last notification (compute.exception).
        # 0. instance-create-start
        # 1. instance-create-end
        # 2. instance-update
        # 3. instance-volume_attach-start
        # 4. instance-volume_attach-end
        # 5. instance-volume_swap-start
        # 6. instance-volume_swap-error
        # 7. compute.exception
        self.assertLessEqual(7, len(fake_notifier.VERSIONED_NOTIFICATIONS),
                             'Unexpected number of versioned notifications. '
                             'Got: %s' % fake_notifier.VERSIONED_NOTIFICATIONS)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/634334

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/634334
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=03e4e3ce1357123f9f53a3901a17a64f160653a4
Submitter: Zuul
Branch: master

commit 03e4e3ce1357123f9f53a3901a17a64f160653a4
Author: Matt Riedemann <email address hidden>
Date: Thu Jan 31 17:36:47 2019 -0500

    Fix race in test_volume_swap_server_with_error

    Change I03c8e8225e51fd80580772752c0b292987e34218 added
    another notification to the list checked in
    test_volume_swap_server_with_error and then change
    Id587967ea4f9980c292492e2f659bf55fb037b28 relied on
    checking the last notification in the list (compute.exception)
    but per the note in the test, that last exception can
    race to show up by the time we check the size of the
    notification list which can result in an IndexError.

    This change fixes the race by just waiting for the
    compute.exception notification rather than the racy
    _wait_until_swap_volume_error which depended on the
    cinder fixture swap_error variable (which is removed
    in this change).

    Change-Id: I8165fcd98e11b9155640559a36ae90a38f63c0dd
    Closes-Bug: #1814177

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/640595

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/640595
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=40771bc155fe28977c805aed5f60b5d84b1ac2bc
Submitter: Zuul
Branch: stable/rocky

commit 40771bc155fe28977c805aed5f60b5d84b1ac2bc
Author: Matt Riedemann <email address hidden>
Date: Thu Jan 31 17:36:47 2019 -0500

    Fix race in test_volume_swap_server_with_error

    Change I03c8e8225e51fd80580772752c0b292987e34218 added
    another notification to the list checked in
    test_volume_swap_server_with_error and then change
    Id587967ea4f9980c292492e2f659bf55fb037b28 relied on
    checking the last notification in the list (compute.exception)
    but per the note in the test, that last exception can
    race to show up by the time we check the size of the
    notification list which can result in an IndexError.

    This change fixes the race by just waiting for the
    compute.exception notification rather than the racy
    _wait_until_swap_volume_error which depended on the
    cinder fixture swap_error variable (which is removed
    in this change).

    Change-Id: I8165fcd98e11b9155640559a36ae90a38f63c0dd
    Closes-Bug: #1814177
    (cherry picked from commit 03e4e3ce1357123f9f53a3901a17a64f160653a4)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.2.0

This issue was fixed in the openstack/nova 18.2.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.