volume-backed resize intermittently fails with rbd imagebackend

Bug #1580625 reported by Matt Riedemann
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Nicolas Simonds
Liberty
Fix Released
High
Matt Riedemann
Mitaka
Fix Released
High
Matt Riedemann

Bug Description

After this Tempest change landed to test volume-backed resize:

https://review.openstack.org/#/c/314816/

The ceph plugin job has been intermittently failing the test:

http://logs.openstack.org/00/314600/3/check/gate-tempest-dsvm-full-devstack-plugin-ceph/9dad224/logs/screen-n-cpu.txt.gz?level=TRACE#_2016-05-11_12_14_26_874

2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server [req-1eef0755-362f-4136-ab7c-695b0bb4f0bb tempest-TestServerAdvancedOps-1194878159 tempest-TestServerAdvancedOps-1912438683] Exception during handling message
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 153, in dispatch
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 122, in _do_dispatch
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/exception.py", line 110, in wrapped
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server payload)
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 221, in __exit__
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server self.force_reraise()
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 197, in force_reraise
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/exception.py", line 89, in wrapped
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/compute/manager.py", line 232, in decorated_function
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/compute/manager.py", line 210, in decorated_function
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info())
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 221, in __exit__
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server self.force_reraise()
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 197, in force_reraise
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/compute/manager.py", line 198, in decorated_function
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/compute/manager.py", line 3338, in confirm_resize
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server do_confirm_resize(context, instance, migration.id)
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/compute/manager.py", line 3336, in do_confirm_resize
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server migration=migration)
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/compute/manager.py", line 3362, in _confirm_resize
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server network_info)
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 7288, in confirm_migration
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server self._cleanup_resize(instance, network_info)
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 1061, in _cleanup_resize
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server ignore_errors=True)
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/virt/libvirt/imagebackend.py", line 903, in remove_snap
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server return self.driver.remove_snap(self.rbd_name, name, ignore_errors)
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/virt/libvirt/storage/rbd_utils.py", line 396, in remove_snap
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server with RBDVolumeProxy(self, str(volume), pool=pool) as vol:
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/virt/libvirt/storage/rbd_utils.py", line 65, in __init__
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server driver._disconnect_from_rados(client, ioctx)
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 221, in __exit__
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server self.force_reraise()
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 197, in force_reraise
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/virt/libvirt/storage/rbd_utils.py", line 61, in __init__
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server read_only=read_only)
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/rbd.py", line 374, in __init__
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server raise make_ex(ret, 'error opening image %s at snapshot %s' % (name, snapshot))
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server ImageNotFound: error opening image 14decc0d-ac5a-4c29-95cd-a8b1bd48e5e8_disk at snapshot None
2016-05-11 12:14:26.874 17925 ERROR oslo_messaging.rpc.server

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22ImageNotFound%3A%20error%20opening%20image%5C%22%20AND%20message%3A%5C%22at%20snapshot%20None%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d

So far there are 29 hits in 24 hours since the Tempest change merged, and since it's Tempest it's hitting on all branches for any projects that run the ceph job, which is at least nova/cinder/glance/glance-store/os-brick/gnocchi.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Tempest change to skip the test for now based on this bug:

https://review.openstack.org/#/c/315106/

Revision history for this message
Lee Yarwood (lyarwood) wrote :

Drive by comment but AFAICT the following change is responsible as it attempts to remove a snapshot for a root rbd image that doesn't exist in the boot from rbd volume use case now tested by temptest :

libvirt: Fix/implement revert-resize for RBD-backed images
https://review.openstack.org/#/c/187395/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/315187

Changed in nova:
assignee: nobody → Nicolas Simonds (nicolas.simonds)
status: Confirmed → In Progress
Changed in nova:
assignee: Nicolas Simonds (nicolas.simonds) → melanie witt (melwitt)
melanie witt (melwitt)
Changed in nova:
assignee: melanie witt (melwitt) → Nicolas Simonds (nicolas.simonds)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/315596

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/315694

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/315187
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=671bb2651d9d1a07947690468825581d30482dcb
Submitter: Jenkins
Branch: master

commit 671bb2651d9d1a07947690468825581d30482dcb
Author: Nicolas Simonds <email address hidden>
Date: Wed May 11 10:52:52 2016 -0700

    imagebackend: Check that the RBD image exists before trying to cleanup

    In volume-backed setups, there is no image to clean up, so any
    attempts to cleanup the resize snapshots will fail by definition.
    Make sure the image exists first.

    Change-Id: I25f65bcc76b83f31a8fce77c2b751d2d167ffc7e
    Closes-Bug: 1580625

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/mitaka)

Reviewed: https://review.openstack.org/315596
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8b256fb50a0bb9c6ddcbff2099a744045688b652
Submitter: Jenkins
Branch: stable/mitaka

commit 8b256fb50a0bb9c6ddcbff2099a744045688b652
Author: Nicolas Simonds <email address hidden>
Date: Wed May 11 10:52:52 2016 -0700

    imagebackend: Check that the RBD image exists before trying to cleanup

    In volume-backed setups, there is no image to clean up, so any
    attempts to cleanup the resize snapshots will fail by definition.
    Make sure the image exists first.

    NOTE(mriedem): The backport has to add an additional mock that wasn't
    in the original change because on master the fake_imagebackend.Raw
    object wasn't used, simply a mock.Mock object which handled the
    check_image_exists() method call on the fake image backend.

    Change-Id: I25f65bcc76b83f31a8fce77c2b751d2d167ffc7e
    Closes-Bug: 1580625
    (cherry picked from commit 671bb2651d9d1a07947690468825581d30482dcb)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/liberty)

Reviewed: https://review.openstack.org/315694
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=743d5efccaa99e3b4873831a8f43c216a31c7113
Submitter: Jenkins
Branch: stable/liberty

commit 743d5efccaa99e3b4873831a8f43c216a31c7113
Author: Nicolas Simonds <email address hidden>
Date: Wed May 11 10:52:52 2016 -0700

    imagebackend: Check that the RBD image exists before trying to cleanup

    In volume-backed setups, there is no image to clean up, so any
    attempts to cleanup the resize snapshots will fail by definition.
    Make sure the image exists first.

    NOTE(mriedem): The test in the backport is slightly different from
    mitaka because of some minor things with stubs and config options.

    Change-Id: I25f65bcc76b83f31a8fce77c2b751d2d167ffc7e
    Closes-Bug: 1580625
    (cherry picked from commit 671bb2651d9d1a07947690468825581d30482dcb)
    (cherry picked from commit 8b256fb50a0bb9c6ddcbff2099a744045688b652)

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/nova 14.0.0.0b1

This issue was fixed in the openstack/nova 14.0.0.0b1 development milestone.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/nova 12.0.4

This issue was fixed in the openstack/nova 12.0.4 release.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/nova 13.1.0

This issue was fixed in the openstack/nova 13.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/382030
Reason: This review is > 4 weeks without comment, and is not mergable in it's current state. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.