Saw this on a change I'm working on to fix the ceph job on master:
https://review.openstack.org/563238
where the ceph job failed with this trace [0]:
Apr 20 19:47:40.765248 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
Apr 20 19:47:40.765384 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
Apr 20 19:47:40.765508 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
Apr 20 19:47:40.765633 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
Apr 20 19:47:40.765780 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
Apr 20 19:47:40.765917 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
Apr 20 19:47:40.766046 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
Apr 20 19:47:40.766174 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server File "<decorator-gen-242>", line 2, in delete_volume
Apr 20 19:47:40.766294 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/coordination.py", line 151, in _synchronized
Apr 20 19:47:40.766413 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server return f(*a, **k)
Apr 20 19:47:40.766539 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server File "<decorator-gen-241>", line 2, in delete_volume
Apr 20 19:47:40.766658 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/objects/cleanable.py", line 207, in wrapper
Apr 20 19:47:40.766788 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server result = f(*args, **kwargs)
Apr 20 19:47:40.766923 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/volume/manager.py", line 872, in delete_volume
Apr 20 19:47:40.767054 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server self.publish_service_capabilities(context)
Apr 20 19:47:40.767168 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/volume/manager.py", line 2527, in publish_service_capabilities
Apr 20 19:47:40.767314 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server self._report_driver_status(context)
Apr 20 19:47:40.767487 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/volume/manager.py", line 2424, in _report_driver_status
Apr 20 19:47:40.767611 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server volume_stats = self.driver.get_volume_stats(refresh=True)
Apr 20 19:47:40.767747 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/volume/drivers/rbd.py", line 528, in get_volume_stats
Apr 20 19:47:40.767877 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server self._update_volume_stats()
Apr 20 19:47:40.768006 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/volume/drivers/rbd.py", line 514, in _update_volume_stats
Apr 20 19:47:40.768124 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server total_gbi = self._get_usage_info()
Apr 20 19:47:40.768250 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/volume/drivers/rbd.py", line 408, in _get_usage_info
Apr 20 19:47:40.768375 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server ioctx=client.ioctx) as v:
Apr 20 19:47:40.768495 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/volume/drivers/rbd.py", line 147, in __init__
Apr 20 19:47:40.768647 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server read_only=read_only)
Apr 20 19:47:40.768777 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server File "rbd.pyx", line 1392, in rbd.Image.__init__ (/build/ceph-7hHYeL/ceph-12.2.4/obj-x86_64-linux-gnu/src/pybind/rbd/pyrex/rbd.c:13545)
Apr 20 19:47:40.768897 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server ImageNotFound: [errno 2] error opening image volume-ac0586ce-26c6-4e04-a4d6-78a941a93ad7 at snapshot None
Apr 20 19:47:40.769016 ubuntu-xenial-ovh-bhs1-0003655341 cinder-volume[2422]: ERROR oslo_messaging.rpc.server
I found an old related bug:
https://bugs.launchpad.net/cinder/+bug/1698786
that was fixed in the past by:
https://review.openstack.org/475400
but then I found this change re-introduced the problem:
https://review.openstack.org/486734/4/cinder/volume/drivers/rbd.py@381
Chatting on IRC with e0ne, he pointed out this is likely a race:
[13:50:31] <e0ne> do you see this error on each patch?
[13:50:49] <melwitt> no, it's intermittent
[13:55:27] <e0ne> it could be some race between delete_volume and get_pool_stats calls
[13:56:56] <melwitt> ah yeah, that would make sense
[13:57:34] <e0ne> 19:47:40.530030 - deleting volume
[13:57:53] <e0ne> 19:47:40.765109 - ImageNotFound exception raised
[14:00:50] <melwitt> yeah. definitely makes sense to be resilient there because all it's doing is listing images, then assuming each item in the list exists, when really it could be deleted out from underneath
[14:02:22] <melwitt> between the RBDProxy().list and the RBDVolumeProxy()
[0] http://logs.openstack.org/38/563238/1/check/legacy-tempest-dsvm-full-devstack-plugin-ceph/e42009b/logs/screen-c-vol.txt.gz?level=TRACE#_Apr_20_19_47_40_765248
Fix is proposed here: https:/ /review. openstack. org/563313