Nova fails to clean up objects in CEPH as resize operation fails

Bug #1830088 reported by Josue Palmerin
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Invalid
High
Oleksiy Molchanov

Bug Description

Description
===========
During a nova resize operation CEPH (rbd) creates a snap object that gets moved to its destination. When nova resize fails for x reason, the rados object never gets cleaned up by nova and any attempts to resize thereafter will fail

Steps to reproduce
==================
- create an instance.
- resize the instance to a smaller image.
- this will cause a failure.
- attempt to resize the instance again.
- The resize operation fails with following error -

Error resizing server
 Traceback (most recent call last):
 File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 4041, in finish_resize
 disk_info, image_meta)
 File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 4006, in _finish_resize
 old_instance_type)
 File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
 self.force_reraise()
 File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
 six.reraise(self.type_, self.value, self.tb)
 File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 4001, in _finish_resize
 block_device_info, power_on)
 File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 7423, in finish_migration
 fallback_from_host=migration.source_compute)
 File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3194, in _create_image
 backend.create_snap(libvirt_utils.RESIZE_SNAPSHOT_NAME)
 File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 940, in create_snap
 return self.driver.create_snap(self.rbd_name, name)
 File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/storage/rbd_utils.py", line 397, in create_snap
 vol.create_snap(name)
 File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
 result = proxy_call(self._autowrap, f, *args, **kwargs)
 File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call
 rv = execute(f, *args, **kwargs)
 File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
 six.reraise(c, e, tb)
 File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
 rv = meth(*args, **kwargs)
 File "/usr/lib/python2.7/dist-packages/rbd.py", line 594, in create_snap
 raise make_ex(ret, 'error creating snapshot %s from %s' % (name, self.name))
 ImageExists: error creating snapshot nova-resize from 84e8d232-5451-46d1-82a5-873f74fd69c1_disk

Expected result
===============
Expect to successfully resize the instance.

Actual result
=============
Resize fails with above error.

Workaround
==========
Manually remove rbd object from the compute pool
1. Identify nova_resize object for pool
   rbd snap ls -p compute 84e8d232-5451-46d1-82a5-873f74fd69c1_disk
2. Remove nova_resize snap
   rbd snap rm compute/84e8d232-5451-46d1-82a5-873f74fd69c1_disk@nova_resize
3. Attempt to resize again.

Environment
===========

1. Openstack Mitaka

2. Which storage type did you use?
   CEPH

3. Which networking type did you use?
Not relevant

Logs & Configs
==============
The above exception trace should give a good understanding of where the issue is occurring.

Changed in mos:
assignee: nobody → Oleksiy Molchanov (omolchanov)
Changed in mos:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Cannot reproduce.

Changed in mos:
status: Confirmed → Incomplete
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Moving to invalid

Changed in mos:
status: Incomplete → Invalid
Revision history for this message
yasin (lachini) wrote :

I have the same problem in queen

Revision history for this message
Laurent Dumont (baconpackets) wrote :

We hit the same issue in Ocata.

- Failed resize left the VM in error state because the new flavor was missing the Video Memory parameters (it's a Windows VM).
- After resetting the state of the VM and re-doing a resize, the VM state was changed to ERROR again with the
"Image Exists" error message.

nImageExists: error creating snapshot nova-resize from fb8f7f4a-0fcf-4603-8742-7212f718cc15_disk\n', 'message': 'ImageExists', 'created': '2020-06-03T16:01:44Z', 'code': 500} |

Deleted the extra snapshot from Ceph and all was okay.

sudo rbd ls -l nova-ssd | grep fb8f7f4a
fb8f7f4a-0fcf-4603-8742-7212f718cc15_disk 160G glance/412bf28a-c547-494d-a236-b3a85d4bf177@snap 2
fb8f7f4a-0fcf-4603-8742-7212f718cc15_disk@nova-resize 136G glance/412bf28a-c547-494d-a236-

sudo rbd snap rm nova-ssd/fb8f7f4a-0fcf-4603-8742-7212f718cc15_disk@nova-resize

sudo rbd ls -l nova-ssd | grep fb8f7f4a
fb8f7f4a-0fcf-4603-8742-7212f718cc15_disk 160G glance/412bf28a-c547-494d-a236-b3a85d4bf177@snap 2

Revision history for this message
Khuong Luu (organic-doge) wrote :

Thanks, Laurent. I had the same issue today and your workaround resolves it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.