Unable to remove snapshots after an instance is unshelved when using the rbd imagebackend
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
Lee Yarwood | ||
Queens |
Fix Committed
|
Medium
|
Lee Yarwood | ||
Rocky |
Fix Committed
|
Medium
|
Lee Yarwood | ||
Stein |
Fix Committed
|
Medium
|
Lee Yarwood |
Bug Description
Description
===========
I'm not entirely convinced that this is a bug but wanted to document and discuss this upstream.
When using the rbd imagebackend, snapshots used to shelve an instance cannot be removed after unshelving as they are cloned and as a result are now the parents of the recreated instance disks.
This is in line with the behaviour of the imagebackend when initially spawning an instance from an image but has caused confusion for operators downstream who assume that the snapshot can be removed once the instance has been unshelved.
We could flatten the instance disk when spawning during an unshelve but to do so would mean extending the imagebackend to handle yet another corner case for rbd.
Steps to reproduce
==================
$ nova boot --image cirros-raw --flavor 1 test-shelve
[..]
$ nova shelve test-shelve
[..]
$ nova unshelve test-shelve
[..]
$ sudo rbd -p vms ls -l
NAME SIZE PARENT FMT PROT LOCK
4c843671-
$ glance image-delete df96af36-
Unable to delete image 'df96af36-
We can easily workaround this by manually flattening the instance disk :
$ nova stop test-shelve
$ sudo rbd -p vms flatten 4c843671-
Image flatten: 100% complete...done.
$ nova start test-shelve
$ glance image-delete df96af36-
Expected result
===============
Able to remove the shelved snapshot from Glance after unshelve.
Actual result
=============
Unable to remove the shelved snapshot from Glance after unshelve.
Environment
===========
1. Exact version of OpenStack you are running. See the following
list for all releases: http://
$ pwd
/opt/stack/nova
$ git rev-parse HEAD
d768bfa2c2fb
2. Which hypervisor did you use?
(For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
What's the version of that?
libvirt + kvm
2. Which storage type did you use?
(For example: Ceph, LVM, GPFS, ...)
What's the version of that?
ceph
3. Which networking type did you use?
(For example: nova-network, Neutron with OpenVSwitch, ...)
N/A
Logs & Configs
==============
tags: | added: ceph |
Changed in nova: | |
assignee: | nobody → Ritesh (rsritesh) |
Changed in nova: | |
status: | New → In Progress |
Changed in nova: | |
status: | In Progress → Fix Committed |
Changed in nova: | |
assignee: | Ritesh (rsritesh) → Vladyslav Drok (vdrok) |
Changed in nova: | |
assignee: | Vladyslav Drok (vdrok) → melanie witt (melwitt) |
Changed in nova: | |
assignee: | melanie witt (melwitt) → Lee Yarwood (lyarwood) |
Changed in nova: | |
assignee: | Lee Yarwood (lyarwood) → Vladyslav Drok (vdrok) |
Changed in nova: | |
assignee: | Vladyslav Drok (vdrok) → Lee Yarwood (lyarwood) |
no longer affects: | nova/pike |
My first thought on this is that the cloned image is flattened as part of direct_snapshot, so I'm interested in finding out why that's not sufficient for the image-delete to work.
To investigate, I tried to repro this myself in devstack and am hitting multiple other bugs. I can't get it to complete the snapshot successfully and the instance does not get shelved.
The first trace I get is:
2017-01-05 01:14:50.499 ERROR nova.virt. libvirt. driver [req-a00f11f9- 344f-40b7- b2ef-27b46825d9 d6 admin admin] Failed to snapshot image libvirt. driver Traceback (most recent call last): libvirt. driver File "/opt/stack/ nova/nova/ virt/libvirt/ driver. py", line 1559, in snapshot libvirt. driver instance.image_ref) libvirt. driver File "/opt/stack/ nova/nova/ virt/libvirt/ imagebackend. py", line 990, in direct_snapshot libvirt. driver self.driver. clone(location, image_id, dest_pool= parent_ pool) libvirt. driver File "/opt/stack/ nova/nova/ virt/libvirt/ storage/ rbd_utils. py", line 234, in clone libvirt. driver with RADOSClient(self, dest_pool) as dest_client: libvirt. driver File "/opt/stack/ nova/nova/ virt/libvirt/ storage/ rbd_utils. py", line 105, in __init__ libvirt. driver self.cluster, self.ioctx = driver. _connect_ to_rados( pool) libvirt. driver File "/opt/stack/ nova/nova/ virt/libvirt/ storage/ rbd_utils. py", line 138, in _connect_to_rados libvirt. driver ioctx = client. open_ioctx( pool_to_ open) libvirt. driver File "/usr/lib/ python2. 7/dist- packages/ rados.py" , line 662, in open_ioctx libvirt. driver raise TypeError('the name of the pool must be a string') libvirt. driver TypeError: the name of the pool must be a string
2017-01-05 01:14:50.499 TRACE nova.virt.
2017-01-05 01:14:50.499 TRACE nova.virt.
2017-01-05 01:14:50.499 TRACE nova.virt.
2017-01-05 01:14:50.499 TRACE nova.virt.
2017-01-05 01:14:50.499 TRACE nova.virt.
2017-01-05 01:14:50.499 TRACE nova.virt.
2017-01-05 01:14:50.499 TRACE nova.virt.
2017-01-05 01:14:50.499 TRACE nova.virt.
2017-01-05 01:14:50.499 TRACE nova.virt.
2017-01-05 01:14:50.499 TRACE nova.virt.
2017-01-05 01:14:50.499 TRACE nova.virt.
2017-01-05 01:14:50.499 TRACE nova.virt.
2017-01-05 01:14:50.499 TRACE nova.virt.
2017-01-05 01:14:50.499 TRACE nova.virt.
which I remedied locally by s/dest_ pool/str( dest_pool) / in the clone function in rbd_utils.py. In my environment, dest_pool is of type unicode.
After that, I hit the next trace:
2017-01-05 01:20:31.675 ERROR nova.virt. libvirt. driver [req-152c9895- 39cc-464a- ae38-0e6f613278 b5 admin admin] Failed to snapshot image libvirt. driver Traceback (most recent call last): libvirt. driver File "/opt/stack/ nova/nova/ virt/libvirt/ driver. py", line 1559, in snapshot libvirt. driver instance.image_ref) libvirt. driver File "/opt/stack/ nova/nova/ virt/libvirt/ imagebackend. py", line 995, in direct_snapshot libvirt. driver self.cleanup_ direct_ snapshot( location) libvirt. driver File "/opt/stack/ nova/nova/ virt/libvirt/ imagebackend. py", line 1017, in cleanup_d...
2017-01-05 01:20:31.675 TRACE nova.virt.
2017-01-05 01:20:31.675 TRACE nova.virt.
2017-01-05 01:20:31.675 TRACE nova.virt.
2017-01-05 01:20:31.675 TRACE nova.virt.
2017-01-05 01:20:31.675 TRACE nova.virt.
2017-01-05 01:20:31.675 TRACE nova.virt.