Comment 0 for bug 2012622

Revision history for this message
Enrico Bocchi (ebocchi) wrote :

Description
===========

The Cinder controller provides the ability to make backup of volumes. In case of an "in-use" volume, it is still possible to create the backup by "--force"-ing it, leading to the creation of an temporary snapshot and volume to backup from. When the backup process completes, Cinder deletes the intermediate snapshot and volume.

The deletion of the snapshot may fail in case deferred deletions are enabled. This is due to the fact the temporary volume, which is a child of the snapshot, prevents the deletion of the latter.
The temporary volume remains in "error_deleting" state, but will be eventually deleted once the asynchronous trash purging kicks in.

This issue has been identified with:
- Cinder 18.1.0 (and is still present in master)
- Ceph RBD Pacific, 16.2.9

Steps to reproduce
==================
* Configure one Ceph RBD cluster to be used with Cinder for the provisioning of volumes and enable deferred deletion.
* Make a backup of an in-use volume using the `--force` flag. This will generate a snapshot and a temporary volume (created from the snapshot) that will be used to make the backup.
* Once the backup process completes, cinder tries to delete the temporary volume and the snapshot to clean up. Given deferred deletion is enabled, the temporary volume is move to trash instead of being immediately deleted.
* When cinder tries to unprotect and delete the snapshot from the original volume, Ceph librdb refuses as the temporary volume in the trash is a child of the snapshot and returns "[errno 16] RBD image is busy".

* Use an alias in `ceph.conf` to reach the mons of the Ceph cluster
* Start the Manila controller
* Replace one existing mons with another one (e.g., due to HW failure) that has a different IP address
* Update the alias members to remove the old mon and add the new one

Expected result
===============
The cleanup procedure kicking in when a backup completes successfully deletes the temporary volume and snapshots, leaving the original volume and its backup.

Actual result
=============
* The temporary volume is eventually deleted from the trash thanks to the asynchronous trash purging, but it remains in the cinder volume list with state "error_deleting"
* The snapshot is never visible through volume snapshot list, but instead remains on ceph RBD as cinder failed to unprotect and remove it (due to the temporary volume blocking).

Further comments
================
- I am attaching a patch against master (last commit for rbd.py being c827e5f8867fe71ca121b5671284b852c218aa23)