cinder-backup ceph snapshot delete

Bug #1933265 reported by masterpe
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Cinder
New
Low
Unassigned

Bug Description

When we create a backup in Openstack using version Train and Ussuri with a Ceph based volume and backup.

Ceph is running Nautilus.

When we create 4 backups of a volume that volume gets 4 snapshots on the source. When we delete all of the backups the 4 backups still exists on the source volume.

To reproduce we do:
openstack volume create --size 1 vol1
This volume gets ID: 4dad898a-a670-4745-a142-da884e7b45da
then 4 times: openstack volume backup create 4dad898a-a670-4745-a142-da884e7b45da to create 4 backpup.

At the Ceph cluster we see at rbd info -p volumes volume-4dad898a-a670-4745-a142-da884e7b45da|grep "snapshot_count:"
snapshot_count: 4

Changed in cinder:
importance: Undecided → Low
tags: added: backup-service ceph rbd snapshot
Revision history for this message
Christian Rohmann (christian-rohmann) wrote :

Yes, I have been wondering about this myself.

Apparently there was a major change in snapshot management to the cinder-backup ceph driver:

    https://github.com/openstack/cinder/commit/bc9ab142da919087c71525978f960115ff0259b9
    Bug: https://bugs.launchpad.net/cinder/+bug/1703011

This causes all snapshots created on the source volume of a backup to remain (see https://github.com/openstack/cinder/commit/bc9ab142da919087c71525978f960115ff0259b9#diff-ad4253afa319b756b369399880a46ef8b186e46dbeb6ee06896c243f1f349e6fL716) together with "their" backups. Only if backups are deleted both the snapshots (target as well as source) should be cleaned up.

also see the discussion at https://bugs.launchpad.net/cinder/+bug/1907542

There is a discussion around volumes and their (user hidden) relation to the snapshots they originated from and how to resolve this transparently for the user while maintaining the storage efficiency.

* https://bugs.launchpad.net/cinder/+bug/1677525
* https://review.opendev.org/c/openstack/cinder/+/754397

Revision history for this message
masterpe (michiel-y) wrote :

I think that the biggest problem is also when the backup gets deleted the source snapshot never gets deleted. This means that we ended up with ceph images (cinder volumes) that have 200 or more snapshots. All of them with the prefix name backup.

Revision history for this message
masterpe (michiel-y) wrote :
Download full text (5.4 KiB)

This bug started in Stein.

I have wrote a script that deletes the snapshots that are created for the backups. Except the last backup snapshot.

#!/usr/bin/python3

import rados
import sys
import rbd
import re
import time

"""
Volumes can have two types of names:
volume-5b2851f8-4722-42c2-a6dd-4dc359afbf6c
volume-81895053-a4d4-4cdb-9e1f-4e8340d56949.deleted

Volumes with the suffix deleted, are volumes that are deleted but have related glance images to it or have childeren to a new cinder volumes.

A Cinder volume can have snapshots, Glance images and cinder backup.
In Ceph a cinder snapshot, this is a Ceph snapshot with the prefix of the name "snapshot-", with the protection set and the ID of the snapshot:
snapshot-722ad9a3-e1bf-416f-81d4-01a4f32c02ee

When a Glance image gets created based from a Cinder Volume, a new cinder volume gets created and in Ceph a new snapshot gets
created with the name volume-<UUID of of the new cinder_volume>.clone_snap

When a cinder-backup full backup gets created.
    On the source ceph cluster the image gets a snapshot with syntax "backup.<UUID of the backup>.snap.<timestamp>":
    backup.03fc2dfb-9654-4f00-ac11-a8f471c599d0.snap.1625049437.645906

    On the destination ceph cluster the naming is volume-<UUID of the volume>.backup.<UUID of the backup> and the name of
    the snapshot backup.<UUID of the backup>.snap.<timestamp>

When a incremental cinder backup is created
    on the source the naming of the snapshot is the same as with a full backup: backup.<UUID of the backup>.snap.<timestamp>
    on the destination the volume of the full backup is used and a snapshot is created with syntax "backup.<UUID of the backup>.snap.<timestamp>"

When a backup also has incremental backups you are unable to delete in Cinder the full backups.

So on the source Ceph cluster, a volume can have three types of snapshots:
    snapshot-722ad9a3-e1bf-416f-81d4-01a4f32c02ee
    volume-7048a7bf-530b-42ee-bfe9-6221c3b9f384.clone_snap
    backup.e224e8d2-5bda-4a8a-8841-0c578737f865.snap.1625049686.5710375
"""

class Cleanup:
    dryRun = True

    def __init__(self):
        self.__cluster__ = rados.Rados(conffile='/etc/ceph/ceph.conf')
        print("\nlibrados version: {}".format(str(self.__cluster__.version())))
        print("Will attempt to connect to: {}".format(str(self.__cluster__.conf_get('mon host'))))

        self.__cluster__.connect()
        print("\nCluster ID: {}".format(self.__cluster__.get_fsid()))

        print("\n\nCluster Statistics")
        print("==================")
        cluster_stats = self.__cluster__.get_cluster_stats()

        ## open pool
        self.__ioctx__ = self.__cluster__.open_ioctx('volumes')
        self.__rbd_inst__ = rbd.RBD()
        self.__allImages__ = self.__rbd_inst__.list(self.__ioctx__)

    def getAllCinderVolumes(self):
        allFoundImages = []
        for imageName in self.__allImages__:
            if not re.search('^volume-[a-z0-9-]{36}$', imageName) == None:
                allFoundImages.append(imageName)
        return allFoundImages

    def getSnapshotsOfImage(self, imageName):
        snapshots = []
        image = rbd.Image(self.__ioctx__, name=imageName)

 ...

Read more...

Revision history for this message
zhou jielei (leonunix) wrote :

this bug is still in yoga

Revision history for this message
masterpe (michiel-y) wrote :

I think it is best to get the these undeleted source ceph snapshots into cinder manage. Cinder manage can then do a maintance task in deleting these ceph snapshots.

Revision history for this message
garcetto2 (garcetto2) wrote :

good morning,
 same problem here, quite big european csp based on openstack, any news?
thank you.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.