Cannot delete snapshot: invalid backing file

Bug #2009849 reported by Nicolas Melot
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Incomplete
Undecided
Unassigned

Bug Description

I fail to remove a volume snapshot using Horizon or CLI. Cinder fails to remove snapshot but does not report any error. Only the snapshot remains after deletion, with state "error_deleting". The CLI command return 0. See below for a detailed trace:

[root@phoenix ~]# openstack volume snapshot list
+--------------------------------------+------------------------+-------------+--------+------+
| ID | Name | Description | Status | Size |
+--------------------------------------+------------------------+-------------+--------+------+
| 3df6ef28-7f8d-45bc-a9e2-d67703c020ae | Debian Bullseye LVM 8G | | error | 8 |
+--------------------------------------+------------------------+-------------+--------+------+
[root@phoenix ~]# openstack volume snapshot delete 3df6ef28-7f8d-45bc-a9e2-d67703c020ae; echo $?
0
[root@phoenix ~]# openstack volume snapshot list
+--------------------------------------+------------------------+-------------+----------------+------+
| ID | Name | Description | Status | Size |
+--------------------------------------+------------------------+-------------+----------------+------+
| 3df6ef28-7f8d-45bc-a9e2-d67703c020ae | Debian Bullseye LVM 8G | | error_deleting | 8 |
+--------------------------------------+------------------------+-------------+----------------+------+
[root@phoenix ~]# #openstack volume snapshot delete --force 3df6ef28-7f8d-45bc-a9e2-d67703c020ae; echo $?
[root@phoenix ~]# openstack volume snapshot set --state error 3df6ef28-7f8d-45bc-a9e2-d67703c020ae; echo $?
0
[root@phoenix ~]# openstack volume snapshot delete --force 3df6ef28-7f8d-45bc-a9e2-d67703c020ae; echo $?
0
[root@phoenix ~]# openstack volume snapshot list
+--------------------------------------+------------------------+-------------+----------------+------+
| ID | Name | Description | Status | Size |
+--------------------------------------+------------------------+-------------+----------------+------+
| 3df6ef28-7f8d-45bc-a9e2-d67703c020ae | Debian Bullseye LVM 8G | | error_deleting | 8 |
+--------------------------------------+------------------------+-------------+----------------+------+
[root@phoenix ~]# vim /etc/cinder/cinder.conf
[root@phoenix ~]# openstack volume snapshot set --state error 3df6ef28-7f8d-45bc-a9e2-d67703c020ae; echo $?
0
[root@phoenix ~]# openstack volume snapshot delete --force 3df6ef28-7f8d-45bc-a9e2-d67703c020ae; echo $?
0
[root@phoenix ~]# openstack volume snapshot list
+--------------------------------------+------------------------+-------------+----------------+------+
| ID | Name | Description | Status | Size |
+--------------------------------------+------------------------+-------------+----------------+------+
| 3df6ef28-7f8d-45bc-a9e2-d67703c020ae | Debian Bullseye LVM 8G | | error_deleting | 8 |
+--------------------------------------+------------------------+-------------+----------------+------+
[root@phoenix ~]#
[root@phoenix ~]# openstack volume snapshot delete --force 3df6ef28-7f8d-45bc-a9e2-d67703c020ae; echo $?
0
[root@phoenix ~]# openstack volume snapshot list
+--------------------------------------+------------------------+-------------+----------------+------+
| ID | Name | Description | Status | Size |
+--------------------------------------+------------------------+-------------+----------------+------+
| 3df6ef28-7f8d-45bc-a9e2-d67703c020ae | Debian Bullseye LVM 8G | | error_deleting | 8 |
+--------------------------------------+------------------------+-------------+----------------+------+

journalctl does not yield any output despite having activated full logs in /etc/cinder/cinder.conf
#log_config_append = <None>

...

default_log_levels = amqp=DEBUG,amqplib=DEBUG,boto=DEBUG,qpid=DEBUG,sqlalchemy=DEBUG,suds=DEBUG,oslo.messaging=DEBUG,oslo_messaging=DEBUG,iso8601=DEBUG,requests.packages.urllib3.connectionpool=DEBUG,urllib3.connectionpool=DEBUG,websocket=DEBUG,requests.packages.urllib3.util.retry=DEBUG,urllib3.util.retry=DEBUG,keystonemiddleware=DEBUG,routes.middleware=DEBUG,stevedore=DEBUG,taskflow=DEBUG,keystoneauth=DEBUG,oslo.cache=DEBUG,oslo_policy=DEBUG,dogpile.core.dogpile=DEBUG

But errors are reported in /var/log/cinder/volume.log:
var/log/cinder/volume.log:103148:2023-03-09 15:18:20.274 3152 ERROR oslo_messaging.rpc.server cinder.exception.RemoteFSInvalidBackingFile: File /media/os-stores/cinder/mnt/ea003e1a1bee1aaf573aafc40cd188ff/volume-e67af022-0d5b-4253-9981-c8b34622e058.3df6ef28-7f8d-45bc-a9e2-d67703c020ae has invalid backing file /media/os-stores/cinder/mnt/ea003e1a1bee1aaf573aafc40cd188ff/volume-e67af022-0d5b-4253-9981-c8b34622e058

I tried to create a new qcow2 image to replace /media/os-stores/cinder/mnt/ea003e1a1bee1aaf573aafc40cd188ff/volume-e67af022-0d5b-4253-9981-c8b34622e058 and a snapshot using this new image that I placed instead of /media/os-stores/cinder/mnt/ea003e1a1bee1aaf573aafc40cd188ff/volume-e67af022-0d5b-4253-9981-c8b34622e058.3df6ef28-7f8d-45bc-a9e2-d67703c020ae, then I tried to remove the snapshot again using Horizon, but the deletion fails still and the same trace is produced in the logs.

As far as I can see, the only remaining option to me is to manually delete the snapshot and the volume, in a way similar to https://wiki.infn.it/progetti/cloud-areapd/operations/manual_delete_cinder_volumes. This is far from ideal though, as the risk to set the database in an inconsistent state is high.

Can you suggest a safe solution to remove the sticky snapshot?

Tags: drivers nfs
Revision history for this message
Eric Harney (eharney) wrote :

> ERROR oslo_messaging.rpc.server cinder.exception.RemoteFSInvalidBackingFile: File /media/os-stores/cinder/mnt/ea003e1a1bee1aaf573aafc40cd188ff/volume-e67af022-0d5b-4253-9981-c8b34622e058.3df6ef28-7f8d-45bc-a9e2-d67703c020ae has invalid backing file /media/os-stores/cinder/mnt/ea003e1a1bee1aaf573aafc40cd188ff/volume-e67af022-0d5b-4253-9981-c8b34622e058

I think this failure occurs because the backing file should just be "volume-e67af022-0d5b-4253-9981-c8b34622e058" without a path specified. Do you know what sequence of operations led to this state?

tags: added: drivers nfs
Revision history for this message
Nicolas Melot (nicmel) wrote :

I re-created the snapshot manually (see info below) but the issue persists.

[root@phoenix ea003e1a1bee1aaf573aafc40cd188ff]# qemu-img info volume-e67af022-0d5b-4253-9981-c8b34622e058.3df6ef28-7f8d-45bc-a9e2-d67703c020ae
image: volume-e67af022-0d5b-4253-9981-c8b34622e058.3df6ef28-7f8d-45bc-a9e2-d67703c020ae
file format: qcow2
virtual size: 1 GiB (1073741824 bytes)
disk size: 196 KiB
cluster_size: 65536
backing file: volume-e67af022-0d5b-4253-9981-c8b34622e058
backing file format: qcow2
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false
[root@phoenix ea003e1a1bee1aaf573aafc40cd188ff]#

I created this snapshot from a volume attached to a VM while the vm was shutdown. Openstack would not still not allow the operation while the volume is attached so I used the "force" switch via Horizon. Since then the VM has been deleted. The creation of new snapshots from the same volume also fails, but I can remove the failed snapshots.
Maybe worth to mention: my setup is a all-in-one setup, manually installed; no ansible involved. Cinder is configured to use the NFS backend, where the NFS server is also hosted in the same machine.

Revision history for this message
Nicolas Melot (nicmel) wrote :

It turns out cinder expected the backing file path to begin with /var/lib/cinder/mnt/[0-9a-f]+/ that my attempts reported here, /media/os-stores/cinder/mnt/ea003e1a1bee1aaf573aafc40cd188ff/volume-e67af022-0d5b-4253-9981-c8b34622e058 and just volume-e67af022-0d5b-4253-9981-c8b34622e058.

I found out by reading the code in the last lines of the trace (below, not reported so far), opened /usr/lib/python3.9/site-packages/cinder/volume/drivers/remotefs.py" at line 880 and edit the code so that the exception shows both what it got and the pattern its matched it against and that failed. The code modification is a gross but harmless hack so I would not merging it. However it did turn the log trace from vaguely explaining what the problem is, to exactly showing it.

File "/usr/lib/python3.9/site-packages/cinder/volume/drivers/remotefs.py", line 880, in _qemu_img_info_base
2023-03-11 12:34:25.228 315328 ERROR oslo_messaging.rpc.server raise exception.RemoteFSInvalidBackingFile(
2023-03-11 12:34:25.228 315328 ERROR oslo_messaging.rpc.server cinder.exception.RemoteFSInvalidBackingFile: File /media/os-stores/cinder/mnt/ea003e1a1bee1aaf573aafc40cd188ff/volume-e67af022-0d5b-4253-9981-c8b34622e058.3df6ef28-7f8d-45bc-a9e2-d67703c020ae has invalid backing file backing file: "/media/os-stores/cinder/mnt/ea003e1a1bee1aaf573aafc40cd188ff/volume-e67af022-0d5b-4253-9981-c8b34622e058, expected: (/var/lib/cinder/mnt/[0-9a-f]+/)?volume-e67af022-0d5b-4253-9981-c8b34622e058(.(tmp-snap-)?[0-9a-f-]+)?$".

/Nicolas

Revision history for this message
Nicolas Melot (nicmel) wrote :

After having deleted the snapshot, I created a new one using horizon, then spawned a new VM on the volume. The VM failed to be spawned with the same message as above ("Invalid backing file") and deleting the snapshot resulted in the same error. qemu-img info showed the information below:

[root@phoenix ea003e1a1bee1aaf573aafc40cd188ff]# qemu-img info volume-e67af022-0d5b-4253-9981-c8b34622e058.cf3311ad-05eb-42fd-9b9c-1e9012a4a1f3
image: volume-e67af022-0d5b-4253-9981-c8b34622e058.cf3311ad-05eb-42fd-9b9c-1e9012a4a1f3
file format: qcow2
virtual size: 8 GiB (8589934592 bytes)
disk size: 196 KiB
cluster_size: 65536
backing file: volume-e67af022-0d5b-4253-9981-c8b34622e058
backing file format: qcow2
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false
[root@phoenix ea003e1a1bee1aaf573aafc40cd188ff]#

So it would appear that either cinder creates a snapshot file with a wrong backing filename (bare volume filename with no path), or the check it performs before spawning a VM or deleting the snapshot (expect /var/lib/cinder/mnt/.../volume-...)

Revision history for this message
Sofia Enriquez (lsofia-enriquez) wrote :

Hello Nicolas Melot,

I hope you are doing well. To solve the problem you are experiencing, I need some more information from you. Could you please provide me with the following details:

Can you explain the exact steps you are taking using CLI? This will help me to understand the sequence of operations that led to this issue. Did you create a nova instance from the volume or the snapshot? If you created the instance from the volume, did you create the snapshot before or after?
What version of Openstack are you currently using?

Thank you, and have a great day!

Changed in cinder:
status: New → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.