Activity log for bug #1716920

Date Who What changed Old value New value Message
2017-09-13 13:24:10 Silvan Kaiser bug added bug
2017-09-13 13:24:26 Silvan Kaiser tags snapshot quobyte remotefs snapshot
2017-09-13 13:25:12 Silvan Kaiser bug task added nova
2017-09-13 13:26:23 Silvan Kaiser description The deletion of online snapshots of remotefs based volumes breaks the .info file/backing chain of these volumes. Logs can be seen in any current Quobyte CI run in Cinder/Nova/OS-Brick . Afaics the the other driver using this (VZstorage) has it's CI skip the affected tests (e.g. test_snapshot_create_delete_with_volume_in_use). I ran a lot of tests and so far i can say that the first deletion of a member in the backing chain works (snapshot is deleted) but seemingly leaves the .info files content and/or the backing chain of the volume file in a broken state. The error can be identified e.g. by the following log pattern: This is the first snapshot chain deletion that runs successfully (the snapshots id is 91755e5f-e573-4ddb-84af-3712d69a). The ID of the snapshot and its snapshot_file name do match their uuids: 2017-09-13 08:28:59.436 20467 DEBUG cinder.volume.drivers.remotefs [req-eda7ddf5-217d-490d-a8d4-1813df68d8db tempest-VolumesSnapshotTestJSON-708947401 -] Deleting online snapshot 91755e5f-e573-4ddb-84af-3712d69a fc89 of volume 94598844-418c-4b5d-b034-5330e24e7421 _delete_snapshot /opt/stack/cinder/cinder/volume/drivers/remotefs.py:1099 2017-09-13 08:28:59.487 20467 DEBUG cinder.volume.drivers.remotefs [req-eda7ddf5-217d-490d-a8d4-1813df68d8db tempest-VolumesSnapshotTestJSON-708947401 -] snapshot_file for this snap is: volume-94598844-418c-4b5d-b034-5330e24e7421.91755e5f-e573-4ddb-84af-3712d69afc89 _delete_snapshot /opt/stack/cinder/cinder/volume/drivers/remotefs.py:1124 The next snapshot to be deleted (138a1f62-7582-4aaa-9d72-9eada34b) shows that a wrong snapshot_file is read from the volumes .info file. In fact it shows the file of the previous snapshot : 2017-09-13 08:29:01.857 20467 DEBUG cinder.volume.drivers.remotefs [req-6ad4add9-34b8-41b9-a1f0-7dc2d6bb1862 tempest-VolumesSnapshotTestJSON-708947401 -] Deleting online snapshot 138a1f62-7582-4aaa-9d72-9eada34b eeaf of volume 94598844-418c-4b5d-b034-5330e24e7421 _delete_snapshot /opt/stack/cinder/cinder/volume/drivers/remotefs.py:1099 2017-09-13 08:29:01.872 20467 DEBUG cinder.volume.drivers.remotefs [req-6ad4add9-34b8-41b9-a1f0-7dc2d6bb1862 tempest-VolumesSnapshotTestJSON-708947401 -] snapshot_file for this snap is: volume-94598844-418c-4b5d-b034-5330e24e7421.91755e5f-e573-4ddb-84af-3712d69afc89 _delete_snapshot /opt/stack/cinder/cinder/volume/drivers/remotefs.py:1124 Now this second snapshot deletion fails because the snapshot file for 138a1f62-7582-4aaa-9d72-9eada34b no longer exists: 2017-09-13 08:29:02.674 20467 ERROR oslo_messaging.rpc.server ProcessExecutionError: Unexpected error while running command. 2017-09-13 08:29:02.674 20467 ERROR oslo_messaging.rpc.server Command: /usr/bin/python -m oslo_concurrency.prlimit --as=1073741824 --cpu=8 -- env LC_ALL=C qemu-img info /opt/stack/data/cinder/mnt/a1e3635ffba9fce1b854369f1a255d7b/volume-94598844-418c-4b5d-b034-5330e24e7421.138a1f62-7582-4aaa-9d72-9eada34beeaf 2017-09-13 08:29:02.674 20467 ERROR oslo_messaging.rpc.server Exit code: 1 2017-09-13 08:29:02.674 20467 ERROR oslo_messaging.rpc.server Stdout: u'' 2017-09-13 08:29:02.674 20467 ERROR oslo_messaging.rpc.server Stderr: u"qemu-img: Could not open '/opt/stack/data/cinder/mnt/a1e3635ffba9fce1b854369f1a255d7b/volume-94598844-418c-4b5d-b034-5330e24e7421.138a1f62-7582-4aaa-9d72-9eada34beeaf': Could not open '/opt/stack/data/cinder/mnt/a1e3635ffba9fce1b854369f1a255d7b/volume-94598844-418c-4b5d-b034-5330e24e7421.138a1f62-7582-4aaa-9d72-9eada34beeaf': No such file or directory\n" The referenced tempest test fails 100% of the time in our CIs. I manually tested the scenario and found the same results. Furthermore i was able, by creating three consecutive snapshots from a single volume and deleting them one after the other, to create a snapshot file with a broken backing file link. In the end i was left with a volume file and an overlay file referencing a removed backing file (previous snapshot of the same volume). I was able to run the scenario without issues when using offline snapshots. Thus this seems to be related to the usage of the online snapshot deletion via the Nova API. The deletion of online snapshots of remotefs based volumes breaks the .info file/backing chain of these volumes. Logs can be seen in any current Quobyte CI run in Cinder/Nova/OS-Brick . Afaics the the other driver using this (VZstorage) has it's CI skip the affected tests (e.g. test_snapshot_create_delete_with_volume_in_use). I ran a lot of tests and so far i can say that the first deletion of a member in the backing chain works (snapshot is deleted) but seemingly leaves the .info files content and/or the backing chain of the volume file in a broken state. The error can be identified e.g. by the following log pattern: This is the first snapshot deletion that runs successfully (the snapshots id is 91755e5f-e573-4ddb-84af-3712d69a). The ID of the snapshot and its snapshot_file name do match their uuids: 2017-09-13 08:28:59.436 20467 DEBUG cinder.volume.drivers.remotefs [req-eda7ddf5-217d-490d-a8d4-1813df68d8db tempest-VolumesSnapshotTestJSON-708947401 -] Deleting online snapshot 91755e5f-e573-4ddb-84af-3712d69a fc89 of volume 94598844-418c-4b5d-b034-5330e24e7421 _delete_snapshot /opt/stack/cinder/cinder/volume/drivers/remotefs.py:1099 2017-09-13 08:28:59.487 20467 DEBUG cinder.volume.drivers.remotefs [req-eda7ddf5-217d-490d-a8d4-1813df68d8db tempest-VolumesSnapshotTestJSON-708947401 -] snapshot_file for this snap is: volume-94598844-418c-4b5d-b034-5330e24e7421.91755e5f-e573-4ddb-84af-3712d69afc89 _delete_snapshot /opt/stack/cinder/cinder/volume/drivers/remotefs.py:1124 The next snapshot to be deleted (138a1f62-7582-4aaa-9d72-9eada34b) shows that a wrong snapshot_file is read from the volumes .info file. In fact it shows the file of the previous snapshot : 2017-09-13 08:29:01.857 20467 DEBUG cinder.volume.drivers.remotefs [req-6ad4add9-34b8-41b9-a1f0-7dc2d6bb1862 tempest-VolumesSnapshotTestJSON-708947401 -] Deleting online snapshot 138a1f62-7582-4aaa-9d72-9eada34b eeaf of volume 94598844-418c-4b5d-b034-5330e24e7421 _delete_snapshot /opt/stack/cinder/cinder/volume/drivers/remotefs.py:1099 2017-09-13 08:29:01.872 20467 DEBUG cinder.volume.drivers.remotefs [req-6ad4add9-34b8-41b9-a1f0-7dc2d6bb1862 tempest-VolumesSnapshotTestJSON-708947401 -] snapshot_file for this snap is: volume-94598844-418c-4b5d-b034-5330e24e7421.91755e5f-e573-4ddb-84af-3712d69afc89 _delete_snapshot /opt/stack/cinder/cinder/volume/drivers/remotefs.py:1124 Now this second snapshot deletion fails because the snapshot file for 138a1f62-7582-4aaa-9d72-9eada34b no longer exists: 2017-09-13 08:29:02.674 20467 ERROR oslo_messaging.rpc.server ProcessExecutionError: Unexpected error while running command. 2017-09-13 08:29:02.674 20467 ERROR oslo_messaging.rpc.server Command: /usr/bin/python -m oslo_concurrency.prlimit --as=1073741824 --cpu=8 -- env LC_ALL=C qemu-img info /opt/stack/data/cinder/mnt/a1e3635ffba9fce1b854369f1a255d7b/volume-94598844-418c-4b5d-b034-5330e24e7421.138a1f62-7582-4aaa-9d72-9eada34beeaf 2017-09-13 08:29:02.674 20467 ERROR oslo_messaging.rpc.server Exit code: 1 2017-09-13 08:29:02.674 20467 ERROR oslo_messaging.rpc.server Stdout: u'' 2017-09-13 08:29:02.674 20467 ERROR oslo_messaging.rpc.server Stderr: u"qemu-img: Could not open '/opt/stack/data/cinder/mnt/a1e3635ffba9fce1b854369f1a255d7b/volume-94598844-418c-4b5d-b034-5330e24e7421.138a1f62-7582-4aaa-9d72-9eada34beeaf': Could not open '/opt/stack/data/cinder/mnt/a1e3635ffba9fce1b854369f1a255d7b/volume-94598844-418c-4b5d-b034-5330e24e7421.138a1f62-7582-4aaa-9d72-9eada34beeaf': No such file or directory\n" The referenced tempest test fails 100% of the time in our CIs. I manually tested the scenario and found the same results. Furthermore i was able, by creating three consecutive snapshots from a single volume and deleting them one after the other, to create a snapshot file with a broken backing file link. In the end i was left with a volume file and an overlay file referencing a removed backing file (previous snapshot of the same volume). I was able to run the scenario without issues when using offline snapshots. Thus this seems to be related to the usage of the online snapshot deletion via the Nova API.
2017-09-14 09:22:12 Silvan Kaiser attachment added buglog-1716920.txt https://bugs.launchpad.net/nova/+bug/1716920/+attachment/4950014/+files/buglog-1716920.txt
2017-09-18 13:40:05 Kashyap Chamarthy bug added subscriber Kashyap Chamarthy
2017-10-12 13:50:34 Silvan Kaiser attachment added console.html https://bugs.launchpad.net/cinder/+bug/1716920/+attachment/4968684/+files/console.html
2017-10-13 13:22:05 Silvan Kaiser tags quobyte remotefs snapshot nfs quobyte remotefs snapshot
2017-10-13 13:22:29 Silvan Kaiser summary online snapshot deletion breaks volume info and backing chain (with remotefs drivers?) online snapshot deletion breaks backing chain with remotefs drivers
2017-11-07 14:21:22 Eric Harney cinder: importance Undecided High
2017-11-16 08:12:09 Silvan Kaiser tags nfs quobyte remotefs snapshot bugsmash nfs quobyte remotefs snapshot
2018-06-12 15:30:41 Eric Harney bug added subscriber Eric Harney
2018-08-17 12:01:35 Silvan Kaiser nova: status New Incomplete
2018-08-17 12:01:39 Silvan Kaiser cinder: status New Incomplete
2018-10-17 04:17:29 Launchpad Janitor nova: status Incomplete Expired
2018-12-17 10:50:02 Silvan Kaiser cinder: status Incomplete Invalid
2018-12-17 11:02:33 Silvan Kaiser nova: status Expired Invalid