Cinder fails to create image-based volume if mirroring is enabled
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Cinder |
Undecided
|
Corey Bryant | ||||
Ubuntu Cloud Archive | Status tracked in Victoria | |||||
Mitaka |
High
|
Unassigned | ||||
Queens |
High
|
Unassigned | ||||
Stein |
High
|
Unassigned | ||||
Train |
High
|
Unassigned | ||||
Ussuri |
High
|
Unassigned | ||||
Victoria |
High
|
Unassigned | ||||
charm-ceph-rbd-mirror |
Undecided
|
Unassigned | ||||
cinder (Ubuntu) |
High
|
Unassigned | ||||
Xenial |
High
|
Unassigned | ||||
Bionic |
High
|
Unassigned | ||||
Focal |
High
|
Unassigned | ||||
Groovy |
High
|
Unassigned |
Bug Description
OpenStack Train, Ceph Nautilus, ceph-rbd-mirror is deployed in dual-way.
Cinder has Ceph as backend for volumes.
Creating a volume from qcow2 image.
Current flow is the following:
1. Cinder creates empty volume in Ceph
2. Cinder downloads the image
3. Image is converted to raw
4. Volume is deleted https:/
5. Cinder performs "rbd import" using unpacked raw image as source.
Apparently rbd-mirror daemon creates a snapshot upon the image creation in Ceph (seems for mirroring purposes) which for empty image lasts for about a second.
It happens that step4 may be performed (very often) during the period of time when the snapshot exists, and it fails with "Cannot delete the volume with snapshots" error.
The only way to fix this behaviour - disable mirroring of the backend pool which is not desired.
Vern Hart (vern) wrote : | #1 |
Changed in charm-ceph-rbd-mirror: | |
assignee: | nobody → Aurelien Lourot (aurelien-lourot) |
Aurelien Lourot (aurelien-lourot) wrote : | #2 |
IMHO this is a bug in https:/
1. volume/
2. calls volume_
a. fetches the image,
b. deletes the volume, and
c. runs `rbd import`.
If a snapshot of the volume got created between 1 and 2b (e.g. by rbd mirror), 2b will fail. So I think we need one of these solutions:
1. Instead of creating a volume and immediately deleting it, let's not create it in the first place.
2. Is there a way to delete a volume and all its snapshot in an atomic way?
3. Can we force the deletion of a volume even if it has snapshots?
Changed in charm-ceph-rbd-mirror: | |
status: | New → Invalid |
assignee: | Aurelien Lourot (aurelien-lourot) → nobody |
Corey Bryant (corey.bryant) wrote : | #3 |
It's worth noting that the ceph-rbd-mirror charm uses RBD journaling to mirror RBD images between two Ceph clusters. [1] The ceph-rbd-mirror charm doesn't use the snapshot-based approach to RBD Mirroring.
Corey Bryant (corey.bryant) wrote : | #4 |
Documentation on ceph RBD mirroring modes (journal-based and snapshot-based): https:/
Andrey Grebennikov (agrebennikov) wrote : | #5 |
This is correct, however while watching the content of the snap list filtering by the volume ID it is clear that the snapshot does get created for this empty volume, having a name of "rbd_mirror.
Corey Bryant (corey.bryant) wrote : | #6 |
Andrey/Vern, can you help us understand where the snapshot is coming from? Perhaps we could also get confirmation that journaling is in fact configured for the rbd mirroring; I'm not sure how we can get that information, perhaps 'rbd mirror pool status' will show details.
Corey Bryant (corey.bryant) wrote : | #7 |
I have a bionic-train ceph-rbd-mirror deployment up (the smoke test from charm-ceph-
ceph-mon/0
----------
$ sudo ceph osd lspools
1 glance
2 cinder-ceph
ceph-rbd-mirror/0
-----------------
$ ls /etc/ceph
ceph.client.
$ sudo rbd --id rbd-mirror.
Total Images: 1
Total Snapshots: 1
Provisioned Size: 342 MiB
$ sudo rbd --id rbd-mirror.
Total Images: 1
Total Snapshots: 0
Provisioned Size: 8 GiB
$ sudo rbd --id rbd-mirror.
health: OK
images: 1 total
1 stopped
$ sudo rbd --id rbd-mirror.
health: OK
images: 1 total
1 stopped
$ sudo rbd --id rbd-mirror.
NAME SIZE PARENT FMT PROT LOCK
db355184-
db355184-
$ sudo rbd --id rbd-mirror.
NAME SIZE PARENT FMT PROT LOCK
volume-
$ sudo rbd --id rbd-mirror.
SNAPID NAME SIZE PROTECTED TIMESTAMP
24 snap 342 MiB yes Wed Oct 21 20:29:56 2020
Andrey Grebennikov (agrebennikov) wrote : | #8 |
Corey, as I mentioned, the snapshot lives for only about 1 second. In order to reproduce that:
1. Create a pool
2. Set up mirror
3. Create a volume from the image
4. IMMEDIATELY grab the volume id
5. run "watch -n0.2 rbd -p <pool> snap ls|grep <vol_id>" (alternatively just run watch without grep permanently). You will see the snapshot come and go.
Adding sleep to the cinder code can only help to overcome the issue (though will be dirty hack and will never be accepted to upstream), and won't help to watch the snapshot lifecycle.
The replication indeed is configured with journals because it is Nautilus (no snapshot-based mirroring yet).
James Page (james-page) wrote : | #9 |
The exception is raised in the following code block:
https:/
(maybe).
As this is as an expected situation then having some sort of retry around this might work OK
Corey Bryant (corey.bryant) wrote : | #10 |
@Andrey you're right, I think the sleep would have to be in ceph. @James Yes we might be able to do something like:
diff --git a/cinder/
index 5b868e4fe.
--- a/cinder/
+++ b/cinder/
@@ -1594,6 +1594,9 @@ class RBDDriver(
+ @utils.
+ self.configurat
+ self.configurat
def _copy_image_
Corey Bryant (corey.bryant) wrote : | #11 |
and the sleep was just a way to force recreation, not a fix :)
Fix proposed to branch: master
Review: https:/
Changed in cinder: | |
assignee: | nobody → Corey Bryant (corey.bryant) |
status: | New → In Progress |
Changed in cinder (Ubuntu Groovy): | |
status: | New → Triaged |
Changed in cinder (Ubuntu Focal): | |
status: | New → Triaged |
Changed in cinder (Ubuntu Bionic): | |
status: | New → Triaged |
Changed in cinder (Ubuntu Xenial): | |
status: | New → Triaged |
importance: | Undecided → High |
Changed in cinder (Ubuntu Focal): | |
importance: | Undecided → High |
Changed in cinder (Ubuntu Groovy): | |
importance: | Undecided → High |
Changed in cinder (Ubuntu Bionic): | |
importance: | Undecided → High |
Vern Hart (vern) wrote : | #13 |
I tested Corey's fix (via ppa) side-by-side with a cinder unit with version 2:15.4.
I used a bionic qcow2 image with:
openstack volume create --image bionic --size 3 --type stable2-az2 vernvol
Again, glance is backended by the az1 ceph cluster so this is a cross-cluster copy. In our previous tests, qcow2 images tend to fail at a much higher rate than raw images.
On the unpatched cinder, 77% failure with 23 out of 30 tries erring with:
Exception during message handling: cinder.
On Corey's patched version, 0% failure. 30 out of 30 volumes succeeded.
So, in this deployment, this simple retry solves the problem.
Corey Bryant (corey.bryant) wrote : | #14 |
Vern, Thanks very much for testing. I'm glad it worked. I just wanted to give you an update on this. I've been monitoring the upstream gate status which has been broken and blocking all new patches. We want to land patches upstream first so they can get a good upstream review before starting SRUs. This helps reduce regression potential. I'll continue to monitor the gate status and once the gate is back in order I'll work on pushing to get a review of the patch.
Michael Skalka (mskalka) wrote : | #15 |
Bumping this down to high as a workaround exists
Corey Bryant (corey.bryant) wrote : | #16 |
This fix has been merged into upstream master. I will proposed backports now.
Corey Bryant (corey.bryant) wrote : | #18 |
This in Ubuntu hirsute (wallaby) in cinder version 2:17.0.
Changed in cinder (Ubuntu): | |
status: | Triaged → Fix Released |
Our environment has two ceph clusters (az1 and az2) with a separate ceph-rbd-mirror charm deployed to each. The ceph-rbd-mirror charm enables pool mirroring on all rbd pools.
Glance is backed by only one of the ceph clusters, az1.
We have volume types named after the ceph cluster, arbor-az1 and arbor-az2.
When we copy an image to cinder using the following openstack cli, it's fast and reliable when copying from az1 to az1, but we have greater than 50% failure when copying from az1 to az2.
openstack volume create --image bionic --size 3 --type arbor-az1 volume-az1 # works
openstack volume create --image bionic --size 3 --type arbor-az2 volume-az2 # fails >50%
When the az2 image upload fails, it's always with "cinder. exception. VolumeIsBusy: deleting volume volume-<id> that has snapshot"
The full log from the cinder-volume log follows.
2020-10-19 10:15:35.871 45458 INFO cinder. volume. flows.manager. create_ volume [req-a4905f43- c6f8-42e4- a602-a002380684 92 f2ee9e8060e54d0 58060d220fab840 88 0c0bb65d03f446c a8a845d1ae92297 90 - 1e59a4057fb2400 3a902bc4df6247d 3c 1e59a4057fb2400 3a902bc4df6247d 3c] Volume 63e4d69c- 0722-40fc- b44f-cb59fca0d4 35: being created as image with specification: {'status': 'creating', 'volume_name': 'volume- 63e4d69c- 0722-40fc- b44f-cb59fca0d4 35', 'volume_size': 20, 'image_id': 'e93f97e0- f514-435d- a277-0ac288ed0c 6c', 'image_location': ('rbd:/ /36bd979e- 0511-11eb- bced-ecebb88db4 76/glance/ e93f97e0- f514-435d- a277-0ac288ed0c 6c/snap' , [{'url': 'rbd:// 36bd979e- 0511-11eb- bced-ecebb88db4 76/glance/ e93f97e0- f514-435d- a277-0ac288ed0c 6c/snap' , 'metadata': {}}]), 'image_meta': {'name': 'UBUNTU-18.04', 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', 'size': 359923712, 'virtual_size': None, 'status': 'active', 'checksum': '9aa011b2b79b1f e42a7c306555923 b1b', 'protected': False, 'min_ram': 0, 'min_disk': 0, 'owner': '86fcc0b3839b45 029dd325641ddc2 a09', 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': '03786c51866c1c 6c50ef671502b26 5afb71084e0d018 748ca9b29f871ca 5445c9eefffffdf 927d540bddaf0e8 597adc673e218e7 85fcf16013fba7e 7d9898e6e' , 'id': 'e93f97e0- f514-435d- a277-0ac288ed0c 6c', 'created_at': datetime. datetime( 2020, 10, 15, 16, 49, tzinfo= <iso8601. Utc>), 'updated_at': datetime. datetime( 2020, 10, 15, 16, 49, 18, tzinfo= <iso8601. Utc>), 'locations': [{'url': 'rbd:// 36bd979e- 0511-11eb- bced-ecebb88db4 76/glance/ e93f97e0- f514-435d- a277-0ac288ed0c 6c/snap' , 'metadata': {}}], 'direct_url': 'rbd:// 36bd979e- 0511-11eb- bced-ecebb88db4 76/glance/ e93f97e0- f514-435d- a277-0ac288ed0c 6c/snap' , 'tags': [], 'file': '/v2/images/ e93f97e0- f514-435d- a277-0ac288ed0c 6c/file' , 'properties': {}}, 'image_service': <cinder. image.glance. GlanceImageServ ice object at 0x7fb92712cda0>} image.image_ utils [req-a4905f43- c6f8-42e4- a602-a002380684 92 f2ee9e8060e54d0 58060d220fab840 88 0c0bb65d03f446c a8a845d1ae92297 90 - 1e59a4057fb2400 3a902bc4df6247d 3c 1e59a4057fb2400 3a902bc4df6247d 3c] Image download 343.25 MB at 70.20 MB/s image.image_ utils [req-a4905f43- c6f8-42e4- a602-a002380684 92 f2ee9e8060e54d0 58060d220fab840 88 0c0bb65d03f446c a8a845d1ae92297 90 - 1e59a4057fb2400 3a902bc. ..
2020-10-19 10:15:40.810 45458 INFO cinder.
2020-10-19 10:15:44.685 45458 INFO cinder.