Cinder-backed images occasionally fail to clone in A-A

Bug #1906286 reported by Mohammed Naser
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Cinder
Triaged
Medium
Unassigned

Bug Description

When using Cinder-backed images inside Glance in combination with Active-Active, it leads to a situation where the original image would fallback to the download image flow which can be very taxing on environments.

The root cause of this is that Cinder tries to find for any matching volumes that _only_ match on the host the volume is being launched on, which makes sense in a non clustered Cinder environment, but does not make sense in an environment where the base volume can potentially be accessible in the cluster, but on a differnet host.

The code in question is the following:

https://github.com/openstack/cinder/blob/075ab6c85f6d6a069e96f175a1f4950eeedfbcbb/cinder/volume/flows/manager/create_volume.py#L680-L681

Changed in cinder:
status: New → Triaged
importance: Undecided → Medium
tags: added: active-active backup glance
Revision history for this message
Rajat Dhasmana (whoami-rajat) wrote :

This should be fixed with the patch[1]
I found the same issue while testing multiple glance cinder stores with different backend configuration, see my comment on the review for the that scenario.

[1] https://review.opendev.org/c/openstack/cinder/+/755654

Revision history for this message
Yusuf Güngör (yusuf2) wrote (last edit ):

Hi,

That patch not fixes the problem.

On the review comments @whoami-rajat commented about different backends. But the case is about having a image-volume with multiple cinder service hosts for same backend.

Imagine having a 2 volume service for a backend. If an image volume created then it is assigned to a host. When creating a volume from cinder backed images, If volume from image request goes to the cinder service host which image-volume assigned then it works as expected and creates volume very fast. But if requests goes to other cinder service hosts it logs "No accessible image volume for image <image_id> found"

It seems that the code checks the image-volumes by host not by cluster_name.

https://github.com/openstack/cinder/blob/zed-eom/cinder/volume/flows/manager/create_volume.py#L745

ha cluster is not supported for cinder backed images? how can we make it to clone a volume from the image volume always?

Thanks

$ openstack volume service list
+------------------+----------------------------------------------------------+------+
| Binary | Host | Zone | Status | State|
+------------------+----------------------------------------------------------+------+
| cinder-scheduler | controller-01.mycompany.dmz | nova | enabled | up |
| cinder-scheduler | controller-02.mycompany.dmz | nova | enabled | up |
| cinder-volume | controller-01.mycompany.dmz@purestorage | nova | enabled | up |
| cinder-volume | controller-02.mycompany.dmz@purestorage | nova | enabled | up |
+------------------+-----------------------------------------+------+---------+------+

Rajat Comments:

Rajat Dhasmana
Patchset 7
Nov 02, 2020
This only works for the case when the volume created is in the same backend as the image-volume is.
Eg:
image-volume host is, hostname@cephdriver-1#cephdriver-1 (created in ceph backend)
and we request a volume to be created in lvm backend then this searches the volume with host value of hostname@lvmdriver-1#lvmdriver-1
which returns empty list

Rajat Dhasmana
Patchset 7
Aug 18, 2021
Ignore this, since we are performing a clone operation, the backend should be same as the source image-volume's backend.

Revision history for this message
Yusuf Güngör (yusuf2) wrote :

Hi cinder team,

Modifying the cinder/volume/flows/manager/create_volume.py as below fixes this problem. Is it ok to do that? Should we create a merge request for this change to upstream?

Line 744 (zed branch)

- image_volumes = self.db.volume_get_all_by_host(
- context, volume['host'], filters={'id': image_volume_ids})

+ image_volumes = self.db.volume_get_all(context, filters={'id': image_volume_ids})

Revision history for this message
Gorka Eguileor (gorka) wrote :

Yusuf, that solution is pretty close, but it would return even volumes that are on a different backend, which cannot be efficiently clone. We need to be a bit more restrictive to support deployments with multiple cinder backends.

I think this would work:

```
        filters={'id': image_volume_ids}
        if volume.cluster_name:
            filters['cluster_name'] = volume.cluster_name
        else:
            filters['host'] = volume.host
        image_volumes = self.db.volume_get_all(context, filters=filters)
```

Revision history for this message
Yusuf Güngör (yusuf2) wrote :

Hi Gorka, thanks for your code. We have tested your code and it worked as expected for HA cluster scenario.

We do not have non-HA backends, can someone test it with non-HA backend too?

For cinder version 9.1.1, "cinder/volume/flows/manager/create_volume.py" our patch file as below:

744,745c744,753
< image_volumes = self.db.volume_get_all_by_host(
< context, volume['host'], filters={'id': image_volume_ids})
---
> # Below code replaced by us because of bug:
> # https://bugs.launchpad.net/cinder/+bug/1906286
> #image_volumes = self.db.volume_get_all_by_host(
> # context, volume['host'], filters={'id': image_volume_ids})
> filters={'id': image_volume_ids}
> if volume.cluster_name:
> filters['cluster_name'] = volume.cluster_name
> else:
> filters['host'] = volume.host
> image_volumes = self.db.volume_get_all(context, filters=filters)

Revision history for this message
Yusuf Güngör (yusuf2) wrote :
Revision history for this message
Yusuf Güngör (yusuf2) wrote :

We have disabled the cinder cluster config, tested as non-HA and it worked as old behaviour. There is no exception etc.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.