raising error during image clone doen't delete cache

Bug #1801595 reported by Shay Halsband
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
Alan Bishop

Bug Description

XtremIO, like other arrays, has a snapshot limit per volume group. Volume snapshot is the mechanism used for volume cloning and image cache. hence the driver need to restart the cache from time to time.
The way thing worked till Pike in order to restart the cache, the driver would raise CinderException from create_cloned_volume and the manager would delete the cache, create a full clone and create a new cache.
I suspect this change I547fb4bcdd4783225b8ca96d157c61ca3bcf4ef4 broke this behavior and currently the driver can't reset the cache, which is a big bug for some drivers.

Revision history for this message
shuki (kimhi) wrote :

When we return at this code:

/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py

else:
                model_update, cloned = self._create_from_image_cache(

The

CinderException:

Cloned = False

(Because it reach our limit per volume)

Now in default value when you enter:

    def _create_from_image_cache_or_download(self, context, volume,
                                             image_location, image_id,
                                             image_meta, image_service,
                                             update_cache=False):

So you get here:

          if not cloned and update_cache:
                    should_create_cache_entry = True

With should_create_cache_entry = false

And we not fall back to default behavior in the rest of the code that happened when

should_create_cache_entry = True

Revision history for this message
shuki (kimhi) wrote :

It seem that the issue is here..

In

/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py

We enter

    def _create_from_image_cache_or_download(self, context, volume,
                                             image_location, image_id,
                                             image_meta, image_service,
                                             update_cache=False):

Now XtremIO CINDER return here:

          else:
                model_update, cloned = self._create_from_image_cache(
                    context,
                    internal_context,
                    volume,
                    image_id,
                    image_meta
                )

CinderException

cloned = False.

Then..

                if not cloned and update_cache:
                    should_create_cache_entry = True

should_create_cache_entry = False

And the rest of the code in this function..

                       if should_create_cache_entry:
                            if virtual_size and virtual_size != original_size:
                                    volume.size = virtual_size
                                    volume.save()

...

           if should_create_cache_entry:
                # Update the newly created volume db entry before we clone it
                # for the image-volume creation.
                if model_update:
                    volume.update(model_update)
                    volume.save()
                self.manager._create_image_cache_volume_entry(internal_context,
                                                              volume,
                                                              image_id,
                                                              image_meta)

Is not executed.

Revision history for this message
arkady kanevsky (arkady-kanevsky) wrote :

How is that handled in other drivers?

description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/618698

Changed in cinder:
assignee: nobody → Shay Halsband (shay-halsband)
status: New → In Progress
Revision history for this message
Shay Halsband (shay-halsband) wrote :

XtremIO has a specific parameter to limit the number of snapshots, which can also be ignored if set to 0 and fallback to array limit. I don't know of other drivers that has this behavior.

Revision history for this message
Alan Bishop (alan-bishop) wrote :

I have some thoughts on how this might be fixed, but want to be sure I understand the expected behavior. The bug description uses the phrases "restart the cache" and "reset the cache," and I think you mean "create a new (fresh) cache entry." Is that true?

Here's my current understanding.
- _prepare_cache_entry() is responsible for seeding the cache
- When a cache entry exists, the driver is expected to clone the cached volume each time a new volume is created
- Drivers may fail to clone the cached volume due to factors like a snapshot limit has been reached

If a new cache entry for the same image is added, there be two entries in the cache: the old "bad" one that reached its snapshot limit, and a new "good" one that can be used for another round of cloning. And when the second entry reaches the snapshot limit, a third cache entry is added (and so on). The cache will grow (albeit slowly), with multiple entries for an image. All but the latest image will be unused because they're "bad" (can't be cloned).

Is my understanding correct?

Revision history for this message
Shay Halsband (shay-halsband) wrote :

partialy, reset cache mean take a new full copy of the image and start a cache from it, we can't use an existing volume on the array since it will hit the same limit of number of snapshots.

I was under the impression that a cache entry we failed to clone from, will be deleted, but I might be wrong and the scenario you described will happen instead.

Revision history for this message
Alan Bishop (alan-bishop) wrote :

Sorry, I missed your update on Dec-20, and I've been taking an extended holiday break.

I attached a patch file with an alternate approach to solving the problem. I have not had a chance to test it, and of course it warrants additional unit tests. If this resolves the problem then I wouldn't object to you using it in your review, https://review.openstack.org/618698.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (master)

Change abandoned by Jay Bryant (<email address hidden>) on branch: master
Review: https://review.openstack.org/618698
Reason: This has been not been touched in a long time. abishop will submit a new patch.

Revision history for this message
Alan Bishop (alan-bishop) wrote :

I should have a new patch available for submission soon.

Changed in cinder:
assignee: Shay Halsband (shay-halsband) → Alan Bishop (alan-bishop)
Revision history for this message
Shay Halsband (shay-halsband) wrote :

Hi Alan,
just wanted to update, we managed to test you patch and it work!

Thanks,
Shay

Revision history for this message
Alan Bishop (alan-bishop) wrote : Re: [Bug 1801595] Re: raising error during image clone doen't delete cache

Hi Shay,

Thanks for the feedback! I hope to have my patch ready soon (I need to add
a unit test), and I'll add you to the review.

Alan

On Thu, Jan 10, 2019 at 10:05 AM Shay Halsband <email address hidden>
wrote:

> Hi Alan,
> just wanted to update, we managed to test you patch and it work!
>
> Thanks,
> Shay
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1801595
>
> Title:
> raising error during image clone doen't delete cache
>
> Status in Cinder:
> In Progress
>
> Bug description:
> XtremIO, like other arrays, has a snapshot limit per volume group.
> Volume snapshot is the mechanism used for volume cloning and image cache.
> hence the driver need to restart the cache from time to time.
> The way thing worked till Pike in order to restart the cache, the driver
> would raise CinderException from create_cloned_volume and the manager would
> delete the cache, create a full clone and create a new cache.
> I suspect this change I547fb4bcdd4783225b8ca96d157c61ca3bcf4ef4 broke
> this behavior and currently the driver can't reset the cache, which is a
> big bug for some drivers.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cinder/+bug/1801595/+subscriptions
>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/630753

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/630753
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=93519a02c0c9b5dcb9d5139761780bcbdceef982
Submitter: Zuul
Branch: master

commit 93519a02c0c9b5dcb9d5139761780bcbdceef982
Author: Alan Bishop <email address hidden>
Date: Mon Jan 14 12:42:40 2019 -0500

    Create new image volume cache entry when cloning fails

    This patch fixes the image volume cache so that a new cache entry is
    created whenever cloning an existing entry fails (which can happen, for
    example, when a cached volume reaches its snapshot limit). This restores
    the original behavior, which was broken by [1].

    [1] I547fb4bcdd4783225b8ca96d157c61ca3bcf4ef4

    Closes-Bug: #1801595
    Change-Id: Ib5947e2c7300730adb851ad58e898a29f2b88525

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
Shay Halsband (shay-halsband) wrote :

thank you Alan for picking this up.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 14.0.0.0rc1

This issue was fixed in the openstack/cinder 14.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/663514

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/663518

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/rocky)

Reviewed: https://review.opendev.org/663514
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=b0cf233d808507707f3ccf7c614d3930f2e3eb32
Submitter: Zuul
Branch: stable/rocky

commit b0cf233d808507707f3ccf7c614d3930f2e3eb32
Author: Alan Bishop <email address hidden>
Date: Mon Jan 14 12:42:40 2019 -0500

    Create new image volume cache entry when cloning fails

    This patch fixes the image volume cache so that a new cache entry is
    created whenever cloning an existing entry fails (which can happen, for
    example, when a cached volume reaches its snapshot limit). This restores
    the original behavior, which was broken by [1].

    [1] I547fb4bcdd4783225b8ca96d157c61ca3bcf4ef4

    Closes-Bug: #1801595
    Change-Id: Ib5947e2c7300730adb851ad58e898a29f2b88525
    (cherry picked from commit 93519a02c0c9b5dcb9d5139761780bcbdceef982)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/queens)

Reviewed: https://review.opendev.org/663518
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=1accbbc7a0912de9faffc35e1249bc4cf7d315ff
Submitter: Zuul
Branch: stable/queens

commit 1accbbc7a0912de9faffc35e1249bc4cf7d315ff
Author: Alan Bishop <email address hidden>
Date: Mon Jan 14 12:42:40 2019 -0500

    Create new image volume cache entry when cloning fails

    This patch fixes the image volume cache so that a new cache entry is
    created whenever cloning an existing entry fails (which can happen, for
    example, when a cached volume reaches its snapshot limit). This restores
    the original behavior, which was broken by [1].

    [1] I547fb4bcdd4783225b8ca96d157c61ca3bcf4ef4

    Conflicts:
            cinder/tests/unit/volume/flows/test_create_volume_flow.py

    Closes-Bug: #1801595
    Change-Id: Ib5947e2c7300730adb851ad58e898a29f2b88525
    (cherry picked from commit 93519a02c0c9b5dcb9d5139761780bcbdceef982)
    (cherry picked from commit b0cf233d808507707f3ccf7c614d3930f2e3eb32)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.opendev.org/667735

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (stable/pike)

Change abandoned by Keith Berger (<email address hidden>) on branch: stable/pike
Review: https://review.opendev.org/667735
Reason: not able to put back to pike without out changes to glance

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 13.0.6

This issue was fixed in the openstack/cinder 13.0.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 12.0.8

This issue was fixed in the openstack/cinder 12.0.8 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/700176

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.