OpenStack Compute (Nova)

libvirt image cache manager doesn't handle shared storage during cleanup

Reported by Michael Still on 2012-11-14
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Critical
Michael Still
Folsom
Critical
Mark McLoughlin

Bug Description

On 11/14/2012 04:03 PM, Sam Morrison wrote:
> After the upgrade which went relatively smoothly (a lot easier than
> diablo -> essex) almost all our base images were deleted by the image
> cache clean up.
> I can't explain how this happened. We lost a total of about 70 images
> that affected ~200 running instances.
>
> We have since disabled this flag until we can find out what went wrong.
> I can see it in the logs and if this flag is enabled it would delete a
> lot of in use base files still.
>
> We have an nfs mounted /var/lib/nova/instances directory where the _base
> dir is located so I'm wondering if this had something to do with it?
> Is the image cache cleanup meant to work in a shared instance storage
> environment?

Tags: ops Edit Tag help

Fix proposed to branch: master
Review: https://review.openstack.org/16134

Changed in nova:
status: Triaged → In Progress

Reviewed: https://review.openstack.org/16134
Committed: http://github.com/openstack/nova/commit/c2de33a0a2132774dc295861cef138ec24bb0cf9
Submitter: Jenkins
Branch: master

commit c2de33a0a2132774dc295861cef138ec24bb0cf9
Author: Michael Still <email address hidden>
Date: Wed Nov 14 18:37:04 2012 +1100

    Detect shared storage; handle base cleanup better.

    If base image storage is shared, we need to care about remote
    instances when we clean up. This patch "learns" which storage is
    shared, and then decides what base images are in use anywhere
    on the set of compute nodes which share that base storage.

    This is complicated because shared instance storage doesn't have
    to be per-cluster. It could for example be per rack. We need to
    handle that properly.

    This should resolve bug 1078594.

    Change-Id: I36d0d6e965b114bb68c8f7b7fd43f8e96b2dd8f5

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2012-11-21
Changed in nova:
milestone: none → grizzly-1
status: Fix Committed → Fix Released
tags: removed: folsom-backport-potential
Mark McLoughlin (markmc) wrote :

Note the workaround documented in http://wiki.openstack.org/ReleaseNotes/2012.2.1

Set 'image_cache_manager_interval = 0' in nova.conf

Fix proposed to branch: stable/folsom
Review: https://review.openstack.org/17064

Reviewed: https://review.openstack.org/17064
Committed: http://github.com/openstack/nova/commit/22d7c3bb0e522503a648f279e222f595c351fba2
Submitter: Jenkins
Branch: stable/folsom

commit 22d7c3bb0e522503a648f279e222f595c351fba2
Author: Mark McLoughlin <email address hidden>
Date: Wed Nov 28 17:47:38 2012 +0000

    Disable the image cache manager by default

    bug #1075018 and bug #1078594 are very serious issues with the image
    cache manager when using shared storage. We couldn't address the
    issues in time for 2012.2.1 because of the riskiness of the changes
    required.

    As a workaround, disable that code by default using:

      image_cache_manager_interval=0

    Change-Id: Iab78abf855e919bc3d3278a39882ff6d39bd3c1c

Thierry Carrez (ttx) on 2013-04-04
Changed in nova:
milestone: grizzly-1 → 2013.1
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers