Race between imagebackend and imagecache

Bug #1256838 reported by Arata Notsu
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Ankit Agrawal
Liberty
Fix Released
Undecided
Unassigned

Bug Description

After ImageCacheManager judges a base image is not used recently and marks it as to be removed, there is some time before the image is actually removed. So if an instance using the image is launched during the time, the image will be removed unfortunately.

Tags: libvirt
Arata Notsu (arata776)
tags: added: libvirt
Arata Notsu (arata776)
Changed in nova:
assignee: nobody → Arata Notsu (arata776)
Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/61075
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2593469103aa7d9d2bcb759b78d5f8637911a1e0
Submitter: Jenkins
Branch: master

commit 2593469103aa7d9d2bcb759b78d5f8637911a1e0
Author: Arata Notsu <email address hidden>
Date: Tue Dec 3 18:27:26 2013 +0900

    Fix race conditions between imagebackend and imagecache

    The race may occur in the situation:
    * There is a base file that is not used for a long time enough
      to be removed by imagecache.
    * imagebackend is provisioning a virtual disk from the base file.
    * imagecache is removing the base file.

    Then, the base file is removed even though it is about to be used.

    To fix this, these changes are in this patch:

    * A new function imagecache.refresh_timestamp(base_file) updates
      the owner and mtime of the base file with the lock dedicated
      to the base file.
    * imagebacked calls refresh_timestamp(base_file) before provision
      a disk from the base file.
    * imagecache.ImageCacheManager._remove_base_file(base_file) uses
      the same lock as used by refresh_timestamp()

    Closes-Bug: #1256838
    Change-Id: I7c897cf6071d87a2c4532fb3a73863d649d02782

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
Arata Notsu (arata776) wrote :
Changed in nova:
status: Fix Committed → New
Solly Ross (sross-7)
Changed in nova:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
habuka036 (habuka036) wrote :

what's the status on this?

Arata Notsu (arata776)
Changed in nova:
assignee: Arata Notsu (arata776) → nobody
habuka036 (habuka036)
Changed in nova:
assignee: nobody → habuka036 (habuka036)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/139951

Changed in nova:
assignee: habuka036 (habuka036) → Yasuaki Nagata (yasuaki-nagata)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/185549

Changed in nova:
assignee: Yasuaki Nagata (yasuaki-nagata) → Ankit Agrawal (ankitagrawal)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Kevin L. Mitchell (<email address hidden>) on branch: master
Review: https://review.openstack.org/139951
Reason: Idle and in merge conflict. Feel free to re-open if you get time to work on this change.

Changed in nova:
assignee: Ankit Agrawal (ankitagrawal) → Michael Still (mikalstill)
Changed in nova:
assignee: Michael Still (mikalstill) → Ankit Agrawal (ankitagrawal)
Revision history for this message
Ankit Agrawal (ankitagrawal) wrote :

gate-grenade-dsvm is failing on my latest patch set because I have added a new command 'touch' in etc/nova/rootwrap.d/compute.filters to update the base file access time with root user privileges.
Can someone please help me to understand how to fix grande test failures while adding a new command in compute filters.

Thanks !

Changed in nova:
importance: High → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/185549
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ec9d5e375e208686d33b9259b039cc009bded42e
Submitter: Jenkins
Branch: master

commit ec9d5e375e208686d33b9259b039cc009bded42e
Author: Ankit Agrawal <email address hidden>
Date: Mon Aug 10 16:27:57 2015 +1000

    libvirt: Race condition leads to instance in error

    ImageCacheManager deletes base image while image backend is copying
    image to the instance path leading instance to go in the error state.

    Acquired lock before removing image from cache. If libvirt is copying
    image to the instance path, image cache manager won't be able to remove
    it until libvirt finishes copying image completely.

    Closes-Bug: 1256838
    Closes-Bug: 1470437
    Co-Authored-By: Michael Still <email address hidden>
    Depends-On: I337ce28e2fc516c91bec61ca3639ebff0029ad49
    Change-Id: I376cc951922c338669fdf3f83da83e0d3cea1532

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/278928

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/nova 13.0.0.0b3

This issue was fixed in the openstack/nova 13.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/liberty)

Change abandoned by Ankit Agrawal (<email address hidden>) on branch: stable/liberty
Review: https://review.openstack.org/278928

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.