create volume reschedule on image download failure

Bug #1212502 reported by John Griffith
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
John Griffith

Bug Description

The copy_image_to_volume code sometimes encounters a timing issue where it attempts to unlink the temp lock file (cinder.image.image_utils.py:L#380) causing the unlink to raise and unhandled exception.

This results in the *unknown* exception being inspected in the task_flow sequence and it doesn't match the list of non-reschedule errors, so it tries again, which is no bueno because the volume already exists and we end up creating another one. The result is 3 volumes are created on the back-end instead of the single one that we requested.

I haven't figured out what is or isn't being downloaded to the new volumes yet and if they fail and its just missed.

The easy fix right now is to wrap the unlink in a try block and pass if DNE.

Longer term, the strategy of retrying on everything NOT in the list may prove to be the wrong strategy? Perhaps we should think about changing the flows to retry ONLY on the identified list of exceptions rather than the other way around to deal with unforseen errors.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/42045

Changed in cinder:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/42045
Committed: http://github.com/openstack/cinder/commit/6dc2193813d1e8ee951288c5386296b0a2a5e7b9
Submitter: Jenkins
Branch: master

commit 6dc2193813d1e8ee951288c5386296b0a2a5e7b9
Author: John Griffith <email address hidden>
Date: Wed Aug 14 20:06:05 2013 -0600

    Replace os.unlink with delete_if_exists

    Shouldn't care when doing unlink on our temp files
    if they exist or not. In fact this causes problems
    when you do things like with tempfile/dir and happen
    to try and unlink after it's already been removed.

    This replaces these calls with the safer
    common.fileutils.delete_if_exists which will
    ignore the os exception of the object DNE.

    Fixes bug: 1212502

    Change-Id: Ica86c95f736411da486335aec5512e59247bfbc0

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in cinder:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in cinder:
milestone: havana-3 → 2013.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.