Error in finish_migration results in image deletion on source with no copy

Bug #1686703 reported by Matthew Booth
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Undecided
Alexey Stupnikov

Bug Description

ML post describing the issue here:

  http://lists.openstack.org/pipermail/openstack-dev/2017-April/115989.html

User was resizing an instance whose glance image had been deleted. An ssh failure occurred in finish_migration, which runs on the destination, attempting to copy the image out of the image cache on the source. This left the instance and migration in an error state on the destination, but with no copy of the image on the destination. Cache manager later ran on the source and expired the image from the image cache there, leaving no remaining copies. At this point the user's instance was unrecoverable.

Tags: resize
Revision history for this message
Matthew Booth (mbooth-9) wrote :

As mentioned in the above ML post, I don't think the image cache manager should expire the image of an instance while a migration is active. However, also as described in the post I'm not convinced it's currently possible to reliably identify if a migration is ongoing.

My current thought is that we could send the image from source to dest during migrate_disk_and_power_off. This way, all data transfer would happen in the same place, and any failure involving user data would happen before the switch, not after.

However, while this would resolve this failure mode, I still think it would be better for the image cache manager to consider instances with active migrations.

Revision history for this message
Matthew Booth (mbooth-9) wrote :

Hmm, realised I can't do that because I'd need to hold a lock on the destination whilst writing to the image cache.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/462521

Changed in nova:
assignee: nobody → Matthew Booth (mbooth-9)
status: New → In Progress
Matt Riedemann (mriedem)
tags: added: resize
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matthew Booth (<email address hidden>) on branch: master
Review: https://review.openstack.org/462521
Reason: This has been NAK'd unconditionally. Not worth pushing further.

Matt Riedemann (mriedem)
Changed in nova:
status: In Progress → Won't Fix
assignee: Matthew Booth (mbooth-9) → nobody
Revision history for this message
Matthew Booth (mbooth-9) wrote :

Matt, this is definitely a bug. It's even a bug I hope to work on some time soon. I'm just not going to fix it as originally intended.

I don't seem to be able to reset the status.

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

I will try to propose a patch and see how it would turn out. Maybe we will fix this afterwards.

Changed in nova:
assignee: nobody → Alexey Stupnikov (astupnikov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/877410

Changed in nova:
status: Won't Fix → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.