Boot from volumes that fail in initialize_connection are not rescheduled

Bug #1488111 reported by Samuel Matzek
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Wishlist
Unassigned

Bug Description

Version: OpenStack Liberty

Boot from volumes that fail in volume initialize_connection are not rescheduled. Initialize connection failures can be very host-specific and in many cases the boot would succeed if the instance build was rescheduled to another host.

The instance is not rescheduled because the initialize_connection is being called down this stack:
nova.compute.manager _build_resources
nova.compute.manager _prep_block_device
nova.virt.block_device attach_block_devices
nova.virt.block_device.DriverVolumeBlockDevice.attach

When this fails an exception is thrown which lands in this block:
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1740
and throws an InvalidBDM exception which is caught by this block:
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2110

this in turn throws a BuildAbortException which causes the instance to not be rescheduled by landing the flow in this block:
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2004

To fix this we likely need a different exception thrown from nova.virt.block_device.DriverVolumeBlockDevice.attach when the failure is in initialize_connection and then work back up the stack to ensure that when this different exception is thrown a BuildAbortException is not thrown so the reschedule can happen.

Samuel Matzek (smatzek)
Changed in nova:
assignee: nobody → Samuel Matzek (smatzek)
tags: added: spawn volumes
Matt Riedemann (mriedem)
tags: added: compute
removed: spawn
Changed in nova:
status: New → Triaged
importance: Undecided → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/246505

Changed in nova:
status: Triaged → In Progress
Matt Riedemann (mriedem)
tags: added: liberty-backport-potential
Samuel Matzek (smatzek)
no longer affects: mitaka (Ubuntu)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/246505
Reason: This review is > 6 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Maciej Szankin (mszankin) wrote :

Liberty has hit the EOL, so this one is invalid. Mitaka was removed from affected releases, so I am closing this one.

Changed in nova:
status: In Progress → Won't Fix
Changed in nova:
assignee: Samuel Matzek (smatzek) → nobody
Revision history for this message
Matt Riedemann (mriedem) wrote :

I wouldn't say that we won't ever fix this, since I've wondered why we don't reschedule on volume failures like we do with networking failures, but it's not a high priority.

tags: removed: liberty-backport-potential
no longer affects: nova/liberty
Changed in nova:
status: Won't Fix → Opinion
status: Opinion → Confirmed
importance: Low → Wishlist
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.