Evacuation failure results in Neutron port down

Bug #1779860 reported by Margarita Mazepa
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
Vladimir Khlyunev

Bug Description

Evacuation failure results in neutron showing ports down. Neutron shows that port is in target compute while because of evacuation failure, the VM is still in source compute and actual ports are still attached to OVS in source compute

Steps to reproduce:
- Delete the image from glance
- Stop nova-compute service in source compute
- Make sure that ha_policy=ha_offline for the target VM
- Give "nova evacuate <vm-id>" to trigger evacuation.
- Evacuation started, but failed. The target VM was in "error" status
- As a result, neutron port-show <vm-port-id> shows port in

tags: added: customer-found
tags: added: sla2
Changed in mos:
milestone: none → 9.2-mu-7
assignee: nobody → MOS Maintenance (mos-maintenance)
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

The cause of the bug is not neutron, but instead, nova.

1. If nova-compute on source is stopped, there is no way original port is deleted from OVS on the source compute

2. Evacuation fails because of the issues during the instance spawn.
Ideally, nova should have deleted all VM's artifacts on target node, but it didn't
I believe that's what needs to be fixed.

Changed in mos:
assignee: MOS Maintenance (mos-maintenance) → Ilya Bumarskov (ibumarskov)
Changed in mos:
assignee: Ilya Bumarskov (ibumarskov) → Denis Meltsaykin (dmeltsaykin)
Changed in mos:
milestone: 9.2-mu-7 → 9.2-mu-8
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (9.0/mitaka)

Fix proposed to branch: 9.0/mitaka
Change author: paul-carlton2 <email address hidden>
Review: https://review.fuel-infra.org/39012

Changed in mos:
status: Confirmed → In Progress
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (9.0/mitaka)

Reviewed: https://review.fuel-infra.org/39012
Submitter: Pkgs Jenkins <email address hidden>
Branch: 9.0/mitaka

Commit: 4a398e852e8d6746a435e547447931c50cc0c81f
Author: paul-carlton2 <email address hidden>
Date: Thu Aug 9 08:53:05 2018

Clean up instance on target node if evacuate fails

If the libvirt driver's spawn method fails the instance is not
destroyed and the files not removed. This is ok for boot and
normal rebuild operations because the instance is recorded as
being on the host in question and can be cleaned up by a delete
or recovered using reboot, start or rebuild.

However in the case of a rebuild being performed on behalf of
a evacuation operation the instance is still recorded as being
on the source node. In this case if the spawn method fails the
instance may remain defined or even running on the target node
and the instance files will still be present on the target node.

To address this issue we use the recreate parameter which is set
for rebuild operations performed as part of an evacuation to
determine if the target compute manager should destroy the
instance. The on_shared_storage parameter is used to determine
if the instance files should be removed too.

Change-Id: I15cc03320e5dfd898516c91ed915b06802f3c67a
Closes-Bug: 1779860
Partial-bug: PROD-21102

Changed in mos:
status: In Progress → Fix Committed
Revision history for this message
Alexander Rubtsov (arubtsov) wrote :

The customer has reported that with the patch https://review.fuel-infra.org/#/c/39012/2 applied the issue still remains. Please contact me directly for the details and the log files.

Changed in mos:
status: Fix Committed → New
Changed in mos:
assignee: Denis Meltsaykin (dmeltsaykin) → Vladimir Khlyunev (vkhlyunev)
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

The issue that has been fixed with the patch has nothing to do with the migration failure.

Migration failure is a result of a wrong scheduling decision of numa-enabled VM.
I'll file a separate bug once i confirm it's not a misconfiguration.

Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Eugene, obviously this patch does not fix migration failures. There were no details on migration failure so there was nothing to fix. This fix is for a cleanup after such failures.

Revision history for this message
Alexander Rubtsov (arubtsov) wrote :

Is it planned to merge this commit [1] or it requires the customer's verification?

[1] https://review.fuel-infra.org/#/c/39303/

Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :
Changed in mos:
status: New → Fix Committed
Revision history for this message
Dmitry (dtsapikov) wrote :

Verified on 9.2+mu8

Changed in mos:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.