In case of compute reboot the evacuation of a SR-IOV VM can fail

Bug #1732923 reported by Anton Rodionov
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
MOS Maintenance

Bug Description

In case of compute reboot the evacuation of an SR-IOV VM with ha-offline metadata fails if:
- on the destination compute the PCI address used by the SR-IOV VM is already in use.
- on the destination compute the PCI address used by the SR-IOV VM does not exist.

I tried to run the same test executing via CLI the forcemove and migrate commands. In this case nova is able to write in the XML an available PCI address on the destination compute so the evacuation/migration is successfully completed.

At the same time, if a compute reboot is done, the VM will go in error state if one of the two conditions above are fulfilled. Extract from nova-show:
| fault | {"message": "Requested operation is not valid: PCI device 0000:3b:0a.5 is in use by driver QEMU, domain instance-000024c9", "code": 500, "details": " File \"/usr/lib/python2.7/dist-packages/nova/compute/manager.py\", line 375, in decorated_function |

I collected the shell log of these operations in the attached log file. It is a bit long but there is everything there.

from less /var/log/libvirt/libvirtd.log in case the PCI address does not exist on the destination compute:
2017-11-15 11:22:04.121+0000: 23297: error : virPCIDeviceNew:1596 : Device 0000:5e:02.4 not found: could not access /sys/bus/pci/devices/0000:5e:02.4/config: No such file or directory
2017-11-15 11:22:04.121+0000: 23297: error : virPCIDeviceNew:1596 : Device 0000:5e:02.4 not found: could not access /sys/bus/pci/devices/0000:5e:02.4/config: No such file or directory

Error in nova-compute if the PCI address is already in use:
Requested operation is not valid: PCI device 0000:3b:0a.5 is in use by driver QEMU, domain instance-000024c9

MOS version: MOS 9.2+ code from stable branch from September 2017
Steps to reproduce: attached as a file - all the executed commands are included.
The issue is reproducible all the time.

Revision history for this message
Anton Rodionov (arodionov) wrote :
Changed in fuel:
milestone: none → 9.x-updates
assignee: nobody → MOS Maintenance (mos-maintenance)
importance: Undecided → High
Revision history for this message
Vladyslav Drok (vdrok) wrote :

OK so what does the reboot has to do with this? Looking at the logs, evacuation fails prior to the reboot right?

Changed in fuel:
assignee: MOS Maintenance (mos-maintenance) → Anton Rodionov (arodionov)
Revision history for this message
Anton Rodionov (arodionov) wrote :

Here is an explanation from the customer:
In our nodes reboot itself means evcauate with target host. So, execution.log contains both scenarios.
Scenario-1: Evacuate without target host via "nova evacuate <VM name>"
Scenario-2: Reboot which means evacuate with targhet host via "nova evacuate <VM name> <Target host>"
This Bug is raised for addressing both the scenarios.

Changed in fuel:
assignee: Anton Rodionov (arodionov) → MOS Maintenance (mos-maintenance)
Revision history for this message
Alexander Rubtsov (arubtsov) wrote :

The customer has confirmed that the patch successfully passed the tests.
Can you merge the corresponding commit: https://review.fuel-infra.org/#/c/37448/?

Changed in fuel:
milestone: 9.x-updates → 9.2-mu-5
status: New → Confirmed
Changed in fuel:
status: Confirmed → Fix Committed
Revision history for this message
Vladimir Jigulin (vjigulin) wrote :

Closing the bug because we have a confirmation (#4) that the path works

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.