In case of compute reboot the evacuation of a SR-IOV VM can fail
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Released
|
High
|
MOS Maintenance |
Bug Description
In case of compute reboot the evacuation of an SR-IOV VM with ha-offline metadata fails if:
- on the destination compute the PCI address used by the SR-IOV VM is already in use.
- on the destination compute the PCI address used by the SR-IOV VM does not exist.
I tried to run the same test executing via CLI the forcemove and migrate commands. In this case nova is able to write in the XML an available PCI address on the destination compute so the evacuation/
At the same time, if a compute reboot is done, the VM will go in error state if one of the two conditions above are fulfilled. Extract from nova-show:
| fault | {"message": "Requested operation is not valid: PCI device 0000:3b:0a.5 is in use by driver QEMU, domain instance-000024c9", "code": 500, "details": " File \"/usr/
I collected the shell log of these operations in the attached log file. It is a bit long but there is everything there.
from less /var/log/
2017-11-15 11:22:04.121+0000: 23297: error : virPCIDeviceNew
2017-11-15 11:22:04.121+0000: 23297: error : virPCIDeviceNew
Error in nova-compute if the PCI address is already in use:
Requested operation is not valid: PCI device 0000:3b:0a.5 is in use by driver QEMU, domain instance-000024c9
MOS version: MOS 9.2+ code from stable branch from September 2017
Steps to reproduce: attached as a file - all the executed commands are included.
The issue is reproducible all the time.
Changed in fuel: | |
milestone: | none → 9.x-updates |
assignee: | nobody → MOS Maintenance (mos-maintenance) |
importance: | Undecided → High |
Changed in fuel: | |
assignee: | MOS Maintenance (mos-maintenance) → Anton Rodionov (arodionov) |
Changed in fuel: | |
milestone: | 9.x-updates → 9.2-mu-5 |
status: | New → Confirmed |
Changed in fuel: | |
status: | Confirmed → Fix Committed |
OK so what does the reboot has to do with this? Looking at the logs, evacuation fails prior to the reboot right?