CPU hot unplug fails after migrating a CPU hotplugged guest from source
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
qemu (Ubuntu) |
Fix Released
|
Medium
|
Christian Ehrhardt |
Bug Description
== Comment: #0 - Balamuruhan S <email address hidden> - 2017-02-28 03:21:57 ==
---Problem Description---
CPU hot unplug fails after migrating a CPU hotplugged guest from source
Perform CPU hotplug before migration,
# virsh setvcpus avocado-
Hotplugged CPUs in source are available from guest XML and reflected from inside guest.
# virsh -c 'qemu:///system' migrate --live --domain avocado-
Migration is success without any issue
# virsh -c 'qemu+ssh:
error: operation failed: vcpu unplug request timed out
---uname output---
# uname -a Linux c158f2u09os 4.10.0-9-generic #11-Ubuntu SMP Mon Feb 20 13:45:11 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
# qemu-img --version
qemu-img version 2.8.0(Debian 1:2.8+dfsg-
Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers
# dpkg -l | grep libvirt
ii libvirt-bin 2.5.0-3ubuntu2 ppc64el programs for the libvirt library
ii libvirt-clients 2.5.0-3ubuntu2 ppc64el Programs for the libvirt library
ii libvirt-daemon 2.5.0-3ubuntu2 ppc64el Virtualization daemon
ii libvirt-
ii libvirt-dev:ppc64el 2.5.0-3ubuntu2 ppc64el development files for the libvirt library
ii libvirt-
ii libvirt0:ppc64el 2.5.0-3ubuntu2 ppc64el library for interfacing with different virtualization systems
ii python-libvirt 3.0.0-2 ppc64el libvirt Python bindings
Machine Type = Tuleta
---Steps to Reproduce---
1. Created guest with shared storage in NFS
2. Enabled ports 49152:49216 in iptables, virt_use_nfs -> on
3. Mounted the image location in destination and started migration.
4. Perform CPU hotplug to guest in source before migration
5. Perform live migration to other host.
6. CPU Hot unplug fails with "error: operation failed: vcpu unplug request timed out"
Contact Information = Balamuruhan S / <email address hidden>
Userspace tool common name: virsh (libvirt)
The userspace tool has the following bit modes: ppc64le
Userspace rpm: libvirt-bin, libvirt-daemon
Userspace tool obtained from project website: na
*Additional Instructions for Balamuruhan S / <email address hidden>:
-Post a private note with access information to the machine that the bug is occuring on.
-Attach ltrace and strace of userspace application.
== Comment: #4 - Balamuruhan S <email address hidden> - 2017-03-01 03:17:07 ==
== Comment: #5 - Balamuruhan S <email address hidden> - 2017-03-01 03:17:38 ==
== Comment: #9 - Shivaprasad G. Bhat <email address hidden> - 2017-03-24 05:26:57 ==
On new ubuntu kernel 4.10.0.13, with in-kernel hotplug/unplug code, the newly hotplugged core post migration can be unplugged. The cores hotplugged before the migration cannot be unplugged post migration.
Discussed with Bharata and he believes the issue belongs to qemu.
== Comment: #11 - BHARATA BHASKER RAO <email address hidden> - 2017-03-27 01:49:59 ==
Usually when a device hotplug is done at the source and the guest is migrated, the QEMU cmdline at the target is appended with the hot added device at the source. For example,
At the source:
qemu ... -smp 4,maxcpus=8
(qemu) device_add host-spapr-
At the target, QEMU is started like this before starting the migration:
qemu ... -smp 4,maxcpus=8 -device host-spapr-
Thus the hot added CPU at the source became a cold-plugged CPU at the target. This works.
What is done differently here is that libvirt is in fact doing a hotplug at the target via QEMU monitor before the migration. In this situation when the guest is migrated from source to target, the DRC state information for the added CPU at the target will be wrong. The hot added CPU at the target will never undergo the DRC state transitions via RTAS set allocation calls. Hence subsequent hot unplug fails at the target.
This situation can be fixed by migrating the DRC state information and updating the same for the hot added CPU at the target. This is what Jianjun Duan's DRC state migration patchset achieves. I have verified (using his old patchset, v5) that this problem disappears when DRC state migration is done.
So the fix for this is to get Jianjun's DRC migration patchset upstream and then to 1704.
Changed in qemu (Ubuntu): | |
assignee: | Taco Screen team (taco-screen-team) → ChristianEhrhardt (paelzer) |
tags: |
added: severity-high removed: severity-critical |
tags: |
added: targetmilestone-inin1710 removed: targetmilestone-inin1704 |
tags: | added: virt-fixed-by-2.10 |
Default Comment by Bridge