Ubuntu 17.04: Guest does not reflect all the cpus hotplugged

Bug #1670315 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Tim Gardner
Zesty
Fix Released
Undecided
Tim Gardner

Bug Description

== Comment: #0 - Satheesh Rajendran <email address hidden> - 2017-02-28 06:00:53 ==
---Problem Description---
Guest does not reflect all the cpus hotplugged,
Holpug vcpus using setvcpu with initial less number of cpus(1) to a greater cpus(~256), though
setvcpu(libvirt) returns no error, guest does not reflect all cpus inside.

Contact Information = <email address hidden>

---uname output---
Linux ltc-test-ci1 4.10.0-9-generic #11-Ubuntu SMP Mon Feb 20 13:45:11 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = power 8 ppc64le

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 1. Start the guest(Ubuntu 17.04) with 1 current vcpu and 255 maxvcpus
...
 <vcpu placement='static' current='1'>255</vcpu>
...
 <cpu>
    <topology sockets='1' cores='255' threads='1'/>
  </cpu>
....

# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Model: 2.1 (pvr 004b 0201)
Model name: POWER8E (raw), altivec supported
Hypervisor vendor: KVM
Virtualization type: para
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0

2.# time virsh setvcpus virt-tests-vm1 255 --live --config

real 0m4.460s
user 0m0.012s
sys 0m0.000s
root@ltc-test-ci1:/var/lib/libvirt/images/workspace/runAvocadoFVTTest/avocado-fvt-wrapper# echo $?
0

3. Check inside the guest after some time (10-15 mins) (
dmesg of guest shows all the RTAS(255) events,but the guest showed only 90 vcpus(it consistent around ~ 100 always).

root@ubuntu:~# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 97
On-line CPU(s) list: 0-96
Thread(s) per core: 1
Core(s) per socket: 97
Socket(s): 1
NUMA node(s): 1
Model: 2.1 (pvr 004b 0201)
Model name: POWER8E (raw), altivec supported
Hypervisor vendor: KVM
Virtualization type: para
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-96
root@ubuntu:~# tail /proc/cpuinfo

processor : 96
cpu : POWER8E (raw), altivec supported
clock : 3425.000000MHz
revision : 2.1 (pvr 004b 0201)

timebase : 512000000
platform : pSeries
model : IBM pSeries (emulated by qemu)
machine : CHRP IBM pSeries (emulated by qemu)

Userspace tool common name: libvirt, qemu

The userspace tool has the following bit modes: both

Userspace rpm: qemu-kvm 1:2.8+dfsg-2ubuntu1 ppc64el,ii libvirt-bin 2.5.0-3ubuntu2 ppc64el

Userspace tool obtained from project website: na

Guest Details:
#cat /etc/os-release |grep VERSION=
VERSION="17.04 (Zesty Zapus)"
# uname -a
Linux ubuntu 4.10.0-8-generic #10-Ubuntu SMP Mon Feb 13 14:00:06 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
root@ubuntu:~# dpkg -l |grep rtas
ii librtas-dev 2.0.0-2 ppc64el userspace RTAS library development files
ii librtas2 2.0.0-2 ppc64el userspace RTAS library
ii librtasevent-dev 2.0.0-2 ppc64el RTAS events library development files
ii librtasevent2 2.0.0-2 ppc64el RTAS events library
ii ppc64-diag 2.7.1-6 ppc64el Platform error log analysis tool and rtas_errd daemon

*Additional Instructions for <email address hidden>:
-Post a private note with access information to the machine that the bug is occuring on.
-Attach ltrace and strace of userspace application.

== Comment: #9 - BHARATA BHASKER RAO <email address hidden> - 2017-03-03 04:32:14 ==
When a large number of hotplug requests are generated too quickly, guest will miss the handling of a few RTAS events due to buffer overrun. Because of this, guest will not see all the hotplugged CPUs. This was raised earlier in bz 142499 with the following resolution:

- We need in-kernel CPU hotplug feature in the guest for this to work.
- Until in-kernel CPU hotplug is available in the guest kernel, user should be careful not to overload the guest with so many successive hotplug requests.

I reproduced the problem with ubuntu-1704 guest (with default kernel) and was able to get over the problem with a self-compiled guest kernel from latest linux git that has in-kernel CPU hotplug.

bharata@ubuntu-1704:~$ uname -a
Linux ubuntu-1704 4.10.0+ #1 SMP Fri Mar 3 15:42:46 IST 2017 ppc64le ppc64le ppc64le GNU/Linux

bharata@ubuntu-1704:~$ lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 255
On-line CPU(s) list: 0-254
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 255
NUMA node(s): 1
Model: 2.0 (pvr 004d 0200)
Model name: POWER8 (raw), altivec supported
Hypervisor vendor: KVM
Virtualization type: para
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-254

== Comment: #10 - BHARATA BHASKER RAO <email address hidden> - 2017-03-06 02:28:42 ==

commit 3dbbaf200f532e01e56168b8339f2981f2cb1d67
Author: Michael Roth <email address hidden>
Date: Mon Feb 20 19:12:18 2017 -0600

    powerpc/pseries: Advertise Hot Plug Event support to firmware

    With the inclusion of commit 333f7b76865b ("powerpc/pseries: Implement
    indexed-count hotplug memory add") and commit 753843471cbb
    ("powerpc/pseries: Implement indexed-count hotplug memory remove"), we
    now have complete handling of the RTAS hotplug event format as described
    by PAPR via ACR "PAPR Changes for Hotplug RTAS Events".

    This capability is indicated by byte 6, bit 2 (5 in IBM numbering) of
    architecture option vector 5, and allows for greater control over
    cpu/memory/pci hot plug/unplug operations.

    Existing pseries kernels will utilize this capability based on the
    existence of the /event-sources/hot-plug-events DT property, so we
    only need to advertise it via CAS and do not need a corresponding
    FW_FEATURE_* value to test for.

    Signed-off-by: Michael Roth <email address hidden>
    Signed-off-by: Michael Ellerman <email address hidden>

$ git tag --contains 3dbbaf200
v4.11-rc1
$

Commit 3dbbaf200 is available upstream in kernel v4.11-rc1 onwards, thus
missing from 1704 kernel. This commit is needed for memory unplug support.

CVE References

Revision history for this message
bugproxy (bugproxy) wrote : sosreport_host

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-152070 severity-critical targetmilestone-inin1704
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-03-06 07:13 EDT-------
commit 3dbbaf200f532e01e56168b8339f2981f2cb1d67
Author: Michael Roth <email address hidden>
Date: Mon Feb 20 19:12:18 2017 -0600

powerpc/pseries: Advertise Hot Plug Event support to firmware

With the inclusion of commit 333f7b76865b ("powerpc/pseries: Implement
indexed-count hotplug memory add") and commit 753843471cbb
("powerpc/pseries: Implement indexed-count hotplug memory remove"), we
now have complete handling of the RTAS hotplug event format as described
by PAPR via ACR "PAPR Changes for Hotplug RTAS Events".

Hello Canonical,

Please include above commit with Ubuntu 17.04 and Ubuntu 16.04 LTS release.

Ken Sharp (kennybobs)
tags: added: zesty
Revision history for this message
Tim Gardner (timg-tpi) wrote :

Needed some prerequisites:

powerpc/64: Don't try to use radix MMU under a hypervisor
powerpc/pseries: Fixes for the "ibm,architecture-vec-5" options
powerpc/64: Enable use of radix MMU under hypervisor on POWER9
powerpc/pseries: Advertise HPT resizing support via CAS
powerpc/pseries: Advertise Hot Plug Event support to firmware

Changed in linux (Ubuntu Zesty):
assignee: Taco Screen team (taco-screen-team) → Tim Gardner (timg-tpi)
status: New → Fix Committed
Revision history for this message
Michael Hohnbaum (hohnbaum) wrote : Re: [Bug 1670315] [NEW] Ubuntu 17.04: Guest does not reflect all the cpus hotplugged
Download full text (7.4 KiB)

Leann,

Kernel patch referenced to fix this issue. Please have the Kernel team look.

Thanks.

                     Michael

On 03/06/2017 02:51 AM, Launchpad Bug Tracker wrote:
> bugproxy (bugproxy) has assigned this bug to you for Ubuntu:
>
> == Comment: #0 - Satheesh Rajendran <email address hidden> - 2017-02-28 06:00:53 ==
> ---Problem Description---
> Guest does not reflect all the cpus hotplugged,
> Holpug vcpus using setvcpu with initial less number of cpus(1) to a greater cpus(~256), though
> setvcpu(libvirt) returns no error, guest does not reflect all cpus inside.
>
>
> Contact Information = <email address hidden>
>
> ---uname output---
> Linux ltc-test-ci1 4.10.0-9-generic #11-Ubuntu SMP Mon Feb 20 13:45:11 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
>
> Machine Type = power 8 ppc64le
>
> ---Debugger---
> A debugger is not configured
>
> ---Steps to Reproduce---
> 1. Start the guest(Ubuntu 17.04) with 1 current vcpu and 255 maxvcpus
> ...
> <vcpu placement='static' current='1'>255</vcpu>
> ...
> <cpu>
> <topology sockets='1' cores='255' threads='1'/>
> </cpu>
> ....
>
> # lscpu
> Architecture: ppc64le
> Byte Order: Little Endian
> CPU(s): 1
> On-line CPU(s) list: 0
> Thread(s) per core: 1
> Core(s) per socket: 1
> Socket(s): 1
> NUMA node(s): 1
> Model: 2.1 (pvr 004b 0201)
> Model name: POWER8E (raw), altivec supported
> Hypervisor vendor: KVM
> Virtualization type: para
> L1d cache: 64K
> L1i cache: 32K
> NUMA node0 CPU(s): 0
>
> 2.# time virsh setvcpus virt-tests-vm1 255 --live --config
>
>
> real 0m4.460s
> user 0m0.012s
> sys 0m0.000s
> root@ltc-test-ci1:/var/lib/libvirt/images/workspace/runAvocadoFVTTest/avocado-fvt-wrapper# echo $?
> 0
>
> 3. Check inside the guest after some time (10-15 mins) (
> dmesg of guest shows all the RTAS(255) events,but the guest showed only 90 vcpus(it consistent around ~ 100 always).
>
> root@ubuntu:~# lscpu
> Architecture: ppc64le
> Byte Order: Little Endian
> CPU(s): 97
> On-line CPU(s) list: 0-96
> Thread(s) per core: 1
> Core(s) per socket: 97
> Socket(s): 1
> NUMA node(s): 1
> Model: 2.1 (pvr 004b 0201)
> Model name: POWER8E (raw), altivec supported
> Hypervisor vendor: KVM
> Virtualization type: para
> L1d cache: 64K
> L1i cache: 32K
> NUMA node0 CPU(s): 0-96
> root@ubuntu:~# tail /proc/cpuinfo
>
> processor : 96
> cpu : POWER8E (raw), altivec supported
> clock : 3425.000000MHz
> revision : 2.1 (pvr 004b 0201)
>
> timebase : 512000000
> platform : pSeries
> model : IBM pSeries (emulated by qemu)
> machine : CHRP IBM pSeries (emulated by qemu)
>
>
> Userspace tool common name: libvirt, qemu
>
> The userspace tool has the following bit modes: both
>
> Userspace rpm: qemu-kvm 1:2.8+dfsg-
> 2ubuntu1 ppc64el,ii libvirt-bin
> 2.5.0-3ubuntu2 ppc64el
>
> Userspace tool obtained from project website: na
>
> Guest Details:
> ...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-03-06 15:21 EDT-------
(In reply to comment #10)
> > Next, I will look at ubuntu-1704 kernel to figure out which commits are
> > required for this to work.
>
> commit 3dbbaf200f532e01e56168b8339f2981f2cb1d67
> Author: Michael Roth <email address hidden>
> Date: Mon Feb 20 19:12:18 2017 -0600
>
> powerpc/pseries: Advertise Hot Plug Event support to firmware
>
> With the inclusion of commit 333f7b76865b ("powerpc/pseries: Implement
> indexed-count hotplug memory add") and commit 753843471cbb
> ("powerpc/pseries: Implement indexed-count hotplug memory remove"), we
> now have complete handling of the RTAS hotplug event format as described
> by PAPR via ACR "PAPR Changes for Hotplug RTAS Events".
>
> This capability is indicated by byte 6, bit 2 (5 in IBM numbering) of
> architecture option vector 5, and allows for greater control over
> cpu/memory/pci hot plug/unplug operations.
>
> Existing pseries kernels will utilize this capability based on the
> existence of the /event-sources/hot-plug-events DT property, so we
> only need to advertise it via CAS and do not need a corresponding
> FW_FEATURE_* value to test for.
>
> Signed-off-by: Michael Roth <email address hidden>
> Signed-off-by: Michael Ellerman <email address hidden>
>
> The above commit is missing from 1704 kernel. This commit is needed to fix
> the issue seen in this bugzilla. More importantly, this commit is needed for
> memory unplug support.
>
> Vipin - Can you follow up with Ubuntu and get this patch included into 1704
> kernel ?

Because QEMU will also use the flag set by the above patch as an indicator for memory unplug support, as well as count+indexed-based memory hotplug support, I think we'd also want to include the patches required for those memory hotplug operations as well. Otherwise memory hotplug/unplug may not function correctly with the patch applied.

The following list of commits should do it (from most recent to oldest commit):

3dbbaf2 powerpc/pseries: Advertise Hot Plug Event support to firmware
943db62 powerpc/pseries: Revert 'Auto-online hotplugged memory'
7538434 powerpc/pseries: Implement indexed-count hotplug memory remove
333f7b7 powerpc/pseries: Implement indexed-count hotplug memory add
673bc43 powerpc/pseries: Report DLPAR capabilities

The "Report DLPAR capabilities" one is optional for KVM, but may be needed in order for PowerVM guests to use the in-kernel mechanisms for cpu/memory.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.10.0-13.15

---------------
linux (4.10.0-13.15) zesty; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1671614

  * ehci-platform needed in usb-modules udeb (LP: #1671589)
    - d-i: add ehci-platform to usb-modules

  * irqchip/gic-v3-its: Enable cacheable attribute Read-allocate hints
    (LP: #1671598)
    - irqchip/gic-v3-its: Enable cacheable attribute Read-allocate hints

  * iommu: Fix static checker warning in iommu_insert_device_resv_regions
    (LP: #1671599)
    - iommu: Fix static checker warning in iommu_insert_device_resv_regions

  * QDF2400: Fix panic introduced by erratum 1003 (LP: #1671602)
    - arm64: Avoid clobbering mm in erratum workaround on QDF2400

  * QDF2400 PCI ports require ACS quirk (LP: #1671601)
    - PCI: Add ACS quirk for Qualcomm QDF2400 and QDF2432

  * tty: pl011: Work around QDF2400 E44 stuck BUSY bit (LP: #1671600)
    - tty: pl011: Work around QDF2400 E44 stuck BUSY bit

  * CVE-2017-2636
    - tty: n_hdlc: get rid of racy n_hdlc.tbuf

  * Sync virtualbox to 5.1.16-dfsg-1 in zesty (LP: #1671470)
    - ubuntu: vbox -- Update to 5.1.16-dfsg-1

 -- Tim Gardner <email address hidden> Thu, 09 Mar 2017 06:16:24 -0700

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-04-12 00:42 EDT-------
Tested on guest kernel version. 4.10.0-19-generic and issue is fixed, though it takes quite (~ few secs) time, that would be different item.

#uname -a
Linux ubuntu 4.10.0-19-generic #21-Ubuntu SMP Thu Apr 6 17:03:05 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

# virsh vcpucount virt-tests-vm1 --guest
1

# time virsh setvcpus virt-tests-vm1 255 --live

real 0m10.541s
user 0m0.013s
sys 0m0.011s

# virsh vcpucount virt-tests-vm1 --guest
226

# virsh vcpucount virt-tests-vm1 --guest
255-------------------------------------------------------OK

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.