[regression][powerpc] some vcpus are found offline inside guest with different vsmt setting from qemu-cmdline and breaks subsequent vcpu hotplug operation (xive)

Bug #1900241 reported by Satheesh Rajendran
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Gustavo Romero

Bug Description

Env:
Host: Power9 HW ppc64le

# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 24-31,40-159
Thread(s) per core: 4
Core(s) per socket: 16
Socket(s): 2
NUMA node(s): 2
Model: 2.3 (pvr 004e 1203)
Model name: POWER9, altivec supported
Frequency boost: enabled
CPU max MHz: 3800.0000
CPU min MHz: 2300.0000
L1d cache: 1 MiB
L1i cache: 1 MiB
L2 cache: 8 MiB
L3 cache: 160 MiB
NUMA node0 CPU(s): 24-31,40-79
NUMA node8 CPU(s): 80-159
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Mitigation; RFI Flush, L1D private per thread
Vulnerability Mds: Not affected
Vulnerability Meltdown: Mitigation; RFI Flush, L1D private per thread
Vulnerability Spec store bypass: Mitigation; Kernel entry/exit barrier (eieio)
Vulnerability Spectre v1: Mitigation; __user pointer sanitization, ori31 speculation barrier enabled
Vulnerability Spectre v2: Mitigation; Software count cache flush (hardware accelerated), Software link stack flush
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Host Kernel: 5.9.0-0.rc8.28.fc34.ppc64le (Fedora rawhide)
Guest Kernel: Fedora33(5.8.6-301.fc33.ppc64le)

Qemu: e12ce85b2c79d83a340953291912875c30b3af06 (qemu/master)

Steps to reproduce:

Boot below kvm guest: (-M pseries,vsmt=2 -smp 8,cores=8,threads=1)

 /home/sath/qemu/build/qemu-system-ppc64 -name vm1 -M pseries,vsmt=2 -accel kvm -m 4096 -smp 8,cores=8,threads=1 -nographic -nodefaults -serial mon:stdio -vga none -nographic -device virtio-scsi-pci -drive file=/home/sath/tests/data/avocado-vt/images/fdevel-ppc64le.qcow2,if=none,id=hd0,format=qcow2,cache=none -device scsi-hd,drive=hd0

lscpu inside guest:
Actual:
[root@atest-guest ~]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0,2,4,6
Off-line CPU(s) list: 1,3,5,7 --------------------------NOK
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Model: 2.3 (pvr 004e 1203)
Model name: POWER9 (architected), altivec supported
Hypervisor vendor: KVM
Virtualization type: para
L1d cache: 128 KiB
L1i cache: 128 KiB
NUMA node0 CPU(s): 0,2,4,6
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Mitigation; RFI Flush, L1D private per thread
Vulnerability Mds: Not affected
Vulnerability Meltdown: Mitigation; RFI Flush, L1D private per thread
Vulnerability Spec store bypass: Mitigation; Kernel entry/exit barrier (eieio)
Vulnerability Spectre v1: Mitigation; __user pointer sanitization, ori31
                                 speculation barrier enabled
Vulnerability Spectre v2: Mitigation; Software count cache flush (hardwar
                                 e accelerated), Software link stack flush
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Expected:

[root@atest-guest ~]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Model: 2.3 (pvr 004e 1203)
Model name: POWER9 (architected), altivec supported
Hypervisor vendor: KVM
Virtualization type: para
L1d cache: 256 KiB
L1i cache: 256 KiB
NUMA node0 CPU(s): 0-7
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Mitigation; RFI Flush, L1D private per thread
Vulnerability Mds: Not affected
Vulnerability Meltdown: Mitigation; RFI Flush, L1D private per thread
Vulnerability Spec store bypass: Mitigation; Kernel entry/exit barrier (eieio)
Vulnerability Spectre v1: Mitigation; __user pointer sanitization, ori31
                                 speculation barrier enabled
Vulnerability Spectre v2: Mitigation; Software count cache flush (hardwar
                                 e accelerated), Software link stack flush
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

There by further vcpuhotplug operation fails...

Tags: kvm powerpc xive
Revision history for this message
Satheesh Rajendran (sathnaga) wrote :
Download full text (4.1 KiB)

Did a git bisect and the bad commit is

acbdb9956fe93f4669141f103cb543d3025775db is the first bad commit
commit acbdb9956fe93f4669141f103cb543d3025775db
Author: Cédric Le Goater <email address hidden>
Date: Thu Aug 20 15:45:46 2020 +0200

    spapr/xive: Allocate IPIs independently from the other sources

    The vCPU IPIs are now allocated in kvmppc_xive_cpu_connect() when the
    vCPU connects to the KVM device and not when all the sources are reset
    in kvmppc_xive_source_reset()

    This requires extra care for hotplug vCPUs and VM restore.

    Signed-off-by: Cédric Le Goater <email address hidden>
    Message-Id: <email address hidden>
    Signed-off-by: David Gibson <email address hidden>

 hw/intc/spapr_xive_kvm.c | 47 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 42 insertions(+), 5 deletions(-)

# git bisect log
git bisect start
# good: [d0ed6a69d399ae193959225cdeaa9382746c91cc] Update version for v5.1.0 release
git bisect good d0ed6a69d399ae193959225cdeaa9382746c91cc
# bad: [7daf8f8d011cdd5d3e86930ed2bde969425c790c] Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
git bisect bad 7daf8f8d011cdd5d3e86930ed2bde969425c790c
# skip: [7595a65818ea9b49c36650a8c217a1ef9bd6e62a] hw/riscv: Sort the Kconfig options in alphabetical order
git bisect skip 7595a65818ea9b49c36650a8c217a1ef9bd6e62a
# skip: [3b65b742543bc6c2ad35e3b42401a26b48a87f26] target/hppa: Fix boot with old Linux installation CDs
git bisect skip 3b65b742543bc6c2ad35e3b42401a26b48a87f26
# bad: [f4ef8c9cc10b3bee829b9775879d4ff9f77c2442] Merge remote-tracking branch 'remotes/ehabkost/tags/machine-next-pull-request' into staging
git bisect bad f4ef8c9cc10b3bee829b9775879d4ff9f77c2442
# good: [4ee40a6b98c02b72fc5dd262df9d3ac8680d767b] hw/usb: Add U2F device check to passthru mode
git bisect good 4ee40a6b98c02b72fc5dd262df9d3ac8680d767b
# skip: [fe4b0b5bfa96c38ad1cad0689a86cca9f307e353] tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem
git bisect skip fe4b0b5bfa96c38ad1cad0689a86cca9f307e353
# skip: [287b1defeb44398d02669d97ebdc347d650f274d] target/microblaze: Cache mem_index in DisasContext
git bisect skip 287b1defeb44398d02669d97ebdc347d650f274d
# skip: [7a1fb2ef40df508e90eb756a80d67e6435246cae] block/nvme: Extract nvme_poll_queue()
git bisect skip 7a1fb2ef40df508e90eb756a80d67e6435246cae
# good: [536e340f464d7c2ef55cca47c7535d9409bf03c7] target/microblaze: Convert msrclr, msrset to decodetree
git bisect good 536e340f464d7c2ef55cca47c7535d9409bf03c7
# good: [227de21ed0759e275a469394af72c999d0134bb5] Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20200903' into staging
git bisect good 227de21ed0759e275a469394af72c999d0134bb5
# bad: [b95ba83fc56ebfc4b6869f21db0c757c0c191104] Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.2-20200908' into staging
git bisect bad b95ba83fc56ebfc4b6869f21db0c757c0c191104
# good: [789035f1239054331b335801a06bdbef026f02e1] oss-fuzz: fix rpath
git bisect good 789035f1239054331b335801a06bdbef026f02e1
# good: [00942071a0eabeb3ebc3bd594296859587f8f3c8] Merge remote-tracking branch 'remotes/rth/tags/pull-mb-20200907-2' into staging
git bisect good 00942071a0eabeb3...

Read more...

tags: added: kvm powerpc xive
summary: - some vcpus are found offline inside guest with different vsmt setting
- from qemu-cmdline and breaks subsequent vcpu hotplug operation (xive)
+ [regression][powerpc] some vcpus are found offline inside guest with
+ different vsmt setting from qemu-cmdline and breaks subsequent vcpu
+ hotplug operation (xive)
Gustavo Romero (gromero)
Changed in qemu:
assignee: nobody → Gustavo Romero (gromero)
Revision history for this message
Greg Kurz (gkurz) wrote :

Fixed by reverting the series that caused the regression.

https://git.qemu.org/?p=qemu.git;a=commit;h=6d24795ee7e3199d199d3c415312c93382ad1807

The optimization needs to be reworked later.

Changed in qemu:
status: New → Fix Committed
Revision history for this message
Satheesh Rajendran (sathnaga) wrote :

Tested with latest upstream and found the issue is fixed,

# git log -1
commit dd3d2340c4076d1735cd0f7cb61f4d8622b9562d (HEAD -> master, tag: v5.2.0-rc3, origin/master, origin/HEAD)
Author: Peter Maydell <email address hidden>
Date: Tue Nov 24 22:13:30 2020 +0000

    Update version for v5.2.0-rc3 release

    Signed-off-by: Peter Maydell <email address hidden>

/home/sath/qemu/build/qemu-system-ppc64 -name vm1 -M pseries,vsmt=2 -accel kvm -m 4096 -smp 8,cores=8,threads=1 -nographic -nodefaults -serial mon:stdio -vga none -nographic -device virtio-scsi-pci -drive file=/home/sath/tests/data/avocado-vt/images/fdevel-ppc64le.qcow2,if=none,id=hd0,format=qcow2,cache=none -device scsi-hd,drive=hd0

Fedora 33 (Thirty Three Prerelease)
Kernel 5.8.13-300.fc33.ppc64le on an ppc64le (hvc0)

atest-guest login: root
Password:
Login incorrect

atest-guest login: root
Password:
Last login: Wed Nov 18 09:03:24 on hvc0
[root@atest-guest ~]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1

Regards,
-Satheesh

Changed in qemu:
status: Fix Committed → Fix Released
status: Fix Released → Fix Committed
Revision history for this message
Thomas Huth (th-huth) wrote :

Released with QEMU v5.2.0.

Changed in qemu:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers