[regression][powerpc] some vcpus are found offline inside guest with different vsmt setting from qemu-cmdline and breaks subsequent vcpu hotplug operation (xive)

Bug #1900241 reported by Satheesh Rajendran
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Gustavo Romero

Bug Description

Env:
Host: Power9 HW ppc64le

# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 24-31,40-159
Thread(s) per core: 4
Core(s) per socket: 16
Socket(s): 2
NUMA node(s): 2
Model: 2.3 (pvr 004e 1203)
Model name: POWER9, altivec supported
Frequency boost: enabled
CPU max MHz: 3800.0000
CPU min MHz: 2300.0000
L1d cache: 1 MiB
L1i cache: 1 MiB
L2 cache: 8 MiB
L3 cache: 160 MiB
NUMA node0 CPU(s): 24-31,40-79
NUMA node8 CPU(s): 80-159
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Mitigation; RFI Flush, L1D private per thread
Vulnerability Mds: Not affected
Vulnerability Meltdown: Mitigation; RFI Flush, L1D private per thread
Vulnerability Spec store bypass: Mitigation; Kernel entry/exit barrier (eieio)
Vulnerability Spectre v1: Mitigation; __user pointer sanitization, ori31 speculation barrier enabled
Vulnerability Spectre v2: Mitigation; Software count cache flush (hardware accelerated), Software link stack flush
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Host Kernel: 5.9.0-0.rc8.28.fc34.ppc64le (Fedora rawhide)
Guest Kernel: Fedora33(5.8.6-301.fc33.ppc64le)

Qemu: e12ce85b2c79d83a340953291912875c30b3af06 (qemu/master)

Steps to reproduce:

Boot below kvm guest: (-M pseries,vsmt=2 -smp 8,cores=8,threads=1)

 /home/sath/qemu/build/qemu-system-ppc64 -name vm1 -M pseries,vsmt=2 -accel kvm -m 4096 -smp 8,cores=8,threads=1 -nographic -nodefaults -serial mon:stdio -vga none -nographic -device virtio-scsi-pci -drive file=/home/sath/tests/data/avocado-vt/images/fdevel-ppc64le.qcow2,if=none,id=hd0,format=qcow2,cache=none -device scsi-hd,drive=hd0

lscpu inside guest:
Actual:
[root@atest-guest ~]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0,2,4,6
Off-line CPU(s) list: 1,3,5,7 --------------------------NOK
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Model: 2.3 (pvr 004e 1203)
Model name: POWER9 (architected), altivec supported
Hypervisor vendor: KVM
Virtualization type: para
L1d cache: 128 KiB
L1i cache: 128 KiB
NUMA node0 CPU(s): 0,2,4,6
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Mitigation; RFI Flush, L1D private per thread
Vulnerability Mds: Not affected
Vulnerability Meltdown: Mitigation; RFI Flush, L1D private per thread
Vulnerability Spec store bypass: Mitigation; Kernel entry/exit barrier (eieio)
Vulnerability Spectre v1: Mitigation; __user pointer sanitization, ori31
                                 speculation barrier enabled
Vulnerability Spectre v2: Mitigation; Software count cache flush (hardwar
                                 e accelerated), Software link stack flush
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Expected:

[root@atest-guest ~]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Model: 2.3 (pvr 004e 1203)
Model name: POWER9 (architected), altivec supported
Hypervisor vendor: KVM
Virtualization type: para
L1d cache: 256 KiB
L1i cache: 256 KiB
NUMA node0 CPU(s): 0-7
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Mitigation; RFI Flush, L1D private per thread
Vulnerability Mds: Not affected
Vulnerability Meltdown: Mitigation; RFI Flush, L1D private per thread
Vulnerability Spec store bypass: Mitigation; Kernel entry/exit barrier (eieio)
Vulnerability Spectre v1: Mitigation; __user pointer sanitization, ori31
                                 speculation barrier enabled
Vulnerability Spectre v2: Mitigation; Software count cache flush (hardwar
                                 e accelerated), Software link stack flush
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

There by further vcpuhotplug operation fails...

Tags: kvm powerpc xive
Revision history for this message
Satheesh Rajendran (sathnaga) wrote :
Download full text (4.1 KiB)

Did a git bisect and the bad commit is

acbdb9956fe93f4669141f103cb543d3025775db is the first bad commit
commit acbdb9956fe93f4669141f103cb543d3025775db
Author: Cédric Le Goater <email address hidden>
Date: Thu Aug 20 15:45:46 2020 +0200

    spapr/xive: Allocate IPIs independently from the other sources

    The vCPU IPIs are now allocated in kvmppc_xive_cpu_connect() when the
    vCPU connects to the KVM device and not when all the sources are reset
    in kvmppc_xive_source_reset()

    This requires extra care for hotplug vCPUs and VM restore.

    Signed-off-by: Cédric Le Goater <email address hidden>
    Message-Id: <email address hidden>
    Signed-off-by: David Gibson <email address hidden>

 hw/intc/spapr_xive_kvm.c | 47 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 42 insertions(+), 5 deletions(-)

# git bisect log
git bisect start
# good: [d0ed6a69d399ae193959225cdeaa9382746c91cc] Update version for v5.1.0 release
git bisect good d0ed6a69d399ae193959225cdeaa9382746c91cc
# bad: [7daf8f8d011cdd5d3e86930ed2bde969425c790c] Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
git bisect bad 7daf8f8d011cdd5d3e86930ed2bde969425c790c
# skip: [7595a65818ea9b49c36650a8c217a1ef9bd6e62a] hw/riscv: Sort the Kconfig options in alphabetical order
git bisect skip 7595a65818ea9b49c36650a8c217a1ef9bd6e62a
# skip: [3b65b742543bc6c2ad35e3b42401a26b48a87f26] target/hppa: Fix boot with old Linux installation CDs
git bisect skip 3b65b742543bc6c2ad35e3b42401a26b48a87f26
# bad: [f4ef8c9cc10b3bee829b9775879d4ff9f77c2442] Merge remote-tracking branch 'remotes/ehabkost/tags/machine-next-pull-request' into staging
git bisect bad f4ef8c9cc10b3bee829b9775879d4ff9f77c2442
# good: [4ee40a6b98c02b72fc5dd262df9d3ac8680d767b] hw/usb: Add U2F device check to passthru mode
git bisect good 4ee40a6b98c02b72fc5dd262df9d3ac8680d767b
# skip: [fe4b0b5bfa96c38ad1cad0689a86cca9f307e353] tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem
git bisect skip fe4b0b5bfa96c38ad1cad0689a86cca9f307e353
# skip: [287b1defeb44398d02669d97ebdc347d650f274d] target/microblaze: Cache mem_index in DisasContext
git bisect skip 287b1defeb44398d02669d97ebdc347d650f274d
# skip: [7a1fb2ef40df508e90eb756a80d67e6435246cae] block/nvme: Extract nvme_poll_queue()
git bisect skip 7a1fb2ef40df508e90eb756a80d67e6435246cae
# good: [536e340f464d7c2ef55cca47c7535d9409bf03c7] target/microblaze: Convert msrclr, msrset to decodetree
git bisect good 536e340f464d7c2ef55cca47c7535d9409bf03c7
# good: [227de21ed0759e275a469394af72c999d0134bb5] Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20200903' into staging
git bisect good 227de21ed0759e275a469394af72c999d0134bb5
# bad: [b95ba83fc56ebfc4b6869f21db0c757c0c191104] Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.2-20200908' into staging
git bisect bad b95ba83fc56ebfc4b6869f21db0c757c0c191104
# good: [789035f1239054331b335801a06bdbef026f02e1] oss-fuzz: fix rpath
git bisect good 789035f1239054331b335801a06bdbef026f02e1
# good: [00942071a0eabeb3ebc3bd594296859587f8f3c8] Merge remote-tracking branch 'remotes/rth/tags/pull-mb-20200907-2' into staging
git bisect good 00942071a0eabeb3...

Read more...

tags: added: kvm powerpc xive
summary: - some vcpus are found offline inside guest with different vsmt setting
- from qemu-cmdline and breaks subsequent vcpu hotplug operation (xive)
+ [regression][powerpc] some vcpus are found offline inside guest with
+ different vsmt setting from qemu-cmdline and breaks subsequent vcpu
+ hotplug operation (xive)
Gustavo Romero (gromero)
Changed in qemu:
assignee: nobody → Gustavo Romero (gromero)
Revision history for this message
Greg Kurz (gkurz) wrote :

Fixed by reverting the series that caused the regression.

https://git.qemu.org/?p=qemu.git;a=commit;h=6d24795ee7e3199d199d3c415312c93382ad1807

The optimization needs to be reworked later.

Changed in qemu:
status: New → Fix Committed
Revision history for this message
Satheesh Rajendran (sathnaga) wrote :

Tested with latest upstream and found the issue is fixed,

# git log -1
commit dd3d2340c4076d1735cd0f7cb61f4d8622b9562d (HEAD -> master, tag: v5.2.0-rc3, origin/master, origin/HEAD)
Author: Peter Maydell <email address hidden>
Date: Tue Nov 24 22:13:30 2020 +0000

    Update version for v5.2.0-rc3 release

    Signed-off-by: Peter Maydell <email address hidden>

/home/sath/qemu/build/qemu-system-ppc64 -name vm1 -M pseries,vsmt=2 -accel kvm -m 4096 -smp 8,cores=8,threads=1 -nographic -nodefaults -serial mon:stdio -vga none -nographic -device virtio-scsi-pci -drive file=/home/sath/tests/data/avocado-vt/images/fdevel-ppc64le.qcow2,if=none,id=hd0,format=qcow2,cache=none -device scsi-hd,drive=hd0

Fedora 33 (Thirty Three Prerelease)
Kernel 5.8.13-300.fc33.ppc64le on an ppc64le (hvc0)

atest-guest login: root
Password:
Login incorrect

atest-guest login: root
Password:
Last login: Wed Nov 18 09:03:24 on hvc0
[root@atest-guest ~]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1

Regards,
-Satheesh

Changed in qemu:
status: Fix Committed → Fix Released
status: Fix Released → Fix Committed
Revision history for this message
Thomas Huth (th-huth) wrote :

Released with QEMU v5.2.0.

Changed in qemu:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.