L2 Guest migration: continuously dumping while running NFS guest migration
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
The Ubuntu-power-systems project |
Fix Released
|
Critical
|
Ubuntu on IBM Power Systems Bug Triage | ||
linux (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Noble |
Fix Released
|
High
|
Canonical Kernel Team | ||
Oracular |
Fix Released
|
High
|
Unassigned |
Bug Description
SRU Justification:
[ Impact ]
* While doing ISST testing it turned out that a 2nd level (KVM)
guest (aka VM) continuously dumped when running an NFS
guest migration.
[ Test Plan ]
* Setup two IBM Power 10 system (with firmware 1060, that offers
support for KVM) with Ubuntu Server 24.04 for ppc64el.
* Setup qemu/KVM on both on these system to allow guest migration.
* Setup a KVM guest and place its disk on an NFS volume.
* Now initiate a guest migration.
* Without the two patches the initiator system will start to dump.
* Since this setup requires a special firmware level,
the verification will be done by the IBM Power team.
[ Where problems could occur ]
* Although the patch set looks huge,
the patches themselves are relatively small and less invasive
and I would consider them mainly as fixes.
* kvmppc_
set() for MMCR3.
* And The kvmppc_
the SIAR instead of SDAR - which is quite traceable.
* Then a one-reg interface for DEXCR register KVM_REG_PPC_DEXCR
is introduced. Here issues can happen if the initialization
is done wrong or in the case statement.
A fix was added to keep nested guest DEXCR in sync.
The guest state element defined for DEXCR was already there,
but not really considered - this is fixed now (DEXCR GSID).
If initialization was done wrong or code in case stmt,
this can harm the guest state.
Guest state may get out of sync.
* Another one-reg register identifier was introduced
that is used to read and set the virtual HASHKEYR
for the guest during enter/exit with KVM_REG_
Again initialization and the case code are critical.
Code was added to keep nested guest HASHKEYR in sync.
Again the state element defined for HASHKEYR was there,
but not considered, what is fixed now (HASHKEYR GSID)
If initialization was done wrong or code in case stmt,
this can harm the guest state.
This can harm the L2 guest during enter or exit.
* Again another one-reg identifier was introduced
that is used to read and set the virtual HASHPKEYR
for the guest during enter/exit with KVM_REG_
And again the guest state element defined for HASHPKEYR
was there but ignored which is now fixed (HASHPKEYR GSID).
If initialization was done wrong or code in case stmt,
this can harm the guest state.
This can harm the L2 guest during enter or exit.
[ Other Info ]
* Since (nested) KVM support is new on P10,
this does not affect older Power generation
(P9 is the only other hw generation that is supported by 24.04,
but it only supports native virtualization).
* Both patches are upstream accepted since v6.11(-rc1),
hence will be in oracular
and are also upstream tagged as stable updates.
* Since the required firmware FW1060 is relatively new,
we can assume that not many user ran into this issue yet.
__________
== Comment: #0 - SEETEENA THOUFEEK <email address hidden> - 2024-08-09 03:50:24 ==
+++ This bug was initially created as a clone of Bug #206737 +++
---Problem Description---
L2 Guest migration: evelp2g4[L2]: while running NFS guest migration continuously dumping smp_call_
---uname output---
NA
Machine Type = NA
Contact Information = NA
[79205.163691] Hardware name: IBM pSeries (emulated by qemu) POWER10 (raw) 0x800200 0xf000006 of:SLOF,HEAD hv:linux,kvm pSeries
[79205.163834] NIP: c0000000002bb7a4 LR: c0000000002bb750 CTR: c0000000000d192c
[79205.163929] REGS: c0000003871cf1b0 TRAP: 0900 Tainted: G L
[79205.165041] MSR: 800000000280b033 <SF,VEC,
[79205.165266] CFAR: 0000000000000000 IRQMASK: 0
[79205.171660] NIP [c0000000002bb7a4] smp_call_
[79205.171752] LR [c0000000002bb750] smp_call_
[79205.171835] Call Trace:
[79205.171869] [c0000003871cf450] [c0000000002bbc58] smp_call_
[79205.171986] [c0000003871cf520] [c0000000000ac4d0] radix__
[79205.173636] [c0000003871cf560] [c00000000052e900] tlb_finish_
[79205.173754] [c0000003871cf590] [c00000000052a280] exit_mmap+
[79205.173848] [c0000003871cf6c0] [c00000000016ec9c] __mmput+0x54/0x1d4
[79205.173939] [c0000003871cf6f0] [c0000000006385c4] begin_new_
[79205.174037] [c0000003871cf780] [c0000000006edea8] load_elf_
[79205.174136] [c0000003871cf880] [c0000000006361c8] bprm_execve+
[79205.174219] [c0000003871cf950] [c000000000637988] do_execveat_
[79205.174316] [c0000003871cf9f0] [c000000000638e38] sys_execve+
[79205.174399] [c0000003871cfa20] [c00000000002fec8] system_
[79205.174497] [c0000003871cfe50] [c00000000000d05c] system_
[79205.176245] --- interrupt: 3000 at 0x7fff95b10b08
[79205.176326] NIP: 00007fff95b10b08 LR: 00007fff95b10b08 CTR: 0000000000000000
[79205.176438] REGS: c0000003871cfe80 TRAP: 3000 Tainted: G L (
[79205.176558] MSR: 800000000280f033 <SF,VEC,
[79205.176686] IRQMASK: 0
[79205.177505] NIP [00007fff95b10b08] 0x7fff95b10b08
[79205.177578] LR [00007fff95b10b08] 0x7fff95b10b08
[79205.177649] --- interrupt: 3000
Steps to reproduce: Install the build on NFS storage guest kernel 6.8.10-300
Start the HTX workload - mdt.less
Start the NFS guest migration between the L2 hosts.
Sourece L2 host : evelp2
Target L2 host : rinlp1
migration command : virsh migrate --live --domain $vm_name qemu+ssh:
Share the same NFS storage between two hosts [here /kvm_pool]
10.33.4.
Test running : HTX
Guest state : up
------
L2 guest Config:
(1) Problem on Guest: evelp2g4
(2) PHYP/ Processor Type: KVM/P10/Everest
(3) Rootvg Filesystem: EXT4
(5) Network Bridge: Macvtap
(6) IO Disk Type/Driver: qemu-img/ qcow2
(7) Install Disk Type: Single
------
L1 host details :
MDC mode : off
(1) PHYP/ Processor Type: KVM/P10/Everest
(2) CEC Name: evelp2
(3) Rootvg Filesystem: xfs
(5) Network Interface: Dedicated Network
(6) IO Type: NVME
(8) Multipath Enabled: no
(9) Install Disk Type: Single
(10) MMU: RPT
The kernel patches are at
https://<email address hidden>/T/#t
Qemu patches are at
https:/
powerpc/
[1/8] KVM: PPC: Book3S HV: Fix the set_one_reg for MMCR3
https:/
[2/8] KVM: PPC: Book3S HV: Fix the get_one_reg of SDAR
https:/
[3/8] KVM: PPC: Book3S HV: Add one-reg interface for DEXCR register
https:/
[4/8] KVM: PPC: Book3S HV nestedv2: Keep nested guest DEXCR in sync
https:/
[5/8] KVM: PPC: Book3S HV: Add one-reg interface for HASHKEYR register
https:/
[6/8] KVM: PPC: Book3S HV nestedv2: Keep nested guest HASHKEYR in sync
https:/
[7/8] KVM: PPC: Book3S HV: Add one-reg interface for HASHPKEYR register
https:/
[8/8] KVM: PPC: Book3S HV nestedv2: Keep nested guest HASHPKEYR in sync
https:/
CVE References
tags: | added: architecture-ppc64le bugnameltc-208511 severity-critical targetmilestone-inin2404 |
Changed in ubuntu: | |
assignee: | nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) |
affects: | ubuntu → kernel-package (Ubuntu) |
affects: | kernel-package (Ubuntu) → linux (Ubuntu) |
Changed in ubuntu-power-systems: | |
assignee: | nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) |
importance: | Undecided → Critical |
Changed in linux (Ubuntu): | |
importance: | Undecided → High |
Changed in linux (Ubuntu Noble): | |
importance: | Undecided → High |
status: | New → Triaged |
Changed in ubuntu-power-systems: | |
status: | New → Triaged |
Changed in linux (Ubuntu Oracular): | |
assignee: | Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → nobody |
Changed in linux (Ubuntu Oracular): | |
status: | New → Fix Committed |
summary: |
- ISST-LTE:KOP:1060FW:evelp2 :L2 Guest migration: evelp2g4[L2]: while - running NFS guest migration continuously dumping - smp_call_function_many_cond+0x500/0x738 (unreliable) and watchdog: BUG: - soft lockup - CPU#14 stuck for 223s! [systemd-homed} (Fedora) + L2 Guest migration: continuously dumping while running NFS guest + migration |
description: | updated |
Changed in linux (Ubuntu Oracular): | |
status: | Fix Committed → Fix Released |
Changed in linux (Ubuntu Noble): | |
status: | In Progress → Fix Committed |
Changed in ubuntu-power-systems: | |
status: | In Progress → Fix Committed |
Changed in ubuntu-power-systems: | |
status: | Fix Committed → Fix Released |
A test kernel was build in this PPA: /launchpad. net/~fheimes/ +archive/ ubuntu/ lp2076406
https:/