Activity log for bug #1979296

Date Who What changed Old value New value Message
2022-06-21 09:59:32 bugproxy bug added bug
2022-06-21 09:59:34 bugproxy tags architecture-s39064 bugnameltc-198658 severity-high targetmilestone-inin2004
2022-06-21 09:59:35 bugproxy ubuntu: assignee Skipper Bug Screeners (skipper-screen-team)
2022-06-21 09:59:39 bugproxy affects ubuntu linux (Ubuntu)
2022-06-21 10:13:53 Frank Heimes bug task added ubuntu-z-systems
2022-06-21 10:14:11 Frank Heimes ubuntu-z-systems: assignee Skipper Bug Screeners (skipper-screen-team)
2022-06-21 10:14:15 Frank Heimes linux (Ubuntu): assignee Skipper Bug Screeners (skipper-screen-team) Frank Heimes (fheimes)
2022-06-21 10:14:21 Frank Heimes ubuntu-z-systems: importance Undecided High
2022-06-21 10:15:37 Frank Heimes linux (Ubuntu): importance Undecided High
2022-06-21 11:07:28 Frank Heimes description ---Problem Description--- rcu_sched self-detected stall with Secure Execution When the system is busy and additional Secure Execution guests are started, the LPAR crashes. Christian Borntraeger looked at the stack trace and identified two commits which should fix the issue: 1e2aa46de526a5adafe580bca4c25856bb06f09e and f0a1a0615a6ff6d38af2c65a522698fb4bb85df6 Please include these two fixes into 20.04, and 18.04 HWE. Here the stack trace: [592792.725078] rcu: INFO: rcu_sched self-detected stall on CPU [592792.725089] rcu: 4-....: (2099 ticks this GP) idle=7d2/1/0x4000000000000002 softirq=3920041/3920042 fqs=984 [592792.725133] (t=2100 jiffies g=26268505 q=410280) [592792.725135] Task dump for CPU 4: [592792.725137] qemu-system-s39 R running task 0 2557923 1644255 0x06000004 [592792.725139] Call Trace: [592792.725146] ([<000000566e2dcf52>] show_stack+0x7a/0xc0) [592792.725150] [<000000566dab696c>] sched_show_task.part.0+0xdc/0x100 [592792.725151] [<000000566e2df248>] rcu_dump_cpu_stacks+0xc0/0x100 [592792.725154] [<000000566db0510c>] rcu_sched_clock_irq+0x75c/0x980 [592792.725156] [<000000566db1326c>] update_process_times+0x3c/0x80 [592792.725160] [<000000566db24fea>] tick_sched_handle.isra.0+0x4a/0x70 [592792.725161] [<000000566db2528e>] tick_sched_timer+0x5e/0xc0 [592792.725163] [<000000566db14294>] __hrtimer_run_queues+0x114/0x2f0 [592792.725165] [<000000566db14fdc>] hrtimer_interrupt+0x12c/0x2a0 [592792.725167] [<000000566da14b6a>] do_IRQ+0xaa/0xb0 [592792.725170] [<000000566e2eed08>] ext_int_handler+0x130/0x134 [592792.725174] [<000000566da2bad8>] gmap_make_secure+0x1c8/0x340 [592792.725175] ([<000000566da2b9fe>] gmap_make_secure+0xee/0x340) [592792.725180] [<000000566da6e796>] kvm_s390_pv_unpack+0xc6/0x2b0 [592792.725183] [<000000566da535c0>] kvm_s390_handle_pv+0x390/0x580 [592792.725184] [<000000566da55b30>] kvm_arch_vm_ioctl+0x250/0x9e0 [592792.725187] [<000000566da44c26>] kvm_vm_ioctl+0x396/0x760 [592792.725191] [<000000566dceb0b6>] do_vfs_ioctl+0x376/0x690 [592792.725193] [<000000566dceb454>] ksys_ioctl+0x84/0xb0 [592792.725194] [<000000566dceb4ea>] __s390x_sys_ioctl+0x2a/0x40 [592792.725195] [<000000566e2ee6b2>] system_call+0x2a6/0x2c8 Contact Information = stefan.amann@de.ibm.com, cborntra@de.ibm.com ---uname output--- 5.4.0-90-generic #101-Ubuntu Machine Type = 8562 A00-GT2 ---System Hang--- LPAR crashed and needed to be re-booted ---Debugger--- A debugger is not configured ---Steps to Reproduce--- Cause high load. Then start Secure Execution enabled KVM guest SRU Justification: ================== [Impact] * On IBM Z secure execution environments under heavy load (means with over-committed resources - KVM guests) rcu_sched self-detected stalls can occur, which lead to LPAR crashes. [Fix] * 1e2aa46de526 1e2aa46de526a5adafe580bca4c25856bb06f09e "KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm" * f0a1a0615a6f f0a1a0615a6ff6d38af2c65a522698fb4bb85df6 "KVM: s390: pv: avoid stalls when making pages secure" [Test Plan] * An IBM z15 or LinuxONE III LPAR with FC 115 (secure execution) enabled is required. * Installation of Ubuntu Server 20.04 LTS (18.04 with hwe-5.4) or 22.04 LTS on top. * Install a kernel that incl. the above two patches/commits * Bring the system under high load with KVM guests. * Monitor dmesg for 'rcu_sched self-detected stalls' and/or look for crashes. * Due to hardware requirements this test needs to be conducted by IBM. [Where problems could occur] * The change in 1e2aa46de526 only uses uv_call_sched instead of just uv_call, which should lead to a snappier system under high load, but may consume overall some more cycles. * With f0a1a0615a6f the uv_call_sched cannot simply replace uv_call, due to locks being held. * Instead __uv_call is replacing uv_call, which does not loop. * But due to these changes of the (uv) calls, - in case erroneous - they may lead to wrong states, and even broken ultravisor calls and with that broken secure execution (SE). * As a side effect the uv might no longer loop over all pages, and in worst case leaving some unprotected. * All this is s390x-only functionality, that is only available on IBM z15 / LinuxONE III systems and newer, and only is the optional feature 'FC 115' in place, which is limited to 'secure-execution' workloads. [Other Info] * Patches are upstream accepted with kernel 5.16. * Commit 1e2aa46de526 is already included in jammy but f0a1a0615a6f is missing. * Focal requires both commits 1e2aa46de526 and f0a1a0615a6f. * Since impish is very close to it's EOL, it's not covered by this SRU. __________ ---Problem Description--- rcu_sched self-detected stall with Secure Execution When the system is busy and additional Secure Execution guests are started, the LPAR crashes. Christian Borntraeger looked at the stack trace and identified two commits which should fix the issue: 1e2aa46de526a5adafe580bca4c25856bb06f09e and f0a1a0615a6ff6d38af2c65a522698fb4bb85df6 Please include these two fixes into 20.04, and 18.04 HWE. Here the stack trace: [592792.725078] rcu: INFO: rcu_sched self-detected stall on CPU [592792.725089] rcu: 4-....: (2099 ticks this GP) idle=7d2/1/0x4000000000000002 softirq=3920041/3920042 fqs=984 [592792.725133] (t=2100 jiffies g=26268505 q=410280) [592792.725135] Task dump for CPU 4: [592792.725137] qemu-system-s39 R running task 0 2557923 1644255 0x06000004 [592792.725139] Call Trace: [592792.725146] ([<000000566e2dcf52>] show_stack+0x7a/0xc0) [592792.725150] [<000000566dab696c>] sched_show_task.part.0+0xdc/0x100 [592792.725151] [<000000566e2df248>] rcu_dump_cpu_stacks+0xc0/0x100 [592792.725154] [<000000566db0510c>] rcu_sched_clock_irq+0x75c/0x980 [592792.725156] [<000000566db1326c>] update_process_times+0x3c/0x80 [592792.725160] [<000000566db24fea>] tick_sched_handle.isra.0+0x4a/0x70 [592792.725161] [<000000566db2528e>] tick_sched_timer+0x5e/0xc0 [592792.725163] [<000000566db14294>] __hrtimer_run_queues+0x114/0x2f0 [592792.725165] [<000000566db14fdc>] hrtimer_interrupt+0x12c/0x2a0 [592792.725167] [<000000566da14b6a>] do_IRQ+0xaa/0xb0 [592792.725170] [<000000566e2eed08>] ext_int_handler+0x130/0x134 [592792.725174] [<000000566da2bad8>] gmap_make_secure+0x1c8/0x340 [592792.725175] ([<000000566da2b9fe>] gmap_make_secure+0xee/0x340) [592792.725180] [<000000566da6e796>] kvm_s390_pv_unpack+0xc6/0x2b0 [592792.725183] [<000000566da535c0>] kvm_s390_handle_pv+0x390/0x580 [592792.725184] [<000000566da55b30>] kvm_arch_vm_ioctl+0x250/0x9e0 [592792.725187] [<000000566da44c26>] kvm_vm_ioctl+0x396/0x760 [592792.725191] [<000000566dceb0b6>] do_vfs_ioctl+0x376/0x690 [592792.725193] [<000000566dceb454>] ksys_ioctl+0x84/0xb0 [592792.725194] [<000000566dceb4ea>] __s390x_sys_ioctl+0x2a/0x40 [592792.725195] [<000000566e2ee6b2>] system_call+0x2a6/0x2c8 Contact Information = stefan.amann@de.ibm.com, cborntra@de.ibm.com ---uname output--- 5.4.0-90-generic #101-Ubuntu Machine Type = 8562 A00-GT2 ---System Hang---  LPAR crashed and needed to be re-booted ---Debugger---  A debugger is not configured ---Steps to Reproduce---  Cause high load. Then start Secure Execution enabled KVM guest
2022-06-21 15:24:46 Frank Heimes ubuntu-z-systems: status New Incomplete
2022-06-21 15:24:48 Frank Heimes linux (Ubuntu): status New Incomplete
2022-06-21 17:50:17 Frank Heimes description SRU Justification: ================== [Impact] * On IBM Z secure execution environments under heavy load (means with over-committed resources - KVM guests) rcu_sched self-detected stalls can occur, which lead to LPAR crashes. [Fix] * 1e2aa46de526 1e2aa46de526a5adafe580bca4c25856bb06f09e "KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm" * f0a1a0615a6f f0a1a0615a6ff6d38af2c65a522698fb4bb85df6 "KVM: s390: pv: avoid stalls when making pages secure" [Test Plan] * An IBM z15 or LinuxONE III LPAR with FC 115 (secure execution) enabled is required. * Installation of Ubuntu Server 20.04 LTS (18.04 with hwe-5.4) or 22.04 LTS on top. * Install a kernel that incl. the above two patches/commits * Bring the system under high load with KVM guests. * Monitor dmesg for 'rcu_sched self-detected stalls' and/or look for crashes. * Due to hardware requirements this test needs to be conducted by IBM. [Where problems could occur] * The change in 1e2aa46de526 only uses uv_call_sched instead of just uv_call, which should lead to a snappier system under high load, but may consume overall some more cycles. * With f0a1a0615a6f the uv_call_sched cannot simply replace uv_call, due to locks being held. * Instead __uv_call is replacing uv_call, which does not loop. * But due to these changes of the (uv) calls, - in case erroneous - they may lead to wrong states, and even broken ultravisor calls and with that broken secure execution (SE). * As a side effect the uv might no longer loop over all pages, and in worst case leaving some unprotected. * All this is s390x-only functionality, that is only available on IBM z15 / LinuxONE III systems and newer, and only is the optional feature 'FC 115' in place, which is limited to 'secure-execution' workloads. [Other Info] * Patches are upstream accepted with kernel 5.16. * Commit 1e2aa46de526 is already included in jammy but f0a1a0615a6f is missing. * Focal requires both commits 1e2aa46de526 and f0a1a0615a6f. * Since impish is very close to it's EOL, it's not covered by this SRU. __________ ---Problem Description--- rcu_sched self-detected stall with Secure Execution When the system is busy and additional Secure Execution guests are started, the LPAR crashes. Christian Borntraeger looked at the stack trace and identified two commits which should fix the issue: 1e2aa46de526a5adafe580bca4c25856bb06f09e and f0a1a0615a6ff6d38af2c65a522698fb4bb85df6 Please include these two fixes into 20.04, and 18.04 HWE. Here the stack trace: [592792.725078] rcu: INFO: rcu_sched self-detected stall on CPU [592792.725089] rcu: 4-....: (2099 ticks this GP) idle=7d2/1/0x4000000000000002 softirq=3920041/3920042 fqs=984 [592792.725133] (t=2100 jiffies g=26268505 q=410280) [592792.725135] Task dump for CPU 4: [592792.725137] qemu-system-s39 R running task 0 2557923 1644255 0x06000004 [592792.725139] Call Trace: [592792.725146] ([<000000566e2dcf52>] show_stack+0x7a/0xc0) [592792.725150] [<000000566dab696c>] sched_show_task.part.0+0xdc/0x100 [592792.725151] [<000000566e2df248>] rcu_dump_cpu_stacks+0xc0/0x100 [592792.725154] [<000000566db0510c>] rcu_sched_clock_irq+0x75c/0x980 [592792.725156] [<000000566db1326c>] update_process_times+0x3c/0x80 [592792.725160] [<000000566db24fea>] tick_sched_handle.isra.0+0x4a/0x70 [592792.725161] [<000000566db2528e>] tick_sched_timer+0x5e/0xc0 [592792.725163] [<000000566db14294>] __hrtimer_run_queues+0x114/0x2f0 [592792.725165] [<000000566db14fdc>] hrtimer_interrupt+0x12c/0x2a0 [592792.725167] [<000000566da14b6a>] do_IRQ+0xaa/0xb0 [592792.725170] [<000000566e2eed08>] ext_int_handler+0x130/0x134 [592792.725174] [<000000566da2bad8>] gmap_make_secure+0x1c8/0x340 [592792.725175] ([<000000566da2b9fe>] gmap_make_secure+0xee/0x340) [592792.725180] [<000000566da6e796>] kvm_s390_pv_unpack+0xc6/0x2b0 [592792.725183] [<000000566da535c0>] kvm_s390_handle_pv+0x390/0x580 [592792.725184] [<000000566da55b30>] kvm_arch_vm_ioctl+0x250/0x9e0 [592792.725187] [<000000566da44c26>] kvm_vm_ioctl+0x396/0x760 [592792.725191] [<000000566dceb0b6>] do_vfs_ioctl+0x376/0x690 [592792.725193] [<000000566dceb454>] ksys_ioctl+0x84/0xb0 [592792.725194] [<000000566dceb4ea>] __s390x_sys_ioctl+0x2a/0x40 [592792.725195] [<000000566e2ee6b2>] system_call+0x2a6/0x2c8 Contact Information = stefan.amann@de.ibm.com, cborntra@de.ibm.com ---uname output--- 5.4.0-90-generic #101-Ubuntu Machine Type = 8562 A00-GT2 ---System Hang---  LPAR crashed and needed to be re-booted ---Debugger---  A debugger is not configured ---Steps to Reproduce---  Cause high load. Then start Secure Execution enabled KVM guest SRU Justification: ================== [Impact] * On IBM Z secure execution environments under heavy load (means with over-committed resources - KVM guests) rcu_sched self-detected stalls can occur, which lead to LPAR crashes. [Fix] * 57c5df13eca4 57c5df13eca4017ed28f9375dc1d246ec0f54217 "KVM: s390: pv: add macros for UVC CC values" * 1e2aa46de526 1e2aa46de526a5adafe580bca4c25856bb06f09e "KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm" * f0a1a0615a6f f0a1a0615a6ff6d38af2c65a522698fb4bb85df6 "KVM: s390: pv: avoid stalls when making pages secure" [Test Plan] * An IBM z15 or LinuxONE III LPAR with FC 115 (secure execution) enabled is required. * Installation of Ubuntu Server 20.04 LTS (18.04 with hwe-5.4) or 22.04 LTS on top. * Install a kernel that incl. the above two patches/commits * Bring the system under high load with KVM guests. * Monitor dmesg for 'rcu_sched self-detected stalls' and/or look for crashes. * Due to hardware requirements this test needs to be conducted by IBM. [Where problems could occur] * The definition from 57c5df13eca4 are missing in both jammy and focal, but shouldn't harm. * The change in 1e2aa46de526 only uses uv_call_sched instead of just uv_call, which should lead to a snappier system under high load, but may consume overall some more cycles. * With f0a1a0615a6f the uv_call_sched cannot simply replace uv_call, due to locks being held. * Instead __uv_call is replacing uv_call, which does not loop. * But due to these changes of the (uv) calls, - in case erroneous - they may lead to wrong states, and even broken ultravisor calls and with that broken secure execution (SE). * As a side effect the uv might no longer loop over all pages, and in worst case leaving some unprotected. * All this is s390x-only functionality, that is only available on IBM z15 / LinuxONE III systems and newer, and only is the optional feature 'FC 115' in place, which is limited to 'secure-execution' workloads. [Other Info] * Patches are upstream accepted with kernel 5.16. * Commit 1e2aa46de526 is already included in jammy but 57c5df13eca4 and f0a1a0615a6f are missing. * Focal requires all 3 commits 57c5df13eca4, 1e2aa46de526 and f0a1a0615a6f. * Since impish is very close to it's EOL, it's not covered by this SRU. __________ ---Problem Description--- rcu_sched self-detected stall with Secure Execution When the system is busy and additional Secure Execution guests are started, the LPAR crashes. Christian Borntraeger looked at the stack trace and identified two commits which should fix the issue: 1e2aa46de526a5adafe580bca4c25856bb06f09e and f0a1a0615a6ff6d38af2c65a522698fb4bb85df6 Please include these two fixes into 20.04, and 18.04 HWE. Here the stack trace: [592792.725078] rcu: INFO: rcu_sched self-detected stall on CPU [592792.725089] rcu: 4-....: (2099 ticks this GP) idle=7d2/1/0x4000000000000002 softirq=3920041/3920042 fqs=984 [592792.725133] (t=2100 jiffies g=26268505 q=410280) [592792.725135] Task dump for CPU 4: [592792.725137] qemu-system-s39 R running task 0 2557923 1644255 0x06000004 [592792.725139] Call Trace: [592792.725146] ([<000000566e2dcf52>] show_stack+0x7a/0xc0) [592792.725150] [<000000566dab696c>] sched_show_task.part.0+0xdc/0x100 [592792.725151] [<000000566e2df248>] rcu_dump_cpu_stacks+0xc0/0x100 [592792.725154] [<000000566db0510c>] rcu_sched_clock_irq+0x75c/0x980 [592792.725156] [<000000566db1326c>] update_process_times+0x3c/0x80 [592792.725160] [<000000566db24fea>] tick_sched_handle.isra.0+0x4a/0x70 [592792.725161] [<000000566db2528e>] tick_sched_timer+0x5e/0xc0 [592792.725163] [<000000566db14294>] __hrtimer_run_queues+0x114/0x2f0 [592792.725165] [<000000566db14fdc>] hrtimer_interrupt+0x12c/0x2a0 [592792.725167] [<000000566da14b6a>] do_IRQ+0xaa/0xb0 [592792.725170] [<000000566e2eed08>] ext_int_handler+0x130/0x134 [592792.725174] [<000000566da2bad8>] gmap_make_secure+0x1c8/0x340 [592792.725175] ([<000000566da2b9fe>] gmap_make_secure+0xee/0x340) [592792.725180] [<000000566da6e796>] kvm_s390_pv_unpack+0xc6/0x2b0 [592792.725183] [<000000566da535c0>] kvm_s390_handle_pv+0x390/0x580 [592792.725184] [<000000566da55b30>] kvm_arch_vm_ioctl+0x250/0x9e0 [592792.725187] [<000000566da44c26>] kvm_vm_ioctl+0x396/0x760 [592792.725191] [<000000566dceb0b6>] do_vfs_ioctl+0x376/0x690 [592792.725193] [<000000566dceb454>] ksys_ioctl+0x84/0xb0 [592792.725194] [<000000566dceb4ea>] __s390x_sys_ioctl+0x2a/0x40 [592792.725195] [<000000566e2ee6b2>] system_call+0x2a6/0x2c8 Contact Information = stefan.amann@de.ibm.com, cborntra@de.ibm.com ---uname output--- 5.4.0-90-generic #101-Ubuntu Machine Type = 8562 A00-GT2 ---System Hang---  LPAR crashed and needed to be re-booted ---Debugger---  A debugger is not configured ---Steps to Reproduce---  Cause high load. Then start Secure Execution enabled KVM guest
2022-06-21 19:41:13 Frank Heimes nominated for series Ubuntu Jammy
2022-06-21 19:41:13 Frank Heimes bug task added linux (Ubuntu Jammy)
2022-06-21 19:41:13 Frank Heimes nominated for series Ubuntu Focal
2022-06-21 19:41:13 Frank Heimes bug task added linux (Ubuntu Focal)
2022-06-21 19:41:21 Frank Heimes linux (Ubuntu): status Incomplete Invalid
2022-06-21 19:41:27 Frank Heimes linux (Ubuntu Focal): status New In Progress
2022-06-21 19:41:32 Frank Heimes linux (Ubuntu Jammy): status New In Progress
2022-06-21 19:41:36 Frank Heimes ubuntu-z-systems: status Incomplete In Progress
2022-06-21 19:41:47 Frank Heimes linux (Ubuntu Jammy): assignee Canonical Kernel Team (canonical-kernel-team)
2022-06-21 19:41:56 Frank Heimes linux (Ubuntu Focal): assignee Canonical Kernel Team (canonical-kernel-team)
2022-06-21 19:42:00 Frank Heimes linux (Ubuntu): assignee Frank Heimes (fheimes)
2022-06-21 19:42:05 Frank Heimes linux (Ubuntu): importance High Undecided
2022-06-21 19:42:08 Frank Heimes linux (Ubuntu Focal): importance Undecided High
2022-06-21 19:42:11 Frank Heimes linux (Ubuntu Jammy): importance Undecided High
2022-07-08 14:48:39 Stefan Bader linux (Ubuntu Jammy): status In Progress Fix Committed
2022-07-08 14:48:43 Stefan Bader linux (Ubuntu Focal): status In Progress Fix Committed
2022-07-08 15:00:24 Frank Heimes ubuntu-z-systems: status In Progress Fix Committed
2022-07-13 13:18:06 Ubuntu Kernel Bot tags architecture-s39064 bugnameltc-198658 severity-high targetmilestone-inin2004 architecture-s39064 bugnameltc-198658 severity-high targetmilestone-inin2004 verification-needed-focal
2022-07-13 15:59:31 bugproxy tags architecture-s39064 bugnameltc-198658 severity-high targetmilestone-inin2004 verification-needed-focal architecture-s39064 bugnameltc-198658 severity-high targetmilestone-inin2004 verification-done-focal
2022-07-15 13:26:36 Ubuntu Kernel Bot tags architecture-s39064 bugnameltc-198658 severity-high targetmilestone-inin2004 verification-done-focal architecture-s39064 bugnameltc-198658 severity-high targetmilestone-inin2004 verification-done-focal verification-needed-jammy
2022-07-15 14:09:42 bugproxy tags architecture-s39064 bugnameltc-198658 severity-high targetmilestone-inin2004 verification-done-focal verification-needed-jammy architecture-s39064 bugnameltc-198658 severity-high targetmilestone-inin2004 verification-done-focal verification-done-jammy
2022-07-28 10:25:44 Launchpad Janitor linux (Ubuntu Jammy): status Fix Committed Fix Released
2022-07-28 10:25:44 Launchpad Janitor cve linked 2022-1652
2022-07-28 10:25:44 Launchpad Janitor cve linked 2022-1679
2022-07-28 10:25:44 Launchpad Janitor cve linked 2022-28893
2022-07-28 10:25:44 Launchpad Janitor cve linked 2022-34918
2022-08-09 20:50:06 Launchpad Janitor linux (Ubuntu Focal): status Fix Committed Fix Released
2022-08-09 20:50:06 Launchpad Janitor cve linked 2022-1734
2022-08-09 20:50:06 Launchpad Janitor cve linked 2022-2586
2022-08-09 20:50:06 Launchpad Janitor cve linked 2022-2588
2022-08-10 11:43:31 Frank Heimes ubuntu-z-systems: status Fix Committed Fix Released