[UBUNTU 20.04] Include patches to avoid self-detected stall with Secure Execution
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu on IBM z Systems |
Fix Released
|
High
|
Skipper Bug Screeners | ||
linux (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
High
|
Canonical Kernel Team | ||
Jammy |
Fix Released
|
High
|
Canonical Kernel Team |
Bug Description
SRU Justification:
==================
[Impact]
* On IBM Z secure execution environments under heavy load
(means with over-committed resources - KVM guests)
rcu_sched self-detected stalls can occur,
which lead to LPAR crashes.
[Fix]
* 57c5df13eca4 57c5df13eca4017
* 1e2aa46de526 1e2aa46de526a5a
* f0a1a0615a6f f0a1a0615a6ff6d
[Test Plan]
* An IBM z15 or LinuxONE III LPAR with FC 115 (secure execution)
enabled is required.
* Installation of Ubuntu Server 20.04 LTS (18.04 with hwe-5.4)
or 22.04 LTS on top.
* Install a kernel that incl. the above two patches/commits
* Bring the system under high load with KVM guests.
* Monitor dmesg for 'rcu_sched self-detected stalls'
and/or look for crashes.
* Due to hardware requirements this test needs to be conducted by IBM.
[Where problems could occur]
* The definition from 57c5df13eca4 are missing in both jammy
and focal, but shouldn't harm.
* The change in 1e2aa46de526 only uses uv_call_sched instead
of just uv_call, which should lead to a snappier system
under high load, but may consume overall some more cycles.
* With f0a1a0615a6f the uv_call_sched cannot simply replace
uv_call, due to locks being held.
* Instead __uv_call is replacing uv_call, which does not loop.
* But due to these changes of the (uv) calls,
- in case erroneous - they may lead to wrong states,
and even broken ultravisor calls
and with that broken secure execution (SE).
* As a side effect the uv might no longer loop over all pages,
and in worst case leaving some unprotected.
* All this is s390x-only functionality,
that is only available on IBM z15 / LinuxONE III systems and newer,
and only is the optional feature 'FC 115' in place,
which is limited to 'secure-execution' workloads.
[Other Info]
* Patches are upstream accepted with kernel 5.16.
* Commit 1e2aa46de526 is already included in jammy
but 57c5df13eca4 and f0a1a0615a6f are missing.
* Focal requires all 3 commits 57c5df13eca4, 1e2aa46de526 and f0a1a0615a6f.
* Since impish is very close to it's EOL, it's not covered by this SRU.
__________
---Problem Description---
rcu_sched self-detected stall with Secure Execution
When the system is busy and additional Secure Execution guests are started, the LPAR crashes.
Christian Borntraeger looked at the stack trace and identified two commits which should fix the issue:
1e2aa46de526a5a
and
f0a1a0615a6ff6d
Please include these two fixes into 20.04, and 18.04 HWE.
Here the stack trace:
[592792.725078] rcu: INFO: rcu_sched self-detected stall on CPU
[592792.725089] rcu: 4-....: (2099 ticks this GP) idle=7d2/
[592792.725133] (t=2100 jiffies g=26268505 q=410280)
[592792.725135] Task dump for CPU 4:
[592792.725137] qemu-system-s39 R running task 0 2557923 1644255 0x06000004
[592792.725139] Call Trace:
[592792.725146] ([<000000566e2d
[592792.725150] [<000000566dab6
[592792.725151] [<000000566e2df
[592792.725154] [<000000566db05
[592792.725156] [<000000566db13
[592792.725160] [<000000566db24
[592792.725161] [<000000566db25
[592792.725163] [<000000566db14
[592792.725165] [<000000566db14
[592792.725167] [<000000566da14
[592792.725170] [<000000566e2ee
[592792.725174] [<000000566da2b
[592792.725175] ([<000000566da2
[592792.725180] [<000000566da6e
[592792.725183] [<000000566da53
[592792.725184] [<000000566da55
[592792.725187] [<000000566da44
[592792.725191] [<000000566dceb
[592792.725193] [<000000566dceb
[592792.725194] [<000000566dceb
[592792.725195] [<000000566e2ee
Contact Information = <email address hidden>, <email address hidden>
---uname output---
5.4.0-90-generic #101-Ubuntu
Machine Type = 8562 A00-GT2
---System Hang---
LPAR crashed and needed to be re-booted
---Debugger---
A debugger is not configured
---Steps to Reproduce---
Cause high load. Then start Secure Execution enabled KVM guest
CVE References
tags: | added: architecture-s39064 bugnameltc-198658 severity-high targetmilestone-inin2004 |
Changed in ubuntu: | |
assignee: | nobody → Skipper Bug Screeners (skipper-screen-team) |
affects: | ubuntu → linux (Ubuntu) |
Changed in ubuntu-z-systems: | |
assignee: | nobody → Skipper Bug Screeners (skipper-screen-team) |
Changed in linux (Ubuntu): | |
assignee: | Skipper Bug Screeners (skipper-screen-team) → Frank Heimes (fheimes) |
Changed in ubuntu-z-systems: | |
importance: | Undecided → High |
Changed in linux (Ubuntu): | |
importance: | Undecided → High |
description: | updated |
Changed in linux (Ubuntu Jammy): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Focal): | |
status: | In Progress → Fix Committed |
Changed in ubuntu-z-systems: | |
status: | In Progress → Fix Committed |
Changed in ubuntu-z-systems: | |
status: | Fix Committed → Fix Released |
Even if the two commits applied cleanly (well for jammy only one of them is needed, see bug description for more details), I get compile errors, like:
On 22.04: >/arch/ s390/kernel/ uv.c: In function ‘make_secure_pte’: >/arch/ s390/kernel/ uv.c:198: 19: error: ‘UVC_CC_OK’ undeclared (first use in this function) >/arch/ s390/kernel/ uv.c:198: 19: note: each undeclared identifier is reported only once for each function it appears in >/arch/ s390/kernel/ uv.c:200: 24: error: ‘UVC_CC_BUSY’ undeclared (first use in this function); did you mean ‘SIGP_CC_BUSY’? >/arch/ s390/kernel/ uv.c:200: 45: error: ‘UVC_CC_PARTIAL’ undeclared (first use in this function) >>/scripts/ Makefile. build:285: arch/s390/ kernel/ uv.o] Error 1 >>/scripts/ Makefile. build:548: arch/s390/kernel] Error 2 >>/Makefile: 1875: arch/s390] Error 2
"
/<<PKGBUILDDIR>
/<<PKGBUILDDIR>
198 | if (cc == UVC_CC_OK)
| ^~~~~~~~~
/<<PKGBUILDDIR>
/<<PKGBUILDDIR>
200 | else if (cc == UVC_CC_BUSY || cc == UVC_CC_PARTIAL)
| ^~~~~~~~~~~
| SIGP_CC_BUSY
/<<PKGBUILDDIR>
200 | else if (cc == UVC_CC_BUSY || cc == UVC_CC_PARTIAL)
| ^~~~~~~~~~~~~~
make[4]: *** [/<<PKGBUILDDIR
make[3]: *** [/<<PKGBUILDDIR
make[2]: *** [/<<PKGBUILDDIR
make[2]: *** Waiting for unfinished jobs....
"
Similar on 20.04: >/arch/ s390/kernel/ uv.c: In function ‘make_secure_pte’: >/arch/ s390/kernel/ uv.c:195: 12: error: ‘UVC_CC_OK’ undeclared (first use in this function) >/arch/ s390/kernel/ uv.c:195: 12: note: each undeclared identifier is reported only once for each function it appears in >/arch/ s390/kernel/ uv.c:197: 17: error: ‘UVC_CC_BUSY’ undeclared (first use in this function); did you mean ‘SIGP_CC_BUSY’? >/arch/ s390/kernel/ uv.c:197: 38: error: ‘UVC_CC_PARTIAL’ undeclared (first use in this function) >>/scripts/ Makefile. build:270: arch/s390/ kernel/ uv.o] Error 1 >>/scripts/ Makefile. build:519: arch/s390/kernel] Error 2 >>/Makefile: 1762: arch/s390] Error 2
"
/<<PKGBUILDDIR>
/<<PKGBUILDDIR>
195 | if (cc == UVC_CC_OK)
| ^~~~~~~~~
/<<PKGBUILDDIR>
/<<PKGBUILDDIR>
197 | else if (cc == UVC_CC_BUSY || cc == UVC_CC_PARTIAL)
| ^~~~~~~~~~~
| SIGP_CC_BUSY
/<<PKGBUILDDIR>
197 | else if (cc == UVC_CC_BUSY || cc == UVC_CC_PARTIAL)
| ^~~~~~~~~~~~~~
make[4]: *** [/<<PKGBUILDDIR
make[3]: *** [/<<PKGBUILDDIR
make[2]: *** [/<<PKGBUILDDIR
make[2]: *** Waiting for unfinished jobs....
"
I assume that maybe a pre-required commit (or more is missing)?