Calling rmdir() on a resctrl monitor group results in segmentation fault and hangs the system
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-signed-hwe (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
On Intel Xeon processors newer than the E5 v4 family, calling rmdir() on a resctrl monitor-only group causes a segmentation fault in kernel. After the segfault many operation will hang including the bug report command `ubuntu-bug linux`. Even the `reboot` command hangs and a hardware reset is required to restore the normal state.
Reproduction steps:
1. Confirm that we're on the latest hwe kernel for 16.04 (4.15.0-96-generic for now)
```
$ uname -a
Linux <hostname> 4.15.0-96-generic #97~16.04.1-Ubuntu SMP Wed Apr 1 03:03:31 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
```
2. Confirm that we have a Intel RDT Memory Bandwidth Monitoring capable CPU (mine is E5-2690 v4)
```
$ lscpu
...
Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
...
```
3. Execute the following command as root to create a resctrl monitor group
```
# mount -t resctrl resctrl /sys/fs/resctrl
# mkdir /sys/fs/
# ls /sys/fs/
cpus cpus_list mon_data tasks
```
We can see that the monitor group is created normally.
4. Remove the newly-created monitor group, and segfault happens
```
# rmdir /sys/fs/
Segmentation fault
```
Guesses:
I believe that there is a bug in Bionic kernel's upstream stable patchset 2020-02-26 (https:/
The commit above fixes a race condition when removing a resctrl control group. Commit message says `Fix it by moving free_all_
Since I'm using the latest HWE kernel for 16.04 which backports Bionic's kernel patches, I encountered this issue in 16.04.
Fixes and test results:
I moved `free_all_
I created a patch based on Bionic kernel's master branch. I have no knowledge about x86 architecture so I'm not sure that whether it is the correct way to fix the issue. Hopefully someone can have it reviewed and I will try to sumbit a kernel patch (I have no experience about this before... sorry about that). Thanks!
ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-
ProcVersionSign
Uname: Linux 4.15.0-96-generic x86_64
NonfreeKernelMo
ApportVersion: 2.20.1-0ubuntu2.23
Architecture: amd64
Date: Thu Apr 16 11:01:58 2020
InstallationDate: Installed on 2018-10-30 (533 days ago)
InstallationMedia: Ubuntu 16.04.4 LTS "Xenial Xerus" - Release amd64 (20180228)
SourcePackage: linux-signed-hwe
UpgradeStatus: No upgrade log present (probably fresh install)
dmesg containing the call trace is attached below.