Calling rmdir() on a resctrl monitor group results in segmentation fault and hangs the system

Bug #1873126 reported by Mutong Xie
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-signed-hwe (Ubuntu)
New
Undecided
Unassigned

Bug Description

On Intel Xeon processors newer than the E5 v4 family, calling rmdir() on a resctrl monitor-only group causes a segmentation fault in kernel. After the segfault many operation will hang including the bug report command `ubuntu-bug linux`. Even the `reboot` command hangs and a hardware reset is required to restore the normal state.

Reproduction steps:

1. Confirm that we're on the latest hwe kernel for 16.04 (4.15.0-96-generic for now)

```
$ uname -a
Linux <hostname> 4.15.0-96-generic #97~16.04.1-Ubuntu SMP Wed Apr 1 03:03:31 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
```

2. Confirm that we have a Intel RDT Memory Bandwidth Monitoring capable CPU (mine is E5-2690 v4)

```
$ lscpu
...
Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
...
```

3. Execute the following command as root to create a resctrl monitor group

```
# mount -t resctrl resctrl /sys/fs/resctrl
# mkdir /sys/fs/resctrl/mon_groups/test
# ls /sys/fs/resctrl/mon_groups/test
cpus cpus_list mon_data tasks
```

We can see that the monitor group is created normally.

4. Remove the newly-created monitor group, and segfault happens

```
# rmdir /sys/fs/resctrl/mon_groups/test
Segmentation fault
```

Guesses:

I believe that there is a bug in Bionic kernel's upstream stable patchset 2020-02-26 (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1864904). The corresponding commit is `c9c5f0ce9900a99433bb44e88ccc89665be15a07` ("x86/resctrl: Fix use-after-free due to inaccurate refcount of rdtgroup"), which puts the free function `free_all_child_rdtgrp()` in the wrong place.

The commit above fixes a race condition when removing a resctrl control group. Commit message says `Fix it by moving free_all_child_rdtgrp() to after kernfs_remove() in rdtgroup_rmdir_ctrl() to ensure it has the accurate refcount of rdtgrp` but the commit actually moves `free_all_child_rdtgrp()` to another function named `rdtgroup_rmdir_mon()`. Additionally, the "backporting notes" section in the commit message is also confusing. It points out that the function modified in upstream commit `fa7d949337cc` ("x86/resctrl: Rename and move rdt files to a separate directory") is related to control group, but it mentions monitor group related function in stable trees.

Since I'm using the latest HWE kernel for 16.04 which backports Bionic's kernel patches, I encountered this issue in 16.04.

Fixes and test results:

I moved `free_all_child_rdtgrp()` back to the original function `rdtgroup_rmdir_ctrl()`, right after `kernfs_remove()` according to the original commit message, compiled it and booted into the modified kernel. It turns out that the segfault no longer happens.

I created a patch based on Bionic kernel's master branch. I have no knowledge about x86 architecture so I'm not sure that whether it is the correct way to fix the issue. Hopefully someone can have it reviewed and I will try to sumbit a kernel patch (I have no experience about this before... sorry about that). Thanks!

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.15.0-96-generic 4.15.0-96.97~16.04.1
ProcVersionSignature: Ubuntu 4.15.0-96.97~16.04.1-generic 4.15.18
Uname: Linux 4.15.0-96-generic x86_64
NonfreeKernelModules: nvidia_uvm nvidia_drm nvidia_modeset nvidia
ApportVersion: 2.20.1-0ubuntu2.23
Architecture: amd64
Date: Thu Apr 16 11:01:58 2020
InstallationDate: Installed on 2018-10-30 (533 days ago)
InstallationMedia: Ubuntu 16.04.4 LTS "Xenial Xerus" - Release amd64 (20180228)
SourcePackage: linux-signed-hwe
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Mutong Xie (mutxts) wrote :
tags: added: patch
Mutong Xie (mutxts)
tags: added: resctrl
Revision history for this message
Mutong Xie (mutxts) wrote :

dmesg containing the call trace is attached below.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.