Azure: Jammy fio test hangs, swiotlb buffers exhausted
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-azure (Ubuntu) |
New
|
Undecided
|
Unassigned | ||
Jammy |
Fix Released
|
Critical
|
Tim Gardner | ||
Kinetic |
Fix Released
|
High
|
Tim Gardner |
Bug Description
SRU Justification
[Impact]
Hello Canonical Team,
This issue was found while doing the validation on CPC's Jammy CVM image. We are up against a tight timeline to deliver this to a partner on 10/5. Would appreciate prioritizing this.
While running fio, the command fails to exit after 2 minutes. I watched `top` as the command hung and I saw kworkers getting blocked.
sudo fio --ioengine=libaio --bs=4K --filename=
Example system logs:
-------
[ 1096.297641] INFO: task kworker/u192:0:8 blocked for more than 120 seconds.
[ 1096.302785] Tainted: G W 5.15.0-1024-azure #30-Ubuntu
[ 1096.306312] "echo 0 > /proc/sys/
[ 1096.310489] INFO: task jbd2/sda1-8:1113 blocked for more than 120 seconds.
[ 1096.313900] Tainted: G W 5.15.0-1024-azure #30-Ubuntu
[ 1096.317481] "echo 0 > /proc/sys/
[ 1096.324117] INFO: task systemd-
[ 1096.331219] Tainted: G W 5.15.0-1024-azure #30-Ubuntu
[ 1096.335332] "echo 0 > /proc/sys/
-------
-------
[ 3241.013230] systemd-
[ 3261.492691] systemd-
-------
TOP report:
-------
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
417 root 20 0 0 0 0 R 66.2 0.0 0:34.61 ksoftirqd/59
435 root 20 0 0 0 0 I 24.5 0.0 0:09.03 kworker/
416 root rt 0 0 0 0 S 23.5 0.0 0:01.86 migration/59
366 root 0 -20 0 0 0 I 19.2 0.0 0:16.64 kworker/
378 root 0 -20 0 0 0 I 17.9 0.0 0:15.71 kworker/
455 root 0 -20 0 0 0 I 17.9 0.0 0:14.76 kworker/
135 root 0 -20 0 0 0 I 17.5 0.0 0:13.08 kworker/
420 root 0 -20 0 0 0 I 16.9 0.0 0:14.63 kworker/
...
-------
LISAv3 Testcase: perf_premium_
Image : "canonical-test 0001-com-
VMSize : "Standard_
For repro-ability, I am seeing this every time I run the storage perf tests. It always seems to happen on iteration 9 or 10. When running manually, I had to run the command three or four times to reproduce the issue.
[Test Case]
Microsoft tested, requires lots of cores (96) and disks (16)
[Where things could go wrong]
swiotlb buffers could be double freed.
[Other Info]
SF: #00349781
CVE References
Changed in linux-azure (Ubuntu Jammy): | |
status: | In Progress → Fix Committed |
Changed in linux-azure (Ubuntu Kinetic): | |
assignee: | nobody → Tim Gardner (timg-tpi) |
importance: | Undecided → High |
status: | New → Fix Committed |
tags: |
added: verification-done-kinetic removed: verification-needed-kinetic |
https://<email address hidden>/ is considered a root cause fix.