[focal] disk I/O performance regression

Bug #1880943 reported by Frode Nordahl
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Freshly deployed equal machines with flat disk layout as configured by MAAS, performance governor changed from ondemand to performance.

Bionic (4.15.0-101-generic) fio ext4 on spinning rust:
Run status group 0 (all jobs):
  WRITE: bw=1441MiB/s (1511MB/s), 1441MiB/s-1441MiB/s (1511MB/s-1511MB/s), io=1267GiB (1360GB), run=900001-900001msec

Disk stats (read/write):
  sda: ios=2/332092, merge=0/567, ticks=196/123698912, in_queue=123760992, util=95.91%

Bionic (4.15.0-101-generic) fio ext4 on nvme:
Run status group 0 (all jobs):
  WRITE: bw=2040MiB/s (2139MB/s), 2040MiB/s-2040MiB/s (2139MB/s-2139MB/s), io=1793GiB (1925GB), run=900001-900001msec

Disk stats (read/write):
  nvme0n1: ios=0/2617321, merge=0/465, ticks=0/233900784, in_queue=232549460, util=78.97%

Focal (5.4.0-31-generic) fio ext4 on spinning rust:
Run status group 0 (all jobs):
  WRITE: bw=108MiB/s (113MB/s), 108MiB/s-108MiB/s (113MB/s-113MB/s), io=100GiB (107GB), run=947255-947255msec

Disk stats (read/write):
  sda: ios=65/430942, merge=0/980, ticks=1655/5837146, in_queue=4898628, util=48.75%

Focal (5.4.0-31-generic) fio ext4 on nvme:
 status group 0 (all jobs):
  WRITE: bw=361MiB/s (378MB/s), 361MiB/s-361MiB/s (378MB/s-378MB/s), io=320GiB (344GB), run=907842-907842msec

Disk stats (read/write):
  nvme0n1: ios=0/2847497, merge=0/382, ticks=0/236641266, in_queue=230690420, util=78.95%

Freshly deployed equal machines with bcache as configured by MAAS, performance governor changed from ondemand to performance.

Bionic (4.15.0-101-generic):
Run status group 0 (all jobs):
  WRITE: bw=2080MiB/s (2181MB/s), 2080MiB/s-2080MiB/s (2181MB/s-2181MB/s), io=1828GiB (1963GB), run=900052-900052msec

Disk stats (read/write):
    bcache3: ios=0/53036, merge=0/0, ticks=0/15519188, in_queue=15522076, util=91.81%, aggrios=0/212383, aggrmerge=0/402, aggrticks=0/59247094, aggrin_queue=59256646, aggrutil=91.82%
  nvme0n1: ios=0/7169, merge=0/397, ticks=0/0, in_queue=0, util=0.00%
  sda: ios=0/417598, merge=0/407, ticks=0/118494188, in_queue=118513292, util=91.82%

Bionic (5.3.0-53-generic):
Run status group 0 (all jobs):
  WRITE: bw=2725MiB/s (2858MB/s), 2725MiB/s-2725MiB/s (2858MB/s-2858MB/s), io=2395GiB (2572GB), run=900001-900001msec

Disk stats (read/write):
    bcache3: ios=96/3955, merge=0/0, ticks=4/895876, in_queue=895880, util=2.63%, aggrios=48/222087, aggrmerge=0/391, aggrticks=3/2730760, aggrin_queue=2272248, aggrutil=90.56%
  nvme0n1: ios=96/2755, merge=0/373, ticks=6/78, in_queue=8, util=1.12%
  sda: ios=0/441420, merge=0/409, ticks=0/5461443, in_queue=4544488, util=90.56%

Focal (5.4.0-31-generic):
Run status group 0 (all jobs):
  WRITE: bw=117MiB/s (123MB/s), 117MiB/s-117MiB/s (123MB/s-123MB/s), io=110GiB (118GB), run=959924-959924msec

Disk stats (read/write):
    bcache3: ios=0/4061, merge=0/0, ticks=0/1571168, in_queue=1571168, util=1.40%, aggrios=0/226807, aggrmerge=0/183, aggrticks=0/2816798, aggrin_queue=2331594, aggrutil=52.79%
  nvme0n1: ios=0/1474, merge=0/46, ticks=0/50, in_queue=0, util=0.53%
  sda: ios=0/452140, merge=0/321, ticks=0/5633547, in_queue=4663188, util=52.79%

; fio-seq-write.job for fiotest

[global]
name=fio-seq-write
filename=fio-seq-write
rw=write
bs=256K
direct=0
numjobs=1
time_based=1
runtime=900

[file1]
size=10G
ioengine=libaio
iodepth=16

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.4.0-31-generic 5.4.0-31.35
ProcVersionSignature: Ubuntu 5.4.0-31.35-generic 5.4.34
Uname: Linux 5.4.0-31-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 May 27 11:52 seq
 crw-rw---- 1 root audio 116, 33 May 27 11:52 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu27
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CasperMD5CheckResult: skip
Date: Wed May 27 12:44:10 2020
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
 Bus 002 Device 002: ID 8087:8002 Intel Corp.
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 001 Device 003: ID 413c:a001 Dell Computer Corp. Hub
 Bus 001 Device 002: ID 8087:800a Intel Corp.
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Lsusb-t:
 /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M
     |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/8p, 480M
 /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M
     |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/6p, 480M
         |__ Port 6: Dev 3, If 0, Class=Hub, Driver=hub/6p, 480M
MachineType: Dell Inc. PowerEdge R630
PciMultimedia:

ProcFB: 0 mgag200drmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.4.0-31-generic root=UUID=8dd6086f-e616-466d-a424-e5556fd75045 ro intel_iommu=on iommu=pt probe_vf=0
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-31-generic N/A
 linux-backports-modules-5.4.0-31-generic N/A
 linux-firmware 1.187
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 11/08/2016
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 2.3.4
dmi.board.name: 02C2CP
dmi.board.vendor: Dell Inc.
dmi.board.version: A03
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr2.3.4:bd11/08/2016:svnDellInc.:pnPowerEdgeR630:pvr:rvnDellInc.:rn02C2CP:rvrA03:cvnDellInc.:ct23:cvr:
dmi.product.name: PowerEdge R630
dmi.product.sku: SKU=NotProvided;ModelName=PowerEdge R630
dmi.sys.vendor: Dell Inc.

Revision history for this message
Frode Nordahl (fnordahl) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
tags: added: bionic
Ryan Beisner (1chb1n)
tags: added: uosci
Revision history for this message
Frode Nordahl (fnordahl) wrote :

Note that to rule out any hardware configuration/malfunction issue between the two hosts I have re-run the tests on the exact same machine used for the non-performant Focal tests with good performance for Bionic.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Tried 4.15.0-102, 5.4.0-33 and mailing kernel all three show similar performance (~2100MB/s).

Please test latest mainline kernel:
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.7-rc7/

And decide what's the next step based on the performance of mainline kernel.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Note that I didn't add any kernel parameter, so my first guess would be "iommu=pt" doesn't get passthrough right.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Frode Nordahl (fnordahl) wrote :

Thank you for the suggestion to test mainline.

Regarding the kernel parameters: The test is run on the host and not in a virtual machine, for completeness I have removed the iommu and vf related kernel parameters and they do not affect the outcome of the test.

Unfortunately using the mainline kernel does not appear to help on the machines in question:

# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.7.0-050700rc7-generic root=UUID=99527048-4021-4a56-a7d3-21c6cb761879 ro

# uname -a
Linux node-mees 5.7.0-050700rc7-generic #202005242331 SMP Sun May 24 23:33:19 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

# cpufreq-info -o
...
CPU 47 1200000 kHz ( 41 %) - 2900000 kHz (100 %) - performance

# fio /usr/share/doc/fio/examples/fio-seq-write.fio
...
Run status group 0 (all jobs):
  WRITE: bw=63.1MiB/s (66.2MB/s), 63.1MiB/s-63.1MiB/s (66.2MB/s-66.2MB/s), io=60.0GiB (64.4GB), run=973234-973234msec

Changed in linux (Ubuntu):
status: Incomplete → New
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Would it be possible for you to do a kernel bisection?

First, find the last -rc kernel works and the first -rc kernel doesn’t work from http://kernel.ubuntu.com/~kernel-ppa/mainline/

Then,
$ sudo apt build-dep linux
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
$ cd linux
$ git bisect start
$ git bisect good $(the working version you found)
$ git bisect bad $(the non-working version found)
$ make localmodconfig
$ make -j`nproc` deb-pkg
Install the newly built kernel, then reboot with it.
If it still have the same issue,
$ git bisect bad
Otherwise,
$ git bisect good
Repeat to "make -j`nproc` deb-pkg" until you find the offending commit.

Revision history for this message
Frode Nordahl (fnordahl) wrote :

That is an excellent idea.

Before embarking on that endeavor I did a last control test which involved deploying Bionic and then installing the Focal 5.4 kernel packages, and lo and behold the system is still performant.

Redeploying Focal (with the Focal kernel obviously) makes it non-performant again.

While still baffled by the roots of the issue I will mark this bug Invalid for the kernel packages for now as there is obviously some mischief somewhere else involved in the end result. Depending on what I find the open status may or may not return.

Thank you for your support so far.

Changed in linux (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Frode Nordahl (fnordahl) wrote :

For any future travelers this issue was caused by the much debated ext4lazyinit "feature".

# iotop

Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
Current DISK READ: 0.00 B/s | Current DISK WRITE: 7.38 M/s
    TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
    947 be/4 root 0.00 B/s 0.00 B/s 0.00 % 2.96 % [ext4lazyinit]

While it on paper is accounting for a small percentage of the IO it will incur a noticeable performance hit to the system until it is done.

Revision history for this message
Frode Nordahl (fnordahl) wrote :

We have been using the ext4lazyinit for quite some time, so I guess this must be a combination of multiple things.

I see that the 5.4 kernel brings a change of io scheduler to mq-deadline.

Could the combination of ext4lazyinit+mq-deadline+rotational drives be a problem?

Revision history for this message
Frode Nordahl (fnordahl) wrote :

Playing around with the now available IO schedulers does not appear to help much and looking at iotop while the test runs does show ext4lazyinit consuming much of the IO most of the time while the test runs.

I wonder if this is a more ominous change of behavior after all

Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
Current DISK READ: 0.00 B/s | Current DISK WRITE: 22.22 M/s
    TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
    948 be/4 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [ext4lazyinit]
   5651 be/4 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % fio /usr~write.fio
     22 be/4 root 0.00 B/s 0.00 B/s 0.00 % 94.61 % [kworker~-252:384]

Reopening to get some take from kernel team on the change of IO schedulers and if this is an anticipated side effect and/or if any known regression surrounding the ext4lazyinit process.

Changed in linux (Ubuntu):
status: Invalid → New
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

If ext4lazyinit isn't in used, do we still see the performance regression?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.