Ubuntu
linux package

Disk IO very slow on kernel 4.15.0-142-generic

Bionic (18.04)
Bug #1928744

Bug #1928744 reported by Mariusz on 2021-05-18

This bug affects 7 people

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Confirmed	Undecided	Unassigned
	Bionic	Confirmed	Undecided	Unassigned

Bug Description

After upgrading Ubuntu 18.04 kernel from 4.15.0-136-generic to kernel 4.15.0-142-generic disk IOs (writes) are suffering from terrible delay.

Time required to write 1GB of data to the dist is ~12 minutes.

1GB in 12 minutes:
strace -ttvvff -o mel.strace dd bs=1024k count=1024 if=/dev/zero of=/data/disk_test/dd_test

Before the upgrade, it took seconds to complete.
The only change which was done on the system was security upgrade done with unattended-upgrade utility and main repositories for bionic and bionic-security.

This is physical server LENOVO System x3650 M5, with Symbios Logic MegaRAID SAS-3 3108 disk controller.

The only difference which I see is that there are more modules loaded by the latest kernel.

I would expect that the performance would be the same. There were no changes to the physical server and the disks are reported as online and healthy. No only dd is affected but also all disk-based operation by barman (we are using postgres steaming backup along with rsync/ssh archiving).

I think this could be a kernel bug. I would be happy to provide other information which would help in fixing this problem.

=====================================================
Details:

lsb_release -rd
Description: Ubuntu 18.04.1 LTS
Release: 18.04

lsmod
Module Size Used by
ip6_tables 28672 0
iptable_filter 16384 0
ip_tables 28672 1 iptable_filter
x_tables 40960 3 iptable_filter,ip6_tables,ip_tables
tcp_diag 16384 0
udp_diag 16384 0
raw_diag 16384 0
inet_diag 24576 3 tcp_diag,raw_diag,udp_diag
unix_diag 16384 0
binfmt_misc 20480 1
ipmi_ssif 32768 0
kvm_intel 217088 0
kvm 614400 1 kvm_intel
irqbypass 16384 1 kvm
crct10dif_pclmul 16384 0
crc32_pclmul 16384 0
ghash_clmulni_intel 16384 0
pcbc 16384 0
aesni_intel 188416 0
aes_x86_64 20480 1 aesni_intel
crypto_simd 16384 1 aesni_intel
glue_helper 16384 1 aesni_intel
cryptd 24576 3 crypto_simd,ghash_clmulni_intel,aesni_intel
ipmi_si 57344 0
ipmi_devintf 20480 0
ipmi_msghandler 53248 3 ipmi_devintf,ipmi_si,ipmi_ssif
acpi_pad 180224 0
bonding 163840 0
lp 20480 0
parport 49152 1 lp
autofs4 40960 2
mgag200 45056 1
i2c_algo_bit 16384 1 mgag200
ttm 106496 1 mgag200
drm_kms_helper 172032 1 mgag200
syscopyarea 16384 1 drm_kms_helper
tg3 167936 0
sysfillrect 16384 1 drm_kms_helper
sysimgblt 16384 1 drm_kms_helper
fb_sys_fops 16384 1 drm_kms_helper
ptp 20480 1 tg3
drm 401408 4 drm_kms_helper,mgag200,ttm
pps_core 20480 1 ptp
megaraid_sas 143360 12

lspci -vnn -s 0b:00.0
0b:00.0 RAID bus controller [0104]: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] [1000:005d] (rev 02)
Subsystem: IBM MegaRAID SAS-3 3108 [Invader] [1014:0454]
Physical Slot: 9
Flags: bus master, fast devsel, latency 0, IRQ 31, NUMA node 0
I/O ports at 2e00 [size=256]
Memory at 903f0000 (64-bit, non-prefetchable) [size=64K]
Memory at 90400000 (64-bit, non-prefetchable) [size=1M]
Expansion ROM at <ignored> [disabled]
Capabilities: [50] Power Management version 3
Capabilities: [68] Express Endpoint, MSI 00
Capabilities: [d0] Vital Product Data
Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+
Capabilities: [c0] MSI-X: Enable+ Count=97 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [1e0] #19
Capabilities: [1c0] Power Budgeting <?>
Capabilities: [148] Alternative Routing-ID Interpretation (ARI)
Kernel driver in use: megaraid_sas
Kernel modules: megaraid_sas

Strace example for 10MB:
strace -ttvvff -o mel.strace dd bs=1024k count=10 if=/dev/zero of=/data/disk_test/dd_test
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 6.73972 s, 1.6 MB/s

14:59:19.145037 execve("/bin/dd", ["dd", "bs=1024k", "count=10", "if=/dev/zero", "of=/data/disk_test/dd_test"], ["LS_COLORS=rs=0:di=01;34:ln=01;36"..., "LESSCLOSE=/usr/bin/lesspipe %s %"..., "LANG=en_AU.UTF-8", "SUDO_GID=4118", "USERNAME=root", "SUDO_COMMAND=/bin/bash", "S_COLORS=auto", "USER=root", "PWD=/data/disk_test", "HOME=/root", "SUDO_USER=mariusz.boruc", "SUDO_UID=4118", "MAIL=/var/mail/root", "SHELL=/bin/bash", "TERM=xterm-256color", "SHLVL=1", "LANGUAGE=en_AU:", "LOGNAME=root", "PATH=/usr/local/sbin:/usr/local/"..., "LESSOPEN=| /usr/bin/lesspipe %s", "_=/usr/bin/strace", "OLDPWD=/data"]) = 0
14:59:19.145685 brk(NULL) = 0x563734ec4000
14:59:19.145766 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
14:59:19.145846 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
14:59:19.145909 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
14:59:19.145970 fstat(3, {st_dev=makedev(8, 3), st_ino=845, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=64, st_size=28885, st_atime=1621301366 /* 2021-05-18T10:59:26.840044503+0930 */, st_atime_nsec=840044503, st_mtime=1620610160 /* 2021-05-10T10:59:20.912915629+0930 */, st_mtime_nsec=912915629, st_ctime=1620610160 /* 2021-05-10T10:59:20.912915629+0930 */, st_ctime_nsec=912915629}) = 0
14:59:19.146055 mmap(NULL, 28885, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f49d79db000
14:59:19.146108 close(3) = 0
14:59:19.146159 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
14:59:19.146215 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
14:59:19.146270 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"..., 832) = 832
14:59:19.146327 fstat(3, {st_dev=makedev(8, 3), st_ino=1413, st_mode=S_IFREG|0755, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=3968, st_size=2030544, st_atime=1621312135 /* 2021-05-18T13:58:55.539717705+0930 */, st_atime_nsec=539717705, st_mtime=1591291526 /* 2020-06-05T02:55:26+0930 */, st_mtime_nsec=0, st_ctime=1604896088 /* 2020-11-09T14:58:08.844042263+1030 */, st_ctime_nsec=844042263}) = 0
14:59:19.146403 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f49d79d9000
14:59:19.146460 mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f49d73cb000
14:59:19.146510 mprotect(0x7f49d75b2000, 2097152, PROT_NONE) = 0
14:59:19.146567 mmap(0x7f49d77b2000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f49d77b2000
14:59:19.146629 mmap(0x7f49d77b8000, 15072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f49d77b8000
14:59:19.146689 close(3) = 0
14:59:19.146758 arch_prctl(ARCH_SET_FS, 0x7f49d79da540) = 0
14:59:19.146895 mprotect(0x7f49d77b2000, 16384, PROT_READ) = 0
14:59:19.146990 mprotect(0x563733ac0000, 4096, PROT_READ) = 0
14:59:19.147044 mprotect(0x7f49d79e3000, 4096, PROT_READ) = 0
14:59:19.147093 munmap(0x7f49d79db000, 28885) = 0
14:59:19.147190 rt_sigaction(SIGINT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
14:59:19.147248 rt_sigaction(SIGUSR1, {sa_handler=0x5637338b30e0, sa_mask=[INT USR1], sa_flags=SA_RESTORER, sa_restorer=0x7f49d7409fd0}, NULL, 8) = 0
14:59:19.147301 rt_sigaction(SIGINT, {sa_handler=0x5637338b30d0, sa_mask=[INT USR1], sa_flags=SA_RESTORER|SA_NODEFER|SA_RESETHAND, sa_restorer=0x7f49d7409fd0}, NULL, 8) = 0
14:59:19.147430 brk(NULL) = 0x563734ec4000
14:59:19.147475 brk(0x563734ee5000) = 0x563734ee5000
14:59:19.147535 openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
14:59:19.147602 fstat(3, {st_dev=makedev(8, 7), st_ino=48819, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=3168, st_size=1687536, st_atime=1621312728 /* 2021-05-18T14:08:48.519742590+0930 */, st_atime_nsec=519742590, st_mtime=1604896695 /* 2020-11-09T15:08:15.489844866+1030 */, st_mtime_nsec=489844866, st_ctime=1604896695 /* 2020-11-09T15:08:15.489844866+1030 */, st_ctime_nsec=489844866}) = 0
14:59:19.147687 mmap(NULL, 1687536, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f49d783d000
14:59:19.147746 close(3) = 0
14:59:19.147853 openat(AT_FDCWD, "/dev/zero", O_RDONLY) = 3
14:59:19.147912 dup2(3, 0) = 0
14:59:19.147960 close(3) = 0
14:59:19.148005 lseek(0, 0, SEEK_CUR) = 0
14:59:19.148051 openat(AT_FDCWD, "/data/disk_test/dd_test", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
14:59:19.293893 dup2(3, 1) = 1
14:59:19.293961 close(3) = 0
14:59:19.294012 mmap(NULL, 1060864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f49d72c8000
14:59:19.294069 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
14:59:19.294495 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
14:59:26.025616 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
14:59:26.025759 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
14:59:26.026304 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
14:59:26.026416 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
14:59:26.026966 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
14:59:26.027067 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
14:59:26.027605 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
14:59:26.027714 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
14:59:26.028242 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
14:59:26.028339 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
14:59:26.028841 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
14:59:26.028927 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
14:59:26.029459 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
14:59:26.029558 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
14:59:26.030123 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
14:59:26.030235 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
14:59:26.030790 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
14:59:26.030902 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
14:59:26.031457 close(0) = 0
14:59:26.031503 close(1) = 0
14:59:26.032941 openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 0
14:59:26.033098 fstat(0, {st_dev=makedev(8, 3), st_ino=992, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2995, st_atime=1621312819 /* 2021-05-18T14:10:19.531746410+0930 */, st_atime_nsec=531746410, st_mtime=1591291526 /* 2020-06-05T02:55:26+0930 */, st_mtime_nsec=0, st_ctime=1604896693 /* 2020-11-09T15:08:13.373844883+1030 */, st_ctime_nsec=373844883}) = 0
14:59:26.033168 read(0, "# Locale name alias data base.\n#"..., 4096) = 2995
14:59:26.033323 read(0, "", 4096) = 0
14:59:26.033358 close(0) = 0
14:59:26.033400 openat(AT_FDCWD, "/usr/share/locale/en_AU/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
14:59:26.033447 openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
14:59:26.033487 openat(AT_FDCWD, "/usr/share/locale-langpack/en_AU/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
14:59:26.033616 openat(AT_FDCWD, "/usr/share/locale-langpack/en/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
14:59:26.033678 write(2, "10+0 records in\n10+0 records out"..., 33) = 33
14:59:26.033747 write(2, "10485760 bytes (10 MB, 10 MiB) c"..., 58) = 58
14:59:26.033785 write(2, "\n", 1) = 1
14:59:26.033824 close(2) = 0
14:59:26.033863 exit_group(0) = ?
14:59:26.034041 +++ exited with 0 +++

Tags:

Mariusz (mariuszbrc) on 2021-05-18

affects:

ubuntu → linux-signed-hwe (Ubuntu)

Revision history for this message

Chris Guiver (guiverc) wrote on 2021-05-18:

Thank you for taking the time to report this bug and helping to make Ubuntu better. Please execute the following command only once, as it will automatically gather debugging information, in a terminal:

apport-collect 1928744

When reporting bugs in the future please use apport by using 'ubuntu-bug' and the name of the package affected. You can learn more about this functionality at https://wiki.ubuntu.com/ReportingBugs.

Note: you may have issues with this, as you've changed the package to the HWE kernel (kernel 5.4 for 18.04) but your details imply you're using 18.04's GA kernel of 4.15 (Ubuntu 16.04 using HWE uses the 4.15 kernel)

If you have issues; please change the package to 'linux' then try apport-collect again.

Revision history for this message

Mariusz (mariuszbrc) wrote on 2021-05-19:

apport.linux-image-generic.zrvzpzar.apport Edit (4.5 MiB, text/plain)

Thanks a lot for coming back to me. I couldn't run apport-collect as we don't have python-launchpadlib installed. I did check (apt install python3-launchpadlib) and in my case it would install quite a lot of dependent packages - we would like to avoid that due to the nature of our server.

But I did collect a bug report with "ubuntu-bug linux-image-generic" and attached to this case. Hope this will be helpful.

Revision history for this message

Chris Guiver (guiverc) wrote on 2021-05-19:

changed HWE to linux as I see no reference to a HWE (5.4) kernel

affects:	linux-signed-hwe (Ubuntu) → linux (Ubuntu)
tags:	added: bionic

Revision history for this message

Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote on 2021-05-19: Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1928744

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

Mariusz (mariuszbrc) wrote on 2021-05-19:

Unable to run apport-collect 1928744 command.

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

Revision history for this message

Mariusz (mariuszbrc) wrote on 2021-05-20:

kernel_issue.png Edit (195.2 KiB, image/png)

Hi,

I have done another test and compared the same server running on 4.15.0-136-generic and kernel 4.15.0-142-generic. Collected zabbix CPU utilization graph which is showing the problem from the performance point of view. On 4.15.0-136-generic, the server works normal and there are no processes stuck on 99% iowait. On 4.15.0-142-generic it is not working at all and rsync, pg_receivewal or dd processes are hanging with iowait 99%. I attached the zabbix graph to this bug.

Revision history for this message

Khaled El Mously (kmously) wrote on 2021-05-20:

Hi @Mariusz - thank you for the bug report. That certainly sounds like a kernel bug based on your description.

Doing a kernel bisect is probably the fastest approach to identifying the offending change. There are 392 changes between -136 and -142 so it could take up to 9 tests for a full bisect.

@Mariusz, is it possible to find out which (if any) of the intermediate kernels between -136 and -142 are affected ?

If you don't have any of the intermediate kernels installed already, you can install them with a command like this one:

sudo apt-get install linux-image-4.15.0-137-generic linux-headers-4.15.0-137-generic linux-modules-4.15.0-137-generic linux-modules-extra-4.15.0-137-generic

Revision history for this message

Mariusz (mariuszbrc) wrote on 2021-05-20:

Cool cool, I will try to do what you have suggested - that make sense. Will get back once I have some details.

Revision history for this message

Mariusz (mariuszbrc) wrote on 2021-05-24:

Download full text (5.9 KiB)

Hi,

In the last 3 days, I have been testing the other kernels. The steps which I took:

1. Boot to kernel-4.15.0-136.
2. Install kernel X, where X = 136,137,139,140:

sudo apt-get install linux-image-4.15.0-X-generic linux-headers-4.15.0-X-generic linux-modules-4.15.0-X-generic linux-modules-extra-4.15.0-X-generic

3. Boot to kernel X.

4. Manually run:
sudo depmod

5. Reboot.

4. Run the server's normal workload and see how it was performing.

I haven't seen disk performance issues with those kernels.
But there is one difference, which I don't know how to explain.

Before I raised this bug, we had the same disk performance issues on kernel-4.15.0-140.
I collected the list of kernel modules running on that kernel at that time:

Hi,

In the last 3 days, I have been testing the other kernels. The steps which I took:

1. Boot to kernel-4.15.0-136.
2. Install kernel X, where X = 136,137,139,140:

sudo apt-get install linux-image-4.15.0-X-generic linux-headers-4.15.0-X-generic linux-modules-4.15.0-X-generic linux-modules-extra-4.15.0-X-generic
	
3. Boot to kernel X.

4. Manually run:
	sudo depmod
	
5. Reboot.

4. Run the server's normal workload and see how it was performing.

I haven't seen disk performance issues with those kernels.
But there is one difference, which I don't know how to explain.

Before I raised this bug, we had the same disk performance issues on kernel-4.15.0-140. 
I collected the list of kernel modules running on that kernel at that time:

Module                  Size  Used by
joydev                 24576  0
input_leds             16384  0
hid_generic            16384  0
usbhid                 49152  0
hid                   122880  2 usbhid,hid_generic
ip6_tables             28672  0
iptable_filter         16384  0
ip_tables              28672  1 iptable_filter
x_tables               40960  3 iptable_filter,ip6_tables,ip_tables
tcp_diag               16384  0
udp_diag               16384  0
inet_diag              24576  2 tcp_diag,udp_diag
cpuid                  16384  0
btrfs                1155072  0
zstd_compress         163840  1 btrfs
xor                    24576  1 btrfs
raid6_pq              114688  1 btrfs
ufs                    77824  0
qnx4                   16384  0
minix                  32768  0
ntfs                  102400  0
msdos                  20480  0
jfs                   188416  0
xfs                  1204224  0
libcrc32c              16384  1 xfs
intel_rapl             20480  0
sb_edac                24576  0
x86_pkg_temp_thermal    16384  0
intel_powerclamp       16384  0
coretemp               16384  0
kvm_intel             217088  0
kvm                   614400  1 kvm_intel
ipmi_ssif              32768  0
irqbypass              16384  1 kvm
crct10dif_pclmul       16384  0
crc32_pclmul           16384  0
ghash_clmulni_intel    16384  0
pcbc                   16384  0
aesni_intel           188416  0
aes_x86_64             20480  1 aesni_intel
cdc_ether              16384  0
usbnet                 45056  1 cdc_ether
mii                    16384  1 usbnet
crypto_simd            16384  1 aesni_intel
glue_helper            16384  1 aesni_intel
cryptd                 24576  3 crypto_simd,ghash_clmulni_intel,aesni_intel
intel_cstate           20480  0
intel_rapl_perf        16384  0
ipmi_si                57344  0
ipmi_devintf           20480  0
ipmi_msghandler        53248  3 ipmi_devintf,ipmi_si,ipmi_ssif
shpchp                 36864  0
mei_me                 40960  0
mei                    94208  1 mei_me
lpc_ich                24576  0
acpi_power_meter       20480  0
acpi_pad              180224  0
mac_hid                16384  0
bonding               163840  0
lp                     20480  0
parport                49152  1 lp
autofs4                40960  2
mgag200                45056  1
i2c_algo_bit           16384  1 mgag200
ttm                   106496  1 mgag200
drm_kms_helper        172032  1 mgag200
syscopyarea            16384  1 drm_kms_helper
sysfillrect            16384  1 drm_kms_helper
tg3                   167936  0
sysimgblt              16384  1 drm_kms_helper
fb_sys_fops            16384  1 drm_kms_helper
ptp                    20480  1 tg3
drm                   401408  4 drm_kms_helper,mgag200,ttm
pps_core               20480  1 ptp
megaraid_sas          143360  12
mxm_wmi                16384  0
wmi                    24576  1 mxm_wmi

Now, after the testing we are running on kernel-4.15.0-140 (again) and this time I don't see the issue. 
But the list of kernel module is different:
lsmod
Module                  Size  Used by
intel_rapl             20480  0
sb_edac                24576  0
x86_pkg_temp_thermal    16384  0
intel_powerclamp       16384  0
coretemp               16384  0
kvm_intel             217088  0
ipmi_ssif              32768  0
kvm                   614400  1 kvm_intel
irqbypass              16384  1 kvm
crct10dif_pclmul       16384  0
crc32_pclmul           16384  0
ghash_clmulni_intel    16384  0
pcbc                   16384  0
aesni_intel           188416  0
aes_x86_64             20480  1 aesni_intel
crypto_simd            16384  1 aesni_intel
glue_helper            16384  1 aesni_intel
cryptd                 24576  3 crypto_simd,ghash_clmulni_intel,aesni_intel
intel_cstate           20480  0
cdc_ether              16384  0
usbnet                 45056  1 cdc_ether
input_leds             16384  0
joydev                 24576  0
mii                    16384  1 usbnet
intel_rapl_perf        16384  0
ipmi_si                57344  0
ipmi_devintf           20480  0
ipmi_msghandler        53248  3 ipmi_devintf,ipmi_si,ipmi_ssif
mei_me                 40960  0
mei                    94208  1 mei_me
lpc_ich                24576  0
shpchp                 36864  0
mac_hid                16384  0
acpi_pad              180224  0
acpi_power_meter       20480  0
bonding               163840  0
lp                     20480  0
parport                49152  1 lp
autofs4                40960  2
mgag200                45056  1
i2c_algo_bit           16384  1 mgag200
ttm                   106496  1 mgag200
drm_kms_helper        172032  1 mgag200
syscopyarea            16384  1 drm_kms_helper
sysfillrect            16384  1 drm_kms_helper
sysimgblt              16384  1 drm_kms_helper
fb_sys_fops            16384  1 drm_kms_helper
tg3                   167936  0
hid_generic            16384  0
ptp                    20480  1 tg3
usbhid                 49152  0
drm                   401408  4 drm_kms_helper,mgag200,ttm
pps_core               20480  1 ptp
hid                   122880  2 usbhid,hid_generic
megaraid_sas          143360  12
mxm_wmi                16384  0
wmi                    24576  1 mxm_wmi

Perhaps those additional kernel modules which were loaded in the first time were affecting disk IOs?

Revision history for this message

Mariusz (mariuszbrc) wrote on 2021-05-31:

#10

Hi,

I have done longer testing of kernel 140 and it has performance degradation issue as well. It took a couple of days to fully max out disk IOs on that kernel with the same load. Therefore, I came back to testing the other kernels in longer time-window.

At the moment, I'm running kernel 4.15.0-137-generic. Will leave it running for the next few days to see if there is any difference.

Revision history for this message

Khaled El Mously (kmously) wrote on 2021-06-08:

#11

Thanks for the extensive testing, Mariusz. (I assume -137 doesn't have the performance problem?)

So, for comment #9 - the difference in loaded modules comes down to 20 additional modules that were loaded when performance was bad:

-ip6_tables 28672 0
-iptable_filter 16384 0
-ip_tables 28672 1 iptable_filter
-x_tables 40960 3 iptable_filter,ip6_tables,ip_tables
-tcp_diag 16384 0
-udp_diag 16384 0
-inet_diag 24576 2 tcp_diag,udp_diag
-cpuid 16384 0
-btrfs 1155072 0
-zstd_compress 163840 1 btrfs
-xor 24576 1 btrfs
-raid6_pq 114688 1 btrfs
-ufs 77824 0
-qnx4 16384 0
-minix 32768 0
-ntfs 102400 0
-msdos 20480 0
-jfs 188416 0
-xfs 1204224 0
-libcrc32c 16384 1 xfs

Seems they are mostly filesystem modules and iptables modules. It is possible that this could affect performance - though unlikely to cause performance issues simply by them being loaded. If they are causing the performance hit it would probably mean that some thing else is running not just additional modules loaded in memory.

So, if I understand correctly, -140 seemed initially fine, but after some time, performance degrades. That makes things a bit more interesting. Do you see those additional modules loaded when performance is degraded? Note that all those additional modules don't seem to have any external dependencies other than themselves - they depend only on each other. So it should be easily possible to unload them and see if that "fixes" the performance issue.

By the way, kernels -143 and -144 have been released since this bug was created. Maybe it is worthwhile to also try those kernels?

Revision history for this message

Mariusz (mariuszbrc) wrote on 2021-06-08:

#12

Hi Khaled,

I have finished testing the kernels - it took more time than I expected but the results are:

4.15.0-136-generic: no issues
4.15.0-137-generic: disk performance issue
4.15.0-139-generic: disk performance issue
4.15.0-140-generic: disk performance issue
4.15.0-142-generic: disk performance issue

Each kernel was running for minimum 7 days. Those which have disk performance issue start to break after few days of normal workload.

Here is an example of "iotop -a", when our disk operations are throttled:

Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
10908 be/4 barman 0.00 B 144.00 K 0.00 % 99.99 % pg_receivewal --dbname=dbname=replication host=XXXXXX replication=~o-password --directory=/data/barman/XXXXX/streaming --slot=barman
9494 be/4 barman 0.00 B 144.00 K 0.00 % 99.99 % rsync --server -logDtpre.iLsfxC . /data/barman/XXXXXX/incoming/0000000100003183000000ED
10907 be/4 barman 0.00 B 144.00 K 0.00 % 99.99 % pg_receivewal --dbname=dbname=replication host=XXXXXX replication~-password --directory=/data/barman/XXXXXXX/streaming --slot=barman
7148 be/4 barman 0.00 B 144.00 K 0.00 % 99.99 % rsync --server -logDtpre.iLsfxC . /data/barman/XXXXXX/incoming/000000010000D2E900000043
11144 be/4 barman 0.00 B 144.00 K 0.00 % 99.99 % pg_receivewal --dbname=dbname=replication host=XXXXX replication=~o-password --directory=/data/barman/XXXXX/streaming --slot=barman
17724 be/3 root 0.00 B 8.00 K 0.00 % 99.99 % auditd
11217 be/4 root 0.00 B 8.00 K 0.00 % 99.99 % syslog-ng -F
10003 be/4 barman 0.00 B 96.00 K 0.00 % 99.99 % gpg -e -r BarmanBackup
10219 be/4 barman 0.00 B 96.00 K 0.00 % 99.99 % gpg -e -r BarmanBackup
9247 be/4 barman 0.00 B 140.00 K 0.00 % 98.62 % rsync --server -logDtpre.iLsfxC . /data/barman/mel-attn-hostdb/incoming/0000000100004CE000000083
11391 be/4 barman 0.00 B 140.00 K 0.00 % 98.60 % pg_receivewal --dbname=dbname=replication host=XXXXXXX replication=~o-password --directory=/data/barmanXXXXXXX/streaming --slot=barman
9556 be/4 barman 0.00 B 92.00 K 0.00 % 98.53 % gpg -e -r BarmanBackup
4381 be/4 root 0.00 B 8.00 K 0.00 % 57.53 % sadc -F -L -S DISK 60 /var/log/sysstat
10959 be/4 root 0.00 B 4.00 K 0.00 % 26.29 % sadc -F -L -S DISK 60 /var/log/sysstat
15626 be/4 root 0.00 B 4.00 K 0.00 % 8.22 % sadc -F -L -S DISK 60 /var/log/sysstat

So, I think it is ok to say that the problem was introduced with 4.15.0-137-generic.

We will try to see the latest kernel in our next patching window. However, based on the fact that kernels 137,139,140,142 are not working correctly, could this be related to some tuning changes? In our case, we haven't done any changes to e.g. sysctl after patching.

Many thanks for your help so far!

Hi Khaled,

I have finished testing the kernels - it took more time than I expected but the results are:

Each kernel was running for minimum 7 days. Those which have disk performance issue start to break after few days of normal workload.

Here is an example of "iotop -a", when our disk operations are throttled:

Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                             
10908 be/4 barman        0.00 B    144.00 K  0.00 % 99.99 % pg_receivewal --dbname=dbname=replication host=XXXXXX replication=~o-password --directory=/data/barman/XXXXX/streaming --slot=barman
 9494 be/4 barman        0.00 B    144.00 K  0.00 % 99.99 % rsync --server -logDtpre.iLsfxC . /data/barman/XXXXXX/incoming/0000000100003183000000ED
10907 be/4 barman        0.00 B    144.00 K  0.00 % 99.99 % pg_receivewal --dbname=dbname=replication host=XXXXXX replication~-password --directory=/data/barman/XXXXXXX/streaming --slot=barman
 7148 be/4 barman        0.00 B    144.00 K  0.00 % 99.99 % rsync --server -logDtpre.iLsfxC . /data/barman/XXXXXX/incoming/000000010000D2E900000043
11144 be/4 barman        0.00 B    144.00 K  0.00 % 99.99 % pg_receivewal --dbname=dbname=replication host=XXXXX replication=~o-password --directory=/data/barman/XXXXX/streaming --slot=barman
17724 be/3 root          0.00 B      8.00 K  0.00 % 99.99 % auditd
11217 be/4 root          0.00 B      8.00 K  0.00 % 99.99 % syslog-ng -F
10003 be/4 barman        0.00 B     96.00 K  0.00 % 99.99 % gpg -e -r BarmanBackup
10219 be/4 barman        0.00 B     96.00 K  0.00 % 99.99 % gpg -e -r BarmanBackup
 9247 be/4 barman        0.00 B    140.00 K  0.00 % 98.62 % rsync --server -logDtpre.iLsfxC . /data/barman/mel-attn-hostdb/incoming/0000000100004CE000000083
11391 be/4 barman        0.00 B    140.00 K  0.00 % 98.60 % pg_receivewal --dbname=dbname=replication host=XXXXXXX replication=~o-password --directory=/data/barmanXXXXXXX/streaming --slot=barman
 9556 be/4 barman        0.00 B     92.00 K  0.00 % 98.53 % gpg -e -r BarmanBackup
 4381 be/4 root          0.00 B      8.00 K  0.00 % 57.53 % sadc -F -L -S DISK 60 /var/log/sysstat
10959 be/4 root          0.00 B      4.00 K  0.00 % 26.29 % sadc -F -L -S DISK 60 /var/log/sysstat
15626 be/4 root          0.00 B      4.00 K  0.00 %  8.22 % sadc -F -L -S DISK 60 /var/log/sysstat

So, I think it is ok to say that the problem was introduced with 4.15.0-137-generic.

Many thanks for your help so far!

Revision history for this message

Launchpad Janitor (janitor) wrote on 2021-06-18:

#13

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu Bionic):
status:	New → Confirmed

Revision history for this message

Leon van Heerden (leonvanheerden) wrote on 2021-06-18:

#14

Hi,
We think we are affected by the same bug.
We were on version 4.15.0-139 and upgraded to 144 before finding this post.
Is there any confirmation that this issue also occurs in 4.15.0-144?

I have have installed version 4.15.0-136 in the meantime to allow us to reboot and select that kernel, incase the issue is present.

Hope to see a fix for this issue soon.

Revision history for this message

Mariusz (mariuszbrc) wrote on 2021-06-21:

#15

Hey,

We experienced the performance issue on different hardware servers:

LENOVO System x3650 M5
Lenovo ThinkSystem SR650

with the following kernels:

4.15.0-137-generic: disk performance issue
4.15.0-139-generic: disk performance issue
4.15.0-140-generic: disk performance issue
4.15.0-142-generic: disk performance issue

I haven't seen this issue on virtual machines (VMware VMs Ubuntu 18/20).
At the moment, I'm testing "4.15.0-144-generic". To early to say it is free of the bug - I will come back in a week.

Thanks
Mariusz

Revision history for this message

Peter Jönsson (peter-joensson) wrote on 2021-06-23:

#16

Hi!

We are running some large (100+) kubernetes clusters on bare metal machines also running a LSI MegaRAID controller and we are experiencing these exact problems since about a week (cluster being in production since early june).

We run with this controller:

# lspci | grep -i mega
59:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID Tri-Mode SAS3516 (rev 01)

After being in operation and taking production traffic (we are very IO intense) for less month both clusters started behaving weirdly with timeouts between processes on different nodes. After much debugging we found that basically IO performance had ground to a halt. Here is a quick summary of what we found together with our external partner (developer of our software defined storage solution):

We analyzed a small subset of nodes encountering the issue. Based on our initial diagnosis, here are the findings:

- The CPU and IO utilization were low but all systems reported high load averages.
- All disks looked fine - root and data disks. The IO load was very small with no visible spikes in latencies. None of the devices were unresponsive/stuck.
- All user space processes looked fine, none were blocked.
- The systems were slow in general. i.e. Installing packages and writing even to the root disk was very slow.
- A kernel thread dump showed a lot of them in D state, all stuck on writing to the page cache.
- We used dd to write a small 1M file on the host (NOT on px device), requiring no disk io and it showed the same symptoms.
- System dirty threshold settings do not show anything out of the ordinary.
- All the affected nodes another common symptom is that the page cache drain appeared to be slow/stuck. Amount of pages in the writeback list is high but no writeback appeared to be happening.

These clusters all run a 18.04 LTS but have different kernels depending on their patch level:

Cluster A:

228 4.15.0-140-generic
74 4.15.0-142-generic
7 4.15.0-143-generic

Cluster B:

418 4.15.0-136-generic
  22 4.15.0-144-generic
  12 4.15.0-143-generic
   9 4.15.0-142-generic

Cluster A is the one with the worst performance. Cluster B have had similar problems but since the majority of the nodes are on 136 it has not been so badly hit.

Our only remedy for now have been to downgrade to 136 and/or rebooting the machines. Rebooting the machines but staying on the problematic kernel works for now, but most likely we will see the same behavior in a week or so of taking production traffic.

Hi!

We run with this controller:

# lspci | grep -i mega
59:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID Tri-Mode SAS3516 (rev 01)

We analyzed a small subset of nodes encountering the issue. Based on our initial diagnosis, here are the findings:

These clusters all run a 18.04 LTS but have different kernels depending on their patch level:

Cluster A:

228 4.15.0-140-generic
  74 4.15.0-142-generic
   7 4.15.0-143-generic

Cluster B:

418 4.15.0-136-generic
  22 4.15.0-144-generic
  12 4.15.0-143-generic
   9 4.15.0-142-generic

Cluster A is the one with the worst performance. Cluster B have had similar problems but since the majority of the nodes are on 136 it has not been so badly hit.

Revision history for this message

Paul Krizak (paul-krizak) wrote on 2021-06-29:

#17

We were directed to this bug by our Portworx support team, and believe we're seeing the same thing.

We are running Ubuntu 16.04 with the hardware enablement kernel. Most of our nodes are running 4.15.0-140, and like Peter notes above, the I/O performance problem does not show up right away, but takes a few days to appear. When it gets bad, we see exactly the same behavior when looking at `iotop`, where any processes doing more than trivial I/O show as 99.9% IOWAIT time. The system performs sluggishly (notably the container runtime) and a reboot "fixes" the problem.

Our strategy at the moment is roll back to the -136 kernel on all of our bare metal Kubernetes nodes, and hopefully get upgraded to Ubuntu 20.04 by the end of the year.

If any further testing is requested please let us know and we can try things out on our dev systems.

Revision history for this message

John McDonnell (jmcdon01) wrote on 2021-07-05:

#18

I note it is possible this is a duplicate of
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081

Revision history for this message

Mariusz (mariuszbrc) wrote on 2021-07-06:

#19

John - thanks for finding ==> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081
There is a great testing work done there.

For this thread I have been running kernel 4.15.0-144-generic for the last 3 weeks and haven't seen the issue.

Revision history for this message

Leon van Heerden (leonvanheerden) wrote on 2021-07-08:

#20

I have also been running 4.15.0-144-generic for three weeks so far without the issue occurring.

Revision history for this message

Zahid Bukhari (cimmerian) wrote on 2021-07-21:

#21

This bug is also affecting us. We're on Xenial but use this kernel for HWE. Basically for our monitoring, the leader to all this shows as dirty memory and or filesystem writeback.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntulinux package

Disk IO very slow on kernel 4.15.0-142-generic

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package