Slow disk writes after some uptime, only on 32bit/16+RAM/4+ kernels

Bug #1698118 reported by Alkis Georgopoulos
This bug report is a duplicate of:  Bug #1333294: 32-bit kernel HDD slow write speed. Edit Remove
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Debian)
Unknown
Unknown
linux-hwe (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

This happens on xenial with 4.4 and 4.8 kernels.
It does not happen on precise with 3.2, nor on trusty with 3.19.

The problem is that disk writes start with 200 MB/sec, but after some disk usage, e.g. after 20-50 GB of writes, they become extremely slow, like 2 MB/sec, and never get fast again.
The difference is really 100 times slower, it's not related to RAM caching, it makes the system unusable permanately after it appears.

Test case [edit: see comment #7 below for my updated test case]:
# This copies with 200 MB/sec:
dd if=/dev/zero of=/mnt/test bs=1M count=1000 conv=fdatasync
# This just does some disk writes, because the problem happens gradually
cp -a /mnt/a-20gb-folder /mnt/dest
# Now testing again, it writes with 2 MB/sec:
dd if=/dev/zero of=/mnt/test bs=1M count=1000 conv=fdatasync

After those 3 commands, the system is unusable even if left for hours.
I've only seen it in 2 out of 100 installations so far, so it appears to be rare...

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.8.0-54-generic 4.8.0-54.57~16.04.1
ProcVersionSignature: Ubuntu 4.8.0-54.57~16.04.1-generic 4.8.17
Uname: Linux 4.8.0-54-generic i686
ApportVersion: 2.20.1-0ubuntu2.6
Architecture: i386
CurrentDesktop: MATE
Date: Thu Jun 15 13:28:43 2017
InstallationDate: Installed on 2017-06-07 (7 days ago)
InstallationMedia: Ubuntu-MATE 16.04.2 LTS "Xenial Xerus" - Release i386 (20170215)
SourcePackage: linux-hwe
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Alkis Georgopoulos (alkisg) wrote :
Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

After long hours of testing, it appears that the problem goes away with
mem=12G
or lower in the kernel command line, and appears with
mem=13G
or any other value up to 16G that the system has.

Tomorrow I'll test sysctl vm.dirty_background_ratio=5;
feel free to suggest more tests.

Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

vm.dirty_background_ratio=5 didn't help. But echo 3 > /proc/sys/vm/drop_caches "clears" the issue for a few minutes, until the cache is full again.

So far I've seen the issue in 3 installations; all of them were 32bit with 16+ GB RAM; maybe that is the key point here. And none of them had issues in older, 4.2- kernels.

Here is my new test case, which should reproduce the issue without some special setup.
It basically copies /lib around:

# echo 3 > /proc/sys/vm/drop_caches
# s=/lib; d=1; rm -rf "$d"; echo -n "Copying $s to $d: "; while /usr/bin/time -f %e cp -a "$s" "$d"; do s=$d; d=$((($d+1)%10)); rm -rf "$d"; echo -n "Copying $s to $d: "; done
Copying /lib to 1: 8.61
Copying 1 to 2: 9.59
Copying 2 to 3: 10.01
Copying 3 to 4: 10.09
Copying 4 to 5: 10.39
Copying 5 to 6: 10.31
Copying 6 to 7: 11.79
Copying 7 to 8: 12.70
Copying 8 to 9: 14.26
Copying 9 to 0: 14.95
Copying 0 to 1: 21.04
Copying 1 to 2: 57.64
Copying 2 to 3: 383.68
Copying 3 to 4: 420.04
Copying 4 to 5: 377.91
Copying 5 to 6: 434.98
...

Revision history for this message
v4169sgr (andrew-d-scott-uk) wrote :

This bug affects me too.

$ uname -a
Linux aammscott 4.8.0-54-generic #57~16.04.1-Ubuntu SMP Wed May 24 16:20:44 UTC 2017 i686 i686 i686 GNU/Linux

-Version-
Kernel : Linux 4.8.0-54-generic (i686)
Distribution : Ubuntu 16.04.2 LTS

-Board-
Name : P8H77-V LE
Vendor : ASUSTeK COMPUTER INC. (SEAGATE, www.seagate.com)

-Computer-
Processor : 4x Intel(R) Core(TM) i5-3450 CPU @ 3.10GHz
Memory : 16552MB (1441MB used)

-SCSI Disks-
ATA Crucial_CT2050MX
Optiarc DVD RW AD-7280S

Originally reported

https://ubuntuforums.org/showthread.php?t=2363087&highlight=

I used the same tests as above, with the following modifications:
- Not run as root (NEVER RUN rm -rf AS ROOT!)
- cwd set to ~/tmp (mkdir /tmp if you need to)

My results on my 16 GB RAM 32 bit install of 16.04.02 (updated) are:

Copying /lib to 1: 8.95
Copying 1 to 2: 18.01
Copying 2 to 3: 10.79
Copying 3 to 4: 10.23
Copying 4 to 5: 8.94
Copying 5 to 6: 8.70
Copying 6 to 7: 8.50
Copying 7 to 8: 8.00
Copying 8 to 9: 9.25
Copying 9 to 0: 9.55
Copying 0 to 1: 9.69
Copying 1 to 2: 12.30
Copying 2 to 3: 16.05
Copying 3 to 4: 36.38
Copying 4 to 5: 212.44
Copying 5 to 6: 323.97

To me that is a clear reproduction.

I then booted into the live cd of 64 bit 16.04.02, and mounted all my file systems, just as though I was going to reinstall grub, and also mounted my separate /home partition, all under /mnt. I chrooted into /mnt, ran sudo -su <username> to limit myself to normal user privileges, then ran the same test. Results are:

Copying /lib to 1: 8.07
Copying 1 to 2: 1.00
Copying 2 to 3: 1.63
Copying 3 to 4: 1.63
Copying 4 to 5: 1.89
Copying 5 to 6: 2.77
Copying 6 to 7: 2.08
Copying 7 to 8: 1.81
Copying 8 to 9: 1.79
Copying 9 to 0: 1.95
Copying 0 to 1: 1.68
Copying 1 to 2: 1.45
Copying 2 to 3: 1.55
Copying 3 to 4: 1.75
Copying 4 to 5: 1.38
Copying 5 to 6: 1.55
Copying 6 to 7: 1.60
Copying 7 to 8: 1.78
Copying 8 to 9: 1.24
Copying 9 to 0: 1.44
Copying 0 to 1: 1.58
Copying 1 to 2: 1.76
Copying 2 to 3: 1.49
Copying 3 to 4: 1.47
Copying 4 to 5: 1.50
Copying 5 to 6: 1.55
Copying 6 to 7: 1.25
Copying 7 to 8: 1.51
Copying 8 to 9: 1.54
Copying 9 to 0: 1.84
Copying 0 to 1: 2.20
Copying 1 to 2: 2.54
Copying 2 to 3: 2.74
Copying 3 to 4: 2.78
Copying 4 to 5: 2.78
Copying 5 to 6: 2.52
Copying 6 to 7: 2.47
Copying 7 to 8: 2.77
Copying 8 to 9: 1.90
Copying 9 to 0: 2.68
Copying 0 to 1: 2.44
Copying 1 to 2: 2.66
Copying 2 to 3: 2.38
Copying 3 to 4: 2.70
Copying 4 to 5: 2.68
Copying 5 to 6: 2.67
Copying 6 to 7: 2.21
Copying 7 to 8: 2.55
Copying 8 to 9: 2.73
Copying 9 to 0: 2.70
Copying 0 to 1: 2.17
Copying 1 to 2: 2.49
Copying 2 to 3: 2.66
Copying 3 to 4: 2.83
Copying 4 to 5: 2.43
Copying 5 to 6: 2.63
Copying 6 to 7: 2.47
Copying 7 to 8: 2.81
Copying 8 to 9: 2.58

Conclusions:

- Bug only affects 32 bit installs with 16 GB RAM (or more?)
- IO performance under 64 bit on my hardware is 4-5 times better in any case.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-hwe (Ubuntu):
status: New → Confirmed
Revision history for this message
Kostas Gidarakos (kostasgidarakos) wrote :

This bug affects me too.
With 16 GB RAM ltsp-update-image needs 40 minutes and with 8 GB it needs only 3 minutes.

Results with 16GB RAM:
real 40m58.672s
user 14m15.420s
sys 0m20.212s

Results with 8GB RAM:
3:12.53 elapsed
667.48 user
12.29 system

summary: - Slow disk writes after some uptime, only on certain hw and 4.4+ kernels
+ Slow disk writes after some uptime, only on 32bit/16+RAM/4+ kernels
Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

I updated my test case to include the "sync" call inside the "time", because otherwise recent 64bit installations report wrong results. @v4169sgr, you might want to test again with 64bit using the updated commands:
1) . /etc/os-release; echo -n "$VERSION, $(uname -r), $(dpkg --print-architecture), RAM="; awk '/MemTotal:/ { print $2 }' /proc/meminfo
2) mount /dev/sdb2 /mnt && rm -rf /mnt/tmp/lib && mkdir -p /mnt/tmp/lib && sync && echo 3 > /proc/sys/vm/drop_caches && chroot /mnt
3) mkdir -p /tmp/lib; cd /tmp/lib; s=/lib; d=1; echo -n "Copying $s to $d: "; while /usr/bin/time -f %e sh -c "cp -a '$s' '$d'; sync"; do s=$d; d=$((($d+1)%100)); echo -n "Copying $s to $d: "; done

I managed to find 16 GB RAM and test locally. All 3.x kernels are unaffected, and all 32 bit 4.x kernels have issues.

14.04, Trusty Tahr, 3.13.0-24-generic, i386, RAM=16076400 [Live CD]
8-13 secs

15.04 (Vivid Vervet), 3.19.0-15-generic, i386, RAM=16083080 [Live CD]
5-7 secs

15.10 (Wily Werewolf), 4.2.0-16-generic, i386, RAM=16082536 [Live CD]
4-350 secs

16.04.2 LTS (Xenial Xerus), 3.19.0-80-generic, i386, RAM=16294832 [HD install]
10-25 secs

16.04.2 LTS (Xenial Xerus), 4.2.0-42-generic, i386, RAM=16294392 [HD install]
14-89 secs

16.04.2 LTS (Xenial Xerus), 4.4.0-79-generic, i386, RAM=16293556 [HD install]
15-605 secs

16.04.2 LTS (Xenial Xerus), 4.8.0-54-generic, i386, RAM=16292708 [HD install]
6-160 secs

16.04.2 LTS (Xenial Xerus), 4.8.0-36-generic, amd64, RAM=16131028 [Live CD]
4-11 secs

description: updated
Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

And these are the results of the latest 4.12 mainline kernel. It keeps getting worse, `cp -a /lib /elsewhere` should need 5 seconds and it takes 800+.

16.04.2 LTS (Xenial Xerus), 4.12.0-041200rc5-generic, i386, RAM=16292588 [HD install]
Copying /lib to 1: 65.18
Copying 1 to 2: 46.17
Copying 2 to 3: 96.98
Copying 3 to 4: 842.58
Copying 4 to 5: 718.65
Copying 5 to 6: 807.03
Copying 6 to 7: ...

Revision history for this message
v4169sgr (andrew-d-scott-uk) wrote :

Results with modified method including sync on 32bit. Will repeat with 64bit live CD tomorrow evening.

16.04.2 LTS (Xenial Xerus), 4.8.0-54-generic, i386, RAM=16552604

Copying 1 to 2: 8.97
Copying 2 to 3: 7.36
Copying 3 to 4: 7.08
Copying 4 to 5: 6.51
Copying 5 to 6: 6.92
Copying 6 to 7: 7.33
Copying 7 to 8: 7.86
Copying 8 to 9: 8.00
Copying 9 to 10: 11.07
Copying 10 to 11: 10.39
Copying 11 to 12: 16.46
Copying 12 to 13: 56.50
Copying 13 to 14: 233.98

Revision history for this message
Dimitris (dkirgr) wrote :

Yes it does affect me to

When we are trying to run ltsp-update-image, it needs almost 40 minutes running with 16 GB RAM, and with 8 GB it needs only 3 minutes.

results with 16GB RAM:
real 38m32.201s
user 14m16.988s
sys 0m24.548s

results with 8GB RAM:
real 3m11.648s
user 11m5.340s
sys 0m10.580s

Revision history for this message
v4169sgr (andrew-d-scott-uk) wrote :

As promised here are the results from the amd64 live DVD (the architecture identifies as 'i386' but this is the default desktop install media identifying as 'amd64')

16.04.2 LTS (Xenial Xerus), 4.8.0-36-generic, i386, RAM=16387048

Copying /lib to 1: 14.20
Copying 1 to 2: 4.24
Copying 2 to 3: 4.28
Copying 3 to 4: 4.15
Copying 4 to 5: 4.35
Copying 5 to 6: 4.24
Copying 6 to 7: 4.31
Copying 7 to 8: 4.25
Copying 8 to 9: 4.12
Copying 9 to 10: 4.16
Copying 10 to 11: 3.92
Copying 11 to 12: 4.38
Copying 12 to 13: 4.17
Copying 13 to 14: 4.35
Copying 14 to 15: 4.26
Copying 15 to 16: 4.57
Copying 16 to 17: 5.64
Copying 17 to 18: 5.78
Copying 18 to 19: 5.87
Copying 19 to 20: 5.02
Copying 20 to 21: 4.30
Copying 21 to 22: 4.11

Again showing no sign of the issue, and even without the issue up to twice as fast in 64 bit compared to 32 bit.

Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

I reported the bug upstream in the Linux kernel:
https://bugzilla.kernel.org/show_bug.cgi?id=196157

The suggested a workaround of "echo 1 > /proc/sys/vm/highmem_is_dirtyable", which eliminates the issue, although "it can lead to a premature OOM killer invocations".

Revision history for this message
v4169sgr (andrew-d-scott-uk) wrote :

Thanks. Running

sync && echo "1" | sudo tee /proc/sys/vm/highmem_is_dirtyable

on my system improves the situation for intensive read / write, but does not solve the problem. Here are my test results after running the above commands.

I should be seeing 4s and 5s all the way down, but instead I see inconsistent behaviour and lagginess in apps while running the test.

Solution does seem to be to run a 64 bit kernel.

Copying /lib to 1: 14.47
Copying 1 to 2: 4.17
Copying 2 to 3: 4.15
Copying 3 to 4: 4.15
Copying 4 to 5: 4.40
Copying 5 to 6: 6.28
Copying 6 to 7: 13.83
Copying 7 to 8: 12.96
Copying 8 to 9: 12.81
Copying 9 to 10: 10.34
Copying 10 to 11: 10.22
Copying 11 to 12: 5.42
Copying 12 to 13: 4.76
Copying 13 to 14: 4.51
Copying 14 to 15: 8.95
Copying 15 to 16: 9.11
Copying 16 to 17: 8.03

Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

Hi Andrew, noone is watching this bug report, please comment in the upstream bug if you want the kernel developers to listen to your feedback.

https://bugzilla.kernel.org/show_bug.cgi?id=196157

Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

I marked this as a duplicate of bug #1333294.
Especially the link mentioned there, http://flaterco.com/kb/PAE_slowdown.html, has a very nice overview of the problem AND of the possible workarounds.

I wonder if it would be possible to enable the VMSPLIT_2G setting on Ubuntu...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.