Frequent swapping causes system to hang

Bug #595047 reported by Pete Goodall on 2010-06-16
32
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
High
linux (Ubuntu)
Medium
Unassigned

Bug Description

Periodically I notice my system slows to a near stand still, and the hard drive light is constantly going. This seems to be a massive amount of disk i/o and it lasts for a long time (lets say 30 mins to put a number on it). I installed and ran iotop (`iotop -a`) and it seems to point to jbd2. From what I can see jbd2 is related to ext4 journaling, but I cannot figure out how to kill this operation. It might even be a red herring because I have also stopped the disk activity by kill either chromium or firefox. I need to understand what else I can do to troubleshoot this.

$ lsb_release -rd
Description: Ubuntu maverick (development branch)
Release: 10.10

Up-to-date as of 16th June 2010.
---
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.23.
AplayDevices:
 **** List of PLAYBACK Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC272X Analog [ALC272X Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
Architecture: i386
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC272X Analog [ALC272X Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: pgoodall 1372 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0x56440000 irq 44'
   Mixer name : 'Realtek ALC272X'
   Components : 'HDA:10ec0272,1025022c,00100001'
   Controls : 14
   Simple ctrls : 8
DistroRelease: Ubuntu 10.10
Frequency: Once a day.
HibernationDevice: RESUME=UUID=145f27a9-859a-4987-8132-ac878c832747
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Alpha i386 (20100602.2)
MachineType: Acer AO531h
Package: linux (not installed)
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.35-6-generic root=UUID=11f96f8b-5e04-4e20-a201-0fa5d0fc07fa ro quiet splash
ProcEnviron:
 PATH=(custom, user)
 LANG=en_GB.utf8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.35-6.9-generic 2.6.35-rc3
Regression: Yes
RelatedPackageVersions: linux-firmware 1.37
Reproducible: No
Tags: maverick ubuntu-une kconfig regression-potential needs-upstream-testing
Uname: Linux 2.6.35-6-generic i686
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare
dmi.bios.date: 12/22/2009
dmi.bios.vendor: Acer
dmi.bios.version: v0.3304
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.vendor: Acer
dmi.board.version: Base Board Version
dmi.chassis.type: 1
dmi.chassis.vendor: Chassis Manufacturer
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnAcer:bvrv0.3304:bd12/22/2009:svnAcer:pnAO531h:pvr1:rvnAcer:rn:rvrBaseBoardVersion:cvnChassisManufacturer:ct1:cvrChassisVersion:
dmi.product.name: AO531h
dmi.product.version: 1
dmi.sys.vendor: Acer

This is an attempt at bringing sanity to bug #7372. Please only comment here is you are experiencing high I/O wait times and interactvity on reasonable workloads.

Latest working kernel version: 2.6.18?

Problem Description:
I/O operations on large files tend to produce extremely high iowait times and poor system I/O performance (degraded interactivity). This behavior can be seen to varying degrees in tasks such as,
 - Backing up /home (40GB with numerous large files) with diffbackup to external USB hard drive
 - Moving messages between large maildirs
 - updatedb
 - Upgrading large numbers of packages with rpm

Steps to reproduce:
The best synthetic reproduction case I have found is,
$ dd if=/dev/zero of=/tmp/test bs=1M count=1M
During this copy, IO wait times are very high (70-80%) with extremely degraded interactivity although throughput averages about 29MB/s (about the disk's capacity I think). Even starting a new shell takes minutes, especially after letting the machine copy for a while without being actively used. Could this mean it's a caching issue?

For the record, this is even reproducible with Linus's master.

I'm also having this problem.

Latest working kernel version: 2.6.18.8 with config:
http://svn.pardus.org.tr/pardus/2007/kernel/kernel/files/pardus-kernel-config.patch

Currently working on 2.6.25.20 with config:
http://svn.pardus.org.tr/pardus/2008/kernel/kernel/files/pardus-kernel-config.patch

Tested also with 2.6.28 and felt no significant performance improvement.

--

During heavy disk IO's like running 'svn up' hogs the system avoiding the start a new shell, browse on the internet, do some text editing using vim, etc.

For example, after being able to open a text buffer with vim, 4-5 seconds delays happens between consecutive search attempts.

Download full text (3.1 KiB)

Hello Ben,

I don't known where to post it exactly. Why Linux Memory Management? Or why -mm and not mainstream? Can you do it for me please?

I have added a second test case, which using threads with pthread_mutex and pthread_cond instead of processes with pipes for communicating, to ensure it is a cpu scheduler issue.

I have repeated the tests with some vanilla kernels again, as there is a remark in the bug report for tainted or distro kernels. As I got a segmentation fault with the 2.6.28 kernel, I added the result of the Ubuntu 9.04 kernel (see attachment). The results are not comparable to the results posted before, as I have changed the time handling (doubles instead of int32_t as some echo messages takes more than one second).
The first three results are 2*100, 2*50 and 2*20 processes exchanging 100k, 200k and 1M messages over a pipe. The last three results are 2*100, 2*50, and 2*20 threads exchanging 100k, 200k and 1M messages with pthread_mutex and pthread_cond. I have added a 10 second pause at the beginning of every thread/process to assure the 2*100 processes or threads are all created and start to exchange the messages nearby at the same time. This was not the case at the old test-case with 2*100 processes, as the first thread was already destroyed before the last was created.

With the second test-case with threads, I got the problems (threads:2*100/msg:1M) immediately with the kernel 2.6.22.19. There kernel 2.6.20.21 was fine with both test-cases.

The meaning of the results:
- min message time
- average message time (80% of the messages)
- message time at median
- maximal message time
- test duration

Here the result.
Linux balrog704 2.6.20.21 #1 SMP Wed Jan 14 10:11:34 CET 2009 x86_64 GNU/Linux
min:0.000ms|avg:0.241-0.249ms|mid:0.244ms|max:18.367ms|duration:25.304s
min:0.002ms|avg:0.088-0.094ms|mid:0.093ms|max:17.845ms|duration:19.694s
min:0.002ms|avg:0.030-0.038ms|mid:0.038ms|max:564.062ms|duration:38.370s
min:0.002ms|avg:0.004-0.007ms|mid:0.004ms|max:1212.746ms|duration:33.137s
min:0.002ms|avg:0.004-0.005ms|mid:0.004ms|max:1092.045ms|duration:31.686s
min:0.002ms|avg:0.004-0.007ms|mid:0.004ms|max:4532.159ms|duration:59.773s

Linux balrog704 2.6.22.19 #1 SMP Wed Jan 14 10:16:43 CET 2009 x86_64 GNU/Linux
min:0.003ms|avg:0.394-0.413ms|mid:0.403ms|max:19.673ms|duration:42.422s
min:0.003ms|avg:0.083-0.188ms|mid:0.182ms|max:13.405ms|duration:37.038s
min:0.003ms|avg:0.056-0.075ms|mid:0.070ms|max:656.112ms|duration:72.943s
min:0.003ms|avg:0.005-0.010ms|mid:0.007ms|max:1756.113ms|duration:49.163s
min:0.003ms|avg:0.005-0.010ms|mid:0.007ms|max:11560.976ms|duration:52.836s
min:0.003ms|avg:0.008-0.010ms|mid:0.010ms|max:5316.424ms|duration:111.323s

Linux balrog704 2.6.24.7 #1 SMP Wed Jan 14 10:21:04 CET 2009 x86_64 GNU/Linux
min:0.003ms|avg:0.223-0.450ms|mid:0.428ms|max:8.494ms|duration:46.123s
min:0.003ms|avg:0.140-0.209ms|mid:0.200ms|max:12.514ms|duration:39.100s
min:0.003ms|avg:0.068-0.084ms|mid:0.076ms|max:38.778ms|duration:78.157s
min:0.003ms|avg:0.454-0.784ms|mid:0.625ms|max:11.063ms|duration:65.619s
min:0.004ms|avg:0.244-0.399ms|mid:0.319ms|max:21.018ms|duration:64.741s
min:0.003ms|avg:0.061-0.138ms|mid:0.111ms|max:23.861ms|durati...

Read more...

Created attachment 19795
test case with processes and pipes

Created attachment 19796
test case with threads and mutexes

38 comments hidden view all 696 comments
Pete Goodall (pgoodall) wrote :

I think this is related to swap, actually. I have decread swappiness to 10, and will see if I encounter this again. Not sure what the next steps are if that works.

Pete Goodall (pgoodall) wrote :

Apparently reducing the swappiness didn't help. My system is currently hopelessly locked (and I'm writing this on another device on my desk). I'm ssh'ed into my netbook and running `iotop -a`. Here are the top two items:

    1 be/4 root 10.34 M 0.00 B ?unavailable? init
   33 be/4 root 16.00 K 340.00 K ?unavailable? [kswapd0]

The init read numbers keep going up (currently 10.86 MB) and the kswapd write numbers are going up as well.

Pete Goodall (pgoodall) wrote :

To add to the last comment, kswapd has just leaped up to over 7 MB written and rising fast! this is in a span of a couple of minutes.

affects: ubuntu → linux (Ubuntu)
Pete Goodall (pgoodall) wrote :

Ok, from what I can tell the system is swapping way too easily. If I'm running Chromium + XChat I'm fine. As soon as I open OpenOffice or Evolution or Rhythmbox that seems to push the memory over the edge and the system starts swapping. If I can manage to get to a terminal in time and kill the last application I started my system will return to normal. If I don't, I might as well just hard power off. I have set the swappiness to 0 in /etc/sysctl.conf.

Afaict there is no one program that is causing the system to swap. It just seems that Ubuntu is over sensitive. There was no problem w/ this workload with Lucid, so I don't think I'm overstressing the system.

Jeremy Foshee (jeremyfoshee) wrote :

Hi Pete,

Please be sure to confirm this issue exists with the latest development release of Ubuntu. ISO CD images are available from http://cdimage.ubuntu.com/daily/current/ . If the issue remains, please run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 595047

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Pete Goodall (pgoodall) wrote :

Per some advice I received on #ubuntu-kernel I'm attaching some debugging output. Here is a list of the files:

vmstat-log-7-July-2010.txt: Output of `vmstat 1 60` once the system started swapping
free.txt: Output of `free -m` a minute later
free-2.txt: Output of `free -m` about 10 seconds after I ran free the first time
top-ouput.txt: Output of `top -b -n 1`

In case it isn't reflected in the top output, I'm running Chromium with four (?) tabs open (no Flash site or anything like that), Xchat connected to two servers and with about seven channels open, Evince viewing a simple pdf document, a Nautilus window, one gnome-terminal window and Banshee playing an mp3. Again, this is a normal load for me that didn't have a problem with previous versions of Ubuntu on the same device.

Pete Goodall (pgoodall) wrote :
Pete Goodall (pgoodall) wrote :
Pete Goodall (pgoodall) wrote :
Pete Goodall (pgoodall) wrote :
Pete Goodall (pgoodall) wrote :

I made the last comment before I read Jeremy's comments (should have refreshed the bug report). Anyway, I'll attach the output he is looking for and be sure to test the upstream kernel as well.

apport information

tags: added: apport-collected
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

I have reproduced the bug using the mainline kernel. I installed linux-image-2.6.35-999-generic from http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/, and rebooted running that kernel. I was running Chromium, Xchat and Banshee w/ no problem, so I decided to try to stress the system a bit more. I started Tomboy and opened one of my notes with no problem. I opened Nautilus and the system started swapping. I tried to get a vmstat by ssh'ing into the system, but by the time I was able to login the swapping had subsided. It started again and I ran vmstat to collect some stats. Don't know if this is useful, but will attach it.

tags: removed: needs-upstream-testing
Pete Goodall (pgoodall) wrote :

I noticed there is a 'needs kernel logs' tag. Is there something else you need attached? If so, do you need it attached running both the current maverick kernel and the mainline kernel?

Pete Goodall (pgoodall) wrote :

Marking as 'New' since I think I have supplied all the required information.

Changed in linux (Ubuntu):
status: Incomplete → New
Pete Goodall (pgoodall) wrote :

In desperation I tried re-installing to see if I just had something installed with a serious memory leak. Unfortunately, this has not improved the situation. My device is next to useless until this is resolved.

Pete Goodall (pgoodall) on 2010-08-09
summary: - massive i/o renders the system unusable
+ Frequent swapping causes system to hang
Changed in linux (Ubuntu):
assignee: nobody → Jeremy Foshee (jeremyfoshee)
tags: added: kernel-core kernel-needs-review
removed: needs-kernel-logs
Pete Goodall (pgoodall) wrote :

Apparently there are some patches for what appears to be this issue. Linked from the upstream kernel bug report: http://www.phoronix.com/scan.php?page=news_item&px=ODQ3Mw

Changed in linux (Ubuntu):
assignee: Jeremy Foshee (jeremyfoshee) → nobody
Brad Figg (brad-figg) on 2010-12-03
tags: added: acpi-namespace-lookup
tags: added: acpi-parse-exec-fail
Changed in linux:
status: Unknown → Confirmed
Changed in linux:
importance: Unknown → High
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux:
status: Confirmed → Fix Released

Pete Goodall, thank you for reporting this and helping make Ubuntu better. Maverick reached EOL on April 10, 2012.
Please see this document for currently supported Ubuntu releases:
https://wiki.ubuntu.com/Releases

We were wondering if this is still an issue in a supported release? If so, could you please test for this with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the kernel in the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested and remove the tag:
needs-upstream-testing

This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the text:
needs-upstream-testing

If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested.

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested.

If you are unable to test the mainline kernel, please comment as to why specifically you were unable to test it and add the following tags:
kernel-unable-to-test-upstream
kernel-unable-to-test-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested.

Please let us know your results. Thank you for your understanding.

Helpful Bug Reporting Tips:
https://help.ubuntu.com/community/ReportingBugs

tags: added: maverick
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
622 comments hidden view all 696 comments

On 3.9-rc5 have a pretty good result with copying from usb-flash to hdd. High speed(30+ mb/s) and iowait 2-20%, wery well. hdd -> flash have 50-60% iowait, but no have any performance problems. Copying speed 12-16mb/s(that is maximum for my usb-flash).

dd if=/dev/zero of=~/ololo bs=1M count=1024 provides high iowait and have some performance problems(in 3d apps have small freezes, fps ceases unstable, and DE slowing down; ie, tasks are starving?).

One a very important problem - swap. Even with a small content of swap have a problems with smoothly work. When kernel starting an a very high usage of swap freezes are delayed on minutes, all of have - freezes. It feels like mm-manager work with swap in blocking mode %) Multitasking is locking when page merging ram <-> swap? Have any ideas why that happens?

Oy, forgot, my sata controller is MCP67(most buggy chip?).

Reply-To: <email address hidden>

I'm currently Out Of Office. I'll be responding to emails, but expect some delay in replies. For any urgent issues, please contact my manager, Kugesh Veeraraghavan <email address hidden>

All who believe that this problem has been fixed, please open this link in Google Chrome: http://ec2-54-229-117-209.eu-west-1.compute.amazonaws.com/party.html

Mikhail, that doesn't related to this one bug, there no large IO in that page, only canvas playing, that eatout RAM:
  function partyHard( drunkenness ) {

    var mapCanvas = [];
    var mapCanvasCtx = [];
    for (var i = 0; i < drunkenness * 1200; i++) {
    mapCanvas[i] = document.createElement('canvas');
        mapCanvas[i].width = 2500;
        mapCanvas[i].height = 2500;
        mapCanvasCtx[i] = mapCanvas[i].getContext('2d');
        mapCanvasCtx[i].fillStyle = 'rgb(0, 0, 0)';
        mapCanvasCtx[i].fillRect( 0, 0, 1700, 1700 );
    }
    console.log(window);
  }

In this example, the large IO will be the result of the swap file. Try to increase the size of the swap to 64Gb and repeat the experiment. On my system with 16Gb of RAM with no swap system is no freezes. If you increase the size of the swap to 64Gb 100% then the system dies. :(

Swap in linux is something fantastik. Fills like schedule is locked, when ram-page is writing in swap. We expected for lags in program, but lags is global! That`s awesome! :)

In my i7+8Gb RAM+sata 750Gb Hdd. If hdd swap working -> system freez and lags -> mouse!!! lags!

kernel 3.10 (and many other versions)

for fix it i use zram swap+hdd swap. Lags are reduced, but did not pass.

(In reply to 3draven from comment #628)
> In my i7+8Gb RAM+sata 750Gb Hdd. If hdd swap working -> system freez and
> lags -> mouse!!! lags!
>
> kernel 3.10 (and many other versions)
>
> for fix it i use zram swap+hdd swap. Lags are reduced, but did not pass.

Yep, experiencing the same, currently on 3.10.15. Getting memory usage to swapping on Linux is craziness for the user. Means 8G of RAM is minimum for any above-average workload.

Nobody cares problems with swap I/O :(

Heya,

after some years I have resolved MY problem with the respopnsivness of the computer in regards to HIGH I/O. For all others I can only say, try it. If it works for you be happy, if not...

For starters I wouldn't call this a bug. It's a DEFECT. Because if I have 20 servers with different linux flavours and distributions, many of them were compiled from scratch, and if I have 200 ubuntu desktops that behave all the same if I use the command dd if=/dev/zero of=test.img bs=1M count=xxx (above 1GB file size), by same meaning this command grinding the system to a halt, and then keeping this problem around for so many years and so many kernels, for me it is a DEFECT.

For the past week I've been trying the BFQ patch for kernel 3.9 on several machines. On one machine I have been heavy testing. I have this machine for some years now, a CORE i7 with 12 GB and 6 HDs in RAID 10. On this I also had the problem, and it was somewhat better with the BFS patch but it was still happening.
With the BFQ patch it's working perfectly. At one moment I had two dd's (dd if=/dev/zero of=test.img bs=1m count=100k, creating a VirtualBox vdi of 60GB, openining 10 ods documents, watching youtube, watching a hd movie in vlc and some other stuff and the desktop / system was as responsive as if nothing was using it. Just like I remember linux being some years ago. And now I have a sustained throughput of 470MB/s HD without my computer going to /dev/null.

So BFQ solved this problem for me. Maybe it's not stable yet, but for me it's more stable than using CFS !!!

Just my two cents. And this bug is closed for me, but only NOW !!!
For all others out there I wish you luck

Just general FYI, BFQ just freshly did a new release where they claim another batch of significant improvements for whatever they're doing.

> So BFQ solved this problem for me. Maybe it's not stable yet,
> but for me it's more stable than using CFS !!!

BFQ and CFS are not congruent. May be you meant BFS?

(In reply to devsk from comment #633)
> > So BFQ solved this problem for me. Maybe it's not stable yet,
> > but for me it's more stable than using CFS !!!
>
> BFQ and CFS are not congruent. May be you meant BFS?

Sorry, my mistake. Meant CFQ. But on the other hand BFS, too. BFS did give me some improvements, I could listen to music while I created a big file, but that was all. So CFQ without BFS was a no-go, CFQ with BFS helped a little, but BFQ alone solved my problems which I had for the past 4-5 years, in which I had to bend and improvise to create a VDI of 60 GB and hoping that my computer stays alive until it finishes the job, and mind you a computer that has resources in abundance. :)

I have successfully reproduce this bug on my HP Z200 under ubuntu 12.04 LTS. After some investigation I found out than main reason of this bug is very ugly bottleneck in block device layer so cores of my Z200 spend almost all time in spinning on spinlock while we have disabled IRQ on ALL cores.

I'm still seeing this.

Setup: Debian 7 Wheezy, amd64 backports kernel (3.11-0.bpo.2-amd64), ~45MB/s write of a low number of large files by rsync (fed through a GBit ethernet link) on an ext3 FS (rw,noatime,data=ordered) in a LVM2 partition on a hardware RAID5.

Observation: The machine (32-core Xeon E5-4650, 192 GB RAM), primarily servicing multiple interactive users via SSH, x2go and SunRay sessions, gets completely unusable during and quite some time after the rsync transfer. TCP connections to SunRay clients time out, IRC connections are dropped, even simple tools like "htop" don't do anything but clear the screen after being started. "iotop" shows a [jbd2/dm-1-8] process on top, reportedly doing "99.99%" I/O (but not reading or writing a single byte, maybe because it's a kernel thread?).

Once I switch from the default CFQ I/O scheduler to "deadline" (echo deadline > /sys/block/sdb/queue/scheduler), the symptoms disappear completely.

Still face this bug. Kernel 3.16

Is it possible to preserve 5% of IO for user/othe processes needs? Any fast download or copying eats 99.99 of IO and system is hard to use.

I'm curious: this bug was ostensibly fixed years ago however I dare everyone, who owns an Android smartphone, run a simple test. Invoke any terminal emulator and execute this command:

$ cat < /dev/zero > /sdcard/EMPTY

What's terribly unpleasant is that _all_ CPU cores become busy (more than 75% load), and the CPU jumps into the highest performance state, i.e. frequency, i.e. power consumption. Obviously this is wrong, bad and shouldn't happen. This test is kinda artificial as no Android app can create such a high IO load, but then there are multiple phones out there with either 5GHz MIMO 802.11n or 802.11ac chips which allow up to 80MB/sec throughput which can easily saturate most if not all internal MMC cards and have the same effect as the above command.

Perhaps vanilla kernel bugzilla is not a place to discuss bugs in Android, but latest Android releases usually feature kernels 3.10.x and 4.1.x without that many patches, so this bug is still there. Both these kernels are currently maintained and supported. Android by default never uses SWAP (one of the reasons for this bug).

Go figure.

P.S. Sample apps from Google Play:

* CPU Stats by takke
* Terminal Emulator by Jack Palevich

I've just experienced this issue with 3.19.0-32-generic on Ubuntu.
My KTorrent downloaded files to NTFS filesystem on SATA3 drive (fuse, download speed was about 100Mbit/s), simultaneously I copied files from that filesystem to USB3.0 flash drive with NTFS filesystem. That resulted poor interactive performance, mouse and windows lags. The workaround was to suspend torrent downloa until files copied.

Hardware: One AMD FX(tm)-8320 Eight-Core Processor, 8 GB RAM.

This bug is definitely not fixed. A simple cp from one drive to another makes a huge impact on my desktop. Trying to do an rsync is even worse. It seems to mainly be a problem with large files. My system is old (Athlon II 250) but even an old P3 running Win98 doesn't lag this bad from just copying files.

I'm trying to copy 50gb from one tower to another via USB 3.0 and it is really no fun. If I would copy all files at once the speed is decreasing constantly. After 30 minutes it copies with 1.0MB/s. If I copy a bunch of directories it is a littlebit better but also decreases in speed. For 2GB my Linux system needs more than an hour. This bug is definitely not fixed. On Windows this USB Stick is working without that speed loss.

OS: Fedora 24
Kernel: 4.8.15

I've noticed that this happens not everytime when I use exact the same USB stick. For my 2GB files (Eclipse with workspace and a project) I needed one hour to copy. It startet with 60MB/s and decreased to 500KB/s. Now I copy 16gb (android studio and some other projects) and it only needs about 15minutes. The copy speed startet with 70MB/s and at the end it was 22MB/s fast. So it also decreases but not as fast as in my 2GB copy process.

It seems Kernel developers not look this topic here, much better to write to the mailing list.

Does Jens' buffered writeback throttling patchset solve your issue?

(In reply to bes1002t from comment #642)
> I'm trying to copy 50gb from one tower to another via USB 3.0 and it is
> really no fun. If I would copy all files at once the speed is decreasing
> constantly. After 30 minutes it copies with 1.0MB/s. If I copy a bunch of
> directories it is a littlebit better but also decreases in speed. For 2GB my
> Linux system needs more than an hour. This bug is definitely not fixed. On
> Windows this USB Stick is working without that speed loss.
>
> OS: Fedora 24
> Kernel: 4.8.15

This bug report has nothing to do with the speed of copying data to USB flash drive. It's about substantially degraded interactivity which manifests in slowness and it's hard to believe you can perceive it via an SSH session.

I'm inclined to believe your bug is related to other subsystems like USB.

> It seems Kernel developers not look this topic here, much better to write to
> the mailing list.

Kernel bugzilla has always been neglected. Thousands of bug reports which have zero comments from prospective developers. LKML is a hit and miss too. Your developer skipped your e-mail because he/she was busy? Bad luck.

@bes1002t: I think throughput is a different issue than this, although it might well be related.

But most important would be for someone to create a I/O concurrency / latency benchmark. Maybe the Phoronix Test Suite is an adequate tool for that? It can also be used for automatic bisecting..

I clearly remember pre-2.6.18 times where I had a much inferior machine and while gent0o's emerge was compiling stuff in the background with multiple threads, I could browse the web switch between programs and play a HD stream without any hickup or stalling.

@bes1002t: Copying to a USB device always starts with the speed of the harddrive as all is cached till the write cache is full and ends with the speed of the usb drive. The write process has to wait till all data is written.

@Artem S. Tashkinov: The stall problems on a ssh session exists or existed. I have migrated an old server with CentOS 6 and copied some vm images. The ssh responsiveness was very bad. I had to wait for up to 20 seconds for tab auto to complete.

I many cases it was a swap problem, as the buffers are full and the caches need a long time to be written to a slow usb device. The server starts to swap process data. It's only a very small amount of data. I could increase the overall desktop performance with an RAM upgrade.

Try Kernel 4.10.

>Improved writeback management
>
>Since the dawn of time, the way Linux synchronizes to disk the data written to
>memory by processes (aka. background writeback) has sucked. When Linux writes
>all that data in the background, it should have little impact on foreground
>activity. That's the definition of background activity...But for a long as it
>can be remembered, heavy buffered writers have not behaved like that. For
>instance, if you do something like $ dd if=/dev/zero of=foo bs=1M count=10k,
>or try to copy files to USB storage, and then try and start a browser or any
>other large app, it basically won't start before the buffered writeback is
>done, and your desktop, or command shell, feels unreponsive. These problems
>happen because heavy writes -the kind of write activity caused by the
>background writeback- fill up the block layer, and other IO requests have to
>wait a lot to be attended (for more details, see the LWN article).
>
>This release adds a mechanism that throttles back buffered writeback, which
>makes more difficult for heavy writers to monopolize the IO requests queue,
>and thus provides a smoother experience in Linux desktops and shells than what
>people was used to. The algorithm for when to throttle can monitor the
>latencies of requests, and shrinks or grows the request queue depth
>accordingly, which means that it's auto-tunable, and generally, a user would
>not have to touch the settings. This feature needs to be enabled explicitly in
>the configuration (and, as it should be expected, there can be regressions)

Hi..

Thank you for your email.

I am sorry, but this email will soon be disabled..

Please send everything work related to <email address hidden>

Please send private mails to <email address hidden>

bye m.

> Try Kernel 4.10.
It not helps in my work load :(
still freezing mouse pointer and keyboard input

Make sure your kernel has that option enabled.

>This feature needs to be enabled explicitly in
>the configuration (and, as it should be expected, there can be regressions)

Created attachment 255491
$ cat /boot/config-`uname -r`

(In reply to Mikhail from comment #651)

First, I'd recommend trying to disable SWAP completely - it might help:

$ sudo swapoff -a

If you compile your own kernel or your distro hasn't enabled them for you, here's the list of the options you need to enable:

BLK_WBT, enable support for block device writeback throttling
BLK_WBT_MQ, multiqueue writeback throttling
BLK_WBT_SQ, single queue writeback throttling

They are all under "Enable the block layer".

If disabling swap and enabling these options have no effect, please ***create a new bug report*** and provide the following information:

CPU
Motherboard and BIOS version
RAM type and volume
Storage and its type
Kernel version and its .config

And also the complete output of these utilities:

dmesg
lspci -vvv
lshw
free
vmstat (when the bug is exposed)

cat /proc/interrupts
cat /proc/iomem
cat /proc/meminfo
cat /proc/mttr

>CONFIG_BLK_WBT=y
># CONFIG_BLK_WBT_SQ is not set
>CONFIG_BLK_WBT_MQ=y

So writeback throttling is enabled only for multi queue devices in your case. I suppose you need to use blk-mq for your sd* devices to activate writeback throttling (scsi_mod.use_blk_mq=1 boot flag) or to recompile kernel with CONFIG_BLK_WBT_SQ enabled.

Created attachment 255501
all required files in one archive

After setting boot flag "scsi_mod.use_blk_mq=1", the freezes became much shorter. I'm not sure now that they are at the kernel level. More look like that window manager (GNOME mutter) is written in such a way that freezes mouse while loading list of applications. To finally defeat freezes, seems need to make the window manager not paged into the swap file.

I'm also catch vmstat output when freeze occurred:
# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r b swpd free buff cache si so bi bo in cs us sy id wa st
 2 6 15947052 205136 112592 4087608 32 41 93 119 7 23 43 19 37 1 0

Twice I asked you you to try disabling SWAP altogether and you still haven't.

I'm unsubscribing from this bug report.

Created attachment 274511
Per deice dirty ration configuration support

Per device dirty bytes configuration

Per device dirty bytes configuration. Patch is not ideal, i'm make it for smoothly flash drive wriring by passing smaller value of dirty byte per removeable device.

>> Path
# ls /sys/block/sdc/bdi/
dirty_background_bytes dirty_background_ratio dirty_bytes dirty_ratio max_ratio min_pages_to_flush min_ratio power read_ahead_kb stable_pages_required subsystem uevent

>> udev Rule for removeables device
# cat /etc/udev/rules.d/90-dirty-flash.rules
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{removable}=="1", ATTR{bdi/dirty_bytes}="4194304"

Hello Ben,

I was trying to figure out the issue but not really sure what exactly the error here. Is the bug fixed already? Or do we have some sources that maybe help us? Thank you

Carlo B.
https://bondereduction.ci/

Displaying first 40 and last 40 comments. View all 696 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.