USB file transfer causes system freezes; ops take hours instead of minutes

Bug #500069 reported by sbec67
654
This bug affects 143 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
High
linux (Fedora)
Won't Fix
High
linux (Ubuntu)
Fix Released
High
Unassigned
Nominated for Lucid by Mudstone

Bug Description

USB Drive is a MP3 Player 2GB

sbec@Diamant:~$ lsusb
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 003 Device 002: ID 046d:c50e Logitech, Inc. MX-1000 Cordless Mouse Receiver
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 004: ID 0402:5661 ALi Corp.
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
sbec@Diamant:~$

Linux Diamant 2.6.31-15-generic #50-Ubuntu SMP Tue Nov 10 14:54:29 UTC 2009 i686 GNU/Linux
Ubuntu 2.6.31-15.50-generic

to test, i issued dd command:
dd if=/dev/zero of=/media/usb-disk/test-file bs=32

while dd is running i run dstat.... this is in the log file attached.

other logs are also in the tar.gz file...

there is a huge USB performance Bug report #1972262. this Report is something simular

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

This is an attempt at bringing sanity to bug #7372. Please only comment here is you are experiencing high I/O wait times and interactvity on reasonable workloads.

Latest working kernel version: 2.6.18?

Problem Description:
I/O operations on large files tend to produce extremely high iowait times and poor system I/O performance (degraded interactivity). This behavior can be seen to varying degrees in tasks such as,
 - Backing up /home (40GB with numerous large files) with diffbackup to external USB hard drive
 - Moving messages between large maildirs
 - updatedb
 - Upgrading large numbers of packages with rpm

Steps to reproduce:
The best synthetic reproduction case I have found is,
$ dd if=/dev/zero of=/tmp/test bs=1M count=1M
During this copy, IO wait times are very high (70-80%) with extremely degraded interactivity although throughput averages about 29MB/s (about the disk's capacity I think). Even starting a new shell takes minutes, especially after letting the machine copy for a while without being actively used. Could this mean it's a caching issue?

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

For the record, this is even reproducible with Linus's master.

Revision history for this message
In , ozan (ozan-linux-kernel-bugs) wrote :

I'm also having this problem.

Latest working kernel version: 2.6.18.8 with config:
http://svn.pardus.org.tr/pardus/2007/kernel/kernel/files/pardus-kernel-config.patch

Currently working on 2.6.25.20 with config:
http://svn.pardus.org.tr/pardus/2008/kernel/kernel/files/pardus-kernel-config.patch

Tested also with 2.6.28 and felt no significant performance improvement.

--

During heavy disk IO's like running 'svn up' hogs the system avoiding the start a new shell, browse on the internet, do some text editing using vim, etc.

For example, after being able to open a text buffer with vim, 4-5 seconds delays happens between consecutive search attempts.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :
Download full text (3.1 KiB)

Hello Ben,

I don't known where to post it exactly. Why Linux Memory Management? Or why -mm and not mainstream? Can you do it for me please?

I have added a second test case, which using threads with pthread_mutex and pthread_cond instead of processes with pipes for communicating, to ensure it is a cpu scheduler issue.

I have repeated the tests with some vanilla kernels again, as there is a remark in the bug report for tainted or distro kernels. As I got a segmentation fault with the 2.6.28 kernel, I added the result of the Ubuntu 9.04 kernel (see attachment). The results are not comparable to the results posted before, as I have changed the time handling (doubles instead of int32_t as some echo messages takes more than one second).
The first three results are 2*100, 2*50 and 2*20 processes exchanging 100k, 200k and 1M messages over a pipe. The last three results are 2*100, 2*50, and 2*20 threads exchanging 100k, 200k and 1M messages with pthread_mutex and pthread_cond. I have added a 10 second pause at the beginning of every thread/process to assure the 2*100 processes or threads are all created and start to exchange the messages nearby at the same time. This was not the case at the old test-case with 2*100 processes, as the first thread was already destroyed before the last was created.

With the second test-case with threads, I got the problems (threads:2*100/msg:1M) immediately with the kernel 2.6.22.19. There kernel 2.6.20.21 was fine with both test-cases.

The meaning of the results:
- min message time
- average message time (80% of the messages)
- message time at median
- maximal message time
- test duration

Here the result.
Linux balrog704 2.6.20.21 #1 SMP Wed Jan 14 10:11:34 CET 2009 x86_64 GNU/Linux
min:0.000ms|avg:0.241-0.249ms|mid:0.244ms|max:18.367ms|duration:25.304s
min:0.002ms|avg:0.088-0.094ms|mid:0.093ms|max:17.845ms|duration:19.694s
min:0.002ms|avg:0.030-0.038ms|mid:0.038ms|max:564.062ms|duration:38.370s
min:0.002ms|avg:0.004-0.007ms|mid:0.004ms|max:1212.746ms|duration:33.137s
min:0.002ms|avg:0.004-0.005ms|mid:0.004ms|max:1092.045ms|duration:31.686s
min:0.002ms|avg:0.004-0.007ms|mid:0.004ms|max:4532.159ms|duration:59.773s

Linux balrog704 2.6.22.19 #1 SMP Wed Jan 14 10:16:43 CET 2009 x86_64 GNU/Linux
min:0.003ms|avg:0.394-0.413ms|mid:0.403ms|max:19.673ms|duration:42.422s
min:0.003ms|avg:0.083-0.188ms|mid:0.182ms|max:13.405ms|duration:37.038s
min:0.003ms|avg:0.056-0.075ms|mid:0.070ms|max:656.112ms|duration:72.943s
min:0.003ms|avg:0.005-0.010ms|mid:0.007ms|max:1756.113ms|duration:49.163s
min:0.003ms|avg:0.005-0.010ms|mid:0.007ms|max:11560.976ms|duration:52.836s
min:0.003ms|avg:0.008-0.010ms|mid:0.010ms|max:5316.424ms|duration:111.323s

Linux balrog704 2.6.24.7 #1 SMP Wed Jan 14 10:21:04 CET 2009 x86_64 GNU/Linux
min:0.003ms|avg:0.223-0.450ms|mid:0.428ms|max:8.494ms|duration:46.123s
min:0.003ms|avg:0.140-0.209ms|mid:0.200ms|max:12.514ms|duration:39.100s
min:0.003ms|avg:0.068-0.084ms|mid:0.076ms|max:38.778ms|duration:78.157s
min:0.003ms|avg:0.454-0.784ms|mid:0.625ms|max:11.063ms|duration:65.619s
min:0.004ms|avg:0.244-0.399ms|mid:0.319ms|max:21.018ms|duration:64.741s
min:0.003ms|avg:0.061-0.138ms|mid:0.111ms|max:23.861ms|durati...

Read more...

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Created attachment 19795
test case with processes and pipes

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Created attachment 19796
test case with threads and mutexes

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Created attachment 19797
All testresult on Core2 T7700 @ 2.40GHz / 4GB RAM

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

I guess the high I/O wait time and the poor responsiveness are the same problem, caused by the cpu scheduler, as I can produce the same symptoms without disc I/O.
Since 2.6.26/27 everyone should be affected by this issue.

What I did not understand is:
Why takes the test with threads and mutexes twice as long as the test with processes and pipes, but stresses the system much more? The mouses freezes nearby immediately, while the test with processes and pipes allows to move the windows.

Revision history for this message
In , l.wandrebeck (l.wandrebeck-linux-kernel-bugs) wrote :

I've met the high I/O wait problem with 3ware cards on Centos 5.x.
This is related to pci_try_set_mwi. More information here:
https://bugzilla.redhat.com/show_bug.cgi?id=444759
Now Thomas seems to have found another source for the problem. Maybe mwi is adding on top of that (not every controller driver sets MWI - BIOS is supposed to do so, but I've met a couple of boards that do not).
HTH.

Revision history for this message
In , rick.richardson (rick.richardson-linux-kernel-bugs) wrote :

If I run "google desktop indexer", then I get the long waits. E.G. vim goes away for up to 5-30 seconds, repeatably!

So, I don't run "google desktop indexer". No problem since 12/15/08!

Revision history for this message
In , humufr (humufr-linux-kernel-bugs) wrote :

You can also add the task:

- copy a file from a compactflash card through usb adaptor or pcmcia card. The
computer is not usable until the copy of the file (3 to 5 megas) is finish. It
doesn't matter if it copy the whole card or only a file. It seems to be similar
to the description of the bug here.

Revision history for this message
In , unixg33k (unixg33k-linux-kernel-bugs) wrote :

I have found that this may be an issue with the Complete Fair Queuing I/O scheduler that was introduced as default in 2.6.18 (when most started observing this performance issue). Reverting back to the old AS scheduler for me seems to have resolved the problem.

To use the AS scheduler and test for yourself, just specify "elevator=as" as a boot option.

Revision history for this message
In , brice+lklm (brice+lklm-linux-kernel-bugs) wrote :

(In reply to comment #2)
> I'm also having this problem.
>
> Latest working kernel version: 2.6.18.8 with config:
>
> http://svn.pardus.org.tr/pardus/2007/kernel/kernel/files/pardus-kernel-config.patch
>
> Currently working on 2.6.25.20 with config:
>
> http://svn.pardus.org.tr/pardus/2008/kernel/kernel/files/pardus-kernel-config.patch
>
> Tested also with 2.6.28 and felt no significant performance improvement.
>
> --
>
> During heavy disk IO's like running 'svn up' hogs the system avoiding the
> start
> a new shell, browse on the internet, do some text editing using vim, etc.
>
> For example, after being able to open a text buffer with vim, 4-5 seconds
> delays happens between consecutive search attempts.

You seem to be able to reproduce the bug easily, and have found a non affected kernel version.
Can you git bisect between those kernels to at least isolate the culprit commit?

Revision history for this message
In , brice+lklm (brice+lklm-linux-kernel-bugs) wrote :

(In reply to comment #3)
>
> With the second test-case with threads, I got the problems
> (threads:2*100/msg:1M) immediately with the kernel 2.6.22.19. There kernel
> 2.6.20.21 was fine with both test-cases.

I'm not sure that's the same issue I had when I posted but 7372, but since you seem to be a programmer you should git bisect between those kernels to isolate the culprit commit.

Revision history for this message
In , pvz (pvz-linux-kernel-bugs) wrote :

I'm not sure if this is related or not, but I'm getting similar behaviour on my own system, but *only* when copying files *from* my USB memory stick (a 4 GB Corsair Flash Voyager) *to* the internal SSD on my Asus Eee PC 900 running Ubuntu 8.10 with a custom build of Linux 2.6.27 (probably slightly patched) provided by array.org.

I.e. reading a file from the USB stick to /dev/null, no slowdown.
Writing /dev/zero to USB stick, no slowdown.
Reading a file from the internal SSD to /dev/null, no slowdown.
Writing /dev/zero to internal SSD, no slowdown.
Copying a file from internal SSD to USB stick, no slowdown.
Copying a file from USB stick to internal SSD, I get massive slowdowns on interactive performance. Launching a terminal, which usually takes a few seconds, suddenly takes the better part of a minute.

Linux used is 2.6.27-8-eeepc on i686 SMP, as prebuilt by http://www.array.org/ubuntu/

The filesystem on the internal SSD is ext3, running on LVM, running on LUKS (encrypted filesystem). As set up by the Ubuntu 8.10 installer. Swap is also on the same encrypted LVM.

The filesystem on the USB stick is vfat. Nothing fancy at all.

I should also add that the read performance of my USB stick is faster (about 25 MB/s) than the write performance on the built-in SSD (about 10 MB/s).

If you feel that it is useful, I can provide dumps of lspci/lsusb/lsmod or any other information. As for the exact build options and patches, that should be determinable by checking the web site specified above.

Hope more data makes it possible to determine a pattern to this bug.

Revision history for this message
In , humufr (humufr-linux-kernel-bugs) wrote :

I tried the solution of Mike the comment http://bugzilla.kernel.org/show_bug.cgi?id=12309#c11 and indeed that solved my issue. So he seens that he is right at least for my problem.

Revision history for this message
In , pvz (pvz-linux-kernel-bugs) wrote :

I tried elevator=as on my system, and it did not change the behaviour. Copying files from external USB to internal encrypted SSD still totally smashes interactive performance. So this issue might be unrelated.

Revision history for this message
In , unixg33k (unixg33k-linux-kernel-bugs) wrote :

(In reply to comment #16)
> I tried elevator=as on my system, and it did not change the behaviour.
> Copying
> files from external USB to internal encrypted SSD still totally smashes
> interactive performance. So this issue might be unrelated.
>

This may be an unrelated issue having to do with USB I/O - since USB seems to be more CPU intensive anyway.

When I experienced this bug (prior to switching from CFQ), it would happen whenever I copied a large file on ATA or SCSI devices and I noticed extremely high I/O wait times - with very low CPU usage. Not only during copying - but during any disk-intensive operation. Everything on my affected machines would come to a grinding halt until the operation was complete. Using AS for me so far has seemed to resolve the issue - as my machines are now responsive as they should be during heavy disk I/O.

Revision history for this message
In , bpenglase (bpenglase-linux-kernel-bugs) wrote :

I have had a very similar problem to this. I still have it often, but not as
much from when I changed from EXT3 to ReiserFS. For the Scheduler, I've been
using BFQ or V(R) thats included in the Zen Patchset. I have tried the stock
kernel, and same problem exists, however I can't remember which scheduler I
used at that point, I believe Deadline.
Most of the IOWait I get comes when either I'm copying files to the local
drives, or using multiple VM's (generally Windows as thats what is needed for
work). I'm willing to try about anything to get this fixed. It's a little
better since I switched FS's on my VM Drive, but still isn't totally fixed.

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #11)
> I have found that this may be an issue with the Complete Fair Queuing I/O
> scheduler that was introduced as default in 2.6.18 (when most started
> observing
> this performance issue). Reverting back to the old AS scheduler for me seems
> to have resolved the problem.
>
> To use the AS scheduler and test for yourself, just specify "elevator=as" as
> a
> boot option.
>

Fwiw, I've never used the CFQ scheduler. I'm on the deadline scheduler with my 3ware 9560SE and still see this problem crop up from time to time, usually when doing a file copy large enough to fill the page cache.

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

I too have found that the choice of I/O scheduler makes little difference. Using AS generally yields no noticable improvement.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

>
> Fwiw, I've never used the CFQ scheduler. I'm on the deadline scheduler with
> my
> 3ware 9560SE and still see this problem crop up from time to time, usually
> when
> doing a file copy large enough to fill the page cache.
>

Another deadliner here. And the same thing. There are two clear cut triggers for me:

1. The test case thomas posted.
2. large copies which fill up page cache.

I think its a process scheduling bug because page cache fill up might be triggering the pdflush processes (which are btw, normal priority. why?) into hyper drive and causing all other processes to wait. We do see various processes going into 'D' state and pdflush at the top of the cpu usage list, when the symptoms occur.

If CFQ is used, and process priority determines IO priority, aren't pdflush processes going to compete with processes doing their own IO when dirty_ratio is reached and the process has priority equal or better than 0 (-1 and higher)? That may explain some of the stories with CFQ here.

Revision history for this message
In , andi-bz (andi-bz-linux-kernel-bugs) wrote :

Re: blaming the scheduler in 2.6.26

The problem was observed a long time before that. There might be additional
scheduler problems (this bug in general suffers from the "lots of different problems" disease), but that is unlikely to be the old well known disk starvation with different devices issue.

Re comment #9 vim stalls while disk is pounded:

You're running ext3 or reiser right? That's a known problem in that vim
regularly does fsync on its auto safe file and that causes a synchronous
JBD transaction and since all transactions are strictly ordered if there
are enough of them in front and the disk is busy it takes quite a long time.

At least on the higher level that is supposed to be mostly solved by ext4
or by XFS.

Of course it's another problem that the disk schedulers allow that long starvation in the first time.

Revision history for this message
In , theparanoidone (theparanoidone-linux-kernel-bugs) wrote :
Download full text (7.3 KiB)

Hi Thomas~

Can you elaborate on your test?

You wrote:
"The first three results are 2*100, 2*50 and 2*20 processes exchanging 100k,
200k and 1M messages over a pipe. The last three results are 2*100, 2*50, and
2*20 threads exchanging 100k, 200k and 1M messages with pthread_mutex and
pthread_cond."

So, I'm guessing you want the test to be run like this:
./processtest 200 100000
./processtest 100 200000
./processtest 40 1000000
./threadtest 200 100000
./threadtest 100 200000
./threadtest 40 1000000

Is that correct? Just want to be sure i'm running the same tests (Also, the code limits number of processes to max 100... so I just edited this allowing the max limit to be 200)

Here's our results:

2.6.15.7-ubuntu1-custom-1000HZ_CLK #1 SMP Thu Jan 15 19:06:30 PST 2009 x86_64 GNU/Linux (ubuntu 6.06.2 server LTS with clk_hz set to 1000HZ)
min:0.004ms|avg:0.004-0.271ms|mid:0.005ms|max:42.049ms|duration:34.029s
min:0.004ms|avg:0.004-0.138ms|mid:0.035ms|max:884.865ms|duration:33.105s
min:0.004ms|avg:0.004-0.042ms|mid:0.004ms|max:2319.621ms|duration:62.438s
min:0.005ms|avg:0.010-0.026ms|mid:0.012ms|max:1407.923ms|duration:92.132s
min:0.005ms|avg:0.011-0.029ms|mid:0.013ms|max:1539.929ms|duration:97.034s
min:0.005ms|avg:0.010-0.031ms|mid:0.013ms|max:18669.095ms|duration:176.555s

2.6.24-23-server #1 SMP Thu Nov 27 18:45:02 UTC 2008 x86_64 GNU/Linux (default ubuntu 64 8.04 server LTS at default 100HZ clock)
min:0.004ms|avg:0.034-0.357ms|mid:0.324ms|max:39.789ms|duration:43.390s
min:0.004ms|avg:0.006-0.149ms|mid:0.131ms|max:79.430ms|duration:39.288s
min:0.004ms|avg:0.046-0.057ms|mid:0.052ms|max:52.427ms|duration:64.481s
min:0.005ms|avg:0.006-0.650ms|mid:0.330ms|max:22.120ms|duration:60.142s
min:0.005ms|avg:0.053-0.309ms|mid:0.276ms|max:21.560ms|duration:62.353s
min:0.004ms|avg:0.033-0.123ms|mid:0.112ms|max:22.007ms|duration:131.029s

Linux la 2.6.24.6-custom #1 SMP Thu Jan 15 23:34:10 UTC 2009 x86_64 GNU/Linux (ubuntu 8.04 server LTS with clk_hz custom set to 1000HZ)
min:0.004ms|avg:0.054-0.364ms|mid:0.332ms|max:24.524ms|duration:42.522s
min:0.004ms|avg:0.125-0.156ms|mid:0.144ms|max:13.171ms|duration:33.573s
min:0.004ms|avg:0.046-0.058ms|mid:0.052ms|max:13.005ms|duration:64.388s
min:0.005ms|avg:0.006-0.594ms|mid:0.302ms|max:13.481ms|duration:61.105s
min:0.005ms|avg:0.109-0.336ms|mid:0.307ms|max:13.345ms|duration:65.000s
min:0.002ms|avg:0.070-0.130ms|mid:0.120ms|max:13.137ms|duration:133.786s

Side notes... we have been experiencing problems with MySQL specifically with sync-binlog=1 and log-bin on and performing high volume of concurrent transactions. Although we run raid-1 with battery cache on... our throughput is horrible. For some reason, we have found that by increasing the CONFIG_HZ=1000 from 100 in the kernel, we get much higher throughput. Otherwise our benchmarks just sit around and have trouble context switching.

#CONFIG_HZ_100=y
#CONFIG_HZ=100
#change to:
CONFIG_HZ_1000=y
CONFIG_HZ=1000

I do not know if the problems we are experiencing with the clock are related to this bug listed here. However, I did want to submit our feed back showing the difference in kernels where our bottleneck runs better.

We use sysbench for our test (wi...

Read more...

Revision history for this message
In , pvz (pvz-linux-kernel-bugs) wrote :

Did some more testing. My father has an Eee PC 900 exactly the same as mine also running Ubuntu 8.10 with the same kernel as mentioned before. Only difference that I can think of - he doesn't use LUKS and LVM like me, he instead has his / directly on /dev/sdb1 (internal SSD).

I also, in addition to trying to launch a Terminal via Gnome (as I did previously) I tried the vim "stuttering" test by creating a file, saving it, and holding down a key to see when it stutters.

The results of these tests:

- On both my own (encypted) and the other (unencrypted) computer, vim occasionally freezes for a few seconds while I cp a file from USB memory to internal SSD.

- On my computer (encrypted) lauching a gnome-terminal takes much longer while copying a file from SSD than on the other computer. While there is a noticable slowdown on the unencrypted machine, on the encrypted machine sometimes the gnome-terminal won't even launch until *after* the copy is complete.

In conclusion - the effect exists on both machines, but the encryption of the SSD very significantly increases the problem. While some slowdown due to encryption should be expected, it should not make the machine almost completely unusable while copying a file from a USB stick to the internal SSD.

Revision history for this message
In , larppaxyz (larppaxyz-linux-kernel-bugs) wrote :

Different scheduler (#11) doesn't seem to do much. I did some quick and dirty testing with my laptop :

Linux lupaus 2.6.28-customlupaus #4 SMP PREEMPT Thu Dec 25 15:05:35 EET 2008 x86_64 GNU/Linux
Vanilla 2.6.28 kernel, config from Ubuntu 8.10, with some modifications to suit my laptop

with io scheduler cfq
./threadtest 100 200000
min:0.004ms|avg:0.007-0.008ms|mid:0.008ms|max:894.480ms|duration:187.588s

with elevator=as (eg. io scheduler anticipatory)
./threadtest 100 200000
min:0.004ms|avg:0.007-0.008ms|mid:0.008ms|max:884.016ms|duration:188.248s

---

with io scheduler cfq
./proctest 50 100000
min:0.005ms|avg:0.005-0.006ms|mid:0.006ms|max:460.631ms|duration:35.773s

with elevator=as (eg. io scheduler anticipatory)
./proctest 50 100000
min:0.005ms|avg:0.006-0.006ms|mid:0.006ms|max:479.695ms|duration:36.645s

Revision history for this message
In , pvz (pvz-linux-kernel-bugs) wrote :

One more observation from another experiment I did:

I have swap on the same encrypted LVM as my root partition. Disabling swap makes the terminal launch much faster while copying -- still slower than when not copying files, but within a few seconds of clicking instead of within minutes.

However! Now, instead individual running processes (like Firefox and vim) hang much more agressively and frequently during copying. I'm not sure what to make of this, but I hope somebody who actually knows something about the Linux kernel will find this useful. :-)

Revision history for this message
In , nalimilan (nalimilan-linux-kernel-bugs) wrote :

I'm not sure any developer will be able to pinpoint the problem in all this mess! ;-) There are likely several bugs here.

For a start, I think it could be nice to separate people whose problem is fixed by elevator=as. And then separate people using encrypted disks. And then problems occurring only with USB disks. Please open new reports. What do developers think?

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Created attachment 19828
Bisect results

I have done the bisect and isolated patch. In the attachment you can find the bisec result. I have done the sysbench test too.

Tests:
 100 Process / 1k messages

Linux balrog704 2.6.20 #13 SMP Fri Jan 16 10:13:21 CET 2009 x86_64 GNU/Linux
min:0.003ms|avg:0.243-0.253ms|mid:0.246ms|max:29.503ms|duration:25.080s
min:0.002ms|avg:0.022-0.038ms|mid:0.037ms|max:756.082ms|duration:37.894s
min:0.002ms|avg:0.004-0.007ms|mid:0.004ms|max:929.790ms|duration:34.608s

Linux balrog704 2.6.20bad #14 SMP Fri Jan 16 10:52:17 CET 2009 x86_64 GNU/Linux
min:0.003ms|avg:0.411-0.434ms|mid:0.424ms|max:18.328ms|duration:43.549s
min:0.003ms|avg:0.063-0.075ms|mid:0.071ms|max:404.088ms|duration:72.860s
min:0.003ms|avg:0.005-0.010ms|mid:0.009ms|max:712.033ms|duration:51.654s

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Created attachment 19829
sysbench results

As I am using Firefox3 with the bad kernel, my post was submitted by accident. With the good kernel there are (nearby) no problems with firefox3 any more.

The tests were where run with the following parameters
- 2*100 processes / 100k messages
- 2*20 processes / 1M messages
- 2*200 threads / 100k messages

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Created attachment 19830
Bisect results

wrong file

Revision history for this message
In , andi-bz (andi-bz-linux-kernel-bugs) wrote :

Re #26

There's some performance problem in general with encrypted swap. I've seen that too. But it's probably a different issue than the primary one which should be discussed here.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

> Is that correct? Just want to be sure i'm running the same tests (Also, the
> code limits number of processes to max 100... so I just edited this allowing
> the max limit to be 200)
I have used 100/50/20 as one echo process uses 2 threads or processes. But it is not important, as these test should only compare different kernel versions on the same computer.

Revision history for this message
In , bpenglase (bpenglase-linux-kernel-bugs) wrote :

(In reply to comment #18)
> I have had a very similar problem to this. I still have it often, but not as
> much from when I changed from EXT3 to ReiserFS. For the Scheduler, I've been
> using BFQ or V(R) thats included in the Zen Patchset. I have tried the stock
> kernel, and same problem exists, however I can't remember which scheduler I
> used at that point, I believe Deadline.
> Most of the IOWait I get comes when either I'm copying files to the local
> drives, or using multiple VM's (generally Windows as thats what is needed for
> work). I'm willing to try about anything to get this fixed. It's a little
> better since I switched FS's on my VM Drive, but still isn't totally fixed.
>

I did try the AS Scheduler, as that was the only thing I changed in my kernel, and it didn't change anything interactively, still get a high IO Wait.

The other thing I noticed, at least when in AS, I start using Swap, it's not a lot (within about 2 minutes I was using 10MB), but it was still climbing.

One other thing, I'm wondering if this is 64bit related. All of my personal boxes are 64bit, and it seems of ones posted here, along with other threads I've read (over on Gentoo forums) that it seems this hits the 64bit users more then the 32bit users. Any truth to this, or am I trying to relate things that aren't related?

My work box (most heavily used):
Linux PC010233L 2.6.28-zen1-2 #2 SMP PREEMPT Thu Jan 15 16:06:37 EST 2009 x86_64 Intel(R) Core(TM)2 Duo CPU E8200 @ 2.66GHz GenuineIntel GNU/Linux

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

(In reply to comment #30)
> Created an attachment (id=19830) [details]
> Bisect results
>
If that bisection is to be believed, the assertion that the issue is caused by a scheduling issue seems quite plausible.

(In reply to comment #33)
> One other thing, I'm wondering if this is 64bit related. All of my personal
> boxes are 64bit, and it seems of ones posted here, along with other threads
> I've read (over on Gentoo forums) that it seems this hits the 64bit users
> more
> then the 32bit users. Any truth to this, or am I trying to relate things that
> aren't related?
>
There is evidence that x86-64 is a factor here.

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

It does strike me as quite odd how large of a factor the size of the transfer seems to be. When I first start evolution (I have very large folders), the system will exhibit poor interactivity for upwards of 5 to 10 minutes. However, when transferring lots of small files (i.e. module_install'ing), the kernel behaves fine. (although modpost also seems to produce poor interactivity)

I think it might help if we had a kernel developer here to list the kernel block/memory manager/scheduler statistics that might indicate where this I/O wait time is going. If sufficient statistics don't exist, it might be worthwhile to instrument the kernel specifically for this bug. It does seem clear that the bug I intended this ticket to describe is invariant on I/O scheduler, so that's one factor that needn't be accounted for.

Revision history for this message
In , larppaxyz (larppaxyz-linux-kernel-bugs) wrote :

I just recompiled my kernel without any SMP support and tested again. My laptop went from usable to totally unusable. Network traffic stops and it's even hard to type anything when process/thread test is running. I have only single CPU on my laptop. I also tried to change scheduler with this setup and that didn't make any difference.

Good luck :)

Revision history for this message
In , Adriaan.van.Kessel (adriaan.van.kessel-linux-kernel-bugs) wrote :

Could this be a jiffies wraparound bug ?

I've seen different formulas for doing interval arithmetic,
and (not) handling wraparound.

For instance, in as_antic_expired()
::
long delta_jif;

        delta_jif = jiffies - ad->antic_start;
        if (unlikely(delta_jif < 0))
                delta_jif = -delta_jif;
::
, which seems incorrect to me. (it could alter the preditive powers
of the scheduler in mysterious ways ;-)
(A different calculation is performed at other places.)
Jiffies wrap around depending on the HZ value (but still, intervals above INT_MAX should be relatively rare), and the jiffies start value
will cause the first wrap @ 5 min after booting, so that would show.

My 2 cents,
AvK

Revision history for this message
In , vapier (vapier-linux-kernel-bugs) wrote :

Adriaan: drivers shouldnt be manually doing comparison on jiffies values. there are helps in linux/jiffies.h for doing the comparison (time_before() / time_after()) and those should handle wrap arounds. if you do see a driver that is doing the wrong thing, i'd open another bug specifically about that (or post a patch yourself :D).

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

With the following code I got negative time differences about -127ms. The tv_sec values where equal and the second tv_usec was smaller than the first. I cannot say which kernel it was, as I am no more able to reproduce it. Some days before it occurs on nearby every test. As this behaviour is connected with TSC synchronisation patch, I have posted it here. I will try to figure out the kernel version.

> gettimeofday(&tv_s, &tz);
> write(a2b[1], &c, 1);
> read(b2a[0], &c, 1);
> gettimeofday(&tv_e, &tz);
> timersub(&tv_e, &tv_s, &tv_r);

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

I get the negative time difference on 2.6.17.14 kernel.org, 2.6.18.8 kernel.org and 2.6.18-92.el5 CentOS.

My system is unusable with these three kernels, when I use the ide_generic. Disc throughput ~3MB/s I/O wait time at 100%.

No problems in ahci and libata with 2.6.18-92.el5.

I was not able to provoke a negative time difference with kernels 2.6.20, 2.6.21, 2.6.24, 2.6.27 and 2.6.8.

Revision history for this message
In , theparanoidone (theparanoidone-linux-kernel-bugs) wrote :

Created attachment 19839
32v64test

32 Bit Test vs 64-Bit

This test is slightly apples and oranges... however, because someone inquired if this was a 32bit or a 64bit problem I ran these tests.

I'm inclined to think it applies to both 32bit and 64bit for 2 reasons
-The 32 bit test didn't perform that great
-The git bisect comment states "the biggest change is the removal of the 'fix up TSCs' code on x86_64 and i386"

Revision history for this message
In , theparanoidone (theparanoidone-linux-kernel-bugs) wrote :

Created attachment 19840
32v64testCleanNewLines.txt

formatting fix

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Please ignore my comments #39 and #40, as this are other problems.

Revision history for this message
In , cyrusm (cyrusm-linux-kernel-bugs) wrote :

Are you guys aware of the Latencytop utility? http://www.latencytop.org/
You have to add CONFIG_LATENCYTOP=y to your config.

Then run your tests which break down the system with Latencytop running. It might give additional information.

Revision history for this message
In , mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote :

I've reproduced this problem with LTTng (http://ltt.polymtl.ca). It looks like the block layer is backmerging the large "dd if=/dev/zero ...." requests at a rate which leaves the request on the top of the request queue.

I've started a more thorough discussion on lkml here :

http://lkml.org/lkml/2009/1/16/487

Revision history for this message
In , unixg33k (unixg33k-linux-kernel-bugs) wrote :

re: the 32bit vs 64bit idea - I've experienced this issue on both 32 and 64 bit platforms, however - all of the platforms were on x64-capable CPUs (not sure if that would matter).

Revision history for this message
In , seanj (seanj-linux-kernel-bugs) wrote :

I hit this bug on Ubuntu 8.10 (updated to 2.6.27-9-generic) running Vmware Workstation 6.5.126130 with Ubuntu 8.04.1 LTS as a guest. It was esp pronounced when resuming a suspended VM.

I tried the different elevator io schedulers. Nothing helped.

Independent of VMWare, if I ran bonnie in one shell and launched firefox the whole system behaved in a very chunky manner.

Renicing pdflush -10 had some great improvement on basic responsiveness. The weird part was after re-recreating a new VM and not seeing the iowait problems I then tried resuming a VM with VMware at the same time I was compressing a tar file with pbzip2 (parallel bzip). All 4 cores were pegged and my load average was normal, system responsiveness was good. As **soon** as I tried resuming the VM with VMWare workstation, the cpu load dropped to 1-5% across all cpus. iowait times shot way up. I have now killed Vmware and iowait times have dropped but my maximum read speed is hovers around 1MB/s (as measured with iostat). This is another symptom of the iowait problem.

 iostat -c -d -m -x sda 1

rMB/s is usually never over 2MB/s

Revision history for this message
In , thomas (thomas-linux-kernel-bugs) wrote :

(In reply to comment #46)
> re: the 32bit vs 64bit idea - I've experienced this issue on both 32 and 64
> bit
> platforms, however - all of the platforms were on x64-capable CPUs (not sure
> if
> that would matter).
>

Using an IBM X40 with an old Pentium M (32bit) and Thomas.pi's testcases made my machine totally unusable. So I don't think this has anything to do with x64-capable CPUs.

Revision history for this message
In , Adriaan.van.Kessel (adriaan.van.kessel-linux-kernel-bugs) wrote :

(In reply to comment #38)
> Adriaan: drivers shouldnt be manually doing comparison on jiffies values.
> there are helps in linux/jiffies.h for doing the comparison (time_before() /
> time_after()) and those should handle wrap arounds. if you do see a driver
> that is doing the wrong thing, i'd open another bug specifically about that
> (or
> post a patch yourself :D).

Well, it was not in one of the driver's code but in block/as-iosched.c:as_fifo_expired()

The observed behavior indicates that something is wrong with the shceduling of
disk I/O, and that most time is spent by all theads competing for one or more (spin-)locks; you might call it a convoy or a thundering hurd syndrome.
But it might be unrelated.
AvK

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

Hi all,

More tests

 Linux ws-esp16 2.6.27-11-generic #1 SMP Thu Jan 8 08:38:33 UTC 2009 i686 GNU/Linux
 $ ./processtest 100 200000
 min:0.006ms|avg:0.278-0.520ms|mid:0.475ms|max:141.058ms|duration:107.646s
 $ ./threadtest 100 200000
 min:0.006ms|avg:0.690-0.768ms|mid:0.715ms|max:235.106ms|duration:159.355s

But if this is a IO problem why monitors does not show a big IO Wait Percentage. It shows a high system usage percentage.
So I suppose that not IO problem seems to be related to process handling inside kernel. May it be related to the preemption model?

I did some additional test:

   1.-Change clock timing -> (no improvement)
   2.-Change preemption model (tested all of them) -> (no improvement)
   3.-Change IO scheduler -> (no improvement)

Is there any way to profile the kernel to see what function gets more attention?

Hope you find somethig...

I attach a screenshot also...

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

Created attachment 19858
Top output while running test

Revision history for this message
In , mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote :

Created attachment 19859
RFC patch to put a maximum to the number of cached bio merge done in a row

Can you try this patch, which applies to 2.6.28, to see if it helps ? I have not been able to reproduce the problem with the patch applied.

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

Hi Mathieu,

I tried this patch against 2.6.27 because it patched right. But the results are not good. It took even more time to complete the test.

Can anyone confirm this?

Revision history for this message
In , mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote :

This patch will probably diminish the overall throughput, because it is making sure that we do not merge more than 128 requests together. I am more interested in the I/O _latency_ (delay) you get when you run the system under a heavy I/O load.

Mathieu

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

Created attachment 19866
Port Attachment #19859 to Linus's master

(In reply to comment #53)
> Hi Mathieu,
>
> I tried this patch against 2.6.27 because it patched right. But the results
> are
> not good. It took even more time to complete the test.
>
> Can anyone confirm this?
>
I can. Unfortunately, not only did the patch fail to reduce latency, but also reduces throughput. Even opening the file selection dialog to attach this patch took over 30 seconds while building a kernel.

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

Also, a patch set providing an ftrace interface to blktrace was recently submitted to the LKML (http://marc.info/?t=123212992300002&r=1&w=2). This could come in handy in further debugging.

Revision history for this message
In , henkjans_bagger (henkjansbagger-linux-kernel-bugs) wrote :

Just a comment that might have gone unnoticed, but to me appears relevant as this bug again appears to become a collection of multiple issues again as happened with #7372 making that the kernel-devs started to ignore it.

The bisect done by thomas.pi points yields a first bad commit dating from february 2007, while these symptoms first surfaced in 2.6.18, which dates from end 2006.

Bug #7372 basically is from before this first bad commit; the bisect I did in that bug for example pointed towards a problem with NCQ with the CFQ scheduler from November 2006 that clearly was only present for 64bit. See http://bugzilla.kernel.org/show_bug.cgi?id=7372#c112 as a reminder for this proof. I'm not sure that issue got resolved in the end.....no clear pointers on what I could do to help further.

Seeing reports in this bug reporting improvements when switching IO-scheduler and reports on differences between 32/64 bit makes me think those might be more related to that commit. Bottomline is to be sceptical with reports on whether or not a patch helps fully as to me it still appears to be multiple issues that have very similar but difficult to reliably trigger symptoms.

However the test-case of Thomas does bring my system to its knees as well, so definitely a good way to tackle at least part of the problem. But I don't think it is the only problem.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

No the patch does not fix the problem, but I think is now better than before.

I think, that it is a cpu scheduler problem. As one process with many threads and thread switching can nearby stop the execution of other processes. This problem exists in every kernel version, even 2.6.15. You can test it by executing the thread based with 2*100 threads.
My system starts to become unusable with the kernel 2.6.27 (Fedora 10) when executing the thread based test with 2*40-50 threads. I don't know how many interrupt occurs, while coping some data, but perhaps it is the commonness between copying files and the thread based test.

The provided bisect, points to a cpu scheduler performance regression, which make the problem more noticeable. The biggest cpu scheduler performance regression was in 2.6.24 - 2.6.27. There was another cpu scheduler performance regression between 2.6.22 and 2.6.24.

Revision history for this message
In , larppaxyz (larppaxyz-linux-kernel-bugs) wrote :

(In reply to comment #57)

> Seeing reports in this bug reporting improvements when switching IO-scheduler
> and reports on differences between 32/64 bit makes me think those might be
> more
> related to that commit.

Nobody confirms that changing io-scheduler or 32<->64bit improves system much?

People are also testing different things, some test disk i/o and others are running process/thread tests. It's very confusing and someone should run couple of identical tests (including disk i/o AND process/thread test) with different kernel options. On my setup, just disabing or enabling SMP support made HUGE difference.

I'm happy to do testing, but only if someone really needs information i can provide.

Again, my worthless 5 cents.. :)

Revision history for this message
In , mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote :

I just created a fio job file which acts like a "ls" executed while doing a large dd. It looks like the anticipatory I/O scheduler was causing those delays for me.

The results for the ls-like jobs are interesting :

I/O scheduler runt-min (msec) runt-max (msec)
noop 41 10563
anticipatory 63 8185
deadline 52 33387
cfq 43 1420

Is it me or all I/O schedulers except cfq generate unexpectedly high latency ?

Details here (including fio job file) :

http://lkml.org/lkml/2009/1/18/198

Mathieu

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

Actually, in this bug as well as in the other (7372), there is no clear direction. None of the kernel devs have taken a leadership role and directed the reporters in a direction where we can start to get a handle on things. What we see here is a lot of speculation on the part of the users and hence enormity of variety of things being tried. Its like everybody shooting in the dark.

Unless someone in the kernel team takes ownership of this bug, sorts out quarters from the pennies and directs users with a clear set of instructions to get well-defined data, I don't see this bug going anywhere.

The question is who has the know-how and willingness to do that? We see process as well as io scheduler being involved, we see vm having effect, we see some libata effects. With so many components in the line of fire and kernel being as vast as it is, I don't see above (one savior coming along and putting 2 & 2 together) happening.

IOW, take a beer and head away from the computer and into the sun....;-)

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

Created attachment 19894
Test job description for fio

Attaching the test case written by Mathieu Desnoyers and included in his earlier email

Revision history for this message
In , bpenglase (bpenglase-linux-kernel-bugs) wrote :

(In reply to comment #33)
>
> I did try the AS Scheduler, as that was the only thing I changed in my
> kernel,
> and it didn't change anything interactively, still get a high IO Wait.
>
> The other thing I noticed, at least when in AS, I start using Swap, it's not
> a
> lot (within about 2 minutes I was using 10MB), but it was still climbing.
>
> My work box (most heavily used):
> Linux PC010233L 2.6.28-zen1-2 #2 SMP PREEMPT Thu Jan 15 16:06:37 EST 2009
> x86_64 Intel(R) Core(TM)2 Duo CPU E8200 @ 2.66GHz GenuineIntel GNU/Linux
>

Ok, I tried playing a little bit more, and switching to the DeadLine scheduler really helped things. I have topped around 73% IOWait, but it never bogged the whole box down. I still need to do definitive testing (via tests already in the bug report), but this seems to have helped. Not sure which problem this relates to in this bug though; I'm guessing the scheduler one.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Created attachment 19906
fio test results of kernel 2.6.15 - 2.6.24

I have executed the test case of Mathieu Desnoyers on some different kernel version. I took the bad and good kernels from my bisection. The results do not confirm my theory. If someone can identificate a problem in it, I can make some more tests.

The only regression I can seen is the regression with the noop scheduler. It is the average of the average latencies.

./test.2.6.15-53-amd64-genericresult.noop 700,62ms
./test.2.6.20-17-genericresult.noop 3520,24ms
./test.2.6.20result.noop 3005,24ms
./test.2.6.20badresult.noop 3698,64ms
./test.2.6.22.19result.noop 1393,67ms
./test.2.6.24.7result.noop 589,66ms

I will check, if the 2.6.24.7 kernel test build has a improved desktop responsiveness.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

There is no performance improve in 2.6.24.7. The list below shows the average times of the 41 small jobs with the cfq scheduler. I have the best desktop responsiveness on 2.6.20. Gimp start on heavy I/O in 10 seconds instead of 30 seconds. The freezes of the applications exists on 2.6.20, but they are much shorter, mostly under one second, while in kernels >= 2.6.22 there are freezes till one minute.

                      min maxa avg stdev
2.6.20-17-generic 9.9 126.00 49.97 59.89
2.6.20 8.66 115.05 39.68 50.41
2.6.22.19 10.34 195.29 66.88 96.07
2.6.24.7 9.93 185.02 64.38 89.95

The high I/O wait is at 75% on the start and climbs to 99-100% after ~5 seconds.

I have noticed, that the freezes occurs in all applications more often, when firefox is running. Currently I create a ram disk on startup, extract the .mozilla folder to it and save it again on shutdown. I makes my system more responsive, especially firefox3.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Created attachment 19912
fio results for kernel 2.6.28

And finally the results for 2.6.28. I have removed all tracing stuff, I could find, but the system is still dull under heavy io.

                             min max avg stdev
2.6.28 noop 97,61 1799,06 654,84 861,90
2.6.28 cfq 9,32 169,32 55,59 79,50

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Created attachment 19920
ext3 and ext4 comparison with patched and unpatched kernel

Here some more results. I could gain or loose some latency by different kernel settings. In 2.6.20 I could reproduceable loose 10ms, which makes a decrease of 25% of average latency. But it makes no difference in the desktop responsiveness.

I have tested the 2.6.28 kernel as patched ( http://bugzilla.kernel.org/attachment.cgi?id=19866 ) and unpatched kernel with ext3 and ext4 with exactly the same kernel settings. My test system is installed on a ext3 partition, the tests are executed on a extra ext3 or ext4 (on the slower one) partition on the same hard drive. The write performance on ext4 is now at 45MB/s instead of 35MB/s (ext3).

The destop responsiveness on the ext4 test with the patched version decreases extremely. While copying a 10gig file from ext4 to ext4, there is nerby no problems with the unpatched kernel. While using the patched kernel, the system becomes unuseable. With ext3 there is a little responsiveness improve with the patched kernel. But it can be coincidence, as I have no exact test for desktop responsiveness.

But while copying the 10gig file on ext4 and compiling the kernel, my system becomes unusable with the unpatched kernel too. There are freezes for >20 seconds, while access the menu in applications the first time.
You can easly simulate this behaviour by executing the following compression for every core.

bzip2 -9 -c /dev/urandom >/dev/null &

And the average latencies of the last four tests.

                             min maxa avg stdev
2.6.28 unpatched ext3 11.24 181.20 62.35 86.15
2.6.28 patched ext3 10.82 175.93 62.18 83.89
2.6.28 unpatched ext4 6.90 396.17 132.52 213.18
2.6.28 patched ext4 6.85 2078.93 707.26 1006.74

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

Forget the back merge patch.

Have you tried running latencytop to spot big sleep offenders?

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Created attachment 19924
Latencytop results

(In reply to comment #68)
> Have you tried running latencytop to spot big sleep offenders?

I am not sure, what I shall look at. You can find in the file latencytop-ext4-2*bzip2.txt the most results.

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

Most of them look as expected, up to about 1 second latency for a single IO under load. latencytop-ext4-2*bzip2.txt looks pretty bad, though. It has a 10 second wait on a single lock_page(), that's pretty slow.

Again, this whole thread confuses me. The IO latencies from the fio jobs posted look OK, in the sense that they haven't regressed and that you can't expect zero latency when you are fully loading a disk with writes. So while we could do better there, it's not a catastrophy.

The bisect you originally did pointed to something interesting, I think. If we have clock problems, the CPU scheduler could easily delay a single process for large amounts of time if other processes are repeatedly ready to run.

Revision history for this message
In , andi-bz (andi-bz-linux-kernel-bugs) wrote :

The scheduler has normal special code to handle bad (like going backwards) clocks. Of course it has its limit, but it should handle the typical cases.
Of course it could confuse other subsystems. For testing you could force
another clock like clock=pmtmr or clock=hpet (if you have HPET)

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

It may be something as simple as a wrapped variable. IIRC, someone recently found something like that in the scheduler, though I can't find the posting just now. It was in kernel/sched_fair.c:update_curr() I think.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

My default clock source is hpet. It is faster, but I have long freezes. With acpi_pm the system is dull, but the freezes were allways below 5 seconds.

Test: copy 10gig file and execute "bzip2 -9 -c /dev/urandom >/dev/null" twice on core2duo.

hpet
1299.7 / 1651.3 / 39790.7 / 4580.1 / 943.9 / 2069.3 / 145.7 / 1739.2 / 691.4 / 2060.2 / 172.3 / 492.4 / 2286.4 / 3064.9 / 696.9 / 716.9 / 14096.2 / 3131.2 / 1640.2 /
min:145.7 ms|max:39790.7 ms|avg:4277.31

acpi_pm
1969 / 1276.8 / 658.8 / 16303.8 / 1604.3 / 3885.8 / 823.6 / 3659.1 / 2719.6 / 2064.2 / 672.9 / 1327.9 / 1783.9 / 604.3 / 1289 / 9535.1 / 1271.5 / 280.9 / 2621.8 / 759.1 /
min:280.9 ms|max:16303.8 ms|avg:2755.57

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

I'm not sure what my default clock source is (where does one look to determine this?), however I just booted with clock=hpet and things don't seem to be particularly better (50% IO wait time while evolution is starting, a process which takes over 5 minutes; this is with Jens' patch). These numbers are common with Jens' patch (which is a bit of an improvement, without the patch evolution pegs IO wait times at 70%+ and is very sluggish even after starting).

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

I just tried clock=acpi_pm and evolution startup performance seems no better. Tonight I'm going to try some quantitative benchmarks on these configurations so that legitimate comparisons can be made.

One thing that I have neglected to mention is that Jens' patch does seem to help overall system interactivity---an application with a high IO load doesn't degrade the latency of the entire system nearly as much---although I have no numbers to support this claim.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

On my computer on 2.6.20 kernel jiffies was the default scheduler. Since 2.6.22 hpet is. On my old notebook it is now acpi_pm. I don't known what it was before. With jiffies under 2.6.28, my system seems much better, although there are still some short freezes. It does not solve the problem, but makes it much better. Please try clocksource=jiffies .

You can check yout current clocksource with.
cat /sys/devices/system/clocksource/clocksource0/*

jiffies
645 / 598.3 / 462.5 / 1496.2 / 213.2 / 1353.1 / 6470.6 / 337.6 / 3406.9 / 2057.5 / 155.3 / 309 / 2332 / 463.1 / 1804.4 / 3258.6 / 261.7 / 8124.3 / 2373.2 / 2471.1
min:116.1 ms|max:8124.3 ms|avg:1843.32

The long values are freezes of firefox.
hpet 39790.7
acpi_pm 16303.8
jiffies 8124.3

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

(In reply to comment #76)
> The long values are freezes of firefox.

Do you mean startup time? or you click on a tab and it takes that long for it to switch?

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

Using the jiffies clocksource on linus's master causes the machine to wedge up on attempting the start Xorg. I'll have to look into it later.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

(In reply to comment #77)
> Do you mean startup time? or you click on a tab and it takes that long for it
> to switch?

It the longest time for switching or opening tabs during heavy io, and 2*bzip2 urandom.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

> (WW) intel(0): No outputs definitely connected, trying again...
> (WW) intel(0): Unable to find initial modes
> (EE) intel(0): No valid modes.

no Xorg coming up with jiffies clocksource. takes the console with it. I have darkness on the screen...:) I can ssh into it, though.

some weird interaction between i915 and clocksource there.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

echo hpet > current_clocksource

and things are back to normal.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

(In reply to comment #81)
> echo hpet > current_clocksource
>
> and things are back to normal.

I got a crash, while tried to set jiffies clocksource while linux was running.

There is now a improvement in the process and thread test with clock source jiffies. Here the result. The performance is nearby as in 2.6.20.

Linux bugs-laptop 2.6.28t61p4 #5 SMP Wed Jan 21 14:30:24 CET 2009 x86_64 GNU/Linux
min:0.000ms|avg:0.000-0.000ms|mid:0.000ms|max:945.000ms|duration:24.354s
min:0.000ms|avg:0.000-0.000ms|mid:0.000ms|max:466.000ms|duration:24.206s
min:0.000ms|avg:0.000-0.000ms|mid:0.000ms|max:220.000ms|duration:47.452s
min:0.000ms|avg:0.000-0.000ms|mid:0.000ms|max:870.000ms|duration:34.105s
min:0.000ms|avg:0.000-0.000ms|mid:0.000ms|max:479.000ms|duration:36.610s
min:0.000ms|avg:0.000-0.000ms|mid:0.000ms|max:212.000ms|duration:77.449s

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

I booted up with clocksource=jiffies and lost Xorg and console. So, it wasn't set while running.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

(In reply to comment #83)
> I booted up with clocksource=jiffies and lost Xorg and console. So, it wasn't
> set while running.

Try to blacklist the thermal and the processor kernel module.

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

Hi

I have currently the following running.

2 x "bzip2 -9 -c /dev/urandom >/dev/null" since I have 2 cores
and one "dd if=/dev/zero of=test.10g bs=1M count=10000"

And only small lockups happenend during that time, which was about 9 minuttes
Bu small locoups I mean a couple of seconds.

After the dd-command had finished the lockups where still occuring but they
where generally much shorter.

For me it is definetly a fix.

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

Seems like is more complex. Only doing the dd-command halts my system in the same ways as earlier described in this bug. ~100% iowait etc. Adding a single bzip-command results in an iowait of around 40% and improved desktop reponse, and finally adding the second bzip-command results in 5% iowait and even better desktop response.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

(In reply to comment #84)
> (In reply to comment #83)
> > I booted up with clocksource=jiffies and lost Xorg and console. So, it
> wasn't
> > set while running.
>
> Try to blacklist the thermal and the processor kernel module.
>

Wouldn't that throw everything cpufreq into a tizzy? Its a laptop, so losing cpufreq and other potential ACPI functions is a big loss. Let me know if I am wrong about this.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

blacklisting processor and thermal didn't work either. I give up on jiffies...:-)

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

Well, looks like there's a good reason why machines hang with clock=jiffies. http://lkml.org/lkml/2009/1/21/402

Any ideas why those users whose machines didn't crash saw improvement? Does this suggest a scheduler issue?

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

> Well, looks like there's a good reason why machines hang with clock=jiffies.
> http://lkml.org/lkml/2009/1/21/402
>

This means I need to recompile kernel without high resolution timer and then pass clocksource=jiffies?

Do we have an explanation for why the freezing period reduced to half with acpi_pm and to a quarter with jiffies for Thomas? I would have thought faster timers will result in better behavior and it was a step in the future direction. But we seem to be going backwards.

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

(In reply to comment #90)
> This means I need to recompile kernel without high resolution timer and then
> pass clocksource=jiffies?
No, it shouldn't be possible to run the kernel using jiffies as a clocksource. The system's time source needs to have a sufficiently high resolution. Using a low resolution time source (like jiffies) can cause the kernel to hang.

>
> Do we have an explanation for why the freezing period reduced to half with
> acpi_pm and to a quarter with jiffies for Thomas? I would have thought faster
> timers will result in better behavior and it was a step in the future
> direction. But we seem to be going backwards.
It's far more complicated than that. If we have a timer wrapping around, it is entirely possible that a slower clock source would give you expected behavior whereas a higher resolution time source would fail. It completely depends upon the source of the freezes.

Jens, what do you think in light of this growing body of evidence pointing towards timer issues?

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

(In reply to comment #91)
Hmm, I think I was a little tired last night. To clarify, I guess you probably could recompile without CONFIG_HIGH_RES_TIMERS, however I'm not sure you'd want to. If I'm not mistaken, the no-tick kernel option is dependent on high-res timers, so you'd have to give that up.

Also, correction:
s/towards timer issues/towards timer-triggered-issues/

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

Has anyone run latency top yet?

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

These are the total values of latency top.

http://bugzilla.kernel.org/show_bug.cgi?id=12309#c73
http://bugzilla.kernel.org/show_bug.cgi?id=12309#c76

Currently my system crashes, while I am executing the copy and 2*bzip operation with jiffies. I will make some new measures, as soon my test system runs.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Created attachment 19954
latencytop captures with clocksource jiffies and hpet

I was not able to execute the 2*bzip2 with jiffies any more. The system freezes for ever, while copying a file and ziping urandom. It happens in runlevel 1, 3 and 5, during cpu intensive tasks.

I have made an test with less cpu consumption. The test uses a script to have the same execution with different clocksources. It's copying a file and extract kernel source, build kernel and finally delete the kernel path. Concurrent the script started gimp, oowriter, firefox, htop, opens some web pages and a document.

Here the "Total:" time from the captures.

jiffies
min:0.1 ms|max:5442.1 ms|avg:213.2

hpet
min:0.0 ms|max:14777.7 ms|avg:403.71

The full capture without the escape sequences are added in the attachment. The escapes sequences are not correctly removed, but it's enough to see the necessary. I can provide the captures with the escape sequences too, if someone wants.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Filtering 10% of the upper and lower times out, results in an average latency time of 1737.18ms for jiffies and 3164.72ms for hpet.

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

Basically all of them show waiting for an async page write to finish, and that can take quite a bit of time with heavy writing going on. First thing next week I'll try and provide a 'this async write now went sync' helper for the io scheduler, so that they can make sure it gets expedited as soon as the sync io is. This should drastically reduce latencies for this situation.

I'll probably be less than straight forward, but a test patch should be quite doable.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

That sounds good.

I have to correct the last values, as I was using the filter for capture logs with escape sequences.

jiffies:
min:0.1 ms|max:5442.1 ms|avg:834.12|avg80:2248.28

hpet:
min:0 ms|max:14777.7 ms|avg:1474.09|avg80:3638.15

Why are there such a big difference in the average latency with jiffies and hpet? The total latency of 80% of the recording is 2,2s with jiffies and 3,6s with hpet.

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

Created attachment 19996
Test patch for async page promotion

First attempt at doing sync promotion of async page waiting. It actually booted, however I haven't done any sort of testing with it yet.

Note that this will only work with CFQ currently.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Created attachment 19997
latencytop captures with clocksource hpet and patched kernel

Same test, with patched kernel and hpet as clocksource.

hpet:
min:10.1 ms|max:11733 ms|avg:3096.22|avg80:4082.79

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

One observation is that ext4 seems quite latency prone in waiting for write access to the journal. IIRC, that matches earlier results where ext3 was much quicker in that area. No idea what causes this, as I'm not familiar with the ext4 internals.

Another observation is that I neglected to include the buffer waiting in the async promotion, it only worked for page locking. I'll add an updated patch below after this posting.

And finally, lots of time is spent waiting for a new write request in the block layer. So you are maxing all 128 requests out in this test case. You can try and increase that to 512 for testing purposes, you can do that ala:

# echo 512 > /sys/block/sda/queue/nr_requests

That will get your async wait numbers down, but it may not reduce your latencies. Fact is that 128 writes is already a lot, and with more requests in the queue, you will have higher completion times for each individual request.

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

Created attachment 19998
Test patch for async page promotion v2

Revision history for this message
In , mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote :
Download full text (3.5 KiB)

Attachement

http://bugzilla.kernel.org/attachment.cgi?id=19998&action=view

Causes the following OOPS as soon as stress-testing starts. Is it possible that bdi->unplug_io_data can be NULL in blk_backing_dev_wop ? Should we simply discard those ?

[ 138.345195] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 138.346301] IP: [<ffffffff803f997d>] elv_wait_on_page+0xd/0x20
[ 138.346301] PGD 434c05067 PUD 434c06067 PMD 0
[ 138.346301] Oops: 0000 [#1] PREEMPT SMP
[ 138.346301] LTT NESTING LEVEL : 0
[ 138.346301] last sysfs file: /sys/block/md1/md/raid_disks
[ 138.346301] Dumping ftrace buffer:
[ 138.346301] (ftrace buffer empty)
[ 138.346301] CPU 3
[ 138.346301] Modules linked in: e1000e loop ltt_tracer ltt_trace_control ltt_type_serializer ltte
[ 138.346301] Pid: 1272, comm: kjournald Not tainted 2.6.28.1 #69
[ 138.346301] RIP: 0010:[<ffffffff803f997d>] [<ffffffff803f997d>] elv_wait_on_page+0xd/0x20
[ 138.346301] RSP: 0018:ffff88043cc19cd0 EFLAGS: 00010286
[ 138.346301] RAX: 0000000000000000 RBX: ffff88043f460938 RCX: 0000000000000000
[ 138.346301] RDX: ffff880438490000 RSI: ffffe200193f0bc0 RDI: ffff88043e580a40
[ 138.346301] RBP: ffff88043cc19cd0 R08: ffff88043d09de78 R09: 0000000000000001
[ 138.346301] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88043cc19d50
[ 138.346301] R13: ffff88043cc19d60 R14: 0000000000000002 R15: ffff8800280590c8
[ 138.346301] FS: 0000000000000000(0000) GS:ffff88043f804d00(0000) knlGS:0000000000000000
[ 138.346301] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 138.346301] CR2: 0000000000000000 CR3: 0000000434817000 CR4: 00000000000006e0
[ 138.346301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 138.346301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 138.346301] Process kjournald (pid: 1272, threadinfo ffff88043cc18000, task ffff88043d09d8c0)
[ 138.346301] Stack:
[ 138.346301] ffff88043cc19ce0 ffffffff803fd2a2 ffff88043cc19d00 ffffffff802f6762
[ 138.346301] ffff88043cc19d60 0000000000000000 ffff88043cc19d40 ffffffff8067ace2
[ 138.346301] ffffffff802f6710 ffff880438490000 0000000000000002 0000000000000002
[ 138.346301] Call Trace:
[ 138.346301] [<ffffffff803fd2a2>] blk_backing_dev_wop+0x12/0x20
[ 138.346301] [<ffffffff802f6762>] sync_buffer+0x52/0x80
[ 138.346301] [<ffffffff8067ace2>] __wait_on_bit+0x62/0x90
[ 138.346301] [<ffffffff802f6710>] ? sync_buffer+0x0/0x80
[ 138.346301] [<ffffffff802f6710>] ? sync_buffer+0x0/0x80
[ 138.346301] [<ffffffff8067ad89>] out_of_line_wait_on_bit+0x79/0x90
[ 138.346301] [<ffffffff802566f0>] ? wake_bit_function+0x0/0x50
[ 138.346301] [<ffffffff802f6649>] __wait_on_buffer+0xf9/0x130
[ 138.346301] [<ffffffff8036c0c5>] journal_commit_transaction+0x7d5/0x1540
[ 138.346301] [<ffffffff80265991>] ? trace_hardirqs_on_caller+0x1b1/0x210
[ 138.346301] [<ffffffff8067d457>] ? _spin_unlock_irqrestore+0x47/0x80
[ 138.346301] [<ffffffff80249cef>] ? try_to_del_timer_sync+0x5f/0x70
[ 138.346301] [<ffffffff803708c8>] kjournald+0xe8/0x250
[ 138.346301] [<ffffffff802566b0>] ? autoremove_wake_function+0x0/0x40
[ 138.346301] [<ffffffff803707e0>] ? kjourna...

Read more...

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

Yes that's expected, I didn't fixup the non-request_fn based drivers. It's trickier to do for dm/md, since you need to know where that page went. Or you can just cycle all the bottom backing_dev_info's like it's done for unplug. I'll be back at the machine in an hour or two, I'll update the patch for dm/md.

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

Created attachment 20001
Test patch for async page promotion v2

Adds support for raid0/1/10/5 and should not oops on dm (just not work as intended, it'll do nothing).

There's still the debug printk in there that notifies you of when something has happened, ala:

$ dmesg | tail
cfq: moving e4a348d4 to dispatch
cfq: moving e49dede4 to dispatch
cfq: moving f687d8d4 to dispatch

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

Another question - are people using CONFIG_NO_HZ or not?

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

(In reply to comment #106)
> Another question - are people using CONFIG_NO_HZ or not?

Yes, I am.

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

(In reply to comment #106)
> Another question - are people using CONFIG_NO_HZ or not?
>
As am I

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

(In reply to comment #106)
Me currently too.

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

So my next question would be if disabling that option makes any difference?

Revision history for this message
In , toby (toby-linux-kernel-bugs) wrote :

We are not using CONFIG_NO_HZ and get high latency (subjective) while running:

dd if=/dev/zero of=file bs=1M count=2048

Additionally, all 8 core cores go to at least 50% iowait, several peg at ~=95%.

We see similar results with:

2.6.18, 2.6.24, deadline, cfq.

Revision history for this message
In , ozan (ozan-linux-kernel-bugs) wrote :

Created attachment 20024
2.6.25.20 fio test with NOHZ disabled

Revision history for this message
In , ozan (ozan-linux-kernel-bugs) wrote :

Created attachment 20025
2.6.25.20 fio test with NOHZ enabled

Revision history for this message
In , ozan (ozan-linux-kernel-bugs) wrote :

What is the preferred way of testing different kernels against this bug?

I've done the fio test of Mathieu but I'm not sure if it gives detailed clue about the problem. I've attached the results.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Created attachment 20026
latencytop captures with clocksource hpet with nohz and no high resolution timer

hpet - no hz - no high resolution timer
min:0 ms|max:10888.7 ms|avg:1311.17
hpet - no hz
min:2 ms|max:16980.9 ms|avg:1513.26

Same settings as in
http://bugzilla.kernel.org/attachment.cgi?id=19954&action=view
hpet
min:0 ms|max:14777.7 ms|avg:1474.09

jiffies
min:0.1 ms|max:5442.1 ms|avg:834.12

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Created attachment 20027
latencytop captures + fio results amd64

I have run the fio job on a different machine on two different discs. While running the fio job, I have captured the latency with latencytop. Each test was executed twice. Once with 2*bzip-urandom and the other without cpu consumption.
You can find the test results for every io scheduler in the archive.

100MB/s disk + 2bzip (2009-01-27.0847-2.6.28.2-acpi_pm)
100MB/s disk (2009-01-27.0908-2.6.28.2-acpi_pm)
40MB/s disk + 2bzip (2009-01-27.0934-2.6.28.2-acpi_pm)
40MB/s disk (2009-01-27.1029-2.6.28.2-acpi_pm)

Total latency - cfq
min: 133.3 ms | max 18555.8 ms | avg 5978.08
min: 25.5 ms | max 5057.2 ms | avg 1660.21
min: 369.0 ms | max 11872.0 ms | avg 3764.57
min: 557.0 ms | max 12215.6 ms | avg 3002.81

fio results - cfq
mint 25msec | maxt 1669msec
mint 23msec | maxt 1596msec
mint 77msec | maxt 2370msec
mint 106msec | maxt 738msec

Revision history for this message
In , ozan (ozan-linux-kernel-bugs) wrote :

// Adding myself to CC

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

(In reply to comment #101)
> One observation is that ext4 seems quite latency prone in waiting for write
> access to the journal. IIRC, that matches earlier results where ext3 was much
> quicker in that area. No idea what causes this, as I'm not familiar with the
> ext4 internals.

It is possible, that the reduced latency on ext4 is a result of the increased write speed, which is nearby doubled. You can see in the result posted before (comment #116), a reduction on ext3 partitions with different hard drives.

Revision history for this message
In , bart (bart-linux-kernel-bugs) wrote :

I have really noticed this lately.

I replace a old server running and older kernel. The replacement hardware was by orders of magnitude more powerful. The I/O system in the old machine was a 4 disk hardware RAID 5 on 64 bit PCI with the very first SATA 10,000RPM WD Raptors (WD740-00FLA). The new machine has and 8 disk hardware RAID 5 using the new 300gig 10,000rpm Velociraptor SATA drives on PCI-Express. The old machine had a Pentium 4 HT CPU. The new machine has a 4 core Core 2 CPU. All high end gear.

The new machine does get far better disk through put, however on the workloads the latencies seem far higher, the interactvity of the machine is poor and all CPU core show high I/O waits.

This machine serves a application that run from Samba shares to 15 or so Windows workstations. This involved lots of file activity on large flat file database files. Some of the files are up to 4GB in size.

The old server was very busy however not a huge amounts of I/O wait was seen. On the new server using a 2.6.18 kernel on an enterprise distro the I/O waits are heaps higher. Especially noticed at backup times. Users of the system have noticed the extra latencies when the system is busy and at these time the I/O waits are high.

The server feels slower than the old machine and this should not be so.

Just thought I would let you know this info as it seems a hard to quantify this to real world.

Revision history for this message
In , simon+kernelbugzilla (simon+kernelbugzilla-linux-kernel-bugs) wrote :

Just wanted to add a couple of links to places where some additional real world experience is related, for whatever they might be worth.

http://forums.storagereview.net/index.php?s=121e3f0d26cbd551c84271019f82f6d3&showtopic=25923&st=0

http://community.novacaster.com/showarticle.pl?id=7395&n=8001

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

(In reply to comment #105)

I have tried the patch with 2.6.28.2 and 2.6.29-rc3 and always get a crash, when io start. Sometimes even after the X-server has started.

kernel 2.6.29-rc3 at

cfq_remove_request 0xe3/0x251

0xffffffff811ca8fc is in cfq_remove_request (block/cfq-iosched.c:650).
645 {
646 struct cfq_queue *cfqq = RQ_CFQQ(rq);
647 struct cfq_data *cfqd = cfqq->cfqd;
648 const int sync = rq_is_sync(rq);
649
650 BUG_ON(!cfqq->queued[sync]);
651 cfqq->queued[sync]--;
652
653 elv_rb_del(&cfqq->sort_list, rq);
654

kernel 2.6.28.2
elv_rb_del+0x21/0x4b

394 }
395 EXPORT_SYMBOL(elv_rb_add);
396
397 void elv_rb_del(struct rb_root *root, struct request *rq)
398 {
399 BUG_ON(RB_EMPTY_NODE(&rq->rb_node));
400 rb_erase(&rq->rb_node, root);
401 RB_CLEAR_NODE(&rq->rb_node);
402 }
403 EXPORT_SYMBOL(elv_rb_del);

Revision history for this message
In , michiel (michiel-linux-kernel-bugs) wrote :

Could this be the same bug as: http://lkml.org/lkml/2008/6/15/163 ?

Because on the same system on which I have the same sympthones on what
his bug describes, also the following happens:
http://beheer.eduwijs.nl/kernellog-brikama.log

I need to say that changing the IO scheduler from CFQ to AS seems to
help a bit. It will not solve the problem, but the system will be much
more responsive.

System information:
IO Scheduler: AS (default is CFQ, using elevator=as)
Timer: hpet
CONFIG_NO_HZ=y
Kernel: Linux brikama 2.6.27-9-generic #1 SMP x86_64 GNU/Linux
Distro: Ubuntu 8.10 Intrepid amd64
CPU: Intel(R) Core(TM)2 CPU E8400 @ 3.00GHz (2 cores)
Memory: 4GB
Using LVM: yes
Using LVM encryption: no
LVM version:
        LVM version: 2.02.39 (2008-06-27)
        Library version: 1.02.27 (2008-06-25)
        Driver version: 4.14.0
Using DM: yes

HDD:

/dev/sda:

 Model=WDC WD5000AACS-00G8B1 , FwRev=05.04C05,
SerialNo= WD-WCAUF0869014
 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=50
 BuffType=unknown, BuffSize=16384kB, MaxMultSect=16, MultSect=?0?
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=976773168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio3 pio4
 DMA modes: mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: Unspecified: ATA/ATAPI-1,2,3,4,5,6,7

Revision history for this message
In , tchiwam (tchiwam-linux-kernel-bugs) wrote :

Anyone here managed to reproduce this problem on an AMD platform ? Because I can't seem to be able to reproduce it. But both of 965GM and 945GM chipset motherboard have the problem with the T7600 and T9500 cpu. My old celeron has the same problem but it doesn't feel like freezing so much.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

(In reply to comment #123)
> Anyone here managed to reproduce this problem on an AMD platform ? Because I
> can't seem to be able to reproduce it. But both of 965GM and 945GM chipset
> motherboard have the problem with the T7600 and T9500 cpu. My old celeron has
> the same problem but it doesn't feel like freezing so much.
>

AMD on nForce4 here running x86_64. Look over at gentoo forums, there is a long thread. And almost all of the people experiencing the problem there are on amd.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

The problem exists on an AMD platform too, but not as bad as on a Intel platform. By changing the clocksource to the acpi_pm, you can recuce the problem a bit on a Intel platform, but the system feels a little bit slower.

Using ext4 reduces the problem enormous. Even firefox is usable while eclipse is indexing the kernel build tree.The problem still exists on heavy io.

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

Sounds like the infamous ext3 fsync() issue is also a factor. Can you try mounting ext3 with -o data=writeback and see if that makes ext3 behave better?

Revision history for this message
In , harrisonmetz (harrisonmetz-linux-kernel-bugs) wrote :

On my machine (nForce 5, AMD Phenom II 940) I also experience huge slowdowns when performing I/O. For example, using Ben's:
dd if=/dev/zero of=/tmp/test bs=1M count=1M
test, it takes me about 40 secs to spawn a shell (15 secs for konsole to open a new tab, and about 25 secs for the shell to actually spawn). This was conducted on my a HD with ext4. Turning off swap helps a lot with launching a shell.

On a more substantial note, I use Unison to sync files between various places and when it is running my system is hardly responsive. This happens to me on ext4, ext3, and ReiserFS.

Changing the schedule to noop, the dd-and-open-shell test is very responsive (with both swapon or off), but any substantial usage, such as using firefox is still slow, just as it is above with cfq.

If I can free up some space and one of my partitions, I'm going to install some distro pre 2.6.18 and "feel" what the performance is like.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

I think the appearance of this bug is conditioned on cpu speed and drive speed.

I have make some more tests. Currently I am using the following command.

for i in 1 2 ; do \
 dd if=/dev/zero of=test-$i bs=1M count=4K oflag=direct & echo test-$i; \
done

Once with oflag=direct and once without.

With ext3 the problem occurs immediately in both cases. With ext4 the problem occurs without oflag=direct occurs immediately. With with oflag=direct I can use even firefox, but sometime the desktop in unusable.

In direct mode new application does not start and disk intensive operations take a long time, but I can move the windows and change the desktops without problems and io-wait at 60%. With dd in non direct mode, I can start new application (it takes still a lot of time), but everything is freezing from time to time and io-wait is immediately at 100%.

I have captures some statistic by adding a printk with the duration time in the function __make_request (blk-core.c). The time is taken directly before and after the spin_lock_irq(q->queue_lock); and finally before the unlock.

There is a dramatic difference between the request per seconds in direct and non-direct mode.
W: wait time before entering lock state
D: duration time of the make_request
T: total time = W + D

ext3 - direct
requests: 209.694080/s
total: W: 0.000645 / D: 0.014584 / T: 0.015229
W: avg: 0.000000307 / min: 0.000000000 / max: 0.000007606
D: avg: 0.000006948 / min: 0.000000255 / max: 0.000085018
T: avg: 0.000007255 / min: 0.000000365 / max: 0.000085018
4294967296 Bytes (4,3 GB) kopiert, 203,66 s, 21,1 MB/s
4294967296 Bytes (4,3 GB) kopiert, 203,582 s, 21,1 MB/s

ext3
requests: 4662.272968/s
total: W: 0.013624 / D: 15.256149 / T: 15.269773
W: avg: 0.000000291 / min: 0.000000000 / max: 0.000275893
D: avg: 0.000325819 / min: 0.000000000 / max: 1.092940760
T: avg: 0.000326110 / min: 0.000000000 / max: 1.092940920
4294967296 Bytes (4,3 GB) kopiert, 203,559 s, 21,1 MB/s
4294967296 Bytes (4,3 GB) kopiert, 214,995 s, 20,0 MB/s

ext4 - direct
requests: 114.510132/s
total: W: 0.000356 / D: 0.017658 / T: 0.018014
W: avg: 0.000000311 / min: 0.000000110 / max: 0.000000630
D: avg: 0.000015408 / min: 0.000000220 / max: 0.000127249
T: avg: 0.000015719 / min: 0.000000330 / max: 0.000127689
4294967296 Bytes (4,3 GB) kopiert, 154,491 s, 27,8 MB/s
4294967296 Bytes (4,3 GB) kopiert, 157,853 s, 27,2 MB/s

ext4
requests: 7009.744726/s
total: W: 0.018928 / D: 6.110891 / T: 6.129819
W: avg: 0.000000270 / min: 0.000000000 / max: 0.000032916
D: avg: 0.000087046 / min: 0.000000000 / max: 0.603327176
T: avg: 0.000087316 / min: 0.000000000 / max: 0.603327516
4294967296 Bytes (4,3 GB) kopiert, 146,303 s, 29,4 MB/s
4294967296 Bytes (4,3 GB) kopiert, 149,361 s, 28,8 MB/s

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

And some test results with clocksource=jiffies instead of hpet (non-direct only), which runs much better on my machine. The total times are added in an interval of 10s. The 15s with ext3 above should come from the two cores.

ext3
total: W: 0.018617 / D: total: 3.714917 / T: total: 3.733534
requests: 4050.191168/s
W: avg: 0.000000459 / min: 0.000000000 / max: 0.000048408
D: avg: 0.000091496 / min: 0.000000000 / max: 0.615268038
T: avg: 0.000091954 / min: 0.000000000 / max: 0.615268379
4294967296 Bytes (4,3 GB) kopiert, 213,215 s, 20,1 MB/s
4294967296 Bytes (4,3 GB) kopiert, 222,198 s, 19,3 MB/s

ext4
total: W: 0.026263 / D: 3.681891 / T: 3.708154
requests: 6006.413044/s
W: avg: 0.000000431 / min: 0.000000000 / max: 0.001003075
D: avg: 0.000060427 / min: 0.000000000 / max: 0.344179020
T: avg: 0.000060858 / min: 0.000000000 / max: 0.344179370
4294967296 Bytes (4,3 GB) kopiert, 147,343 s, 29,1 MB/s
4294967296 Bytes (4,3 GB) kopiert, 146,386 s, 29,3 MB/s

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

Can you try with this simple patch applied?

diff --git a/block/blk.h b/block/blk.h
index 6e1ed40..a145c3a 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -5,7 +5,7 @@
 #define BLK_BATCH_TIME (HZ/50UL)

 /* Number of requests a "batching" process may submit */
-#define BLK_BATCH_REQ 32
+#define BLK_BATCH_REQ 1

 extern struct kmem_cache *blk_requestq_cachep;
 extern struct kobj_type blk_queue_ktype;

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

I would say there are no changes. Perhaps a little bit worse.
There are still freezes with non-direct write access, e.g. while painting circles in gimp.
No freezes with direct-io, but high lattency with concurrent disk access (as before).

ext3 - direct
requests: 205.795295/s
total: W: 0.000616 / D:: 0.011195 / T: 0.011811
W: avg: 0.000000299 / min: 0.000000000 / max: 0.000007085
D: avg: 0.000005434 / min: 0.000000000 / max: 0.000100447
T: avg: 0.000005733 / min: 0.000000000 / max: 0.000100958
4294967296 Bytes (4,3 GB) kopiert, 210,281 s, 20,4 MB/s
4294967296 Bytes (4,3 GB) kopiert, 210,525 s, 20,4 MB/s

ext3
requests: 4960.868922
total: W: 0.032503 / D: 21.032077 / T: 21.064580
W: avg: 0.000000655 / min: 0.000000000 / max: 0.000069624
D: avg: 0.000423863 / min: 0.000000000 / max: 0.415194973
T: avg: 0.000424518 / min: 0.000000000 / max: 0.415195303

requests: 3588.105593/s
total: W: 0.014912 / D: 10.578434 / T: 10.593346
W: avg: 0.000000415 / min: 0.000000000 / max: 0.000077581
D: avg: 0.000294754 / min: 0.000000000 / max: 0.447073476
T: avg: 0.000295170 / min: 0.000000000 / max: 0.447073806

4294967296 Bytes (4,3 GB) kopiert, 218,708 s, 19,6 MB/s
4294967296 Bytes (4,3 GB) kopiert, 228,355 s, 18,8 MB/s

ext4 - direct
requests: 115.981745/s
total: W: 0.000344 / D: 0.016716 / T: 0.017061
W: avg: 0.000000297 / min: 0.000000110 / max: 0.000025846
D: avg: 0.000014398 / min: 0.000000650 / max: 0.000075554
T: avg: 0.000014695 / min: 0.000000990 / max: 0.000076195
4294967296 Bytes (4,3 GB) kopiert, 156,476 s, 27,4 MB/s
4294967296 Bytes (4,3 GB) kopiert, 157,78 s, 27,2 MB/s

ext4
requests: 7556.114616/s
total: W: 0.029942 / D: 9.424271 / T: 9.454213
W: avg: 0.000000396 / min: 0.000000000 / max: 0.000127857
D: avg: 0.000124722 / min: 0.000000000 / max: 0.046151790
T: avg: 0.000125119 / min: 0.000000000 / max: 0.046152130
4294967296 Bytes (4,3 GB) kopiert, 147,553 s, 29,1 MB/s
4294967296 Bytes (4,3 GB) kopiert, 151,226 s, 28,4 MB/s

Revision history for this message
In , mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote :
Download full text (11.5 KiB)

(In reply to comment #130)
> Can you try with this simple patch applied?
>
> diff --git a/block/blk.h b/block/blk.h
> index 6e1ed40..a145c3a 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -5,7 +5,7 @@
> #define BLK_BATCH_TIME (HZ/50UL)
>
> /* Number of requests a "batching" process may submit */
> -#define BLK_BATCH_REQ 32
> +#define BLK_BATCH_REQ 1
>
> extern struct kmem_cache *blk_requestq_cachep;
> extern struct kobj_type blk_queue_ktype;
>

Hi Jens,

I tried it on a 2.6.29-rc3 kernel. It made things worse for "default" config, but did help with config1.
(fio "ssh" test bench)
(config1 : quantum=1, slice_async_rq=1, queue_depth=1)

max runt 2.6.29-rc3 default no patch 14247msec
max runt 2.6.29-rc3 default patch 30833msec

max runt 2.6.29-rc3 config1 no patch 7574msec
max runt 2.6.29-rc3 config1 patch 6585msec

Note that the results seems to indicate that the larger run times occur near the "write" job. The listings below show the runtime of the jobs (1 large write and many 2M reads executed at regular interval for most of the load, and ending with more randomly delayed jobs) in the order they were run. Note that all the read jobs are started at a 4s interval, except the last 2 jobs which are started after 50s for the 1st one, and after another 10s for the last one.

Here is the listing of the 2.6.29-rc3 default no patch

  write: io=10240MiB, bw=56062KiB/s, iops=53, runt=191526msec
  read : io=2052KiB, bw=3411KiB/s, iops=141, runt= 616msec
  read : io=2084KiB, bw=409KiB/s, iops=16, runt= 5215msec
  read : io=2060KiB, bw=349KiB/s, iops=15, runt= 6031msec
  read : io=2060KiB, bw=445KiB/s, iops=17, runt= 4731msec
  read : io=2068KiB, bw=377KiB/s, iops=14, runt= 5606msec
  read : io=2084KiB, bw=558KiB/s, iops=23, runt= 3824msec
  read : io=2056KiB, bw=398KiB/s, iops=15, runt= 5279msec
  read : io=2048KiB, bw=328KiB/s, iops=13, runt= 6393msec
  read : io=2056KiB, bw=337KiB/s, iops=12, runt= 6236msec
  read : io=2072KiB, bw=596KiB/s, iops=23, runt= 3558msec
  read : io=2068KiB, bw=448KiB/s, iops=17, runt= 4723msec
  read : io=2052KiB, bw=342KiB/s, iops=14, runt= 6143msec
  read : io=2056KiB, bw=448KiB/s, iops=19, runt= 4695msec
  read : io=2060KiB, bw=362KiB/s, iops=14, runt= 5814msec
  read : io=2072KiB, bw=1202KiB/s, iops=44, runt= 1765msec
  read : io=2048KiB, bw=395KiB/s, iops=17, runt= 5308msec
  read : io=2056KiB, bw=434KiB/s, iops=17, runt= 4851msec
  read : io=2064KiB, bw=382KiB/s, iops=14, runt= 5521msec
  read : io=2072KiB, bw=412KiB/s, iops=16, runt= 5144msec
  read : io=2052KiB, bw=439KiB/s, iops=17, runt= 4784msec
  read : io=2076KiB, bw=408KiB/s, iops=15, runt= 5209msec
  read : io=2084KiB, bw=405KiB/s, iops=15, runt= 5263msec
  read : io=2052KiB, bw=379KiB/s, iops=14, runt= 5543msec
  read : io=2076KiB, bw=438KiB/s, iops=18, runt= 4852msec
  read : io=2052KiB, bw=1016KiB/s, iops=38, runt= 2068msec
  read : io=2056KiB, bw=227KiB/s, iops=9, runt= 9271msec
  read : io=2072KiB, bw=1256KiB/s, iops=48, runt= 1689msec
  read : io=2048KiB, bw=347KiB/s, iops=13, runt= 6036msec
  read : io=2068KiB, bw=594KiB/s, iops=24, runt= 3562msec
  read : io=2052KiB, bw=415KiB/s, iops=16,...

Revision history for this message
In , mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote :

(edit)
Note that the results seems to indicate that the larger run times occur near
the "write" job *end*.

Revision history for this message
In , petrinic (petrinic-linux-kernel-bugs) wrote :

Hi.

On my laptop(Core2Duo 1.6 ghz) I run my gentoo kernel 2.6.28-gentoo.
I didn't have any problems with latency.

If I run "dd if=/dev/zero of=file bs=1M count=2048" or "dd if=/dev/zero of=/tmp/test bs=1M count=1M" (I tried to run it as user and also as root), my system works well and I can start firefox, another shell, open dolphin (i'm under kde4-svn) and everything is faster.

I have XFS filesystem on my home and reiserfs on root.

Since I configured my kernel manually, maybe it could be usefull for someone to have my .config so I'll post it.

Revision history for this message
In , petrinic (petrinic-linux-kernel-bugs) wrote :

Created attachment 20105
With this .config I don't have latency bug.

My 2.6.28 .config , Everything is ok with this .config. I didn't have any slowdowns running "dd if=/dev/zero of=/tmp/test bs=1M count=1M" on my core2duo laptop(1.6 ghz).

Revision history for this message
In , harrisonmetz (harrisonmetz-linux-kernel-bugs) wrote :

After looking through Alexsandar's kernel I decided to try a new config. Changing my kernel from 250HZ and Voluntary Kernel Preemption to 1000HZ and Preemptible Kernel (Low-Latency Desktop), I can actually open tabs in firefox, new terminals, or SSH into my computer (from itself) without waiting 10-30 seconds. Perhaps there is no bug but this is just expected behavior.

I wonder if it was more of the clock change or the preemption change which made the difference, or both.

For those of you who have this problem what is your HZ and preemption model?

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Enabling the 1000Hz timer frequency and Low-Latency Desktop as preemption model does not solve the problem for me.

The mouse still freezes, I cannot move windows or switch between desktops on heavy i/o. The time of these freezes is now reduced to less than 3s, the freezes interval is 2-10s and the desktop still unusable for me.

Revision history for this message
In , petrinic (petrinic-linux-kernel-bugs) wrote :

Maybe it's not only the preemption and the frequency. I think one of these things could be:

General setup:
- Control Group support DISABLED
- Group CPU Scheduler DISABLED
- Enable full-sized data structures for core ENABLED
- Enable futex support ENABLED
- Use full shmem filesystem ENABLED
- Enable AIO support ENABLED
- SLAB Allocator: SLUB

Processor type and features (ENABLED):
- Tickless System (NO_HZ)
- High Resolution Timer Support
- HPET Timer Support
- Multi-core scheduler support
- Preemptible RCU
- 64 bit Memory and IO resources
- Add LRU list to track non-evictable pages

Good luck...

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

I think it could be great if someone of the kernel can take a look on this.

Linux is starting to loss advantage in performance tests because this problem.

Is there any kernel developer who can address this issue?

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

(In reply to comment #136)
> For those of you who have this problem what is your HZ and preemption model?
>
I'm currently using Voluntary Preemption and HZ=1000. However, I think we're probably losing focus here. Just randomly changing configurations seems like grasping at straws to me. There are far too many potentially relevant configuration options to realistically test them all. If we are going to make progress, we are going to have to use more targeted investigation.

(In reply to comment #139)
> Is there any kernel developer who can address this issue?
>
Jens Axboe has sent us a few patches although he doesn't seem to have a lot of time to dedicate to the issue. Honestly, I think we might need to find a distribution with a block layer developer on payroll who could focus on this issue until it is solved. In my discussions on #fedora-kernel, it doesn't look like Redhat has such a person. I haven't received any responses one way or another on #ubuntu-kernel with respect to Canonical.

Does anyone know of a company who might have someone with the requisite skill set to debug this issue? Jens, do you think you'll be able to sustainably work on this bug? (Thanks for your work so far, by the way)

I think it would be amazing if we could give 2.6.29 proper I/O performance. I know it's getting late considering we're at -rc3, but this bug has been with us for far too long.

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

Well, I'm fairly certain at least part of the issue is a scheduler bug. Just now I was make module_install'ing a few kernels and after some time found that specific processes had stopped responding. This pattern continued, with more and more processes blocking. Eventually the entire X session stopped responding. For a while I could maintain an SSH session and found that IO wait time was 40%, with the rest of the CPU time going idle. After some time, however, even the ssh session stopped responding. This is the third time I have seen behavior like this, with the previous instances involving copying 15GB of data between external hard drives.

Also, Jens, what do you think is the most useful benchmark we've seen here? Testers have used several benchmarks including dd, various fio jobs. Would it help if we standardized on a single benchmark?

Revision history for this message
In , Adriaan.van.Kessel (adriaan.van.kessel-linux-kernel-bugs) wrote :

The best illustration of this behavior seems #128 #129 #131.
IMHO this illustrates that most CPU is burned on a spinlock.
If the time spent inside the critical section also increases (which it does), there is IMHO a strong indication that there must be another (spin-) lock inside this code path.
Currently I'm looking into mm->filemap.c

My own testing consists of a toy search engine I am developing. It uses the maximum of mmap()ed files (32K or 64K). (the program maintains it's own LRU)
In the first stage of its's indexer, it just reads mmap()ed pages, maybe dirtying them. When it is done, it unmaps() them (causing the buffers to be written back to backing store).

The frozen-cursor and non-responsive system only occurs during the first phase.
During the writing phase, things are back to normal again.

IMHO, this could mean two things:
1) There is a funneling lock in the read() pathway
2) The mm runs into the mud

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

Sorry, I wish I could spend more time on it. I'll be on vacation the next 9 days, so no response until the week beginning on Feb 16th. I'll try and set aside a few days to work on it then.

With complete freezing of the mouse, it does look like some sort of spinning issue. To that extent, the most valuable information would be profiling from those 5 seconds surrounding the freeze. Hard to do, but would be very valuable.

People seem to be certain that this is a block layer issue, I'm far from convinced that is the case.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

I have have limited the usage of generic_file_aio_write in filemap.c for every process. Once I have limited the throughput of every process. When the overall throughput was below disc capacity, there where no more freezes of the mouse. When the overall throughput was above disc capacity, the problem appears immediately.

When I have limited the usage to max 20% of interval time for every process, and suspended the thread when it needs more. The problems was present as before, as every 20st requests, __generic_file_aio_write_nolock needs more than 2s for finishing.

I tried the same for the cfq scheduler in cfq_choose_req and added penalties for processes with heavy io, but the pid is not correctly set for all cfq_queue and I got a kernel panic after a while. Before the the kernel panic there was no improvement.

Revision history for this message
In , bart (bart-linux-kernel-bugs) wrote :

Created attachment 20148
Graph of I/O waits on CPU Core 0

Running dd if=/dev/zero of=/storage/hwraid0/test1 bs=1M count=1M

On my AMD Phenom 9950 Quad-Core Processor running a distro kernel (2.6.27.12-170.2.5.fc10.x86_64). This test was run against a XFS file system on a 8 disk PCI-Express hardware RAID card. I also get the same if I run against ext4 on the same hardware. I also get similar results with this machine on a single 10,000RPM drive connected to the motherboards SATA with ext4.

When this test was running the system was very unresponsive. In a different test run I launched evolution and it took around 60 seconds to load.

[root@bajor hwraid0]# dd if=/dev/zero of=/storage/hwraid0/test1 bs=1M count=1M
436560+0 records in
436560+0 records out
457766338560 bytes (458 GB) copied, 2535.92 s, 181 MB/s

Revision history for this message
In , mpartap (mpartap-linux-kernel-bugs) wrote :

<stupidmetoopost />
Now at least i know what's going on.. it seems like its somehow coupled with mm because when this happens a) i can see invocations of the oom_killer in the logs after reboot and b) SYSRQ + sync & unmount action do not end the furious HDD LED flashing so i presume the kernel is misusing swapspace..
btw this is a very indeterminate and simply doing the same thing again will not reproduce the problem... so my vote is for uuhm race condition or spinlock recursion, too.

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

P5K, CPU - Core 2 Duo E8400, connected to the motherboards (ICH9) SATA - ST31000340AS, openSUSE 11.1, kernel - 2.6.28.3

yura@suse:~> dd if=/dev/zero of=test1 bs=1M count=1M
^C
128443+0 records in
128443+0 records out
134682247168 байт (135 GB), 1872,43 c, 71,9 MB/c

vmstat 1 (fragment)
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r b swpd free buff cache si so bi bo in cs us sy id wa
 1 7 0 46780 0 7750564 0 0 4880 12808 838 1491 2 3 0 95
 0 9 0 45480 0 7751004 0 0 1876 36888 1268 2506 2 5 0 93
 1 9 0 45420 0 7752056 0 0 7120 12296 705 1790 1 3 0 96
 0 7 0 43924 0 7751636 0 0 1416 36888 979 2178 3 4 0 93
 0 8 0 44148 0 7751480 0 0 900 28176 672 1444 2 3 0 95
 0 2 0 54144 0 7753008 0 0 2468 24680 649 1191 2 3 0 95
 0 11 0 46420 0 7757720 0 0 1508 72240 994 1696 2 7 2 88
 4 10 0 43348 0 7749244 0 0 5212 51212 1247 2436 6 6 0 87
 1 10 0 46256 0 7749752 0 0 1268 42504 799 1963 2 5 0 93
 0 1 0 45468 0 7757836 0 0 0 81959 1126 2249 1 9 6 84
 0 1 0 43880 0 7758912 0 0 0 71736 830 1818 1 8 31 60
 0 10 0 43280 0 7756472 0 0 0 59473 998 1879 1 5 8 85
 1 9 0 46832 0 7748176 0 0 0 81996 1114 2332 1 8 0 91
 0 10 0 46652 0 7747356 0 0 0 79920 867 1748 1 8 0 91
 0 10 0 45836 0 7747508 0 0 0 76848 1021 1947 1 8 0 91
 0 10 0 46724 0 7751964 0 0 0 52272 821 1775 1 6 0 93
 0 10 0 44388 0 7754660 0 0 0 77896 1054 2230 1 7 0 92
 0 6 0 45672 0 7755792 0 0 0 71736 1343 2886 1 7 0 91
 1 8 0 44624 0 7756444 0 0 0 77863 826 1736 0 7 0 92
 1 6 0 43132 0 7757664 0 0 0 63560 1036 1911 1 7 0 91
 0 3 0 43200 0 7757936 0 0 0 77896 721 1539 1 6 0 92
 1 11 0 46716 0 7760684 0 0 428 63544 1538 2789 12 8 0 79
 0 10 0 44808 0 7756940 0 0 6876 31248 1241 2857 4 4 0 91

The system dies. To call KDE main menu - it is the extremely inconvenient. About the rest - in general I am silent.

Revision history for this message
In , mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote :

Ah !!

I think that could be the problem. The dd test with a large file (20GB) on my machine with 16GB. Looking at top while it's done shows me that the available memory steadily shrinks, all being incrementally reserved for cache.

It actually shrinks down to 80kB. Starting from that point, I experience lags when I type "ls". So.. I think this could be the problem. Is there any reason why the memory used for cache is allowed to grow out of proportion like this ?

Mathieu

(In reply to comment #146)
> <stupidmetoopost />
> Now at least i know what's going on.. it seems like its somehow coupled with
> mm
> because when this happens a) i can see invocations of the oom_killer in the
> logs after reboot and b) SYSRQ + sync & unmount action do not end the furious
> HDD LED flashing so i presume the kernel is misusing swapspace..
> btw this is a very indeterminate and simply doing the same thing again will
> not
> reproduce the problem... so my vote is for uuhm race condition or spinlock
> recursion, too.
>

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

Well actually it is worse than that. If you have not tuned vm.swappiness to something much lower than the default of 60 (1 or something) the kernel will also start swapping out stuff to free memory. I don't know a way the limit the cachememory's size.

Revision history for this message
In , mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote :

There seems to be some information about how to tune this here. Trying out
parameter variations would be interesting :

http://www.westnet.com/~gsmith/content/linux-pdflush.htm

Mathieu

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

echo "1" > dirty_background_ratio
echo "1" > dirty_ratio
echo "3" > drop_caches

and vmstat says

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r b swpd free buff cache si so bi bo in cs us sy id wa
 0 2 355844 427256 3508 67544 10 21 315 180 459 781 5 3 80 12

then after doing a 10gig dd-operation vmstat says

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r b swpd free buff cache si so bi bo in cs us sy id wa
 1 0 355872 24532 8656 457200 10 21 338 497 456 763 5 3 79 13

So if I read the numbers correct around 400 Mb of memory has now been used for caches. Hmm that doesn't match setting dirty_background_ratio and dirty_ratio to 1. Since I have 1G of memory only 1% (10 Mb) should be allowed to be dirty before forcing applications to wait. But this is apparently not the cause here.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

In __block_write_full_page (buffer.c) nearby all submits to the block device are caused by pdflush.
At the beginning there are submits of 300MB on VM with 384MB. After that the dd processes submits the data direct. As soon as there is available memory, it is filled and submitted immediately by pdflush. The 300MB are submitted at once or nearby at once.

On the VM there is the following scheme, caused by the double buffering (VM/Host).
At 67.506825 300MB (pdflush)
               100MB (dd processes)
At 72.750497 300MB (pdflush)
               100MB (dd processes)
At 74.215577 50MB (pdflush) // Host cache filled
...

My guess is, that the dirty pages are not increased correctly by create_empty_buffers in __block_write_full_page. I currently don't known, how to check it, as I have just started to read and understand the kernel code.

Revision history for this message
In , mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote :

The following solution works for me. I use the cgroups to limit the amount of memory dd can use. That shows that there is a problem with the kernel otherwise allowing the page cache to take _all_ the available kernel memory.

mkdir -p /cgroups
mount -t cgroup none /cgroups -o memory
mkdir 0
echo $$ > /cgroups/0/tasks
echo 4M > /cgroups/0/memory.limit_in_bytes
dd if=/dev/zero of=/tmp/bigfile bs=1024k count=20480

The same works with the fio "ssh" test case when run under the cgroups limitations :

  write: io=10240MiB, bw=34349KiB/s, iops=32, runt=312595msec
  read : io=2068KiB, bw=404KiB/s, iops=16, runt= 5239msec
  read : io=2048KiB, bw=598KiB/s, iops=25, runt= 3505msec
  read : io=2056KiB, bw=283KiB/s, iops=12, runt= 7437msec
  read : io=2056KiB, bw=542KiB/s, iops=21, runt= 3879msec
  read : io=2060KiB, bw=388KiB/s, iops=16, runt= 5431msec
  read : io=2052KiB, bw=591KiB/s, iops=25, runt= 3554msec
  read : io=2076KiB, bw=375KiB/s, iops=15, runt= 5658msec
  read : io=2048KiB, bw=522KiB/s, iops=19, runt= 4011msec
  read : io=2080KiB, bw=468KiB/s, iops=19, runt= 4548msec
  read : io=2068KiB, bw=406KiB/s, iops=16, runt= 5206msec
  read : io=2080KiB, bw=412KiB/s, iops=17, runt= 5161msec
  read : io=2068KiB, bw=410KiB/s, iops=18, runt= 5159msec
  read : io=2064KiB, bw=320KiB/s, iops=13, runt= 6603msec
  read : io=2064KiB, bw=356KiB/s, iops=13, runt= 5924msec
  read : io=2052KiB, bw=565KiB/s, iops=22, runt= 3716msec
  read : io=2060KiB, bw=396KiB/s, iops=18, runt= 5321msec
  read : io=2048KiB, bw=507KiB/s, iops=19, runt= 4129msec
  read : io=2048KiB, bw=302KiB/s, iops=12, runt= 6924msec
  read : io=2060KiB, bw=497KiB/s, iops=20, runt= 4243msec
  read : io=2072KiB, bw=3138KiB/s, iops=130, runt= 676msec
  read : io=2048KiB, bw=3472KiB/s, iops=130, runt= 604msec
  read : io=2060KiB, bw=4080KiB/s, iops=172, runt= 517msec
  read : io=2052KiB, bw=4227KiB/s, iops=171, runt= 497msec
  read : io=2048KiB, bw=3744KiB/s, iops=166, runt= 560msec
  read : io=2076KiB, bw=4201KiB/s, iops=169, runt= 506msec
  read : io=2052KiB, bw=3531KiB/s, iops=159, runt= 595msec

See Documentation/cgroups/memory.txt for more details.

Mathieu

Revision history for this message
In , marco.gatti (marco.gatti-linux-kernel-bugs) wrote :

How can we limit this with pre 2.6.29* kernels? I'm using 2.6.28.4 but there's no memory.limit_in_bytes and documentation doesn't help much about this...
Should we completely remove cgroups support from kernel until upgrading or waiting for a fix?

(In reply to comment #153)
[...]
> echo 4M > /cgroups/0/memory.limit_in_bytes
[...]

Revision history for this message
In , mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote :

Is CONFIG_CGROUPS (and sub-options) enabled in your 2.6.28.x kernel ?

I cannot guarantee that memory limits will be available, but I can see the CONFIG_CGROUPS option in my old 2.6.28.x .config.

Mathieu

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

Does not work for me. I succeed in limitting the memory-usage from going to infinity, but I still get 98% iowait and bad loss of responsiveness. I'm running 2.6.28.7

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

well it is a little bit more detailed. 4M limit ended up to kill my dd-operation. A limit of 16M is better for me and seems to be way better than the default without any limits.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

The CGROUPS are available in 2.6.28.3, but there is no the memory limit.

(In reply to comment #156)
Søren can you test it with clocksource=jiffies too? As I still think, that the reduces scheduler performance (#3) makes the problem worse. You can see the differences in comment #128 and #129 on my machine.

The number of dirty pages and writeback pages (/proc/meminfo) is always below 20% of memory on my systems, even under heavy io. But there is a lot of "traffic" caused by pdflush, when dirty pages count reaches the limit. All dirty pages are passed to the blk/elevator nearby at once. The time for sorting the rb-tree or perhaps looks takes more time for every request, as there are a lot of requests.

On ext3 it takes up to 1 second and 0.3 in average for inserting for new a request. And there are up to 7000 request submitted on my notebook. ( see comment #128 and #129 ). I think this one reason for the high io.

The problem for the high memory usage is caused by pdflush too, which is called by generic_perform_write (filemap.c) -> balance_dirty_pages_ratelimited. The clear_page_dirty_for_io is called directly before the page is submitted to the blk/elevator in write_cache_pages. As a result the page buffers are still in the elevator queue and the global_page_state(NR_FILE_DIRTY) has a too small value.

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

It does not matter if I use jiffies in these cases where memory is limited

memory.limit_in_bytes = 4M
Responsiveness : Very good
Disk speed : 40% of disk capabillity
iowait : Generally around 50%

Responsiveness : Good
Disk speed : 50% of disk capabillity
iowait : Generally around 50% but

Interestingly I can't get the disk speed > 50% of the disk capabillity reported by hdparm, not event with oflag=direct

Eearlier I have reported that jiffies performed better, but that was without memory-limitations.

Revision history for this message
In , mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote :

Created attachment 20172
mm fix page writeback accounting to fix oom condition under heavy I/O

Makes sure the page cache accounting behaves correctly with I/O elevator, thus fixing OOM condition.

Does not seem to fix the latency problem though. See changelog.

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

Hi Søren,

It's possible that the memory limits does not help with the problem, as you say the Hd will go underspeed because lack of data (dute to the memory limits). So it will trigger the problem later or not trigger it at all.

But it's good to have a way to limit the problem anyway.

So I have a question. What is the right way to work? I mean under heavy loads of IO flow, what's the right way to work for a sane kernel?

I propose some cases:

A) We have 2 process one that makes high load IO operations (this time HD), one thats only do it occasionally.

   1.- Process 1 (high IO) starts to do IO ops. So it will switch between blocked status by IO ops and active as it reads and sends data to controller.
   2.- Process 2 tries to access disk, so it has to wait for a chance to read.

In this case IO wait of process 1 should be almost 0 so it only waits microsecs while last IO op finish. But process 2 should have high IO waits because Process 1 takes all IO bandwidth.

B) Same case but with a round robin style queue. CFQ?

IO wait should be nearly 0 for Process 2 as it gets chance to write to disk but Process 1 must wait each operation to finish...

What is the correct whay? Is there any other?

What is clear is that is not normal that a process blocks all the other processes because is waiting to write. Just in case that every process want to write the IO Wait should rise as all processes are waiting to get a chance. In this case... Should we only have IO Wait times? Is this our case?

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

Created attachment 20176
Screenshot of current status of the bug while letting a program hang the system

Here you can see how IO Wait is 72.2% With Xorg going crazy on CPU usage and system showing that the rest of the system is completely unusable.

That was just because transmission was verifying my torrents. So again it's not acceptable that systems renders unusable because a background operation in place...

How can I help more?

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

Could it be possible to reused the concept from cpu-schedueling. Instead of talking about time-slices we could talk about IO-slices. The favor the processes which uses fewest IO-slices this will avoid an evil dd to starve other light reader/writers. I'm not kernel-skilled a t all so maybe this sound a lot like your RR-queue but just some thoughts.

Revision history for this message
In , alevkovich (alevkovich-linux-kernel-bugs) wrote :

May be someone can explain me why the simple copying eats ~50% cpu? May be it is a part of this problem? The same copying in Windows eats 5-10% cpu. UDMA 100 is enabled by my pata. I have jfs partitions.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

With the last patch, the problem is permanent on my notebook on a ext4 and ext3 partition. The io wait time is at 100% with heavy io. Mouse clicks are not recognised very often, or the keyboard input is delayed for up to 10 seconds (all under xorg).

I got a deadlock with the patch on kernel 2.6.28.2, but only once. The io wait time was at 100%, but there was no disc io any more. I could not start any programs or save some data, but I was able to use the running programs. I am not sure, if this is a problem caused by the patch or it is our problem. I got a complete freeze with clocksource=jiffies on a unpatched kernel with heavy io and heavy cpu usage too.

I have checked some timings in the block and elevator functions
(__make_request, get_request, get_request_wait, blk_complete_request, cfq_service_tree_add and cfq_add_rq_rb).
All the timings where below 5µs. At some points they are climbing to 80µs. But it looks good for me. In get_request_wait the writing dd processes are waiting up to one second for a new free request. It was only the dd processes or sometime the pdflush process. Should be OK.

Can prepare_to_wait_exclusive(&rl->wait[rw], &wait, TASK_UNINTERRUPTIBLE) in get_request_wait (blk-core.c) cause such a problem?

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

The patch from #160 to avoid the kernel from jsut taking all available memory almost works for me. Thanks Mathieu. I don't get crazy swapout as I used to, but the cache still occupies 400 Megs of memory out of my 1G which is also wrong.

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

hmm ..... I assume that the cache is both read- and write-cache. In that case everything is allright. I can confirm the allmost 100% iowait

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

I have limited the number of request of a process to 200 every second by adding a msleep_interruptible(5) just before spin_lock_irq(q->queue_lock) in __make_request, when there is a intensive usage by this process. The number of request are incremented in a ring buffer for four seconds and updated every 100ms. The throughput of the two dd processes is really bad at 3MB/s (as expected). Processes with a higher priority than 0, kjournald(2) or when (bio_data_dir(bio) == READ || bio_sync(bio)) is true, are passed without delay.
The wait time is at 100% of one core at the beginning and 100% off both cores after ~5-10s. Only the two dd processes and pdflush are delayed.
The problem is permanent. I cannot change the windows of two consoles or switching desktop. There are always long delays. It's exactly the freezing known from heavy io, with the difference of a moveable mouse cursor. I am not able using gedit to write a text, as every 5-15 seconds the keys are recognised with a long delay of at least 5 seconds. Even when the dd processes are killed and there is only a maximum write speed of 3MB/s (pdflush and perhaps kjournald) (0% io wait time) in the background. Gimp is starting in 10 seconds without preloading. The cache usage is at less than 20% of memory (~800MB).

I am using the kernel 2.6.28.2 with the patch from Mathieu. Thanks a lot. I think It stops freezing the mouse cursor. And my delay in __make_request.
Removing the delay only, restores the state before.

I think it is the main problem, as I can simulate it! The high wait io are cause by the sleeping threads. In __make_request there are only 100-200 from 7000 request during heavy io, which are calling get_request_wait. And there are only 10 requests, which are entering the while loop in get_request_wait, realy waiting more then 20ms and up to 1 second on my machine (prepare_to_wait_exclusive(&rl->wait[rw], &wait, TASK_UNINTERRUPTIBLE);...; io_schedule(); in get_request_wait).

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

I have just replaced prepare_to_wait_exclusive(&rl->wait[rw], &wait, TASK_UNINTERRUPTIBLE); and io_schedule(); in the function get_request_wait) agains msleep_interruptible(500). The thoughtput of the two dd processes is at 57MB/s (27/30). The desktop freezes up to 100 seconds.

Revision history for this message
In , mathias.buren (mathias.buren-linux-kernel-bugs) wrote :

Is there any way I can help debugging this?

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

(In reply to comment #138)
> Maybe it's not only the preemption and the frequency. I think one of these
> things could be:
>
> General setup:
> - Control Group support DISABLED
> - Group CPU Scheduler DISABLED
> - Enable full-sized data structures for core ENABLED
> - Enable futex support ENABLED
> - Use full shmem filesystem ENABLED
> - Enable AIO support ENABLED
> - SLAB Allocator: SLUB
>
> Processor type and features (ENABLED):
> - Tickless System (NO_HZ)
> - High Resolution Timer Support
> - HPET Timer Support
> - Multi-core scheduler support
> - Preemptible RCU
> - 64 bit Memory and IO resources
> - Add LRU list to track non-evictable pages
>
> Good luck...
>

Many of these seem to be 32-bit settings. The funny thing is that if I boot into x86 32-bit, I don't see any of the slow downs or they are so little that effectively I don't feel them. Its only x86-64 which freezes on me during IO.

Revision history for this message
In , bart (bart-linux-kernel-bugs) wrote :

Must admit all machines I have noticed this on are x86_64.

Revision history for this message
In , michiel (michiel-linux-kernel-bugs) wrote :

On the systems I have noticed it, are also x86_64.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

I have noticed this bug on a Pentium-M (32-Bit only) processor.

Revision history for this message
In , simon+kernelbugzilla (simon+kernelbugzilla-linux-kernel-bugs) wrote :

I have seen this bug on an Opteron 250 system with a 32-bit OS (CentOS 4.4 thru CentOS 5) installed.

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

Mine is

gad@ws-esp16:~$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz
stepping : 10
cpu MHz : 800.000
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm ida
bogomips : 4388.98
clflush size : 64
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz
stepping : 10
cpu MHz : 800.000
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm ida
bogomips : 4389.07
clflush size : 64
power management:

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

My cpu model is :AMD Turion(tm) 64 X2 Mobile Technology TL-50
The kernel is compiled for i686, and I see large slowdowns.

Revision history for this message
In , james (james-linux-kernel-bugs) wrote :

I see this on my Intel T81OO notebook on both kernel-2.6.29-0.33.rc5.fc10.x86_64 and kernel-2.6.27.15-170.2.24.fc10.x86_64 (default Fedora config options). Just using the simple dd /dev/zero test can provoke it; the desktop feels less responsive. latencytop shows things like evolution waiting almost 10 seconds for an fsync to complete.

Hardware has an ICH8 chipset, DMA etc. seems configured properly.

vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Duo CPU T8100 @ 2.10GHz
stepping : 6
cpu MHz : 800.000
cache size : 3072 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

In certain cases (2.6.28.5) at patch usage "mm fix page writeback accounting to fix oom condition under heavy I/O" the output from under the control, increase iowait ~ 100 % and a complete stop of system is observed. Any data I can not put, as one button reset on a box works only. Probably there is a set of the influencing factors demanding more detailed check.

Revision history for this message
In , mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote :

(In reply to comment #179)
> In certain cases (2.6.28.5) at patch usage "mm fix page writeback accounting
> to
> fix oom condition under heavy I/O" the output from under the control,
> increase
> iowait ~ 100 % and a complete stop of system is observed. Any data I can not
> put, as one button reset on a box works only. Probably there is a set of the
> influencing factors demanding more detailed check.
>

My patch "mm fix page writeback accounting to fix oom condition under heavy I/O" is probably no the right solution, but rather a step in the right direction. It poinpoints that the elevator fails to increment counters that are tested by the code which selects if the memory pressure from the dirty pages and writeback pages high enough to make the process fall into "sync write" mode.

Therefore, I think a cleaner solution to this particular problem could be to create a new page type counter (like dirty pages, write buffers, ..) to let the vm know how many pages are used by the elevator. The fs/buffer.c code should then check for this value too to see if the pressure on memory is high enough to make the process do a "sync write". However, this problem is harder than it appears, because the buffer.c code would probably put such process in sync write mode independently of the elevator, and I really wonder what the interaction of such solution with the CFQ would be. I am not sure the CFQ I/O scheduler would behave correctly in such situation, but Jens could tell better than I on the subject.

Hope this helps,

Mathieu

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

(In reply to comment #179)
> Any data I can not
> put, as one button reset on a box works only. Probably there is a set of the
> influencing factors demanding more detailed check.

I have noticed this issue with a unpached kernel too. The "mm fix page writeback accounting to fix oom condition under heavy I/O" patch makes the problem reproduceable. Sometimes the io wait time is at 100%. Sometimes there is no io wait time. There is no problem with read access, but no write access is executed. I can reproduce the problem with xfs. With ext4 the problem does not appear very often on the patched and unpatched kernel.

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

(In reply to comment #181)

Then I will bring specification, I use only xfs. Probably patch badly influences it, and probably well works with other file systems. I am sorry, it is simple to me there is nothing it to check up.

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

I have consistently had this problem with any kernel I have tried, above 2.6.17, so I have stuck with that up until now.

There are some supposed resolutions to the problem at http://linux-ata.org/faq.html, but none of them work for me, and I don't have the mentioned BIOS setting in my BIOS.

lspci reports...
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 01)
00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA IDE Controller (rev 01)

I do not appear to have the problem on my Macbook 2,1, although the disk performance is like 21M/s, which is lousy. But, what I'm seeing on my one machine is 1M-3M/s.

I also tried passing "pci=routeirq" and "acpi=off" (grasping at straws), but that did not change anything. I did however notice that my HD is /dev/sda in 2.6.17, and /dev/hda in 2.6.25 and 2.6.27.

On 2.6.17, dmesg tells me...
ata_piix 0000:00:1f.2: version 1.05
ata_piix 0000:00:1f.2: MAP [ P0 P2 IDE IDE ]
ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 17 (level, low) -> IRQ 18
ata: 0x170 IDE port busy
PCI: Setting latency timer of device 0000:00:1f.2 to 64
ata1: SATA max UDMA/133 cmd 0x1F0 ctl 0x3F6 bmdma 0xBFA0 irq 14
ata1: dev 0 cfg 49:2f00 82:346b 83:7d09 84:6123 85:3469 86:bc09 87:6123 88:207f
ata1: dev 0 ATA-8, max UDMA/133, 625142448 sectors: LBA48
ata1: dev 0 configured for UDMA/133
scsi2 : ata_piix
  Vendor: ATA Model: ST9320421ASG Rev: SD13
  Type: Direct-Access ANSI SCSI revision: 05
SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
 sda: sda1 sda2 sda3
sd 2:0:0:0: Attached scsi disk sda
sd 2:0:0:0: Attached scsi generic sg0 type 0

But on 2.6.27, I get nothing of the sort. Nothing to do with SATA or anything. I did notice that with 2.6.27, libata was enabled, while with 2.6.17 it didn't appear to be an option even. Ever since that libata, nothing seems to work, and my computer is relatively new. I have a Dell D820 core 2 duo.

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

I noticed the same - would it be possible to revert the libata-integration ?

Revision history for this message
In , akpm (akpm-linux-kernel-bugs) wrote :

Trenton, it's unclear to me what you're describing here.

> I have consistently had this problem

which problem?

Anyway, it sounds like what you're reporting is a straightforward
regression in ATA throughput?

If so, please raise a separate, new bug report against SATA for that,
thanks.

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

Oops, mid-air collision. I'll answer Andrew's question first.

I'm having two problems.
1. on my Dell D820 I see degraded throughput AND high io wait times as everyone else here has described
2. on my Macbook, I do not see degraded performance, but I see the extremely high io wait times.

Both of these systems have the IDENTICAL IDE chipsets. Read on with my original reply, before collision, for more information.

Quick question, is anyone else using the Intel 82801GBM/GHM IDE chipset, who has this problem as well???

I have a Dell D820 (64 bit) notebook, and a Macbook from late 2007 (the 64 bit ones). I noticed that they both have Intel 82801GBM/GHM IDE chipsets. They both exhibit the problem. If running Gentoo Linux 32 bit on the D820, and one of these bad kernels, my hard drive (which was renamed to hda), gets about 3M/sec, and the high wait times are also present.

With the Macbook, the high io wait times are there, but I get a good throughput, with Gentoo 32 bit. Not sure what the difference is between the D820 and the Macbook, seeing they have very similar hardware (almost identical). I suppose it is possible that Apple made the suggested change that the linux-ata guy suggested (for the bios).

This truly is debilitating. I have now tried two distributions with the latest 2.6.x kernels (Gentoo and OpenSUSE 11.1), and all of them exhibit these symptoms on my hardware. I am almost certain that if this does not get fixed, I will be unable to continue using Linux at work, unless I get a new computer (slim chance but possible). After all, eventually, Gentoo will move towards some new features that require a newer kernel, and I will be left in the dust. I will then be forced to run Linux in vmware under Windows. Please, someone save me from this awful DEATH. muhahahaha.

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

(In reply to comment #172)
> Must admit all machines I have noticed this on are x86_64.
>

I am seeing both x86_64 and i686 machines exhibit this. Before my Dell D820 died on me, it was a duo core 32bit machine. Then it got replace with a newer D820 which is a core 2 duo 64bit machine. This issue happened on both of those. And, as mentioned in my last comment, it also happens on my core 2 duo Macbook.

Revision history for this message
In , ozan (ozan-linux-kernel-bugs) wrote :

I once had a similar *traumatic* throughput regression with an Intel processor + p4_clockmod. So the issues may completely have different causes.

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

(In reply to comment #134)
> Hi.
>
> On my laptop(Core2Duo 1.6 ghz) I run my gentoo kernel 2.6.28-gentoo.
> I didn't have any problems with latency.
>
> If I run "dd if=/dev/zero of=file bs=1M count=2048" or "dd if=/dev/zero
> of=/tmp/test bs=1M count=1M" (I tried to run it as user and also as root), my
> system works well and I can start firefox, another shell, open dolphin (i'm
> under kde4-svn) and everything is faster.
>
> I have XFS filesystem on my home and reiserfs on root.
>
> Since I configured my kernel manually, maybe it could be usefull for someone
> to
> have my .config so I'll post it.
>

I have just unmasked, and tried 2.6.28 on Gentoo Linux as well, and the problem appears to be gone. This is on my D820, which is the one with really bad throughput as well. As I am in the process of converting to 64bit on my D820, I am unable to try GUI stuff out. But, before, during heavy load, I was unable to switch between terminals very well either. Now, the system is EXTREMELY responsive, during these heavy load times, which is what I expect. And, I'm getting 82M/sec once the caching limit has been reached, and 256M/sec with caching. This is equivalent to what I was getting with 2.6.17.

Now, I don't know if the gentoo guys applied someone's patch from here, as comment #52 mentioned patching 2.6.28, but it's working for me now. I'm VERY happy about that. :D Based on his description, it very much sounds like the Gentoo guys must have applied the patch. I was doing a while loop, with dd, increasing the amount of data by 1M at a time. The first few, up to about 60M, were getting 256M/sec. Then, I noticed in my other terminal, running vmstat, the iowait times got pinned to nearly 100%. So, I'm thinking that all those dd's that got cached, were finally catching up to the NO LIMIT on cached items, and causing thrashing in the IO system. That caused a COMPLETE freezeup of the while loop. Also, during this time, my HD light was going crazy. Then, when the io wait times dropped to 0 again (cached items flushed), the loop did a few more iterations (and my HD light was off), and it started all over again. Then, again the loop froze, etc, etc, etc.

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

Also, I feel kind of stupid because I should have reported this back in 2007 when I saw it. But, I figured someone else would find it before too long, so I just hung back with my kernel version. SORRY!!! :(

I guess I shouldn't do that next time. Especially considering it is way easier to find bugs when a new release just came out, and there is a new bug due to the changes in that release.

Revision history for this message
In , michiel (michiel-linux-kernel-bugs) wrote :
Download full text (4.6 KiB)

<email address hidden> schreef:
> http://bugzilla.kernel.org/show_bug.cgi?id=12309
>
>
>
>
>
> ------- Comment #189 from <email address hidden> 2009-03-01 01:26
> -------
> (In reply to comment #134)
>
>> Hi.
>>
>> On my laptop(Core2Duo 1.6 ghz) I run my gentoo kernel 2.6.28-gentoo.
>> I didn't have any problems with latency.
>>
>> If I run "dd if=/dev/zero of=file bs=1M count=2048" or "dd if=/dev/zero
>> of=/tmp/test bs=1M count=1M" (I tried to run it as user and also as root),
>> my
>> system works well and I can start firefox, another shell, open dolphin (i'm
>> under kde4-svn) and everything is faster.
>>
>> I have XFS filesystem on my home and reiserfs on root.
>>
>> Since I configured my kernel manually, maybe it could be usefull for someone
>> to
>> have my .config so I'll post it.
>>
>>
>
>
> I have just unmasked, and tried 2.6.28 on Gentoo Linux as well, and the
> problem
> appears to be gone. This is on my D820, which is the one with really bad
> throughput as well. As I am in the process of converting to 64bit on my
> D820,
> I am unable to try GUI stuff out. But, before, during heavy load, I was
> unable
> to switch between terminals very well either. Now, the system is EXTREMELY
> responsive, during these heavy load times, which is what I expect. And, I'm
> getting 82M/sec once the caching limit has been reached, and 256M/sec with
> caching. This is equivalent to what I was getting with 2.6.17.
>
> Now, I don't know if the gentoo guys applied someone's patch from here, as
> comment #52 mentioned patching 2.6.28, but it's working for me now. I'm VERY
> happy about that. :D Based on his description, it very much sounds like the
> Gentoo guys must have applied the patch. I was doing a while loop, with dd,
> increasing the amount of data by 1M at a time. The first few, up to about
> 60M,
> were getting 256M/sec. Then, I noticed in my other terminal, running vmstat,
> the iowait times got pinned to nearly 100%. So, I'm thinking that all those
> dd's that got cached, were finally catching up to the NO LIMIT on cached
> items,
> and causing thrashing in the IO system. That caused a COMPLETE freezeup of
> the
> while loop. Also, during this time, my HD light was going crazy. Then, when
> the io wait times dropped to 0 again (cached items flushed), the loop did a
> few
> more iterations (and my HD light was off), and it started all over again.
> Then, again the loop froze, etc, etc, etc.
>
>
>
Ok, so if that version is working for you of Gentoo, can we compare that
with the vanilla kernel?

Can you send us some system info to compare your kernel config with the
vanilla one?

Can we have a tarball with the following structure? (to make it easy to
diff over it)
--------------------------------------------------
systeminfo.txt
vanilla
    \- config (original config of the vanilla kernel, not yours)
    |- kernel-info.txt
    |- dmesg.txt
    |- lsmod-output.txt
    |- test-report.txt
gentoo-youredition
    \- config (the config file of your kernel version)
    |- dmesg.txt
    |- lsmod-output.txt
    |- test-report.txt
    |- gentoo.patch
--------------------------------------------------

If...

Read more...

Revision history for this message
In , wprins (wprins-linux-kernel-bugs) wrote :

(In reply to comment #16)
> I tried elevator=as on my system, and it did not change the behaviour.
> Copying
> files from external USB to internal encrypted SSD still totally smashes
> interactive performance. So this issue might be unrelated.
>

Note, some SSD's have very poor random-write performance, this can cause stuttering and all sorts of side effects. Anandtech investigated this issue when comparing/reviewing Intel's SSD's vs. parts from OCZ which uses a certain JMicron controller. See here: http://www.anandtech.com/showdoc.aspx?i=3403&p=7
You should probably just read the entire review.

It is therefore possible that your issue has more to do with the behaviour of your SSD during writes than the kernel scheduler or anything else.

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

Working on it now Michiel. I'll try and get that info for 2.6.27, 2.6.28, and vanilla 2.6.28.

ttyl

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

Hmmm, apparently I forgot to try vmstat. The high io wait times are still there, but I haven't been noticing it. I wonder what could have caused me to not notice it now. The performance is way better, even with the high io wait though. I'm not seeing 30 second delays on stuff. Every now and then there's a second or two delay, perhaps five tops. I'll get the info anyhow, and see what the differences are. FYI: This is still on my D820.

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

(In reply to comment #192)
> (In reply to comment #16)
> > I tried elevator=as on my system, and it did not change the behaviour.
> Copying
> > files from external USB to internal encrypted SSD still totally smashes
> > interactive performance. So this issue might be unrelated.
> >
>
> Note, some SSD's have very poor random-write performance, this can cause
> stuttering and all sorts of side effects. Anandtech investigated this issue
> when comparing/reviewing Intel's SSD's vs. parts from OCZ which uses a
> certain
> JMicron controller. See here:
> http://www.anandtech.com/showdoc.aspx?i=3403&p=7
> You should probably just read the entire review.
>
> It is therefore possible that your issue has more to do with the behaviour of
> your SSD during writes than the kernel scheduler or anything else.
>

Well, if that is true, it would have to be a combination of the kernel and my system. Mainly because my system was SUPER fast before I tried upgrading my kernel past 2.6.17. As for my Mac, I don't recall having performance issues while running Mac OS X. Nothing like the article describes anyhow.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

(In reply to comment #189) - #195
> Well, if that is true, it would have to be a combination of the kernel and my
> system. Mainly because my system was SUPER fast before I tried upgrading my
> kernel past 2.6.17. As for my Mac, I don't recall having performance issues
> while running Mac OS X. Nothing like the article describes anyhow.

There is another bug in 2.6.17/18-??, which gives a poor disc performance, while running the SATA controller on a ICH8M (or equal?) platform in compatibility mode, which gives a high i/o wait time too and lets this bug appear.

There are dependencies between cpu-power, disc throughput, task switching time (eg. clocksource) and this bug.

Has someone tried to identify the source of the problem, with the info provided in Comment #168 and Comment #169 ?

There is a comment in the code (blk-core.c @ ~1300)
 /*
  * After dropping the lock and possibly sleeping here, our request
  * may now be mergeable after it had proven unmergeable (above).
  * We don't worry about that case for efficiency. It won't happen
  * often, and the elevators are able to handle it.
  */
But it happens up to 20 times every second during heavy io, causing high io wait times for the writing process (or pdflush) and makes the desktop responsiveness becomes poor. My proof is the real poor desktop responsiveness, when replacing prepare_to_wait_exclusive by msleep_interruptible (see Comment #169). I will be able to spend some more time on this bug in april.

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

Created attachment 20405
info request by Michiel in comment 191

Here's the info you wanted Michiel.

Doing a diff on the config of the bad kernel and the new one reveals this interesting tidbit...

diff -u 2.6.27-gentoo-r8-kernel-config.txt 2.6.28-gentoo-r2-kernel-config.txt
-CONFIG_BLK_DEV_IDEDISK=y
-CONFIG_IDEDISK_MULTI_MODE=y
+CONFIG_IDE_GD=y
+CONFIG_IDE_GD_ATA=y

That must have been what switched me back to using sda. Anyhow, that was obviously a separate issue.

So, my system performance, and io wait times are totally fine during normal system operation. When I do REALLY heavy io, the wait times go up, but the responsiveness is still relatively good. I can start kwrite in about 2-3 seconds. It seems like it is fixed to me. But, I'll still try that patched 2.6.28 and get back to you, to see if it is even better.

Perhaps Andrew Morton was right. Maybe my issue was entirely to do with my SATA issues.

Revision history for this message
In , james (james-linux-kernel-bugs) wrote :

(In reply to comment #196)
> There is another bug in 2.6.17/18-??, which gives a poor disc performance,
> while running the SATA controller on a ICH8M (or equal?) platform in
> compatibility mode, which gives a high i/o wait time too and lets this bug
> appear.
>
> There are dependencies between cpu-power, disc throughput, task switching
> time
> (eg. clocksource) and this bug.

This is interesting, since my notebook has an ICH8M stuck in compatibility mode (no BIOS option). I'll see how it compares to my other notebook with an ATI-IXP chipset.

Revision history for this message
In , heine.andersen (heine.andersen-linux-kernel-bugs) wrote :

Anyone seen this on a non-sata drive ?

If i do a "dd if=/dev/zero of=outfile bs=1M count=50000" on 2.6.28 the load raise to around 8, on 2.6.29-rc5 It never get past 4.

I'm testing on 64bit, ich9 + sata, btw. I tried to install centos 4.7, with kernel 2.6.9.+, and It's just as bad as 2.6.28.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

I have just tested the 2.6.29-rc6. The desktop responsiveness is increased enormous. Especially Firefox is now useable. The problem still exists for me, but it is now not as noticeable as before.

Revision history for this message
In , wprins (wprins-linux-kernel-bugs) wrote :

(In reply to comment #195)
> (In reply to comment #192)
> > It is therefore possible that your issue has more to do with the behaviour
> of
> > your SSD during writes than the kernel scheduler or anything else.
> >
>
> Well, if that is true, it would have to be a combination of the kernel and my
> system. Mainly because my system was SUPER fast before I tried upgrading my
> kernel past 2.6.17. As for my Mac, I don't recall having performance issues
> while running Mac OS X. Nothing like the article describes anyhow.

OK well in that case I absolutely agree it's obviously a software only problem in your case and probably this scheduler kernel issue. (I just wanted to point out for the record so everyone's aware, that there are some SSD hardware combinations that inherently have limitations that will may very well cause similar sluggishness regardless of the kernel/software itself.)

As an aside, high IO wait percentages are after all as far as I understand it not in and of themselves problematic, since high IO wait only means that a process is waiting for IO. This measure will therefore predictably be high when a process is doing heavy substantial IO with a comparatively slow device. Normally however one would expect such IO to not generally negatively affect other processes/general system reponsiveness, *except* if the other processes are also somehow IO hungry in order to proceed and you have some sort of IO resource contention going on, or as appears in this thread, there's actually a scheduling problem which causes processes that are runnable to not receive the CPU when they should, thus resulting in perceived sluggishness.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

I must correct my last post (Comment #200). I was working with VMs the whole day and it is still awful as before.

But there is a big improvement while using firefox.

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

I would agree that -rc6 has for some reason greatly improved system responsiveness under I/O load but there are most certainly still great issues in the block I/O world.

Just now I once again managed to completely wedge up my machine by doing nothing more than copying a few gigabytes of files between drives. Furthermore, Firefox still freezes for several seconds when I first start typing in the location bar as it looks in its history database. Lastly, Evolution still takes several minutes to start and become usable while it's I/O rate is less than 1 MB/s. All in all, things are pretty unusable.

Jens, are you around? I've been asking various distributions and vendors whether they could spare some qualified man-hours to get this problem finally worked out but it seems like you're our best hope. I know you'll be getting at least one case of beer when this is fixed ;)

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

Hi Guys,

My brother has apparently been having the same problem on his computer. I hadn't realized it when I submitted my bug. For him, he has an ICH8 family of chipsets.

The following works for him, and the problem goes away.
echo anticipatory > /sys/block/sda/queue/scheduler

Looks like this may be a tough one to nail down, because everyone's symptoms are slightly different. I'm wondering if perhaps there are multiple issues going on here.

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

Oh, crap, I forgot the details. Before the details, I also wanted to say that I am going to get him to try changing the BIOS option mentioned on the libata page I gave earlier, to see what happens.

[03:05 root@zipper ~]# lspci
00:00.0 Host bridge: Intel Corporation 82P965/G965 Memory Controller Hub (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82G965 Integrated Graphics Controller (rev 02)
00:03.0 Communication controller: Intel Corporation 82P965/G965 HECI Controller (rev 02)
00:19.0 Ethernet controller: Intel Corporation 82566DC Gigabit Network Connection (rev 02)
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Contoller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 02)
00:1c.2 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 3 (rev 02)
00:1c.3 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 4 (rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 5 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801HB/HR (ICH8/R) LPC Interface Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801H (ICH8 Family) 4 port SATA IDE Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 02)
00:1f.5 IDE interface: Intel Corporation 82801H (ICH8 Family) 2 port SATA IDE Controller (rev 02)
02:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6101 single-port PATA133 interface (rev b1)
06:00.0 RAID bus controller: Silicon Image, Inc. SiI 3112 [SATALink/SATARaid] Serial ATA Controller (rev 02)
06:01.0 Mass storage controller: Promise Technology, Inc. 20269 (rev 02)
06:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link)

[03:05 root@zipper ~]# uname -a
Linux zipper 2.6.18-53.el5xen #1 SMP Mon Nov 12 02:46:57 EST 2007 x86_64 x86_64 x86_64 GNU/Linux

[03:09 root@zipper ~]# cat /etc/issue
CentOS release 5 (Final)
Kernel \r on an \m

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

I have noticed, that while working with VMs my system starts swapping after a while. I tried the -rc7 with Mathieus patch (Comment #160) and my system seems to be useable. There is still the non fair io scheduling between processes, but it's another problem. I am using a kernel without "Group CPU Scheduler" and "Control Group Support" and writing this text in firefox at load avg 12.

To reach such high load avg, I have to run eight concurrent dd write operations.

for i in 1 2 3 4 5 6 7 8; do \
  dd if=/dev/zero of=test-$i bs=1M count=4K oflag=direct & echo test-$i; \
done

Copying big files with nautilus makes my system from time to time unusable. With known symptoms such as "Unable to switch desktop" and "mouse freezes".

And finally, I have not seen the complete io freeze with -rc7 kernel on xfs, ext3 and ext4.

Revision history for this message
In , Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote :

Trenton, I too set my kernel to anticipatory scheduler and for a while i thought all was well when I ran dd if=/dev/zero of=~/test bs=1M count=1500 in order to test. Then I realized that its not a reliable testing method since the *anticipatory* can anticipate the coming zeroes that will be written. I ran dd if=/dev/zero of=~/test bs=1M count=1500 simultaniously with the one writing from dev/zero, and realized that the part of the syndrome is fixed with AS, but the problem persists...

Revision history for this message
In , Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote :

argh, forgot to give details too...
running 2.6.28-8-generic kernel (64bit) in ubuntu jaunty. and i had this problem in 32 kernels before aswell.

khaal@Xeraphim:~$ sudo lspci
[sudo] password for khaal:
00:00.0 RAM memory: nVidia Corporation C51 Host Bridge (rev a2)
00:00.1 RAM memory: nVidia Corporation C51 Memory Controller 0 (rev a2)
00:00.2 RAM memory: nVidia Corporation C51 Memory Controller 1 (rev a2)
00:00.3 RAM memory: nVidia Corporation C51 Memory Controller 5 (rev a2)
00:00.4 RAM memory: nVidia Corporation C51 Memory Controller 4 (rev a2)
00:00.5 RAM memory: nVidia Corporation C51 Host Bridge (rev a2)
00:00.6 RAM memory: nVidia Corporation C51 Memory Controller 3 (rev a2)
00:00.7 RAM memory: nVidia Corporation C51 Memory Controller 2 (rev a2)
00:02.0 PCI bridge: nVidia Corporation C51 PCI Express Bridge (rev a1)
00:04.0 PCI bridge: nVidia Corporation C51 PCI Express Bridge (rev a1)
00:09.0 RAM memory: nVidia Corporation MCP51 Host Bridge (rev a2)
00:0a.0 ISA bridge: nVidia Corporation MCP51 LPC Bridge (rev a3)
00:0a.1 SMBus: nVidia Corporation MCP51 SMBus (rev a3)
00:0a.2 RAM memory: nVidia Corporation MCP51 Memory Controller 0 (rev a3)
00:0b.0 USB Controller: nVidia Corporation MCP51 USB Controller (rev a3)
00:0b.1 USB Controller: nVidia Corporation MCP51 USB Controller (rev a3)
00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1)
00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1)
00:10.0 PCI bridge: nVidia Corporation MCP51 PCI Bridge (rev a2)
00:10.1 Audio device: nVidia Corporation MCP51 High Definition Audio (rev a2)
00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
02:00.0 VGA compatible controller: nVidia Corporation G80 [GeForce 8800 GTS] (rev a2)
03:05.0 FireWire (IEEE 1394): Agere Systems FW323 (rev 70)
03:06.0 Multimedia controller: Philips Semiconductors SAA7131/SAA7133/SAA7135 Video Broadcast Decoder (rev d1)
03:07.0 Multimedia audio controller: Creative Labs SB X-Fi
03:09.0 Ethernet controller: Atheros Communications Inc. AR5413 802.11abg NIC (rev 01)

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

I had done some initial testing on my x86_64 box, of 2.6.17 vanilla (downloaded from kernel.org), and it seems to me that it has the problem too. I don't understand why my problem started with 2.6.18 if the vanilla 2.6.17 has the problem. Note that I tested the first 2.6.17, and the last version of 2.6.17. I'm thoroughly confused. I think I'll switch to 2.6.17, and run that for awhile to see if there's better performance overall. Perhaps loading it is not the best way to see if there's latency issues, as there will be some.

Then, if I do see some improvement, I'll increment to 2.6.18. Hopefully, slowly but surely I can figure out which exact kernel as the problem, and then a kernel dev can fix it. That's the plan anyhow. :P

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

Hi I tried new 2.6.28.7 kernel. And things seem to go worse... Even btorrent checking downloaded files is able to lock the computer...

I will upload a new screenshot showing 91.4% of processor time waiting for HD to read data... This is a nonsense... I will try to do same check for evey new kernel that goes out to check for improvements.

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

Created attachment 20464
IWait problem 91,4% 2.6.28.7

Revision history for this message
In , Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote :

Wanted to add even more testing results from my side, tried the suggestions from this source: http://stackoverflow.com/questions/392198/how-to-make-linux-gui-usable-when-lots-of-disk-activity-is-happening
by changing some vm.dirty_ variables. No improvement could be seen neither changing to deadline scheduler didn't improve the situation. I also changed /sys/block/sda/queue/nr_requests to 64 with same unresponsiveness.

i'm still on the same kernel (2.6.28-8) and my fstab mounts the partition with relatime,noatime,nodiratime flags.

Revision history for this message
In , michiel (michiel-linux-kernel-bugs) wrote :

Am currently installing 2.6.29-rc7. Hoping that it will solve some issues on the bug.

Can changing SLAB allocator be an option to test for the problem? We can choose between SLAB/SLUB/SLOB. Maybe that can be helpfull.

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

There's still some confusing comments on IO wait in here, lets clear that up at least. 91% io wait does not mean it's using 91% cpu power for doing the IO, it merely means that some process is BLOCKED waiting for IO 91% of the time. It has zero relevance on cpu cycles consumed. Same goes for the observed load. Having a load of 2.0 due to io wait times does not mean that you have a doubly loaded system. It just means that, on average, two processes are blocked waiting for IO. When you start a bittorrent client and it checks the file data, you would expect io wait to be nearly 100%. It does do some cpu processing, so that's why it's not completely at 100%.

So forget IO wait, it doesn't tell you ANYTHING about whether a system is supposed to be slow or not.

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

And to make a more general comment... This bug is impossible to solve, since it (once again) has degraded into somewhere for everybody to tunnel everything that relates to a system feeling sluggish. There could be at least 10 separate issues described in here, or more. And while some of these are surely things we could do better, some are also certainly expected behaviour. We are at least touching several file systems, mm issues, and io scheduler issues. I'm quite sure that some of the mentioned behaviour is completely due to ext3 sucking at fsync.

I'd LOVE to be able to look into this, but honestly I have no idea where to start. What I would also love is for someone to post a test case that actually works. This includes observed behaviour and a description of what you would EXPECT to see happen. Then we/I should be able to at least judge whether there's something we can do about it. Expecting a fully fluid system while having 100 threads writing data to the device is not reasonable, for instance. But if it behaves significantly worse than previous kernels, then there's still something to look into.

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

I totally agree with you Jens. I have been having a hard time localizing the problem myself. I went back to the 2.6.17 kernel, and it seems to be worse than my 2.6.28 kernel. But keep in mind, I was running i686 when I originally discovered the problem, and now I'm doing x86_64. I think the only way I will be able to localize the issue, is if I restore my system to i686 gentoo, and then trying 2.6.28, then I may start getting somewhere.

I also agree that it is nearly impossible to solve this one without some more concrete data. I wish I had chosen a different time to upgrade to 64bit, because then I could be fiddling with this issue on my i686 still.

I'll post again if I find something more concrete.

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

I will admit that many of my issues seem to be caused by fsync() (I'm on ext4). One of the largest issues I'm currently having is Liferea blocking in fsync() for several seconds every time a new item is selected. During this time kjournald2 is writing, although iotop only shows a total write rate of ~500kB/s. This seems extremely slow and far below the disk's (a 7200 RPM SATA drive) capacity. This low I/O rate is common for all sluggish I/O cases. Does this sound like expected behavior? Perhaps my problems have been caused by just generally slow I/O?

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

The last without a problem kernel was - 2.6.16 (acknowledgement to that is SLES 10 SP2 does not give high iowait on ASUS P5K). So let's look what super-mega-function has appeared in 2.6.17 and was absent in 2.6.16. This function cannot clearly belong to separately taken file system (all file systems are subject to an error). Changes in schedulers between 2.6.16-2.6.17 I has not found out. Introduction libata - a unique difference. Who gives high iowait - itself libata or the infrastructure of its embedding in a kernel practically has no value. Value has only one - the kernel is disabled. Also it is the sad fact.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

I do not mean the fsync problem, which is not a problem in the 29 kernel for me any more. I mean the sluggish behaviour of all gui application. Especially while working with vmware workstation. Suspend and resume time rises from less than two minutes to up to ten minutes. It started for me, when I upgraded from feisty (2.6.20) to gutsy (2.6.22) on a 32-bit Pentium-M.

There is a problem to locate this problem, as it does not appear all the time and there are a lot other problems and many solved problems, which make a comparison very problematic. And my assumption is, that I depend on the cpu, hard drive and user.

The best hint for me was the duration of the process test. I have not committed this test to adjust the kernel to this special test case, as I have seen at LKML. It should help to localize the problem. The results of this tests, seems to fit with the regression of the sluggish behaviour.

See
http://bugzilla.kernel.org/attachment.cgi?id=19797&action=view
CentOS 2.6.18-92.el5 - 29.995s - good
Feisty 2.6.20.21 - 25.304s - good
Gusty 2.6.22-16 - 40.405s - bad
Hardy 2.6.24-23 - 37.604s - bad
Intrepid 2.6.27-9 - 96.922s - unusable

I have seen with powertop, that the number of interrupt was doubled from 200 to 400 for keyboard input, when a high io was running in the background.

And I know there is nothing wrong with a high io wait time, but as soon as the io wait time reaches 100% the desktop becomes sluggish and unusable. You can try this on an installation on a slow disk and ext3, or even on an full encrypted disc. The slow SSD could be related with this bug, as there is a real poor write performance with linux on many SSDs. I have measured transfer rates up(down) to 2MB/s on no direct write (4KB cache splitting), while direct writing gets up to 90MB/s on my SSD. My system on my SSD is completely unusable.

I will execute some tests in a virtual machine, as it's seems to me, that an application running in the virtual machine is more affected by this sluggish behaviour than an application executed on the host. I will run exactly the same vm and test on different host kernels. But I am not able to send some more time before April. Perhaps someone else can starts earlier?

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

Jens,

I'm trying to nail this down on my computer. So, I'm creating a vm of my i686 gentoo system, to see if I can see the same results as I was before.

I used the following command, inside the vm, to extract my system tarball backup of my previous system.

ssh root@192.168.8.4 'gunzip -c /media/backup/system.tar.gz' | tar -xv --exclude './usr/portage/packages/*' --exclude './userportage/distfiles/*' --exclude './var/log/apache2/*' --exclude ./Bonnie.10218 >extract-list.txt

Now, on the host system (192.168.8.4) I am seeing the following...
trenta@tdamac ~/Desktop $ uptime
 01:39:37 up 1:21, 6 users, load average: 20.49, 14.92, 9.35

Obviously I'm getting REALLY sick performance. Normally something linear like a tar extraction does not produce these kinds of issues with performance. Granted that the disk may have to move around a little, but is it that bad?.

Is there some sort of thing I can do, to analyze why this is happening? e.g. something like strace, or something? I ran strace -c on kwrite, during heavy load like this, and it claims that it finished everything in a tenth of a second, even though it took like 30.

So, is there a lower level mechanism I can use to get a fix on what is making processes wait? For example, something that will tell me "kernel function X" is blocking?

Thanks.

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

Hi Ben,

Thank you for the clarification. I think I was really lost on this. I expected the process to wait while IO but then it's supposed that the rest of the system should take the rest of the processor power while it's not. The system seems to hang until IO stops.

So I think best way to proceed is to start to discard problems.

I propose to start with:

    I will try to do CPU intensive with no IO task while other process will write a file with no CPU intensive to check if the first process take the same time to execute under high IO or not.
            Process 1: CPU / No IO
            Process 2: High UI / No CPU
    And measure times...

    Should this test trigger the problem? As no IO for process 1 it should finish almost in the same time than under no load at all. Right?

    Can we discard a ext3 related problem? Test case (Test writing files 1 thread, over ext3 and ext4, reiser, etc) and observe responsivness.

    Can we track if this is a fsync problem? How (commands, test case)?

    How can we test this without making filesystem take part on the tests?

    Can we show differences between kernel 2.6.16 and >=2.6.28? (I will do this today)

    How to measure responsiveness? Can we put a numeric value to this?

Thank you all.

Revision history for this message
In , Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote :

Gonzalo, I think you're giving great question in order for us to establish the cause of the problem. Even though I can't anwer most of your questions (I'm no guru) I think we all should agree on a unified ways to test and measure the responsiveness. Regarding filesystems, I tried ReiserFS, ext3 and ext4 with two terminals running dd if=/dev/zero of=/test1 bs=1M count=1400 and dd if=/dev/urandom of=/tst2 bs=1M count=700 as a test, and they all gave the same sluggish feeling to the system.

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

I agree that those tests let us know that there's a problem, because we see the sluggish behaviour. However, if a kernel dev is not seeing the performance issues on their machines, it won't be very convincing for them. If however, we provide some concrete tests, showing which kernels didn't have the problem, which did, and the test results, then they may be able to get somewhere. That's why I'm hoping someone can chime in and tell us what sorts of tests would be useful, such as I suggested in comment #220.

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

Ok. Here are my firsts tests with 2.6.28.7:

I used a modified version of the ThreadSchedulerTest.cpp that kills the initial timeout. And a dd to simulate high IO loads.

First hypothesis seesm to be broken. High IO loads does not seem affect processing much.

------------------------------------------------------------------
./kernel-test.sh
Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 3362
min:0.008ms|avg:0.010-0.011ms|mid:0.000ms|max:0.000ms|duration:19.791s
Break!
We have Burning CPU with 4855
min:0.006ms|avg:0.010-0.011ms|mid:0.000ms|max:0.000ms|duration:18.754s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 6211
We have Burning CPU with 6212
min:0.007ms|avg:0.010-0.011ms|mid:0.000ms|max:0.000ms|duration:20.265s
DD Finished
 --- Finish ---
Kernel tested: 2.6.28.7-level2crm i686

-----------------------------------------------------------------------

Results says that it takes 2 segs more to complete (Is this relevant for a process that takes ~18-19s to complete).

A curious thing is that I observed no IO Wait was present while doing processing in test 2. Only system processor time.

This also seem to be strange as it should be 100% USER time. System time (correct me if I'm wrong) means that OS is taking lot of time doing scheduling of the threads...

Anyway, I will try to reproduce high iowait times before starting the CPU intensive program to see if we are right.

I will post the test suite in bash. Feel free to add more tests.

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

Created attachment 20489
Initial effort to build an automatic test suite for this bug

Please feel free to add tests or correct what's wrong

Revision history for this message
In , Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote :

Hello Gonzalo, I just ran your testsuit and here is the results:

---------------------------------
khaal@Xeraphim:~/Desktop/test-suite-bug-12309$ sh kernel-test.sh
Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 17986
min:0.006ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:21.873s
We have Burning CPU with 19909
min:0.004ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:17.708s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 21084
We have Burning CPU with 21085
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 12.5488 s, 16.7 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 16.0014 s, 13.1 MB/s
DD Finished
Killing 21085 process
 --- Finish ---
Kernel tested: 2.6.28-8-generic x86_64
khaal@Xeraphim:~/Desktop/test-suite-bug-12309$ 200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 18.6493 s, 11.2 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 18.9091 s, 11.1 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 20.0353 s, 10.5 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 20.1651 s, 10.4 MB/s
-------------------------------------

I'm not really familiar with what it saying, but it did affect the desktop responsiveness. I made a google spread sheet thats open for access to anyone in order to organise test results and see common traits among our systems: http://spreadsheets.google.com/ccc?key=p3aerC-xkjEqvo7BvMHaxXg - there is one thing missing and that is a place to upload the output of these test results, anyone who knows of a service that's like photobucket but for text/console output?

the document open to edit for everyone. Please choose a specific color for you so we keep the readability :-)

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

Created attachment 20491
Initial effort to build an automatic test suite for this bug V2

This fixes the killing of the process (I hope)

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

I will try to explain:

TEST 1: First test does two measures of a CPU intensive program:
 We have Burning CPU with 17986
 min:0.006ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:21.873s
 We have Burning CPU with 19909
 min:0.004ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:17.708s

It takes between 17s - 22s to complete.

The lines like:
209715200 bytes (210 MB) copied, 18.6493 s, 11.2 MB/s

Tells you the throughput of your HD. This throughput is shared between 6 processes that are writing at the same time.

TEST 2. Then tries to do the same thing but with high IO.

Unfortunately I killed the program before finish because High IO finished before than the CPU intensive program. so it seems it is affecting hard to you.

In my computer CPU program finished early.

Can you run it with the new version, please?

NOTE: It writes several 200MB files to your hard disk. Please remove them after tests... it will take 200X6=1200MB of your disk.

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

For me throughput is horrible:
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 14987
min:0.005ms|avg:0.010-0.011ms|mid:0.000ms|max:0.000ms|duration:21.527s
We have Burning CPU with 16371
min:0.005ms|avg:0.010-0.011ms|mid:0.000ms|max:0.000ms|duration:21.833s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 17768
We have Burning CPU with 17769
min:0.007ms|avg:0.010-0.011ms|mid:0.000ms|max:0.000ms|duration:22.777s
200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 64,2187 s, 3,3 MB/s
200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 75,1226 s, 2,8 MB/s
DD Finished
IO Finished before than processing
 --- Finish ---
Kernel tested: 2.6.28.7-level2crm i686
gad@ws-esp16:~$ 200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 76,8811 s, 2,7 MB/s
200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 79,4772 s, 2,6 MB/s
200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 82,0248 s, 2,6 MB/s
200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 82,9147 s, 2,5 MB/s
---------------------------

I forgot to say ext3 filesystem here...

I will try with different kernels from now on.

Revision history for this message
In , james (james-linux-kernel-bugs) wrote :

Results from my notebook:

[james@rhapsody tsb]$ ./kernel-test.sh
Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 3772
min:0.009ms|avg:0.013-0.013ms|mid:0.000ms|max:0.000ms|duration:37.528s
We have Burning CPU with 6762
min:0.011ms|avg:0.013-0.013ms|mid:0.000ms|max:0.000ms|duration:37.351s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 9489
We have Burning CPU with 9490
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 21.1718 s, 9.9 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 38.183 s, 5.5 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 41.1141 s, 5.1 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 45.3742 s, 4.6 MB/s
min:0.007ms|avg:0.012-0.013ms|mid:0.000ms|max:0.000ms|duration:38.801s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 49.0724 s, 4.3 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 50.0517 s, 4.2 MB/s
DD Finished
IO Finished before than processing
 --- Finish ---
Kernel tested: 2.6.29-0.54.rc7.git3.fc10.x86_64 x86_64

Revision history for this message
In , igor.lautar (igor.lautar-linux-kernel-bugs) wrote :
Download full text (7.1 KiB)

Output on kubuntu 8.10 running on EliteBook 8530w.

While running, it felt 'sluggish' but not by much. When copying/unziping big files, I can get 10+ seconds of firefox inactivity.

Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 24021
min:0.004ms|avg:0.018-0.022ms|mid:0.000ms|max:0.000ms|duration:15.861s
We have Burning CPU with 25229
min:0.004ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:15.678s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 27067
We have Burning CPU with 27068
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 15.0066 s, 14.0 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 19.0474 s, 11.0 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 21.9454 s, 9.6 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 22.6718 s, 9.3 MB/s
DD Finished
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 22.9066 s, 9.2 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 23.667 s, 8.9 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Fi...

Read more...

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

gad@ws-esp16:~$ ./kernel-test.sh /mnt/data/gad/
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 8103
min:0.006ms|avg:0.010-0.011ms|mid:0.000ms|max:0.000ms|duration:21.766s
We have Burning CPU with 10098
min:0.007ms|avg:0.010-0.011ms|mid:0.000ms|max:0.000ms|duration:21.275s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 12105
We have Burning CPU with 12106
min:0.007ms|avg:0.010-0.011ms|mid:0.000ms|max:0.000ms|duration:20.630s
200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 34,4896 s, 6,1 MB/s
200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 35,157 s, 6,0 MB/s
200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 37,4852 s, 5,6 MB/s
DD Finished
IO Finished before than processing
 --- Finish ---
Kernel tested: 2.6.28-8-generic i686
gad@ws-esp16:~$ 200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 40,6583 s, 5,2 MB/s
200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 49,9392 s, 4,2 MB/s
200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 51,9306 s, 4,0 MB/s

-----

Filesystem ext4

Revision history for this message
In , igor.lautar (igor.lautar-linux-kernel-bugs) wrote :
Download full text (13.2 KiB)

Seams last comment has double c/p, making it hard to read. Here goes another result (for some reason, I get a bunch of "DD Finished", I didn't want to cut as do not know if its relevant for test - probably not):

Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 7139
min:0.005ms|avg:0.015-0.031ms|mid:0.000ms|max:0.000ms|duration:22.600s
We have Burning CPU with 8947
min:0.004ms|avg:0.014-0.031ms|mid:0.000ms|max:0.000ms|duration:22.342s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 10772
We have Burning CPU with 10773
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 14.7651 s, 14.2 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 16.8547 s, 12.4 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 18.5809 s, 11.3 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 19.6679 s, 10.7 MB/s
DD Finished
DD Finished ...

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

What for you brake disks a finger?

yura@suse:~/Desktop> sh kernel-test.sh
Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 14170
min:0.003ms|avg:0.006-0.007ms|mid:0.000ms|max:0.000ms|duration:4.725s
We have Burning CPU with 14815
min:0.004ms|avg:0.006-0.007ms|mid:0.000ms|max:0.000ms|duration:4.752s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 15470
We have Burning CPU with 15471
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 2,45896 c, 85,3 MB/c
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 4,33352 c, 48,4 MB/c
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 4,51529 c, 46,4 MB/c
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 5,22602 c, 40,1 MB/c
DD Finished
DD Finished
DD Finished
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 5,97021 c, 35,1 MB/c
DD Finished
DD Finished
DD Finished
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 6,38097 c, 32,9 MB/c
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
min:0.003ms|avg:0.006-0.007ms|mid:0.000ms|max:0.000ms|duration:6.047s
DD Finished
IO Finished before than processing
 --- Finish ---
Kernel tested: 2.6.28.5-default x86_64

Revision history for this message
In , mathias.buren (mathias.buren-linux-kernel-bugs) wrote :

$ ./kernel-test.sh
Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 4215
min:0.005ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:14.822s
We have Burning CPU with 5656
min:0.007ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:15.624s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 7403
We have Burning CPU with 7404
200+0 poster in
200+0 poster ut
209715200 byte (210 MB) kopierade, 12,7466 s, 16,5 MB/s
200+0 poster in
200+0 poster ut
209715200 byte (210 MB) kopierade, 15,3423 s, 13,7 MB/s
200+0 poster in
200+0 poster ut
209715200 byte (210 MB) kopierade, 17,363 s, 12,1 MB/s
200+0 poster in
200+0 poster ut
209715200 byte (210 MB) kopierade, 18,3437 s, 11,4 MB/s
200+0 poster in
200+0 poster ut
209715200 byte (210 MB) kopierade, 18,9163 s, 11,1 MB/s
200+0 poster in
200+0 poster ut
209715200 byte (210 MB) kopierade, 19,3732 s, 10,8 MB/s

min:0.005ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:18.564s

IO Finished before than processing
 --- Finish ---
Kernel tested: 2.6.29-rc7-zen2-ARCH-20090309 x86_64

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

I have recognized, that the cpu clock scaling responds sluggish during heavy io. From time to time it stays at lowest clock rate, although there was cpu intensive, but discontinuous, work in other processes. I had just a freeze for 20 seconds during such a state.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

I could move the mouse, but cursor did not change. All panel were working, but I could not move or switch windows.

Revision history for this message
In , Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote :

Gonzalo, is it possible to include the motherboard chipset in the test? It would be interesting to see if everybody who's affected have the same or similiar chipsets... Here's another test result, with 2.5.29 RC7. Still affected by the bug, on ext4.

khaal@Xeraphim:~/Desktop/test-suite-bug-12309-v2$ sh kernel-test.sh
Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 9080
min:0.007ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:23.801s
We have Burning CPU with 14728
min:0.007ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:22.593s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 19811
We have Burning CPU with 19812
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 13.901 s, 15.1 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 15.2808 s, 13.7 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 15.4188 s, 13.6 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 16.1941 s, 13.0 MB/s
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 16.6363 s, 12.6 MB/s
DD Finished
DD Finished
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 17.1937 s, 12.2 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
min:0.004ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:18.957s
DD Finished
IO Finished before than processing
 --- Finish ---
Kernel tested: 2.6.29-020629rc7-generic x86_64

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

Created attachment 20503
Results in ODF for spreadsheet

This shows the information recovered by each of the tests performed.

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

Created attachment 20504
Results in ODF for spreadsheet

This shows the information recovered by each of the tests performed.

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

I uploaded a spreadsheet to show results...

For me a High IO is affecting to the scheduler or processor. Not really much for the tests but it may be important if long processing takes in place.

It's very significative that increment is always about 2 seconds for all included for the tests of Yuriy Lalym where normally should only take 4,7s and the processing time gets incremented in 1,3 secs. Why always around 2 secs?

Also we can see that ext4 does not really seem to be affected. Maybe because throughtput? It would be interesting to know the fs system tested by Khalid Rashid because it take less time to complete under high IO, like for me on ext4.

And what the DD Finished says is that the last IO transfer done finished before thee CPU intensive task. Maybe this also affected the result.

Ok. I will fix the format of the output of the testsuite program and include other tests. Also temp files will be deleted after tests.

What other tests should be included?

I will try to search for the fsync problem to include it in the tests.

Also will try to report motherboard chipset as requested...

Any ideas on what to test?

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

I have one question for the kernel developers...

How many processor time is normal for a dd process using dma?
 I have two hipothesis:
   1.- Kernel is taking to much time getting the process in and out even if it is blocked by IO.
   2.- Is there one lock that prevents the scheduler from running free...

How can I trackdown processor time of a program (say dd)?
   Want to see if times for each kind of process is normal. Current computers are fast and sometimes we do not realize that a process is taking to much time to complete.

Any good ways to profile the kernel looking at only one PID?
   I want to profile specific parts of the kernel. Any good doc?

Thank you all!

I forgot to say. For now don't use the testsuite anymore until new tests are here.

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

Server on Xeon based, internal HDD SATA2 (no RAID), SLES 10 SP2

Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 31607
min:0.004ms|avg:0.013-0.049ms|mid:0.000ms|max:0.000ms|duration:19.071s
We have Burning CPU with 7637
min:0.004ms|avg:0.015-0.057ms|mid:0.000ms|max:0.000ms|duration:21.218s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 15831
We have Burning CPU with 15832
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 1,0195 секунд, 206 MB/s
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 1,04578 секунд, 201 MB/s
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 1,26246 секунд, 166 MB/s
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 1,90053 секунд, 110 MB/s
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 2,19354 секунд, 95,6 MB/s
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 2,22529 секунд, 94,2 MB/s
min:0.003ms|avg:0.014-0.060ms|mid:0.000ms|max:0.000ms|duration:20.705s
IO Finished before than processing
 --- Finish ---
Kernel tested: 2.6.16.60-0.21-smp x86_64

Server on Xeon based, 3-Ware RAID-1 (2 pieces SAS), SLES 10 SP2

Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 22420
min:0.004ms|avg:0.015-0.071ms|mid:0.000ms|max:0.000ms|duration:25.210s
We have Burning CPU with 28763
min:0.004ms|avg:0.018-0.083ms|mid:0.000ms|max:0.000ms|duration:33.232s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 1628
We have Burning CPU with 1629
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 0,335776 секунд, 625 MB/s
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 0,367063 секунд, 571 MB/s
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 0,363934 секунд, 576 MB/s
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 0,430686 секунд, 487 MB/s
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 0,520617 секунд, 403 MB/s
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 0,531063 секунд, 395 MB/s
min:0.004ms|avg:0.014-0.065ms|mid:0.000ms|max:0.000ms|duration:22.025s
IO Finished before than processing
 --- Finish ---
Kernel tested: 2.6.16.60-0.21-smp x86_64

Revision history for this message
In , bpenglase (bpenglase-linux-kernel-bugs) wrote :

bpenglas@PC010233L ~/Desktop/bug $ ./kernel-test.sh
Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 10638
min:0.004ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:14.790s
We have Burning CPU with 13523
min:0.004ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:13.953s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 14793
We have Burning CPU with 14794
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 14.7986 s, 14.2 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 17.6264 s, 11.9 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 19.4253 s, 10.8 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 19.9593 s, 10.5 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 21.898 s, 9.6 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 21.9509 s, 9.6 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
min:0.004ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:14.694s
DD Finished
IO Finished before than processing
 --- Finish ---
Kernel tested: 2.6.29-rc3-zen1-1-07438-g2953ca1 x86_64

Revision history for this message
In , Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote :

Gonzalo, as I stated before I am on ext4 mounted with noatime and nodiratime flags. However, even if my throughput is fast according to the test, my performance takes a big hit during the tests still. I'm considering to reformat my partitions to ext3 so i can get an older kernel running and test how it fares. Also, it would be great to collect the results on one place, I've put one up at http://tinyurl.com/au4fda - feel free to rearrange it to fit your needs.

Well done with the testsuit, and good bughunting everyone :-)

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

(In reply to comment #234)
(In reply to comment #243)

File system - xfs

Revision history for this message
In , bpenglase (bpenglase-linux-kernel-bugs) wrote :

(In reply to comment #244)

Forgot to mention, on this system, all filesystems are EXT3. That is also without my VMs running, and it's my work machine. I'll try to get with VMs running, and also my home box tomorrow(3/13/09).

Revision history for this message
In , bpenglase (bpenglase-linux-kernel-bugs) wrote :

My Work machine:

bpenglas@PC010233L ~/kernel $ ./kernel-test.sh
Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 16034
min:0.004ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:19.169s
We have Burning CPU with 18771
min:0.005ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:17.182s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 21066
We have Burning CPU with 21067
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 21.8451 s, 9.6 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 21.7598 s, 9.6 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 21.9914 s, 9.5 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 24.8323 s, 8.4 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 24.9565 s, 8.4 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 25.6149 s, 8.2 MB/s
DD Finished
DD Finished
DD Finished
min:0.004ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:15.944s
DD Finished
IO Finished before than processing
 --- Finish ---
Kernel tested: 2.6.29-rc3-zen1-1-07438-g2953ca1 x86_64

This is while FireFox is open, Audacious is playing music, and two VMWare Workstation VM's running (Windows Vista, and Windows XP).
All filesystems are EXT3, main system drive is a WD 80gig at 10kRPM, other drive is 250gig 7.2kRPM. All intel Chipset, with an Core2Dou E8200. It's a Dell GX755.

Revision history for this message
In , drees76 (drees76-linux-kernel-bugs) wrote :

Simple test case:

dd if=/dev/zero of=/tmp/bigfile bs=1M count=10000 conv=fdatasync &
sleep 10
time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync

You'd expect the small file to be written fairly quickly - as in a couple seconds at most. But on every system with a recent kernel I've tried this on, it takes 6-45 seconds.

Why the huge range? I'm not sure, but available memory seems to have something to do with it. The more memory in the machine, the larger the smallfile writes.

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #249)
> dd if=/dev/zero of=/tmp/bigfile bs=1M count=10000 conv=fdatasync &
> sleep 10
> time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync

real 0m1.808s
user 0m0.001s
sys 0m0.001s

I don't think this gets to the issue.

Revision history for this message
In , igor.lautar (igor.lautar-linux-kernel-bugs) wrote :

Well, for me it does:

dd if=/dev/zero of=/tmp/bigfile bs=1M count=10000 conv=fdatasync &
sleep 10
time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync

1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 15.8284 s, 0.3 kB/s

real 0m16.024s
user 0m0.004s
sys 0m0.020s

Revision history for this message
In , andre (andre-linux-kernel-bugs) wrote :

(In reply to comment #249)
> dd if=/dev/zero of=/tmp/bigfile bs=1M count=10000 conv=fdatasync &
> sleep 10
> time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync

2.6.28-gentoo-r2 (/tmp on reiser3.6, rootfs-drive):
>4096 bytes (4,1 kB) copied, 10.618 s, 0.4 kB/s
>real 0m10.620s
>user 0m0.000s
>sys 0m0.077s

2.6.28-gentoo-r2 (/tmp on ext4, other drive):
>4096 bytes (4,1 kB) copied, 5,34679 s, 0,8 kB/s
>real 0m5.349s
>user 0m0.000s
>sys 0m0.003s

2.6.27.19-3.2-default (opensuse 11.1) (/tmp on ext3, rootfs):
>4096 bytes (4,1 kB) copied, 60.5764 s, 0.1 kB/s
>real 1m2.827s
>user 0m0.004s
>sys 0m0.036s

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #250)
> real 0m1.808s
> user 0m0.001s
> sys 0m0.001s

My 1.808s was on 2.6.27-gentoo-r8 with XFS on a 3ware 8-drive SATA RAID.

Revision history for this message
In , vaiski (vaiski-linux-kernel-bugs) wrote :

2.6.28.7 w/reiserFS

4096 bytes (4.1 kB) copied, 6.96955 s, 0.6 kB/s

real 0m6.972s
user 0m0.001s
sys 0m0.026s

Revision history for this message
In , drees76 (drees76-linux-kernel-bugs) wrote :

André did you mean to take ownership of this bug away from Jens?

It looks like the test case I posted up earlier seems to be very effective at demonstrating at least one of the issues that is affecting people in this thread (namely people using ext3 or reiserfs).

It appears that xfs and ext4 are better at avoiding these huge latencies - I'm also assuming that the IO scheduler interacts with these filesystems differently.

Matt - I don't think this test case works for you as much because you have such a fast disk array. I imagine that you can write 10GB pretty quickly with an 8-drive array. Try increasing the 10GB to 100GB and increasing the sleep to 20-30 seconds so that you get more data waiting to be flushed to disk.

Revision history for this message
In , Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote :

David, I want to stress that while my earlier test results looked good on my ext4 filesystem, I was still affected by the slow performance. I think we need a (diffrent?) way to measure desktop responsiveness in order to get actual values from there too.

Revision history for this message
In , drees76 (drees76-linux-kernel-bugs) wrote :

Where are your performance numbers from the test case, Khalid, and what is your hardware setup like? André posted numbers in comment #252 on ext4 which are better than his ext3/resiserfs numbers, are still very poor, IMO.

It's fairly obvious that it is likely there are multiple bugs causing similar symptoms and all have been jumbled into this bug report.

Jens has asked for a simple test case illustrating at least one issue discussed in this thread. I have presented one extremely simple test case which duplicates the problems I (and others) am seeing. Feel free to create another.

Revision history for this message
In , andre (andre-linux-kernel-bugs) wrote :

sorry, didn't mean to reassign the bug in the first place

Revision history for this message
In , drees76 (drees76-linux-kernel-bugs) wrote :

I've been doing some testing - two tunables I've found (briefly mentioned earlier) that helps immensely is setting /proc/sys/vm/dirty_background_ratio to 1 and /proc/sys/vm/dirty_ratio to 2.

On some of my systems that I've run the test on it reduces latency down to a fraction of a second - on other systems it reduces it from 20+ seconds to less than 10.

Anyone else see similar behaviour with my simple test?

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

(In reply to comment #259)
> I've been doing some testing - two tunables I've found (briefly mentioned
> earlier) that helps immensely is setting /proc/sys/vm/dirty_background_ratio
> to
> 1 and /proc/sys/vm/dirty_ratio to 2.
>
> On some of my systems that I've run the test on it reduces latency down to a
> fraction of a second - on other systems it reduces it from 20+ seconds to
> less
> than 10.
>
> Anyone else see similar behaviour with my simple test?
>

This is right. Although it doesn't eliminate stutter (mouse freezing for 1-2 seconds) during heavy IO, it does make that stutter tolerable. Its basically converting your IO to almost sync inline instead of leaving the work for later for pdflush to pick up and choke the hell out of the IO subsystem. I have no idea why on larger memory configurations those default values are set so high as 40 and 20 (IIRC). I mean on a 4GB RAM system, we may not see any IO landing until expiry alarms fire in pdflush or 40% of 4GB=1.6G is ready to be written.

Revision history for this message
In , bpenglase (bpenglase-linux-kernel-bugs) wrote :

PC010233L vmware # dd if=/dev/zero of=/tmp/bigfile bs=1M count=10000 conv=fdatasync &
[1] 10528
PC010233L vmware # sleep 10
PC010233L vmware # time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.00333981 s, 1.2 MB/s

real 0m0.054s
user 0m0.000s
sys 0m0.000s
PC010233L vmware #
PC010233L vmware # time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.604249 s, 6.8 kB/s

real 0m3.219s
user 0m0.000s
sys 0m0.000s

The second time I ran the second DD was about 2 minutes later. My / (or /tmp)
is located on a WD 10K RPM SATA II Drive.

And after fixing the dirty ratios....

PC010233L vmware # dd if=/dev/zero of=/tmp/bigfile bs=1M count=10000 conv=fdatasync &
[1] 10548
PC010233L vmware # sleep 10
PC010233L vmware # time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 1.41179 s, 2.9 kB/s

real 0m2.044s
user 0m0.000s
sys 0m0.002s
PC010233L vmware # time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000649804 s, 6.3 MB/s

real 0m6.366s
user 0m0.000s
sys 0m0.002s
PC010233L vmware #

Again, second one was about 2 minutes afterwards.

Revision history for this message
In , drees76 (drees76-linux-kernel-bugs) wrote :

(In reply to comment #261)
Brandon, this test case doesn't seem to reproduce any significant latency issues for you. I suspect that 10k RPM disk is able to write fast enough to keep a significant amount of data from being buffered in memory. 1.5 seconds isn't great, but all my systems are at least 5 times worse than that and often 10-40 times worse.

Do you notice a large latency hit on the system when the large write is running?

Why are you running that second small write afterwards? Was the big write done at that point or not? The latency of your small writes does seem to vary by quite a bit.

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #255)
> Matt - I don't think this test case works for you as much because you have
> such
> a fast disk array. I imagine that you can write 10GB pretty quickly with an
> 8-drive array. Try increasing the 10GB to 100GB and increasing the sleep to
> 20-30 seconds so that you get more data waiting to be flushed to disk.
>

Setting dirty_background_ratio=1 and dirty_ratio=2 had a HUGE effect on my system.

$ dd if=/dev/zero of=/var/tmp/bigfile bs=1M count=100000 conv=fdatasync & sleep 30 ; time dd if=/dev/zero of=/var/tmp/smallfile bs=4k count=1 conv=fdatasync
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 6.96642 s, 0.6 kB/s

real 0m8.590s
user 0m0.000s
sys 0m0.004s

100000+0 records in
100000+0 records out
104857600000 bytes (105 GB) copied, 1354.9 s, 77.4 MB/s

# echo 1 > dirty_background_ratio ; echo 2 > dirty_ratio

$ dd if=/dev/zero of=/var/tmp/bigfile bs=1M count=100000 conv=fdatasync & sleep 30 ; time dd if=/dev/zero of=/var/tmp/smallfile bs=4k count=1 conv=fdatasync
[1] 22718
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.72366 s, 5.7 kB/s

real 0m0.725s
user 0m0.000s
sys 0m0.001s

100000+0 records in
100000+0 records out
104857600000 bytes (105 GB) copied, 359.02 s, 292 MB/s

Revision history for this message
In , bpenglase (bpenglase-linux-kernel-bugs) wrote :

(In reply to comment #262)
> (In reply to comment #261)
> Brandon, this test case doesn't seem to reproduce any significant latency
> issues for you. I suspect that 10k RPM disk is able to write fast enough to
> keep a significant amount of data from being buffered in memory. 1.5 seconds
> isn't great, but all my systems are at least 5 times worse than that and
> often
> 10-40 times worse.
>
> Do you notice a large latency hit on the system when the large write is
> running?
>
> Why are you running that second small write afterwards? Was the big write
> done
> at that point or not? The latency of your small writes does seem to vary by
> quite a bit.
>

The large write took a while to complete (about 10 minutes.. and only got to 5.3gig before I killed it), and yes, VERY degraded performance... took me a while to ssh in and kill it.. as local was almost unusable.

The first small write wasn't when the system started lagging out on me... it was when the ram usage was going up, and cpu usage was going up, so I decided to run it again at a later point just to see.

I can try doing the writes on my 7.2K RPM disc tomorrow when I'm back at work. just need to point the output to a different partition.

Revision history for this message
In , Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote :

David Rees, all my test results are presented here: https://spreadsheets.google.com/ccc?key=p3aerC-xkjEqvo7BvMHaxXg&hl=en and my computer components can be seen here: http://h10025.www1.hp.com/ewfrf/wc/prodinfoCategory?lc=en&cc=se&dlc=sv&product=3387690&lang=sv&

I tried this also on a WD Raptor drive just to ensure that it was not fauly harddrives that was the case, and the symptoms were still present.

Revision history for this message
In , bpenglase (bpenglase-linux-kernel-bugs) wrote :

PC010233L ~ # dd if=/dev/zero of=/home/bigfile bs=1M count=10000 conv=fdatasync &
[1] 22333
PC010233L ~ # sleep 10
PC010233L ~ # time dd if=/dev/zero of=/home/smallfile bs=4k count=1 conv=fdatasync
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 6.27386 s, 0.7 kB/s

real 0m6.275s
user 0m0.000s
sys 0m0.000s
PC010233L ~ # time dd if=/dev/zero of=/home/smallfile bs=4k count=1 conv=fdatasync
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 2.4702 s, 1.7 kB/s

real 0m2.482s
user 0m0.000s
sys 0m0.000s

This was going to /home which is on a 250gig 7200k RPM SATA II drive. Also, even though the second one (ran about a minute or two later) completed quickly.. it was about another 10 secs till I got the prompt back.

Revision history for this message
In , vaiski (vaiski-linux-kernel-bugs) wrote :

(In reply to comment #249)

the same system than in #254 but I changed kernel to latest rc of .29

2.6.29-rc8 w/reiserFS

4096 bytes (4.1 kB) copied, 1.2374 s, 3.3 kB/s

real 0m2.843s
user 0m0.001s
sys 0m0.003s

Revision history for this message
In , mpartap (mpartap-linux-kernel-bugs) wrote :

Just a quick note: i've been having considerable troubles with kernels since 2.6.17 aswell, yet recently run across this article http://kerneltrap.org/node/3000, citing: "Kernel maintainer Andrew Morton has said that he runs his desktop machines with a swappiness of 100"... that made me think if my swappiness of 1 might be not such a good idea. An example of misbehaviour which i was actually crediting to this bug can be seen here:
http://hfopi.org/files/temp/time-trouble.jpg (look at the three different clock times).. This problem was the result of physical memory running full, which was happening a lot (VLC mem leak..) - stalling the system sometimes for hours.
Well setting swappiness .. ah let me quote Andrew: "I'm gonna stick my fingers in my ears and sing 'la la la' until people tell me 'I set swappiness to zero and it didn't do what I wanted it to do'." .. well here i am. To all of you: setting swappiness too extremely low values is a bad idea and won't achieve what you expect it to. So that might actually be your problem if you have done so; echo 100 > /proc/sys/vm/swappiness and make the test.

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #268)

I see the problem, and I've never touched 'swappiness'.

$ cat /proc/sys/vm/swappiness
60

Actually, I have no swap at all.

# swapon -s
swapon: /proc/swaps: No such file or directory

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

Well Mathieu Desnoyers did a fix for the write-cache accounting which solves the kernel write-cache to eat up all available memory + swap. Witout the fix the slowness is solved by setting swappiness to 0 or disabling swap, The fix is afaik not in 2.6.29.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

So, we have one guy (#268) saying high swappiness will solve the problem and the other guy (#270) saying setting swappiness to 0 will solve the problem. I have a feeling neither is going to work, because I have run my system with both and this bug appears under high IO load in both cases. But I would like to see what others find.

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

The swappiness setting is irrelevant to this bug, as it is a disk io problem no matter which way you look at it. Yes, if you are swapping, this bug will cause the system to be even slower.

p.s.
I'm convinced that swap is evil. I just disable my swap, and my system works much better, especially when I get a run-away memory hog process.

Revision history for this message
In , awebers (awebers-linux-kernel-bugs) wrote :

Hi:

A) if you have multipple harddrives
- they are not equally affected
- if you copy a file (e.g. 7 Gig) from drive A to drive B, a job running on drive C is not slowing down, accept, if perhas a swapfile is used.

A job, in my case, is a vmware virtual machine
I was spreading machines over different harddrives to reduce the trouble.

B) isn't this slowdown a planed action of the system:

About /proc/sys/vm/dirty_ratio
> Note that all processes are blocked for writes when this happens
(see below, original text)
This is what slows everything down.

IMHO, it should be:
If "dirty_ratio" is reached, slow down the job that is creating
so much "dirt" and leave the other ones alone.

cut out from http://www.westnet.com/~gsmith/content/linux-pdflush.htm

8< -------------------

Process page writes
There is another parameter involved though that can spill over into management of user processes:

/proc/sys/vm/dirty_ratio (default 40): Maximum percentage of total memory that can be filled with dirty pages before processes are forced to write dirty buffers themselves during their time slice instead of being allowed to do more writes.

Note that all processes are blocked for writes when this happens, not just the one that filled the write buffers. This can cause what is perceived as an unfair behavior where one "write-hog" process can block all I/O on the system. The classic way to trigger this behavior is to execute a script that does "dd if=/dev/zero of=hog" and watch what happens. See Kernel Korner: I/O Schedulers for examples showing this behavior.

8< -------------------

Reference:
http://www.westnet.com/~gsmith/content/linux-pdflush.htm

Does someone have an idea how to slow down the IO-heavy job (automatically) ?
If the throughput of dd, rsync or "whatever" is reduced, the moment
a triggervalue is reached, the problem would be only for dd, rsync, ...
and not for the rest of the system.

Revision history for this message
In , awebers (awebers-linux-kernel-bugs) wrote :

Hi again:

My test is to throttle the bandwith using "rsync --bwlimit=<throughput>"

I am testing using vmware on /images3.
Vmware runs fluent until I copy a lot (7Gig vmdk-file) to /images3, which
is a separat harddrive on which 5 vmware systems are having their .vmdk-files.
Copying this 7Gig file freezes the vmware systems for > 30 seconds.

And now with limited bandwith ...

all jobs run fine, no hangig or else:
rsync --bwlimit=10000 /images5/vmware/vlab03/STD_XP_Prof.vmdk /images3/test
rsync --bwlimit=20000 /images5/vmware/vlab03/STD_XP_Prof.vmdk /images3/test

some jobs start to become slow and hang:
rsync --bwlimit=30000 /images5/vmware/vlab03/STD_XP_Prof.vmdk /images3/test

a lot of jobs hangs and are very slow, some freeze:
rsync --bwlimit=40000 /images5/vmware/vlab03/STD_XP_Prof.vmdk /images3/test

This is my estimation:
rsync is creating more dirt than the "kernel" can get rid off and
the system is put into this "processes are blocked for writes" (see previous posting) mode.

I hope that my input can help.

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

Created attachment 20656
vmstat with high # of uninterruptible processes

I just had a hang for about 10-15 minutes. My system started to freeze, so I immediately switched to a console, and ran "vmstat 1" (see attachment).

I sat there and watched it, as I wanted to catch it immediately after it became usable again, so that I could check the load average.

uptime
 23:38:18 up 6 days, 4:49, 8 users, load average: 23.30, 26.12, 16.21

23, with a 5 minute load average of 26 OUCH.

I have no swap, and I think the problem happened when one of my processes did something to lock up the machine. But, take note how many processes are blocked in UNINTERRUPTIBLE sleep at various times...

I think I also realized something very interesting about this bug. It does not occur as readily when you have a fast disk. As I had mentioned in previous comments, my macbook and my D820 have the same hardware. Well I'm rarely experiencing this on my D820 now. The only difference I can see, related to IO, is that the D820 just had a 320G 80M/s drive put into it. My Macbook runs at approximately 20-25M/s.

Also, given that I am pretty sure that one of my processes hanged the machine, it seems (though I am not a kernel hacker) like this bug may be related to a wait on a mutex or semaphore in a location that it should not be, hence the high number of uninterruptible processes? Could that be?

Revision history for this message
In , drees76 (drees76-linux-kernel-bugs) wrote :

There has been more discussion on LKML related to this issue attached to the 2.6.29 kernel release thread. I'll direct interested parties to this post from Ted Tso:

http://lkml.org/lkml/2009/3/24/227

Attached to that post is Ted's fsync latency measuring tool. If people have a workload which generates high latency, this tool may be useful for measuring it and then posting that workload to Ted/LKML.

His testing tool doesn't do anything much different than my earlier dd test, except that he writes 1MB of data which may show higher latencies.

For those interested, I picked up a couple other workarounds for people this is affecting:

1. Mount ext3 in writeback instead of ordered. This has the drawback of leaving your data a bit more vulnerable than default, but now data writes won't be forced to be completed in order with meta data.

2. Increase IO priority of kjournald:
for i in `pidof kjournald` ; do ionice -c1 -p $i ; done
One theory is that by default kjournald is fighting for IO priority with normal processes. By making the IO priority of kjournald higher, the "important" data (IE, data that is getting synced to disk) should get written out faster reducing user visible latency. See this post/thread for more detail: http://lkml.org/lkml/2008/10/2/205

Revision history for this message
In , nalimilan (nalimilan-linux-kernel-bugs) wrote :

I've tested the second workaround posted by David above (high IO priority of kjournald), and it definitely improves things in my case. My test is very simple: doing normal upgrades under Ubuntu (esp. kernel packages) always make Firefox and even Evolution or the whole desktop freeze for several seconds, up to about 20 sec in some cases. With that workaround, the freezes don't last more than ~1 sec; the desktop experience is not really smooth, but I can work during upgrades.

So I guess we can track down at least a specific issue here, which may be the major one affecting desktop boxes, and which seems to have appeared (maybe in different ways) between 2.6.17 and 2.6.28. I'm using a fairly basic Toshiba Satellite laptop with 512 MB of RAM and a 4200 rd/min HD.

Can anybody confirm that too?

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

Ok. I'm also testing the kjournald option to see if it improves. I will post after some testing...

I want to include the fsync tests you pointed out. I tested it and gave me:
fsync time: 0.0145
fsync time: 0.0205
fsync time: 0.0221
fsync time: 0.0195
fsync time: 0.0177
fsync time: 0.0702
fsync time: 0.0456
What's the correct way to do reliable tests? I will include it in the test suite.

Revision history for this message
In , jonathan.bower (jonathan.bower-linux-kernel-bugs) wrote :

The kjournald option makes my system much more responsive.

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

Hi Guys,

After reading those LKML messages from Theoodre, regarding his sync patches, it gave me an idea. Why not just mount my filesystem with "sync" mount option.

I run the following command on one console...
dd if=/dev/zero of=/tmp/bigfile bs=1M count=10000

And Theodore's fsync-test on another. On the standard test, WITHOUT mounting with sync, I get these results out of Theodore's test...

fsync time: 1.5693
fsync time: 18.8047
fsync time: 21.2672
fsync time: 18.6747
fsync time: 2.3821
fsync time: 2.0494
fsync time: 2.8781
fsync time: 21.6300

Here's a "vmstat 1" snipette. All the lines while the dd is running are roughly the same.
 2 9 380388 16716 33412 1409988 0 0 0 15340 806 1188 3 4 0 93
 0 8 380388 15748 33428 1411080 0 0 0 16284 1165 2350 7 8 0 85
 0 9 380388 16620 33432 1409752 0 0 0 18240 878 1108 5 3 0 92
 1 8 380388 16776 33452 1410108 0 0 0 11888 1046 1140 10 8 0 82

When I do the following...
mount -o remount,rw,sync /dev/s/sys /

I get the following benches while running the same dd command...
fsync time: 0.0067
fsync time: 0.0369
fsync time: 0.0208
fsync time: 0.0099
fsync time: 0.1175
fsync time: 0.0337
fsync time: 0.0003
fsync time: 0.0219
fsync time: 0.0110
fsync time: 0.0142
fsync time: 0.0076
fsync time: 0.0146
fsync time: 0.0153
fsync time: 0.1104
fsync time: 0.0061
fsync time: 0.0003

With "vmstat 1" snippet of ...
 1 0 380624 1112236 93104 297252 0 0 0 13056 920 1167 5 3 49 43
 0 1 380624 1098212 93252 311044 0 0 0 15876 925 1165 5 4 52 38
 1 2 380624 1085796 93408 323296 0 0 0 13800 996 1239 10 4 47 38

Did something in the kernel change a couple years ago, in regard to syncing?

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

Just an FYI, there was some mm/msync.c "fsync" related changes between 2.6.16.62 and 2.6.17 vanilla. I didn't see the problem until after 2.6.17, but perhaps gentoo had patched the kernel heavily, I don't know. I'll try and do some more diffs between the kernel versions around the time I started having the problem, in case it can help you guys figure it out.

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

From first 2.6.17 release to first 2.6.18 release (haven't narrowed it down to exact versions), 3 PF_SYNCWRITE related lines have been removed from mm/msync.c.

And some PF_SYNCWRITE related stuff in block/cfq-iosched.c was added in 2.6.17 (diff between 2.6.16.62 and 2.6.17), and then removed in 2.6.18.

There's also fs/ sync related stuff between 2.6.16.62 and 2.6.17.

I hope I'm not spamming. :P

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

(In reply to comment #280)
> Hi Guys,
>
> After reading those LKML messages from Theoodre, regarding his sync patches,
> it
> gave me an idea. Why not just mount my filesystem with "sync" mount option.

what are the disadvantages of sync mount option? reduced b/w? higher latency? data you posted does show any disadvantages or may be I don't know what to conclude from that data?

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

(In reply to comment #280)
> Hi Guys,
>
> After reading those LKML messages from Theoodre, regarding his sync patches,
> it
> gave me an idea. Why not just mount my filesystem with "sync" mount option.

what are the disadvantages of sync mount option? reduced b/w? higher latency? data you posted doesn't show any disadvantages or may be I don't know what to conclude from that data?

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

(In reply to comment #284)
> (In reply to comment #280)
> > Hi Guys,
> >
> > After reading those LKML messages from Theoodre, regarding his sync
> patches, it
> > gave me an idea. Why not just mount my filesystem with "sync" mount
> option.
>
> what are the disadvantages of sync mount option? reduced b/w? higher latency?
> data you posted doesn't show any disadvantages or may be I don't know what to
> conclude from that data?

It appears that the overall transfer rate has decreased a tiny bit. But, the
big advantage of not doing "sync" on mount, is that the system can queue the
writes. So, for anything that fits into kernel queues, the writes appear way
faster to the user. That's my understanding of the difference between sync and
not using sync.

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

Oh, I should have given an example. Normally, when doing a dd of say 10M, your write would be several hundred MEGABYTES per second, because it's writing to memory, not disk. In my case, I only get disk speeds, even with 10M. So yeah, the memory queueing is WAAAAY faster until you reach the limit.

One last thing, for the kernel devs, as this may be important...
The comment in 2.6.28's version of msync.c is as follows...

/*
 * MS_SYNC syncs the entire file - including mappings.
 *
 * MS_ASYNC does not start I/O (it used to, up to 2.5.67).
 * Nor does it marks the relevant pages dirty (it used to up to 2.6.17).
 * Now it doesn't do anything, since dirty pages are properly tracked.
 *
 * The application may now run fsync() to
 * write out the dirty pages and wait on the writeout and check the result.
 * Or the application may run fadvise(FADV_DONTNEED) against the fd to start
 * async writeout immediately.
 * So by _not_ starting I/O in MS_ASYNC we provide complete flexibility to
 * applications.
 */

This is an interesting comment. Mainly because there was some logic based on MS_SYNC, that was removed from msync.c, in 2.6.18 (as I mentioned at the TOP of comment #282). That code would set the PF_SYNCWRITE flag. The code exists in 2.6.17 but not 2.6.18. I haven't checked if it was the 2.6.18 change that did it, or a previously 2.6.17.x change.

Is this a problem kernel devs???????

Revision history for this message
In , amaury.deganseman (amaury.deganseman-linux-kernel-bugs) wrote :

I have the same result here when mouting with the "sync" option.

I try also async and ionice -c1 'pidof kjournald' and doesn't seems to improve latency measured by fsync-tester.

(In reply to comment #280)
> Hi Guys,
>
> After reading those LKML messages from Theoodre, regarding his sync patches,
> it
> gave me an idea. Why not just mount my filesystem with "sync" mount option.
>
> I run the following command on one console...
> dd if=/dev/zero of=/tmp/bigfile bs=1M count=10000
>
> And Theodore's fsync-test on another. On the standard test, WITHOUT mounting
> with sync, I get these results out of Theodore's test...
>
> fsync time: 1.5693
> fsync time: 18.8047
> fsync time: 21.2672
> fsync time: 18.6747
> fsync time: 2.3821
> fsync time: 2.0494
> fsync time: 2.8781
> fsync time: 21.6300
>
> Here's a "vmstat 1" snipette. All the lines while the dd is running are
> roughly the same.
> 2 9 380388 16716 33412 1409988 0 0 0 15340 806 1188 3 4 0
> 93
> 0 8 380388 15748 33428 1411080 0 0 0 16284 1165 2350 7 8 0
> 85
> 0 9 380388 16620 33432 1409752 0 0 0 18240 878 1108 5 3 0
> 92
> 1 8 380388 16776 33452 1410108 0 0 0 11888 1046 1140 10 8 0
> 82
>
> When I do the following...
> mount -o remount,rw,sync /dev/s/sys /
>
> I get the following benches while running the same dd command...
> fsync time: 0.0067
> fsync time: 0.0369
> fsync time: 0.0208
> fsync time: 0.0099
> fsync time: 0.1175
> fsync time: 0.0337
> fsync time: 0.0003
> fsync time: 0.0219
> fsync time: 0.0110
> fsync time: 0.0142
> fsync time: 0.0076
> fsync time: 0.0146
> fsync time: 0.0153
> fsync time: 0.1104
> fsync time: 0.0061
> fsync time: 0.0003
>
> With "vmstat 1" snippet of ...
> 1 0 380624 1112236 93104 297252 0 0 0 13056 920 1167 5 3 49
> 43
> 0 1 380624 1098212 93252 311044 0 0 0 15876 925 1165 5 4 52
> 38
> 1 2 380624 1085796 93408 323296 0 0 0 13800 996 1239 10 4 47
> 38
>
> Did something in the kernel change a couple years ago, in regard to syncing?

Revision history for this message
In , Adriaan.van.Kessel (adriaan.van.kessel-linux-kernel-bugs) wrote :

@ #286: no the msync.c :: IMHO MS_[A]SYNC is _not_ related.
With the introduction of the Unified (disk) Buffer Cache, msync(MS_ASYNC) became basically a no-op. Every process will see the same contents for a block, whether it uses read() or mmap() to access it. Other unices (without UBC) may behave differently. For MS_SYNC, the situation is more complicated. (IIUC: it is hard to wait for all pages to have been written if other processes may re-dirty them simultaneously)

This bug / issue is not about throughput, it is about latency and (lack of) responsiveness (of other, unrelated processes).

BTW, to me it seems there are actually two symptoms:
1) initially, the mouse cursor is stuck ("stuck/jerky mouse syndrome")
2) later on, the cursor gets quicker, but the actions (pop-ups, window focus, ...)
   are still slow.

(1) can be associated with CPU scheduling, unix-domain socket-I/O, maybe even pagefaulting of X's code segments.
(2) can be associated with CPU scheduling, pagefaulting of code, or memory shortage ( -->> pagefaulting + induced writing of dirty pages)

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

File system - xfs (mounted with options by default)
dd if=/dev/zero of=test.img bs=572041216 count=1

Kernel 2.6.28.8

# time (cp test.img test1.img && sync)
real 0m7.372s
user 0m0.021s
sys 0m1.152s

Kernel 2.6.29

# time (cp test.img test2.img && sync)
real 0m13.704s
user 0m0.016s
sys 0m1.060s

Revision history for this message
In , valentyn+_= (valentyn+-linux-kernel-bugs) wrote :

This bug is present at least from 2.6.15 and up, so it's older than the 2.6.18 (with question mark) reported in this bug.

@breezer:~$ { sleep 5; dd if=/dev/zero of=/tmp/bigfile bs=1M count=5000 conv=fdatasync ; } & /tmp/fsync-tester
[1] 4946
fsync time: 0.0188
fsync time: 0.0142
fsync time: 0.0142
fsync time: 0.0142
fsync time: 0.0143
fsync time: 9.2283
fsync time: 12.0892
fsync time: 11.9867
fsync time: 17.6123
fsync time: 13.5469

I've seen sync times up to 20 seconds.

This is Ubuntu 6.06LTS, 2.6.15-53-686 kernel. I am seeing this behaviour on various machines with different hardware. It is a real problem for NFS servers in combination with clients that run Firefox 3.

Revision history for this message
In , drees76 (drees76-linux-kernel-bugs) wrote :

Anyone willing to do some before and after tests? It looks like the huge filesystem thread has produced some results and latency during large writes should be much better now with 2.6.30-rc1 + Theodore Ts'o's ext3-latency-fixes.

http://lkml.org/lkml/2009/4/8/760

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

Look at the difference in disk throughput when running with dirty_background_ratio=0 and dirty_ratio=0:
http://img9.imageshack.us/img9/811/fsyncgraph00.png

versus with dirty_background_ratio=40 and dirty_ratio=80:
http://img154.imageshack.us/img154/9427/fsyncgraph4080.png

Both images are graphs of vmstat output during this command:
dd if=/dev/zero of=bigfile bs=1M count=20k conv=fdatasync

I collected this data in single-user mode so no other processes were touching the disk.

Do note the fairly steady throughput in the first case, in stark contrast with the huge burst at the beginning and end of, and slowness throughout, the second case.

In case anyone missed the point, it took 55 seconds to write 20 GB with dbr=0,dr=0 and 593 seconds to write 20 GB with dbr=40,dr=80. For some reason, the page cache appears to be really gumming up the works.

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

Hi David,

I would be willing to do some before/after testing. But it may be a couple of days, at least, before I can. When is 2.6.30 going to be released?

Also, I have a local Linus git tree. How do I update it with the latest git, or do I have to re-clone the entire thing again?

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Before you make the comparison tests, you should ensure that you use the same journal mode with ext3. The default ext3 journal mode was changed to writeback as default mode.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Is there a reliable testcase for the latency issue?

Revision history for this message
In , dixlor (dixlor-linux-kernel-bugs) wrote :

How about XFS, JFS, Reiserfs ???

Revision history for this message
In , jgardiazabal (jgardiazabal-linux-kernel-bugs) wrote :

I'm using XFS, and I have the same latency problems.
I've been checking this thread, and testing the proposed ideas, without success.
if you want me to test something, I'll happily do it.

Cheers,

Jose

(In reply to comment #296)
> How about XFS, JFS, Reiserfs ???

Revision history for this message
In , bart (bart-linux-kernel-bugs) wrote :

Same here using XFS on a multi disk (8) volume and seeing high IO waits.

Revision history for this message
In , drees76 (drees76-linux-kernel-bugs) wrote :

For anyone who wants to test, here's what to do:

1. Document latencies with current setup which is performing poorly.
2. Document latencies with 2.6.30-rc1 (which should be much better for most people - make sure that if you are using ext3, that you mount your filesystem with the same journalling mode, as the default has changed)

To document latencies, start a large streaming write:

# dd if=/dev/zero of=/tmp/bigfile bs=1M count=5000

And run Ted Tso's latency testing tool in parallel (grab/compile it from here: http://lkml.org/lkml/2009/3/24/227)

If you still have questions, read the last 50 or so comments to this bug for more information.

Revision history for this message
In , dixlor (dixlor-linux-kernel-bugs) wrote :

(In reply to comment #299)
> For anyone who wants to test, here's what to do:

# uname -a
Linux amd64 2.6.29.1 #4 SMP PREEMPT Fri Apr 3 07:27:52 MSD 2009 x86_64 x86_64 x86_64 GNU/Linux

# cat /proc/meminfo | grep MemTotal
MemTotal: 4127376 kB

#cat /proc/cpuinfo | grep -i "Model name" | uniq

model name : Dual Core AMD Opteron(tm) Processor 265

# cat /proc/mounts | grep ' / '

/dev/sda2 / xfs rw,noatime,nodiratime,relatime,noquota 0 0

# hdparm -i /dev/sda | grep Model

Model=WDC WD1500AHFD-00RAR5, FwRev=21.07QR5, SerialNo=WD-WMAP43732535

/* Western Digital Raptor */

# dd if=/dev/zero of=./bigfile bs=1M count=5000 && ./fsync-tester
5000+0 records in
5000+0 records out
 5242880000 bytes (5,2 GB) copied, 69,7789 s, 75,1 MB/s

fsync time: 0.0076
fsync time: 0.0091
fsync time: 0.0436
fsync time: 0.0359
fsync time: 0.0359
fsync time: 0.0359
fsync time: 0.0358
fsync time: 0.0359
fsync time: 0.0359
fsync time: 0.0359
fsync time: 0.0359
fsync time: 0.0358
fsync time: 0.0359
fsync time: 0.0358
fsync time: 0.0359
fsync time: 0.0359
fsync time: 0.0359
fsync time: 0.0359

^C

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #300)
> # dd if=/dev/zero of=./bigfile bs=1M count=5000 && ./fsync-tester

That's supposed to be a single ampersand, which causes the dd process to start in the background so the fsync-tester process can run simultaneously with it.

Revision history for this message
In , dixlor (dixlor-linux-kernel-bugs) wrote :

(In reply to comment #301)
> ...to start in the background ...

dd if=/dev/zero of=./bigfile bs=1M count=5000 & ./fsync-tester;
[1] 5298
fsync time: 0.0266
fsync time: 0.7677
fsync time: 0.6938
fsync time: 0.5879
fsync time: 1.1956
fsync time: 0.9582
fsync time: 0.9866
fsync time: 1.1833
fsync time: 0.6964
fsync time: 0.9986
fsync time: 0.9624
fsync time: 0.9093
fsync time: 0.9999
fsync time: 0.4423
fsync time: 0.8406
fsync time: 1.0880
fsync time: 0.1754
fsync time: 0.9039
fsync time: 0.8727
fsync time: 0.1261
fsync time: 0.2749
fsync time: 0.8547
fsync time: 0.5241
fsync time: 0.8164
fsync time: 0.4006
fsync time: 0.6532
fsync time: 0.8521
fsync time: 0.4151
fsync time: 0.3384
fsync time: 0.3326
fsync time: 0.4330
fsync time: 0.5800
fsync time: 0.8854
fsync time: 0.5953
fsync time: 0.3899
fsync time: 0.6722
fsync time: 0.1056
fsync time: 0.5554
^C

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Created attachment 20972
fsync tester kernel 17 - 30

I have tested the kernels 17, 18, 20, 28, 29, 29 (patched with http://bugzilla.kernel.org/attachment.cgi?id=20172) and 30 (f4efdd65b754ebbf41484d3a2255c59282720650), which should include the patches.

I got great results with the patched 29 kernel at the beginning and bad results, while executing the test again. This test case is not reliable, or my installation is changing parameters while switching the kernels.

I have executed the two commands concurrent (Comment #299).
dd if=/dev/zero of=./bigfile bs=1M count=5000 & ./fsync-tester

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

ASUS P5K
linux suse 2.6.29-53-default x86_64

# cat /proc/meminfo | grep MemTotal
MemTotal: 8196428 kB

# cat /proc/cpuinfo | grep -i "Model name" | uniq
model name : Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz

# cat /proc/mounts | grep ' /home '
/dev/sda3 /home xfs rw,attr2,noquota 0 0

# hdparm -i /dev/sda
Model=ST31000340AS /* Seagate SATA2 */
UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6

~> dd if=/dev/zero of=./bigfile bs=1M count=5000 & ./fsync-tester
[1] 5346
setting up random write file
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB) copied, 90.9677 s, 57.6 MB/s
done setting up random write file
starting fsync run
starting random io!
fsync time: 1.0965s
fsync time: 0.4574s
fsync time: 0.7729s
fsync time: 0.3746s
fsync time: 0.5232s
fsync time: 0.1928s
fsync time: 0.9374s
fsync time: 0.6353s
fsync time: 0.3625s
fsync time: 0.4970s
fsync time: 0.3150s
run done 11 fsyncs total, killing random writer
[1]+ Done dd if=/dev/zero of=./bigfile bs=1M count=5000

~> vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r b swpd free buff cache si so bi bo in cs us sy id wa
 1 13 0 38868 164 7778824 0 0 12 23407 959 1940 1 3 0 95
 1 13 0 47144 164 7770084 0 0 0 26260 1435 2732 2 3 0 95
 0 13 0 39740 164 7774280 0 0 60 30724 1534 2860 2 4 0 94
 0 13 0 41124 164 7776080 0 0 0 13888 1103 2038 2 3 0 95
 0 13 0 42460 164 7768056 0 0 0 52248 1320 2334 2 3 0 95
 1 13 0 40456 164 7776908 0 0 0 3028 1058 1934 2 3 0 95

At the moment of performance of the test operation with graphic interface KDE is impossible

Revision history for this message
In , todorovic.s (todorovic.s-linux-kernel-bugs) wrote :

Just tried dd if=/dev/zero of=bigfile bs=1M count=20k conv=fdatasync on 2.6.30-rc2 and top still shows iowait of 70% to 90%, on ext3 filesystem.

Motherboard: Gigabyte M57SLI-S4
Distro: Slamd64 12.2

$ cat /proc/meminfo | grep MemTotal
MemTotal: 3089672 kB

$ cat /proc/cpuinfo | grep -i "Model name" | uniq
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4400+

sda:

 Model=WDC WD5000AAKS-00TMA0, FwRev=12.01C01, SerialNo=WD-WCAPW4009869
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6

I believe the ext3 partition was mounted with data=writeback option, but can reboot and confirm if it is important enough.

Revision history for this message
In , drees76 (drees76-linux-kernel-bugs) wrote :

(In reply to comment #304)
> ASUS P5K
> linux suse 2.6.29-53-default x86_64

You're running a kernel that is known to have high write latencies, and it doesn't appear that your fsync latency test is running in parallel with the dd. With 8GB of RAM, you likely need to change your dd to write out at least 10GB of data instead of 5GB.

(In reply to comment #305)
> Just tried dd if=/dev/zero of=bigfile bs=1M count=20k conv=fdatasync on
> 2.6.30-rc2 and top still shows iowait of 70% to 90%, on ext3 filesystem.

Your system *should* show high iowait when you're stress testing it like that. If it doesn't, you're not writing to disk as fast as it can handle it.

High iowait is normal and expected. It is not an indication of a problem.

What is not expected is high latency during those stress tests.

Ideally you should see sync latencies of less than a second - if latencies get higher than that you are likely using ext3 data=ordered or a broken kernel.

2.6.30-rc2 was just released - that should be used for future tests.

Revision history for this message
In , todorovic.s (todorovic.s-linux-kernel-bugs) wrote :

2.6.30-rc2

fsync-tester shows mostly < 1 second, except a few times when it goes just above 1 sec.

fsync time: 0.1964
fsync time: 0.2317
fsync time: 0.2923
fsync time: 0.0565
fsync time: 1.1033
fsync time: 0.2297
fsync time: 0.0124

fsync time: 0.0848
fsync time: 0.1049
fsync time: 0.6525
fsync time: 11.1130 <--- not sure what that was
fsync time: 2.2619
fsync time: 0.3535
fsync time: 0.1543
fsync time: 0.2699

Unfortunately, the load average shoots up, peaking at about 8 before I run out of space on the disk. System responsiveness is also affected, but don't have a meaningful measurable quantity.

top - 21:41:06 up 16 min, 6 users, load average: 7.23, 5.93, 3.98
top - 21:42:19 up 17 min, 7 users, load average: 8.12, 6.53, 4.34

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r b swpd free buff cache si so bi bo in cs us sy id wa
 0 9 0 19428 10752 2681252 0 0 180 13957 1344 497 2 9 30 59
 1 8 0 20100 10780 2680416 0 0 0 47644 2883 1290 2 12 0 86
 0 9 0 18908 10816 2681888 0 0 0 22528 2819 858 2 11 0 88
 0 10 0 20116 10828 2680952 0 0 4 25080 2865 781 2 7 0 92
 0 9 0 18900 10844 2682280 0 0 4 32696 3496 835 0 11 0 90
 0 9 0 19040 10876 2681736 0 0 0 29936 3060 1064 1 10 0 89
 2 8 0 18880 10892 2680868 0 0 4 47736 2954 731 0 7 0 92
 0 9 0 18180 10920 2681448 0 0 0 44160 2723 971 0 13 0 87

/dev/sda4 /home ext3 rw,relatime,errors=continue,data=writeback 0 0

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :
Download full text (3.7 KiB)

Hi all!

I just ran the tests and obtained this:

######################################################
gad@ws-esp16:~$ ./kernel-test2.sh
Using current dir to do IO tests
####################
## System info
System: 2.6.28-11-generic i686
Tag: 2.6.28-11-generic
Memory MemTotal: 2060636 kB
CPU Model: model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz
Running in .
Mounts:
---------------------
rootfs / rootfs rw 0 0
/dev/disk/by-uuid/ee364958-34b6-474e-8e54-9a9eaff56d12 / ext3 rw,relatime,errors=remount-ro,data=ordered 0 0
---------------------
Sda info:
 Model=ST91608220AS , FwRev=3.ALE , SerialNo= 5MA4TF4V
####################
First Test: FsyncProblem

Starting
./test-2.6.28-11-generic-1
We have High IO PID 8949 running
We have fsync-tester with 8950 running...
fsync time: 0.1504
fsync time: 0.5174
fsync time: 0.3664
fsync time: 0.1727
fsync time: 0.2163
fsync time: 0.3080
fsync time: 0.3914
fsync time: 0.1766
fsync time: 0.4800
fsync time: 0.2304
fsync time: 0.4018
fsync time: 0.1159
fsync time: 0.4537
fsync time: 0.1837
fsync time: 0.3032
fsync time: 0.5013
fsync time: 2.0128
fsync time: 0.9343
fsync time: 0.3027
fsync time: 1.2761
fsync time: 0.7145
fsync time: 0.4678
fsync time: 2.0326
fsync time: 0.2019
fsync time: 0.5484
fsync time: 0.3867
fsync time: 0.0912
fsync time: 0.2040
fsync time: 0.3893
fsync time: 0.2703
fsync time: 0.3794
fsync time: 0.5449
fsync time: 0.7379
fsync time: 0.5957
fsync time: 0.6034
fsync time: 0.7915
fsync time: 1.0564
fsync time: 0.5795
fsync time: 0.4501
fsync time: 2.2850
fsync time: 8.1411
fsync time: 1.4754
fsync time: 1.3487
fsync time: 0.9896
fsync time: 0.6221
fsync time: 1.1703
fsync time: 0.2775
fsync time: 0.1842
fsync time: 0.3994
fsync time: 0.5275
fsync time: 0.3382
fsync time: 0.3295
fsync time: 0.6451
fsync time: 0.6803
fsync time: 1.2621
fsync time: 1.3397
fsync time: 0.3250
fsync time: 0.3182
fsync time: 0.3491
fsync time: 0.2745
fsync time: 0.3489
fsync time: 0.5478
fsync time: 0.6009
fsync time: 0.4482
fsync time: 0.3772
fsync time: 0.1414
fsync time: 0.2948
fsync time: 0.2228
fsync time: 0.3758
fsync time: 0.3091
fsync time: 0.2624
fsync time: 0.3526
fsync time: 0.0771
fsync time: 0.2078
fsync time: 0.1613
fsync time: 0.2265
fsync time: 0.2759
fsync time: 0.3231
fsync time: 0.3532
fsync time: 0.1200
fsync time: 0.2788
fsync time: 0.4866
fsync time: 0.2710
fsync time: 0.4107
fsync time: 0.4903
fsync time: 0.5680
fsync time: 0.1199
fsync time: 0.3397
fsync time: 0.3929
fsync time: 0.3373
fsync time: 0.4407
fsync time: 0.2629
fsync time: 0.2998
fsync time: 0.2175
fsync time: 0.3119
fsync time: 0.0971
fsync time: 0.1899
fsync time: 0.4977
fsync time: 0.4127
fsync time: 0.2498
fsync time: 0.8439
fsync time: 0.1513
fsync time: 0.1109
fsync time: 0.2506
fsync time: 0.3414
fsync time: 0.1470
fsync time: 0.0558
./kernel-test2.sh: line 84: 8949 Terminado dd if=/dev/zero of="$io_test_path/test-$info_tag-$i" bs=1M count=5000 oflag=direct
./kernel-test2.sh: line 86: 8950 Terminado ./fsync-tester "$io_test_path/test-$info_tag-$i.fsynctest"
./test-2.6.28-11-generic-1 deleted!
./test-2.6.28-11-generic-1.fsynctest dele...

Read more...

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

Created attachment 21007
Automatic test suite for this bug V3

This includes the fsync test

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

(In reply to comment #306)
> You're running a kernel that is known to have high write latencies, and it
> doesn't appear that your fsync latency test is running in parallel with the
> dd.

????????????????????????????
Most likely about it is not known to anybody. The bug status 'NEW'.

> With 8GB of RAM, you likely need to change your dd to write out at least
> 10GB
> of data instead of 5GB.

OK (add for (In reply to comment #304))

dd if=/dev/zero of=./bigfile bs=1M count=15000 & ./fsync-tester
...
fsync time: 2.3800
fsync time: 2.4295
fsync time: 2.4099
fsync time: 2.1599
fsync time: 2.0760
fsync time: 2.6152
fsync time: 2.1427
fsync time: 2.4893
fsync time: 2.3252
fsync time: 2.3208
fsync time: 2.4223
...
fsync time: 2.3710
fsync time: 1.3094
fsync time: 1.4473
fsync time: 2.7260
fsync time: 2.2739
fsync time: 2.2078
fsync time: 0.5446
15000+0 records in
15000+0 records out
fsync time: 1.5607
15728640000 bytes (16 GB) copied, 201,724 s, 78,0 MB/s

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r b swpd free buff cache si so bi bo in cs us sy id wa
 0 5 0 3930476 6108 3873852 0 0 0 74384 883 1632 1 4 0 94
 0 5 0 3864644 6108 3941216 0 0 0 64512 667 1088 1 5 0 93
 0 4 0 3788956 6108 4015088 0 0 0 73728 943 1738 2 5 0 93
 0 5 0 3735848 6108 4070376 0 0 0 53268 666 1181 1 5 0 94
 2 5 0 3671468 6108 4135384 0 0 0 65024 735 1277 1 4 0 94
 0 4 0 3590356 6108 4213988 0 0 0 77824 860 1590 2 5 0 93
 1 5 0 3524484 6108 4280384 0 0 0 64392 749 1495 1 4 0 94

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Created attachment 21054
test case: Takes the time of mouse click events

All my results shows a high probability of high latencies, when there is a high system time. Most posts where related on high latencies during high IO with SSH connection or with the X-Server. Both uses a network/socket connection. The bug may be in the network stack and not in the io scheduler or block layer.

Here my first test.
The "Example Network Job" test (Flexible IO Tester) shows a regression since 2.6.22.
(see the last test on http://global.phoronix-test-suite.com/?k=profile&u=ebird-3722-22013-9288 )

And here the mouse click test. This test case shows exactly the same regression on all kernels and the same behavior I have recognized in a real environment.

It's !!not!! caused by the fsync bug.

The test case is just clicking on a label and takes the time till the event arrives. It's using the platform's native input queue (see java.util.Robot).

The test case is only a quick solution and has no error handling. It expects a factor as parameter. A high factor like 40.0 means a high sensitiveness and produces a high probability for high latencies, but increases the probability for a missing precondition (no high cpu usage and no high system time) on the current kernels. A value below 5.0 means a bad sensitiveness, which reduces the system time and reduces the probability of capture a high latency event. These values may differ on other machines, as it is not tested on other machines.

For generating the high io, I have used the following commands, but it's enough to copy a big folder (> memory size) too.
# for i in 1 2 3 4 5 6; do dd if=/dev/zero of=t-$i bs=1M count=1K & done

The error occurs with the kernels 2.6.17, 2.6.18 and 2.6.20 only while the cache is filling up withing the first five seconds.

kernel no IO high IO
2.6.17 max 160ms max 35ms (max 2.859s within the first 5 seconds)
2.6.18 max 152ms max 101ms (max 2.430s within the first 5 seconds)
2.6.20 max 164ms max 100ms (max 1.049s within the first 5 seconds)

2.6.27 max 46ms max 6.988s (during IO)
2.6.28 max 51ms max 3.778s (during IO)
2.6.29 max 99ms max 3.632s (during IO)
2.6.30-rc2 max 50ms max 4.993s (during IO)

Unable to run test on this kernel, because of missing preconditions.
2.6.22
2.6.30-rc2 (smp) max 3.624s (during IO)

An output like this or no cpu usage means missing preconditions for the test, reduce the factor.
> High total latency of last 19 events at 138.783s - total latency : 646ms

A factor below 5.0 means the test is not able to be run on this kernel.

P.S.
All tests where done on a kernel without SMP support to reduce multi core scheduler differences with a 250Hz timer and without cpu scaling.
On multi cores system you should busy n-1 cores with an job like this.
# bzip2 -c /dev/zero >/dev/null &

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Created attachment 21055
Complete test log

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

Hi guys,

I have run my test script, which I ran with previous kernels. There is a pretty big increase in performance. on 2.6.30-rc3. The BIGGEST difference I noticed, about my test output was that vmstat reported large numbers (10) of "uninterruptible sleep" processes. Now, it's down to about 1-4.

I saw some 9 and 10 second fsync latencies, but most were around 0.3 seconds, with some around 1-2 seconds.

However, I don't think the kernel is back to what it used to be yet. I never used to have problems with ext3 fsync latencies at all. It used to be that a simple file copy would not cause much latency issues on the responsiveness of my regular apps. In fact, generally speaking, I never noticed any problems when copying huge files. Now, when copying large files, I still get some choppiness, even with Ted's patches.

I'm wondering if the real problem lies in the block io layer, and not the file system layer?

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

The reliability of the mouse click test case (Comment #311) can be improved by adding a random reading process.

# for i in 1 2 3 4 5 6; do dd if=/dev/zero of=t-$i bs=1M count=1K & done
# find / 2>%1 >/dev/null
# java MouseClickTester 40

I am able to catch latencies up to 12 seconds with the kernel 2.6.27 (no smp support). Is there a way to trace such an mouse click event in the kernel? It should be suspend/wait and resume.

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :
Download full text (5.7 KiB)

Kernel 2.6.30-rc2
Other info see comment #304

TEST 1
----------------------------------------------------------------------------
yura@suse:~> dd if=/dev/zero of=./bigfile bs=1M count=15000 & ./fsync-tester
[1] 4561
fsync time: 0.0401
fsync time: 2.4475
fsync time: 1.7808
fsync time: 1.1141
fsync time: 1.6912
fsync time: 1.0753
fsync time: 1.2931
fsync time: 0.3260
fsync time: 0.3653
fsync time: 0.5603
.....
fsync time: 1.3651
fsync time: 1.0479
fsync time: 1.0806
fsync time: 0.6021
fsync time: 0.4708
fsync time: 1.3952
fsync time: 0.6665
fsync time: 1.4431
fsync time: 1.0893
fsync time: 1.7844
fsync time: 0.6520
fsync time: 0.3665
fsync time: 0.8171
fsync time: 0.7537
fsync time: 1.2100
fsync time: 0.9319
fsync time: 1.1578
fsync time: 1.1377
fsync time: 1.4913
fsync time: 1.0317
fsync time: 0.5870
fsync time: 1.8464
fsync time: 1.4770
fsync time: 1.3934
fsync time: 1.3794
fsync time: 0.7868
15000+0 записей считано
15000+0 записей написано
 скопировано 15728640000 байт (16 GB), 172,839 c, 91,0 MB/c
^C

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r b swpd free buff cache si so bi bo in cs us sy id wa
 0 4 0 6189644 808 1572324 0 0 4 116748 1585 1548 2 26 6 67
 1 3 0 6098828 808 1663460 0 0 0 84472 973 1538 2 7 0 91
 0 4 0 6011692 808 1749652 0 0 0 88416 722 1248 2 6 0 92
 0 3 0 5915592 808 1844204 0 0 0 95232 996 1668 1 7 0 92
 1 4 0 5834692 808 1925564 0 0 0 77832 672 838 1 6 0 93
 0 4 0 5755452 808 2005900 0 0 0 79872 940 1472 1 5 0 93
 1 2 0 5664856 808 2096760 0 0 0 88744 746 1316 1 6 0 92
 0 4 0 5574556 808 2185520 0 0 0 86368 802 1286 1 6 0 93
 0 3 0 5492072 808 2268036 0 0 0 81408 785 1112 1 6 0 93
 0 4 0 5412744 808 2347624 0 0 0 78344 926 1400 1 5 0 93
 0 3 0 5333768 808 2428624 0 0 0 78848 659 1046 1 5 50 43
 0 4 0 5245744 808 2516336 0 0 0 86536 992 1526 1 6 50 42
 0 4 0 5153952 808 2605988 0 0 0 89088 947 4596 4 7 48 41
 0 3 0 5074720 808 2686532 0 0 0 78336 958 1768 1 6 49 43
 0 4 0 4974280 808 2787192 0 0 0 92198 706 1028 1 7 20 72
 0 3 0 4897224 808 2862716 0 0 0 80905 1046 1650 1 5 49 45
 0 4 0 4819832 808 2940944 0 0 0 77348 1193 2076 1 6 0 93
 1 2 0 4730172 808 3031732 0 0 0 82104 733 1020 1 6 1 91
 0 3 0 4648668 808 3112676 0 0 0 86864 994 1674 1 6 50 42
 1 3 0 4556864 808 3203828 0 0 0 87232 708 1136 2 6 49 43

TEST 2
----------------------------------------------------------------------------
yura@suse:~> dd if=/dev/zero of=./bigfile2 bs=1M count=15000
15000+0 записей считано
15000+0 записей написано
 скопировано 15728640000 байт (16 GB), 174,036 c, 90,4 MB/c

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r b swpd free buff cache si so bi bo in cs us sy id wa
 1 3 ...

Read more...

Revision history for this message
In , bob+kernel (bob+kernel-linux-kernel-bugs) wrote :

This absolutely cannot be an ext3 bug. I'm using reiserfs for my root, and it happens here too. System totally locks up with a swap storm when memory pressure starts forcing things into swap. Firefox using > 2GB memory, and a wine memory bug which causes it to report ~4GB VIRT are what triggers it for me. Killing either one fixes the storm. (which is often not possible because keyboard/mouse are unresponsive) Machine has 4GB RAM, 4GB swap.

It must be in the block layer, or elsewhere.

It also seems to happen with swap *off*.

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

Bob, (In reply to comment #316)
> This absolutely cannot be an ext3 bug. I'm using reiserfs for my root, and
> it
> happens here too. System totally locks up with a swap storm when memory
> pressure starts forcing things into swap. Firefox using > 2GB memory, and a
> wine memory bug which causes it to report ~4GB VIRT are what triggers it for
> me. Killing either one fixes the storm. (which is often not possible
> because
> keyboard/mouse are unresponsive) Machine has 4GB RAM, 4GB swap.
>
> It must be in the block layer, or elsewhere.
>
> It also seems to happen with swap *off*.

Bob, what exact symptoms are you seeing? There is another issue in the kernel, to which I have been unable to reproduce for the kernel devs. I have seen it numerous times where the kernel has "futex" deadlocks. It is potentially possible that yours could be related to that.

Because this performance problem, in this bug, does not cause a complete lockup. It may seem that way for a bit, but if you leave the machine, it will eventually recover. The futex one appears to be a complete deadlock, as it doesn't appear that it matters how long I leave it, it will never recover.

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

I recently experienced a new (for me) condition wherein this bug reared its ugly head, and it actually did not involve high disk throughput. I was running mencoder, which was pegging three of my CPU cores and using a fair share of the fourth. It was reading from a file on my RAID and writing to a file on a tmpfs, not particularly quickly on either end since it was doing a lot of number crunching in between. The bug cropped up when I started an rsync at the same time, sending some files from my RAID to a remote system, again not particularly quickly (my upstream network bandwidth is only about 80 KB/s). So I wasn't stressing the disk at all, yet my system came to a crawl. I could literally watch windows repainting themselves on expose events. Pressing Ctrl+Alt+Delete to bring up the KRunner process list took at least a minute, if not more. My disks were churning an awful lot, which was odd given the quite low demands I should have been placing on them. I thought maybe the input file to mencoder might have been heavily fragmented, but I ran xfs_fsr on it, and it said it only had 4 extents. Something is seriously FUBAR here.

A possible theory: forcing the disks to seek back and forth to read from the two files "simultaneously" meant that the majority of the time was spent waiting for disk seeks. If the kernel was holding a big lock while waiting for those seeks, it could have seriously degraded the performance of the rest of the system.

Revision history for this message
In , bob+kernel (bob+kernel-linux-kernel-bugs) wrote :

The bug I'm seeing is extremely reproducible. (I just wait for about a day with firefox running and lots of tabs open and it will happen) As I mentioned it occurs when memory pressure starts forcing things into swap. This is not a hard lockup and the system will eventually recover. (Where "eventually" can be > 30 minutes)

updated and trackerd also cause my system to be unusable, as reported above. I have disabled them as a consequence...

Given that I can trigger it, I can run jobs in the background that could log something useful...locks? fsync? What do you suggest?

(This system has a quad core intel and raid5 root as well -- don't know if that's related)

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

Matt,(In reply to comment #318)
> I recently experienced a new (for me) condition wherein this bug reared its
> ugly head, and it actually did not involve high disk throughput.

Yes, that is one of the reasons that I believe there is more to it than just ext3 fsync improvements; it doesn't always take a lot to make it happen.

Matt, do these things happen on 2.6.30-rc3? I've seen an almost disappearance of my issues with this release. It's still not gone, which indicates to me that they just didn't hit the nail on the head. But, it certainly is WAY better.

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #320)
> Matt, do these things happen on 2.6.30-rc3?

I'm not willing to run a pre-release kernel. In fact, the kernel is the only package on my Gentoo system that I intentionally maintain at the "Gentoo stable" level, rather than at the leading edge. This is mostly because I don't want to have to reboot every time a new patch set comes out. Right now I'm running 2.6.28-gentoo-r5, which is based on 2.6.28.9.

If this bug is indeed improved upon in 2.6.30, then I look forward to the release of 2.6.31! :)

Revision history for this message
In , Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote :

(In reply to comment #317)

>
> Bob, what exact symptoms are you seeing? There is another issue in the
> kernel,
> to which I have been unable to reproduce for the kernel devs. I have seen it
> numerous times where the kernel has "futex" deadlocks. It is potentially
> possible that yours could be related to that.

Trenton, could you please point me to the bug of this issue you are speaking of?

Revision history for this message
In , unggnu (unggnu-linux-kernel-bugs) wrote :

I am using Ubuntu 9.04 with 2.6.30-rc3 x86_64 Kernel and I can confirm the whole behavior.
The irony is that it feels like Windows 95 while a floppy was formated. You know, the whole pseudo multi tasking on top of Dos - everything was really choppy.
A easy testcase is to set up two luks encrypted partitions and copy from one to another. Even if no core is under heavy load everything is slow. The same happens with usb transfers too.
But as like Matt Whitlock pointed out it is not always a disk io problem. Even under higher cpu usage this could happen. If I encode a DVD with ogmrip/mencoder h264 and 16 threads (16 threads get the highest cpu usage from my quad core which is still under 80% per core) Gnome feels like a formatting Win 95.
The latest problem has become less severe with 2.6.30-rc3 but it is still noticeable slow which makes no sense since no core has 100% load.
To have an comparison how it could work. If I fire up Prime95 whith 100% load on every core in Windows Vista I can still play modern 3D games without lagging. Windows of course has also flaws with IO and so on but the cpu multi tasking works really great. Way to go imho.

Revision history for this message
In , nalimilan (nalimilan-linux-kernel-bugs) wrote :

FWIW, I've tried the test proposed by Thomas in comment 314:
# for i in 1 2 3 4 5 6; do dd if=/dev/zero of=t-$i bs=1M count=1K & done
# find / 2>%1 >/dev/null
(the Java part did not start for some reason)

I ended force-rebooting my laptop, since it was impossible to control *after a few seconds*. I could only switch to VT and back to X, but very slowly and I couldn't even type a character there or in X. I have 500MB of RAM with a swap of the same size, Pentium M 1500 MHz: not very high config, but that should be sufficient to work, isn't it? :-) This was with 2.6.28, I'll try with 2.6.30rc2.

Revision history for this message
In , rockorequin (rockorequin-linux-kernel-bugs) wrote :

My system also locks up when it tries to access swap. This is on Ubuntu Jaunty with both the Ubuntu 2.6.28 kernel and Ubuntu's vanilla 2.6.30.rc3 kernel. This machine has 4GB of RAM and 4GB of swap and is running on a root ext4 partition.

My test case is to run multiple VirtualBox VMs (eg Jaunty installations) with say 1.4GB of RAM assigned to each. When I run the third one, as soon as the kernel starts to hit swap, it thrashes the hard drive, X rapidly becomes unresponsive and I have to hard reset the the machine. I am able to move the mouse (slowly) but clicking on individual windows doesn't work and the keyboard doesn't respond. atop -d manages to update itself as far as about 300MB of swap use and then stops updating.

I've left it as long as 15 minutes to see if it will recover, but it doesn't.

Revision history for this message
In , unggnu (unggnu-linux-kernel-bugs) wrote :

(In reply to comment #325)
> My system also locks up when it tries to access swap.
> My test case is to run multiple VirtualBox VMs (eg Jaunty installations) with
> say 1.4GB of RAM assigned to each. When I run the third one, as soon as the
> kernel starts to hit swap, it thrashes the hard drive, X rapidly becomes
> unresponsive and I have to hard reset the the machine.
There are definitive some huge issues with the Kernel but I think this is not one of them. If your applications try to use more ram than it is available and always trying to access/reserve this mem which is likely with Virtualbox every other OS wouldn't operate fine anymore. Of course it should be possible to switch to console and run some commands but this has nothing to do with this report I think.

Btw. I forgot to mention that I don't use a swap.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

(In reply to comment #324)
> I ended force-rebooting my laptop, since it was impossible to control *after
> a
> few seconds*.

It's a extreme test case, as there will be generated a very high load. You can try with only two concurrent write processes, as your machine is PATA, only 1,5GHz and with a single core. And start the java test case at the beginning, it was switched before (a long day).

# java MouseClickTester 40

# for i in 1 2; do dd if=/dev/zero of=t-$i bs=1M count=1K & done
# find / 2>%1 >/dev/null

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Little correction.

# java MouseClickTester 40

# for i in 1 2; do dd if=/dev/zero of=t-$i bs=1M count=1K & done
# find >/dev/null 2>&1

Revision history for this message
In , rockorequin (rockorequin-linux-kernel-bugs) wrote :

(In reply to comment #326)
>There are definitive some huge issues with the Kernel but I think this is not
>one of them. If your applications try to use more ram than it is available and
>always trying to access/reserve this mem which is likely with Virtualbox every
>other OS wouldn't operate fine anymore. Of course it should be possible to
>switch to console and run some commands but this has nothing to do with this
>report I think.
>Btw. I forgot to mention that I don't use a swap.

@unggnu: this is not a kernel issue?!!! If multiple apps are trying to reserve more RAM than is available and thus causing continuous access to swap, the kernel should NOT become completely unresponsive and require a hard reset, risking data loss or in the case of a remote server that you can't hard reset, denial of service. Surely the memory management system should be able to recognise this condition and take appropriate action, eg freeze one or more processes with high RAM requirements.

At the VERY least it should allow an operator to kill off offending processes, but this is impossible because you can't even login via ssh or access a console. This is where the test case is relevant to this bug - if the system didn't become completely unresponsive, the operator could fix the problem without a hard reset.

Revision history for this message
In , drees76 (drees76-linux-kernel-bugs) wrote :

IMO, this bug has long past the point where it is useful.

There are far too many people posting with different issues.

There is too much noise to filter through to find a single bug.

There aren't any interested kernel developers following the bug.

The bug needs to be closed and reopened with separate bugs for each issue. Each issue should be reproducible with the latest 2.6.30-rc kernel with a simple test case.

Anything else will just result in another huge bug with 300+ comments and no kernel developer interest.

(In reply to comment #329)
> @unggnu: this is not a kernel issue?!!! If multiple apps are trying to
> reserve
> more RAM than is available and thus causing continuous access to swap

It is not a kernel issue. It is a system configuration issue. If you have a half dozen large memory processes trying to fight for more memory than is available in the system causing each of those processes to be continuously swapped in and out as they fight to run, you're going to get horrible performance.

You either need more memory, less swap so that the OOM killer can kill a process or need to avoid running so many large memory processes in parallel.

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

(In reply to comment #330)
> IMO, this bug has long past the point where it is useful.

Even I (the reporter) have more or less stopped tracking this bug. I absolutely agree.

> There are far too many people posting with different issues.
>
> There is too much noise to filter through to find a single bug.
>
> There aren't any interested kernel developers following the bug.

I would definitely agree; the bug has long outlived its usefulness. Closing with INSUFFICIENT_DATA.

> The bug needs to be closed and reopened with separate bugs for each issue.
> Each issue should be reproducible with the latest 2.6.30-rc kernel with a
> simple test case.

Absolutely, all of you who have commented on this bug thusfar should open new bugs. While I can't stop anyone from opening bug reports, it is likely that any report without a definite test case reproducing the issue will turn into yet another grab-bag like this one.

Revision history for this message
In , simon+kernelbugzilla (simon+kernelbugzilla-linux-kernel-bugs) wrote :

Having tracked bugs 7372 and 12309 on the primary issue (performance hitting a brick wall with heavy IO) since October 2007, and now facing the prospect of needing to track yet another one, can I make a plea that whoever opens the new one(s) posts a reference to the new bug ID(s) in this thread?

Revision history for this message
In , nalimilan (nalimilan-linux-kernel-bugs) wrote :

Thomas: thanks for that update, and indeed the second and more reasonable testcase does not completely kill the system. I'm seeing a possibly interesting phenomenon: the testcase does not trigger any hang when run alone, but when Firefox is started, I can see swap usage rise, and then the mouse won't move for about a second from time to time.

So my guess is that when the system needs to swap, even for only a few MB, it's not able to do that smoothly for the user. Maybe there's a problem of scheduling when the kernel needs to choose to give priority to the swap or to the root partition. Or that's simply because writing to quite remote places on the disk leads to high latencies. Would that be worth a new bug? I think we're a few experiencing this problem here.

I generally agree that this bug is not leading anywhere, but ATST we don't even know how many different issues there are, so opening new reports is problematic too. Maybe we could concentrate on the few cases we're best able to describe precisely, and hope we all suffer from these...

Revision history for this message
In , Lukasz.Kurylo (lukasz.kurylo-linux-kernel-bugs) wrote :

I have found this article after I had another "freeze". Just before freeze free memory was running out, swap was barely used, buffers were few hundred kB, BUT CACHE was over 2,7GB out of total 3GB of memory. After about 20 minutes I managed to switch to VT1 and there was now about 500MB of free memory, less cache and increased swap usage. Last output of top showed kswapd process kicking in.

Googling gave me this thread:
http://lkml.indiana.edu/hypermail/linux/kernel/0311.3/0406.html

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Lets summarize the bugs.

- High cache usage during write process enforce swapping of processes
  Patch in comment #160 works, but is not included in the linux tree.

- Fsync Bug in Ext3
  (There is a test case and a activity)

- Too high prioritization of heavy writing processes
  (Copying a big file, can delay the start of a program till finishing the copy
   operation)

- Missing read and write based scheduler

And finally the annoying bugs
- Low gui responsiveness during heavy IO
  A reliable test case is still missing.
  - The test case in comment #311 shows high click latencies 2-12s
    during heavy io on non smp kernels
    (on smp kernels too, but it's not easy to catch such an event)
  - I have a socket ping pong test (not submitted), which shops latencies of
    ~2s after the writing processes are finished

- Low gui responsiveness in virtual machines
  no test case
  maybe the same bug as the "Low gui responsiveness during heavy IO" bug

The gui responsiveness are not deterministic, as there may be a day with nearby no latencies and a hour with continuous latencies up to 60 seconds

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

Does anybody know why the caches are not dropped after I echo 3 to drop_caches? I would expect that number to come down to 0 ideally but still few megs may be practically. What I see is after some usage of the system, the caches keep increasing and never go down with drop_caches. The graph is ever increasing. Almost like a leak of caches. Has anybody debugged this aspect? I think this is one of the primary reasons for slow down because memory is locked in the caches and new memory requests are swapping the crap out of the system.

Revision history for this message
In , Lukasz.Kurylo (lukasz.kurylo-linux-kernel-bugs) wrote :

In case of GUI responsiveness iotop showed relatively high IO after the freeze on X process (read). Maybe X poor responsiveness is caused by waiting for IO as well.

Revision history for this message
In , Lukasz.Kurylo (lukasz.kurylo-linux-kernel-bugs) wrote :

Interesting thing is cache usage and inability to drop most of it. From my understanding memory cache can be dropped if it's not dirty (has been "wittenback" to disk) this brought me to this thread about lack of writeback:
http://marc.info/?l=linux-kernel&m=113919849421679&w=2

On the other side /proc/meminfo shows only ~160kB of dirty memory. Cache shows 880868 KB. echo 3 > /proc/sys/drop_caches doesn't do anything. So why cache can't be freed? Is it possible to have cache leak?

Revision history for this message
In , Lukasz.Kurylo (lukasz.kurylo-linux-kernel-bugs) wrote :

Looks like drop_caches stopped working as expected somewhere around 2.6.18:
look at first comment:
http://jons-thoughts.blogspot.com/2007/09/tip-of-day-dropcaches.html

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

Be careful using drop_caches. I actually managed to cause a kernel crash by using it in combination with a removable medium. Think it was a double-free bug, but I don't remember for certain.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

It has been mentioned time and again that none of the kernerl devs have gotten a concise description of the problem and hence none of them seems to have any nswers. Well, does anybody know why my caches show 700MB in a 2GB machine and why can't I get rid of any of it? I don't think the question can get any more precise.This is the heart of the problem folks.

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

(In reply to comment #341)
> It has been mentioned time and again that none of the kernerl devs have
> gotten
> a concise description of the problem and hence none of them seems to have any
> nswers. Well, does anybody know why my caches show 700MB in a 2GB machine and
> why can't I get rid of any of it? I don't think the question can get any more
> precise.This is the heart of the problem folks.

I don't understand why you'd assume that cache is a problem. The kernel uses available RAM as cache as it's the most productive use for it. To assume that this is buggy behavior is extremely misled logic.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

(In reply to comment #342)
> (In reply to comment #341)
> > It has been mentioned time and again that none of the kernerl devs have
> gotten
> > a concise description of the problem and hence none of them seems to have
> any
> > nswers. Well, does anybody know why my caches show 700MB in a 2GB machine
> and
> > why can't I get rid of any of it? I don't think the question can get any
> more
> > precise.This is the heart of the problem folks.
>
> I don't understand why you'd assume that cache is a problem. The kernel uses
> available RAM as cache as it's the most productive use for it. To assume that
> this is buggy behavior is extremely misled logic.

What's buggy is that its not ready to relinquish it when asked to drop it or when needed. echo 3 to drop_caches should drop the damn thing. If I configure swappiness=1, cache should be dropped first and then swap disk should be used. I don't like it locking 700MB out of my 2GB RAM, then swapping heavily. If this behavior is by design, someone needs to change that design.

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

Kernel 2.6.30-rc3
If a task is executed using a processor on ~30 percents and work is simultaneously executed with the file system (cp, mv, rm) - a computer dies, at this juncture to start something is not real yet.

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

The kernel 2.6.30-rc4 is not better.
This bug "Large I/O operations result in slow performance and high iowait times" has passed from the status NEW in not clear state but iowait both was high and remained. Stop these frauds. What data is still necessary?

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

(In reply to comment #346)
> The kernel 2.6.30-rc4 is not better.
> This bug "Large I/O operations result in slow performance and high iowait
> times" has passed from the status NEW in not clear state but iowait both was
> high and remained. Stop these frauds. What data is still necessary?

No. Kernel folks will not "stop these frauds". Is your system using the latest DDR3 memory running at 2000Mhz? Is it using core i7 based processor, overclocked to 4.5Ghz? Does it have SSD drives with at least 150MB/s writes? Are you using ext4 yet? If all of these are true, and your system still hangs, only then kernel devs will "stop these frauds" and fix this bug. Until then, just use Vista (make sure to upgrade to SP1)...:D
...

...
in case you couldn't tell, I was just kidding with ya! Please file a separate bug report with specific details about what you are experiencing on your system.

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

Terminal 1 (no other active task)
:~/x1> time cp -r qt-x11-opensource-src-4.5.1 qt-x11-opensource-src-4.5.1-1

real 5m51.075s
user 0m0.147s
sys 0m2.192s

302.6Mb / 351s = 0.9 Mb/s

Terminal 2
:~/x1> vmstat 1

<- only cp
0 0 0 4916172 808 2774512 0 0 24 16248 794 1469 2 3 95 0
2 0 0 4915228 808 2776340 0 0 24 2180 959 1385 2 1 97 0
1 0 0 4913492 808 2778140 0 0 24 3144 841 1251 1 1 97 0
0 0 0 4912500 808 2779104 0 0 24 2636 679 936 2 1 97 0
1 0 0 4910516 808 2781112 0 0 32 2804 862 1258 2 1 96 0
0 1 0 4908872 808 2781812 0 0 36 27160 749 913 5 2 91 2
<- enter to foolder in dolphin (100 files in this folder)
2 0 0 4907012 808 2783712 0 0 48 2615 1108 1563 3 2 82 12
0 1 0 4906020 808 2784728 0 0 80 3248 890 1274 2 1 64 33
0 1 0 4905648 808 2785164 0 0 56 2933 705 920 3 1 67 28
0 1 0 4904828 808 2786028 0 0 84 2600 884 1240 2 1 49 47
0 1 0 4903456 808 2787400 0 0 44 3148 723 873 3 1 62 35
0 1 0 4902084 808 2788572 0 0 64 2681 1177 2604 3 1 49 47
0 1 0 4901464 808 2789284 0 0 48 2328 952 1407 2 1 63 34
0 1 0 4900556 808 2790416 0 0 36 2624 951 2373 4 1 59 35
1 1 0 4898040 808 2792868 0 0 60 2672 1224 4018 8 3 46 43
0 1 0 4897032 808 2793868 0 0 80 2760 693 1004 2 1 49 47
0 1 0 4895552 808 2795304 0 0 28 2459 1029 1495 2 1 81 15
0 1 0 4894552 808 2796408 0 0 84 2744 877 1279 2 1 49 47
0 1 0 4892700 808 2798272 0 0 76 2204 773 1078 4 1 48 47

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

(In reply to comment #348)

You need to open a new bug report with a thorough explanation of your test case, expected and observed results, and any pertinent data you may have collected. Leave a note here referencing your newly created bug but posting any data here is not going to help anyone. This bug is closed due to a lack of focus.

Revision history for this message
In , Lukasz.Kurylo (lukasz.kurylo-linux-kernel-bugs) wrote :

I need to get back to 2.6.17, can't work like that! I have 3GB RAM of which >2GB are used by cache that won't drop even if memory is runnig out.

Revision history for this message
In , akpm (akpm-linux-kernel-bugs) wrote :

(In reply to comment #349)
> (In reply to comment #348)
>
> You need to open a new bug report with a thorough explanation of your test
> case, expected and observed results, and any pertinent data you may have
> collected. Leave a note here referencing your newly created bug but posting
> any
> data here is not going to help anyone. This bug is closed due to a lack of
> focus.

yup.

Guys, problems like this aren't sovled very effectively via bugzilla.

Please prefer to report these issues via email to linux-kernel and
myself and any developers who you think might be relevant. it's confusing,
and clarity is important. Being able to provide a means by which others
can demonstrate the problem is a huge benefit.

Revision history for this message
In , trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote :

Why am I still being CC'd on this bug, even though I'm not on the CC list?

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #352)
> Why am I still being CC'd on this bug, even though I'm not on the CC list?

Maybe you're watching Jens Axboe (the assignee), Ben Gamari (the reporter), or another user who is still in the CC list.

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

kernel 2.6.30-rc6
yura@suse:~> export LANG=en
yura@suse:~> dd if=/dev/zero of=test1 bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 129.928 s, 80.7 MB/s

yura@suse:~> vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r b swpd free buff cache si so bi bo in cs us sy id wa
 3 8 0 44708 0 7628596 0 0 249 12283 362 821 4 3 66 28
 0 8 0 49180 0 7627532 0 0 0 40517 1061 1581 7 4 0 89
 0 7 0 47188 0 7627180 0 0 0 59694 1156 1505 5 6 0 89
 1 6 0 46692 0 7628276 0 0 0 55553 1144 1476 6 5 0 90
 0 8 0 46428 0 7628160 0 0 20 51573 900 1096 5 4 0 90
 0 7 0 46568 0 7627860 0 0 0 64024 1127 1480 5 5 0 90
 0 7 0 45796 0 7629100 0 0 12 44597 889 987 6 4 0 90
 0 8 0 46904 0 7627808 0 0 332 40500 1100 1485 6 4 0 90
 0 7 0 47772 0 7626884 0 0 168 45300 1158 1628 6 4 0 90
 0 8 0 47216 0 7624456 0 0 72 67116 958 1151 5 5 0 90
 0 7 0 47032 0 7626480 0 0 280 29244 1177 1667 5 4 0 91
 0 7 0 45936 0 7626640 0 0 248 58872 922 1060 6 5 0 89
 0 9 0 44988 0 7626640 0 0 216 62492 945 1359 2 6 0 92
 0 8 0 47548 0 7625932 0 0 152 47164 926 1425 1 4 0 95
 1 6 0 45276 0 7627256 0 0 36 54721 605 1089 2 4 0 94
 0 7 0 48208 0 7626388 0 0 44 43612 834 1198 1 4 0 95
 0 8 0 47096 0 7625644 0 0 132 53789 655 1156 1 4 0 94
 0 7 0 46344 0 7624828 0 0 468 50292 981 2089 2 4 0 94
 0 8 0 46576 0 7625416 0 0 116 44056 1155 2119 1 3 0 96
 0 8 0 47476 0 7624800 0 0 636 38936 734 1125 2 4 0 94
 0 8 0 47348 0 7626676 0 0 32 58410 885 1613 1 5 0 93
 1 6 0 48508 0 7626280 0 0 0 67256 623 969 1 4 0 94
 0 7 0 47984 0 7625328 0 0 0 64888 694 1335 2 6 0 92
 0 7 0 45800 0 7626692 0 0 0 62496 1002 1698 1 4 0 95
 0 7 0 48220 0 7625052 0 0 0 61952 614 1222 2 5 0 93
 0 7 0 48508 0 7623300 0 0 0 69632 890 1586 1 5 0 94

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :
Download full text (12.9 KiB)

#354 this bigfile
yura@suse:~> time cp bigfile bigfile.cp

real 5m52.457s
user 0m0.343s
sys 0m21.356s

calc speed => 10485760000 / 352.457 = 29.75 Mb/s

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r b swpd free buff cache si so bi bo in cs us sy id wa
 1 0 0 46688 0 7686820 0 0 0 12 564 862 1 0 98 0
 0 0 0 46688 0 7686820 0 0 20 0 387 730 2 0 96 1
 0 0 0 46688 0 7686840 0 0 0 0 559 879 1 0 98 0
 0 0 0 46688 0 7686840 0 0 0 0 598 937 1 1 97 0
 0 0 0 46688 0 7686840 0 0 0 0 315 517 2 1 98 0
 0 0 0 46704 0 7686840 0 0 0 16 600 1058 2 1 97 0
 0 0 0 46704 0 7686840 0 0 0 0 328 473 2 0 98 0
 0 0 0 46704 0 7686840 0 0 0 0 610 1122 2 0 98 0
 0 0 0 46704 0 7686840 0 0 0 0 582 1013 2 0 98 0
 0 0 0 46876 0 7686840 0 0 0 1 341 475 1 0 98 0
 0 0 0 46876 0 7686840 0 0 0 0 577 988 2 0 98 0
 0 0 0 46876 0 7686840 0 0 0 0 339 543 2 1 97 0
start cp
 3 0 0 46500 0 7686704 0 0 17500 0 857 2379 2 2 91 5
 3 0 0 43840 0 7689132 0 0 90624 0 2119 5710 4 11 61 24
 0 1 0 46532 0 7686180 0 0 83968 0 2008 5246 8 11 57 24
 1 1 0 43884 0 7689020 0 0 81024 46 2159 8097 6 10 59 25
 0 1 0 45020 0 7687772 0 0 81920 1 1759 3732 4 10 60 26
 0 1 0 44948 0 7687472 0 0 91264 0 2154 4449 4 10 60 25
 0 1 0 43924 0 7688888 0 0 88064 0 2040 4500 3 11 60 26
 0 1 0 46180 0 7686288 0 0 89984 0 1919 4107 3 11 63 22
 0 2 0 44692 0 7680932 0 0 86784 39184 2156 4820 4 12 47 38
 0 2 0 44568 0 7681376 0 0 64384 22436 1569 4127 3 7 35 54
 0 2 0 44092 0 7682832 0 0 35584 37396 1331 2886 3 5 37 55
 4 2 0 46920 0 7678572 0 0 42624 43336 1544 3311 3 6 28 63
 0 2 0 45724 0 7679280 0 0 49792 31240 1301 3076 2 6 27 64
 0 2 0 45328 0 7681288 0 0 41856 31648 1473 3322 3 5 27 65
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r b swpd free buff cache si so bi bo in cs us sy id wa
 0 2 0 46328 0 7679272 0 0 52480 29232 1425 3345 3 8 33 56
 1 2 0 46276 0 7679748 0 0 48768 24844 1564 3539 3 6 32 60
 1 2 0 47196 0 7678088 0 0 63360 28688 1830 4781 3 9 14 74
 5 3 0 44052 0 7681744 0 0 58112 23612 1493 3905 3 8 5 83
 1 2 0 44988 0 7679956 0 0 18560 53021 1107 2129 2 4 0 94
 0 4 0 46872 0 7677272 0 0 55808 22541 1478 4117 3 7 1 89
 0 4 0 43824 0 7681360 0 0 52608 33800 1627 4628 3 7 0 89
 0 4 0 45536 0 7679600 0 0 41856 31720 1491 4026 3 6 2 89
 0 4 0 45688 ...

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

Bug 12309 - Large I/O operations result in slow performance and high iowait times

Where the low iowait
Where the small I/O operations result
Where Status: RESOLVED INSUFFICIENT_DATA

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :
Revision history for this message
In , rafal (rafal-linux-kernel-bugs) wrote :

There is ongoing discussion about similar issue:
http://lkml.org/lkml/2009/5/15/320
and
http://lkml.org/lkml/2009/5/16/23

Revision history for this message
In , perlover (perlover-linux-kernel-bugs) wrote :
Download full text (3.9 KiB)

Confirm bug

My OS is Fedora Core release 6
Kernel: 2.6.22.14-72.fc6
2 CPUs: Intel® Xeon® CPU 5130 @ 2.00GHz
HDDs: SAS 3.0 Gb/s, FUJITSU
RAID: Adaptec 4800SAS
RAID10

How to test:
# dd if=/dev/zero of=testfile.1gb bs=1M count=1000

In other terminal during a copying you should run:
# vmstat 1

I see for example:
r b swpd free buff cache si so bi bo in cs us sy id wa st
14 8 460 120716 280236 1509844 0 0 9 14 0 0 9 3 66 22
0
0 13 468 121936 279216 1550936 0 0 1368 47776 1927 4153 24 8 8 60
0
0 15 468 121516 280200 1551200 0 0 1408 3744 1726 2846 1 2 3 94
0
0 8 468 129804 280520 1545940 0 0 1612 4280 1854 4060 3 2 1 95
0
0 6 468 131388 281868 1546628 0 0 2140 3620 2020 4650 12 3 13 71
0
0 17 468 114220 282792 1571864 0 0 1208 3212 1647 2715 4 3 6 87
0
1 12 468 115356 283164 1570704 0 0 1420 18964 1718 2397 2 2 2 94
0
0 9 468 114320 283628 1570868 0 0 768 1204 1753 2831 3 1 0 96

iowait -> 80-90% during 'dd'
All other CPU's task work very very slow ...

AND (!!!), the output of 'dd' is:
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 112.086 seconds, 9.4 MB/s
                                                   ^^^^^^^^^

During some years i see a following behaviour: if server often uses a harddisk (for testing: 'dd' examples here) then iowait is stability 50-90% and many tasks are frozen during some seconds (10-20 and may be more at me). It's easy for testing through 'dd'. I cannot resolve this trouble by ionice for example - iowait is high even if i do a some i/o tasks ionice -c3 or ionice -c2 -n7 for example! So each server under kernel 2.6.18 and more (i read many topics) has this bug. A people in forums write that the kernel of 2.6.30-rc2 has bug too and that FreeBSD work quickly (mouse moving, video showing and some other CPU's tasks) during 'dd' testing unlike Linux ...

I don't know what arguments do you want for finding this bug! This bug to be since 2007 year ...

Please help! Here examples of my loaded server in some times (not DD - there only typical Mysql database & Mysql tasks & apache tasks):

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
-
 r b swpd free buff cache si so bi bo in cs us sy id wa st
13 14 120 68460 574784 1286748 0 0 13 1 0 0 9 3 66 22
0
 1 11 120 74564 576080 1286976 0 0 1560 0 1632 3641 34 10 0 57
0
 0 12 120 69988 577572 1287352 0 0 1904 0 1969 3696 5 2 0 93
0
 0 11 120 66916 578984 1287860 0 0 1900 0 1809 3615 6 2 0 92
0
 0 11 120 64960 580424 1288028 0 0 1668 0 1642 2188 1 1 0 97
0
 0 11 120 72764 576508 1286788 0 0 1668 0 1681 2198 3 2 0 96
0
 1 11 120 71424 577940 1287300 0 0 1604 332 1575 2152 2 1 0 97
0
 3 11 120 58852 579528 1289100 0 0 2000 0 1984 3286 44 7 0 49
0
 1 11 120 75104 581012 1287472 0 0 1608 0 2119 2839 39 7 0 55
0
 0 13 120 72160 582572 1287672 0 0 1908 120 1645 2366 7 1 0 92
0

[root@63 logs]# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
-
 r b swpd free buff cache si so bi ...

Read more...

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :
Download full text (3.5 KiB)

Created attachment 21774
Test patch against heavy io bug

I have made an bisection and got these two patches. Reverting these patches improves the desktop responsiveness on my notebook enormous. I have tested it on a 2.6.28 non smp kernel (my heavy io testing installation) during four concurrent read and write operations, while working with two VMs. It's only a Core2 @2.4GHz system. I can even start new application during heavy io.

I have added the patch, which I have applied to my test installation. Use it with care, as I am not a kernel developer and does not know the dependencies in the cfq scheduler.

I have reverted theses two patches:

07db59bd6b0f279c31044cba6787344f63be87ea is first bad commit
commit 07db59bd6b0f279c31044cba6787344f63be87ea
Author: Linus Torvalds <email address hidden>
Date: Fri Apr 27 09:10:47 2007 -0700

    Change default dirty-writeback limits

    Do this really early in the 2.6.22-rc series, so that we'll get
    feedback. And don't change by half measures. Just cut the default
    dirty limit to a quarter of what it was, and see if anybody even
    notices.

    Signed-off-by: Linus Torvalds <email address hidden>

:040000 040000 b63eb9faf5b9a42a1cdad901a5f18d6cceb7fdf6 2b8b4117ca34077cb0b817c77595aa6c9e34253a M mm

a993800655ee516b6f6a6fc4c2ee13fedfb0590b is first bad commit
commit a993800655ee516b6f6a6fc4c2ee13fedfb0590b
Author: Jens Axboe <email address hidden>
Date: Fri Apr 20 08:55:52 2007 +0200

    cfq-iosched: fix sequential write regression

    We have a 10-15% performance regression for sequential writes on TCQ/NCQ
    enabled drives in 2.6.21-rcX after the CFQ update went in. It has been
    reported by Valerie Clement <email address hidden> and the Intel
    testing folks. The regression is because of CFQ's now more aggressive
    queue control, limiting the depth available to the device.

    This patches fixes that regression by allowing a greater depth when only
    one queue is busy. It has been tested to not impact sync-vs-async
    workloads too much - we still do a lot better than 2.6.20.

    Signed-off-by: Jens Axboe <email address hidden>
    Signed-off-by: Linus Torvalds <email address hidden>

:040000 040000 07c48a6930ce62d36540b6650e3ea0563bd7ec59 95fc11105fe3339c90c4e7bebb66a820f7084601 M block

Here the fsync result on my machine:

**************************************************************************
Without patch
Linux balrog 2.6.28 #2 Mon Mar 23 11:19:13 CET 2009 x86_64 GNU/Linux

fsync time: 7.8282
fsync time: 17.3598
fsync time: 24.0352
fsync time: 19.7307
fsync time: 21.9559
fsync time: 21.0571
5000+0 Datensätze ein
5000+0 Datensätze aus
5242880000 Bytes (5,2 GB) kopiert, 129,286 s, 40,6 MB/s
fsync time: 21.8491
fsync time: 0.0430
fsync time: 0.0448
fsync time: 0.0451
fsync time: 0.0451
fsync time: 0.0451
fsync time: 0.0452

**************************************************************************
With patch
Linux balrog 2.6.28 #5 Fri Jun 5 22:23:54 CEST 2009 x86_64 GNU/Linux

fsync time: 2.8409
fsync time: 2.3345
fsync time: 2.8423
fsync time: 0.0851
fsync time: 1.2497
fsync time: 0.9981
fsync time...

Read more...

Revision history for this message
In , nalimilan (nalimilan-linux-kernel-bugs) wrote :

Fantastic! Have you bisected the whole kernel tree between 2.17 and 2.20? Really great I've found those patches.

The first one doesn't seem to be very important to me, and in 2.6.30 some of its changes have been reverted. But the second one changes dramatically my system's responsiveness. I'm now running it reverted, and there's no possible comparison with the old behavior: now my pointer no longer freezes when performing updates, and almost everything is smooth!

For those that would like to try the patch in 2.6.30, I've updated it as I could, and I'm attaching it. It's quite dirty and I was doubtful it would work, but it looks like that's enough.

Would a kernel dev look at the patches Thomas identified and tell us what he thinks?

Revision history for this message
In , nalimilan (nalimilan-linux-kernel-bugs) wrote :

Created attachment 21816
Patch to revert second commit, updated to apply against 2.6.30rc8

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

(In reply to comment #360)
Thank you very much for you work. I can't imagine how long that bisection must have taken and it is very exciting to have finally found a potential culprit. It would be best for everyone if you opened a new bug report with this information. Developers would be far more likely to look at it if we had a clean slate on which to start.

Revision history for this message
In , james (james-linux-kernel-bugs) wrote :

Are there patches for 2.6.29 available that I can test?

Revision history for this message
In , bob+kernel (bob+kernel-linux-kernel-bugs) wrote :

Isn't the second patch just adjusting things which can be adjusted in proc?

echo 10 > /proc/sys/vm/dirty_background_ratio
echo 40 > /proc/sys/vm/dirty_ratio

Someone want to do some tests after adjusting those two?

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

Created attachment 21822
Backport of the reverted CFQ commit

This is a proper backport of the commit that was indentified by Thomas to be the problematic one.

Thomas, can you please verify that this makes 2.6.30-rc8 behave better? And if it does, it would be interesting to narrow it down to one single change. The first always makes sure that we drain the queue before servicing a queue that has idling enabled, and the second is just a tweak for idle/async immediate expiration. I think the first one is likely the interesting bit, but it would be good to have confirmation on that.

And Thomas, thanks for all your work on this!

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #365)
> Isn't the second patch just adjusting things which can be adjusted in proc?
>
> echo 10 > /proc/sys/vm/dirty_background_ratio
> echo 40 > /proc/sys/vm/dirty_ratio
>
> Someone want to do some tests after adjusting those two?

We already determined months ago that tuning those knobs way down was a way to minimize the problem. (See comment #263 and comment #292 for test results.) It's not a solution, though; it just skirts around the real issue.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

(In reply to comment #366)
> I think the first one is likely the interesting bit, but it would
> be good to have confirmation on that.

Yes it is the first one. I could only execute my long lasting test, which shows only a bad kernel and does not confirm a good kernel one, but I have executed it a long time and there weren't any long times of the lame encoding.

It took 40s without any i/o on all kernels and 48-55s with the following lines during heavy i/o.

+ if (cfqd->rq_in_driver && cfq_cfqq_idle_window(cfqq))
+ return 0;

It took 55-80s without any patch or with the second patch during heavy i/o.

This may be related too. While enabling the second core. The lame encoding process was shifted between the cores without the first patch and it takes up to 130s seconds. I could see it, as the maximum clocks frequency was switched between the cores.

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

This question has probably been answered before, but this bug is huge so I'll just ask again... Thomas, what kind of drive are you using? Does it have NCQ enabled? If so, does disabling NCQ make any difference?

You can disable NCQ on sda by doing:

# echo 1 > /sys/block/sda/device/queue_depth

(or use sdX for others, naturally).

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

The last tests, I have done on a sata drives with queue depth 31. By reducing the queue depth the overall throughput of the two/four concurrent copy operations is nearby halved with and without patch. I have tried to run some tests, but got some really strange results. I will try it again on my test installation at home.

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

(In reply to comment #366)

cd /usr/src/linux-2.6.30-rc8+
suse:/usr/src/linux-2.6.30-rc8+ # patch -p1 < cfq.dif (#360)
patching file block/cfq-iosched.c
Hunk #1 FAILED at 1073.
Hunk #2 FAILED at 1119.
Hunk #3 FAILED at 1129.
3 out of 3 hunks FAILED -- saving rejects to file block/cfq-iosched.c.rej
patching file mm/page-writeback.c
Reversed (or previously applied) patch detected! Assume -R? [n] y
Hunk #1 succeeded at 66 with fuzz 1.
Hunk #2 FAILED at 77.
1 out of 2 hunks FAILED -- saving rejects to file mm/page-writeback.c.rej

suse:/usr/src/linux-2.6.30-rc8+ # patch -p1 < cfq.dif (#360 + #366)
patching file block/cfq-iosched.c
Hunk #3 FAILED at 1119.
Hunk #4 FAILED at 1129.
2 out of 4 hunks FAILED -- saving rejects to file block/cfq-iosched.c.rej
patching file mm/page-writeback.c
Reversed (or previously applied) patch detected! Assume -R? [n]
Apply anyway? [n] y
Hunk #1 FAILED at 66.
Hunk #2 FAILED at 77.
2 out of 2 hunks FAILED -- saving rejects to file mm/page-writeback.c.rej

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

(In reply to comment #371)
You should only try the patch in comment #366

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

Ok, 2.6.30-rc8 + patch in comment #366, xfs

dd if=/dev/zero of=./bigfile bs=1M count=15000 & ./fsync-tester
fsync time: 1.7085
fsync time: 1.6639
fsync time: 0.4616
fsync time: 1.3800
fsync time: 1.3603
fsync time: 1.5529
fsync time: 1.8435
fsync time: 0.2561
fsync time: 0.9318
fsync time: 0.1965
fsync time: 1.2233
fsync time: 1.3920
fsync time: 0.4677
fsync time: 0.4560
fsync time: 1.8206
fsync time: 1.8135
fsync time: 1.8342
fsync time: 0.8565
fsync time: 0.9477
fsync time: 2.8569
fsync time: 0.4323
15000+0 записей считано
15000+0 записей написано
 скопировано 15728640000 байт (16 GB), 181,923 c, 86,5 MB/c
fsync time: 1.3716
fsync time: 0.0168
fsync time: 1.5381
fsync time: 1.5649
fsync time: 0.0349
fsync time: 0.0636
fsync time: 0.0657
fsync time: 0.3337
fsync time: 0.0393

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r b swpd free buff cache si so bi bo in cs us sy id wa
 2 2 0 4230432 808 3417716 0 0 0 87568 1102 1850 1 7 13 79
 0 4 0 4149632 808 3499392 0 0 0 83960 722 1037 1 5 36 57
 0 4 0 4069892 808 3578140 0 0 0 76840 701 1178 1 5 0 93
 1 3 0 3988784 808 3659444 0 0 0 78848 727 1151 1 5 14 79
 0 4 0 3889380 808 3757188 0 0 0 97310 804 1200 2 6 33 59
 0 3 0 3807540 808 3838720 0 0 0 79888 614 1010 2 5 19 74
 0 4 0 3729056 808 3918092 0 0 0 76866 840 1367 0 5 29 65
0 3 0 3002860 808 4645932 0 0 0 90672 597 817 2 6 0 93
 0 4 0 2921840 808 4728132 0 0 0 80416 865 1377 1 6 0 93
 0 3 0 2841564 808 4810132 0 0 0 80384 627 933 1 5 0 93
 1 4 0 2743820 808 4906136 0 0 0 94216 892 1398 1 7 0 92
 0 3 0 2666100 808 4984280 0 0 0 77824 770 1217 1 5 0 93
 1 2 0 2590248 808 5063188 0 0 0 82496 795 1283 2 6 0 92

In the moment of copying of /usr/src/linux-2.6.30-rc8 -> /usr/src/linux-2.6.30-rc8+ (in Konsole, without the use of dolphin, ... and other GUI)
to cause
:~> kdesu /usr/bin/kwrite
it is impossible, after completion of copying - it is impossible, it is needed only to overload an user or computer

Speed of copying of /usr/src/linux-2.6.30-rc8 -> /usr/src/linux-2.6.30-rc8+ as was near to the zero remained so.

time cp -r /usr/src/linux-2.6.30-rc8 /usr/src/linux-2.6.30-rc8+
real 6m14.566s
user 0m0.158s
sys 0m2.838s

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

Brought in misinformation. Sometimes after completion of copying of kdesu /usr/bin/kwrite executed successfully, but in the moment of copying never.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

(In reply to comment #369)
> This question has probably been answered before, but this bug is huge so I'll
> just ask again... Thomas, what kind of drive are you using? Does it have NCQ
> enabled? If so, does disabling NCQ make any difference?

This bug is really annoying. I was not able to reproduce the mouse freezes any more, with and without patch and with and without NCQ. I will try later again.

Is there a possibility to simulate a disc in ram with a parametrized speed and latency?

Revision history for this message
In , perlover (perlover-linux-kernel-bugs) wrote :

Created attachment 21849
The corrected patch from #360 post (for 2.6.29 and may be more kernels)

I tried to patch from post #360 to kernel 2.6.29 and found some rejects
I made rejects by hands and put here normal variant
I saw the patch from #366, but i think there are not same correctoins as #360
So i would like to suggest to test this patch (only cfq-iosched.c file)

Revision history for this message
In , perlover (perlover-linux-kernel-bugs) wrote :

And i saw the patch from post #366, i didn't understand why the author tell that this is "proper backport". There no code with 'prev_cfqq' variable. I think that patch from #366 may be not valid patch.

Please try this patch. This patch for 2.6.29 and more kernels as i think
I didn't test it because i don't have a test machine for experiments. I have only Linux server under a heavy load...

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

It IS the proper backport. I'm the author and maintainer of CFQ, I should know... I would generally advise against using patches from people who don't know what they are doing, especially for data integrity important code like the IO scheduler. There could be data loss from bad patches.

The reason the 2.6.30 and 2.6.29 patches are different is that the CFQ request dispatch mechanism is different in 2.6.30. As such there's no prev_cfqq to take into account, since we never dispatch from more than one cfqq in one round. You would need to take the prev_cfqq out of local function scope for it to have any meaning.

So, not to be rude, but the last thing this bug needs are more cooks or chefs asking people to test things. It's a huge mess already. For now the focus is making Thomas happy, since he's spent much time on this and has a reproducible (sort of) way of testing it. Once that is done, we can proceed to any other potential issues. Any comments not related to that exact issue will be ignored.

Revision history for this message
In , hassium (hassium-linux-kernel-bugs) wrote :

Created attachment 21852
test results

Two hard drive SAMSUNG HD753LJ + NCQ + mdadm raid1 + ext3 + 2GB RAM + Core2Duo E6750 2.66 @ 3.44 GHz

Revision history for this message
In , hassium (hassium-linux-kernel-bugs) wrote :
Download full text (45.0 KiB)

Comment on attachment 21852
test results

>==============================2.6.30==============================
>ff@home-desktop:~$ dd if=/dev/zero of=./bigfile bs=1M count=15000 &
>./fsync-tester
>[1] 6958
>fsync time: 0.1025
>fsync time: 0.8720
>fsync time: 5.5800
>fsync time: 5.6179
>fsync time: 3.7413
>fsync time: 4.2393
>fsync time: 5.2596
>fsync time: 0.0985
>fsync time: 1.7070
>fsync time: 4.1414
>fsync time: 0.1577
>fsync time: 4.8191
>fsync time: 0.6993
>fsync time: 3.6732
>fsync time: 3.6963
>fsync time: 4.7696
>fsync time: 6.0947
>fsync time: 3.4383
>fsync time: 0.7583
>fsync time: 4.0760
>fsync time: 4.1786
>fsync time: 3.9886
>fsync time: 0.3802
>fsync time: 3.4182
>fsync time: 1.1262
>fsync time: 2.8425
>fsync time: 3.9217
>fsync time: 1.4758
>fsync time: 3.7798
>fsync time: 3.9234
>fsync time: 0.3557
>fsync time: 4.1882
>fsync time: 4.4526
>15000+0 records in
>15000+0 records out
>15728640000 bytes (16 GB) copied, 231.473 s, 68.0 MB/s
>fsync time: 2.1747
>fsync time: 0.0820
>fsync time: 0.0774
>fsync time: 0.0299
>fsync time: 0.0268
>fsync time: 0.0282
>fsync time: 0.0277
>fsync time: 0.0270
>^C
>[1]+ Done dd if=/dev/zero of=./bigfile bs=1M count=15000
>
>ff@home-desktop:~$ vmstat 1
>procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 1 0 214308 1592368 6344 66260 1 5 253 653 611 592 15 5 73 7
> 0 0 214308 1592400 6344 66264 0 0 0 0 309 525 7 3 90 0
> 2 0 214308 1592448 6344 66264 0 0 0 0 365 686 5 3 91 0
> 0 0 214308 1592400 6344 66264 0 0 0 0 291 543 5 3 92 0
> 0 2 214308 1126216 6756 464876 0 0 24 398976 980 1265 7 36 37
> 20
> 0 4 214308 1107468 6780 489032 0 0 0 20524 671 551 9 5 35 51
> 0 6 214308 1118544 6780 489032 0 0 0 4 658 575 7 3 32 58
> 0 5 214308 1129752 6784 489032 0 0 0 4 646 578 6 5 36 53
> 0 4 214308 1142036 6784 489032 0 0 0 8 656 576 6 4 36 54
> 2 3 214308 1151708 6784 489032 0 0 0 0 590 501 8 3 16 72
> 0 1 214308 1156616 6792 491124 0 0 0 1572 587 485 7 3 29 60
> 0 2 214308 704504 7188 876836 0 0 0 392152 885 716 8 38 21 32
> 0 4 214308 637132 7252 942604 0 0 0 65728 666 494 7 10 0 83
> 0 4 214308 561368 7324 1016556 0 0 0 73984 686 499 7 12 0 81
> 0 4 214308 490020 7392 1086476 0 0 0...

Revision history for this message
In , perlover (perlover-linux-kernel-bugs) wrote :

I read russian forums about this problem
I should go now and i cannot write to more info right now
But there somebodies tried to checnge scheduler from cfg to other and iowait bug stayed. If you think that bug in scheduler may be to try to change scheduler through /proc ?

Revision history for this message
In , akatopaz (akatopaz-linux-kernel-bugs) wrote :

I'm annoyed by the same bug (I suspect). And I'm able to reproduce it with both anticipatory and cfq schedulers. Therefore, is this but to be link with cfq ?

I'm running my kernel with: elevator=as

Revision history for this message
In , rockorequin (rockorequin-linux-kernel-bugs) wrote :

@Jens Axboe: I tried your patch in comment 366 on the 2.6.30 kernel, and it did improve responsiveness in my initial testing. I used to have the problem that the kernel became highly unresponsive on large file copies to the same partition or as soon as it tried to use swap (in 2.6.30-rc3 and earlier), but the unpatched 2.6.30 performs quite reasonably and the patch improved responsiveness further (my unscientific test results are that moving the mouse resulted in much less 'stuttering' after the patch - note that with earlier kernels the mouse would just freeze).

I did though just find a problem where an overnight memory leak caused X to become so unresponsive it couldn't even draw the screen background until I killed the culprit (firefox). This might be unrelated to the patch, ie a problem with swap management, but it does show that the kernel can still become bogged down under high disk I/O.

Revision history for this message
In , perlover (perlover-linux-kernel-bugs) wrote :

Did anybody here resolve this bug ?
I see a workaround only as an installing FreeBSD instead a Linux kernel version >= 2.6.18

Revision history for this message
In , perlover (perlover-linux-kernel-bugs) wrote :

I think that i coped with a this bug !
I made some options of kernel and my server works stability and there are no frozen timeouts with high iowait already 10-12 hours!

Detailed info:
My kernel now is 2.6.22.14-72.fc6
Fedora Core 6

This the suggestion is not bug resolving (i think there is bug in kernel and it stays) but this is a workaround. I have read many topics and forums and stopped at these commands:

# echo 50 > /proc/sys/vm/vfs_cache_pressure
# echo deadline > /sys/block/DEVICE/queue/scheduler
# # echo 1 > /sys/block/DEVICE/device/queue_depth
# echo 1024 > /sys/block/DEVICE/queue/nr_requests

The DEVICE is 'hda' or 'sda' for some HDDs. I didn't test queue_depth because for my HDDs (SAS SCSI + RAID10) this file is readonly (no there NCQ supporting as i think). But may be this command will help to you. I don't know.

I suggest anybody who have a frozen timeouts with high iowait to try this turning

I am very glad! Please to try this workaround. I didn't test 'dd' command but my heavy a HDD working has been freezing the server. Now i don't see this.

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

Can you try the three settings separately, to see which one makes the large difference?

Revision history for this message
In , perlover (perlover-linux-kernel-bugs) wrote :

I will try but this is my work server under heavy load. I am afraid now there to touch something already :-/
But near time i will try to define the main option of this turning. Already passed > 24 hours and i don't have a troubles there with freezes. I cannot believe ...

Revision history for this message
In , perlover (perlover-linux-kernel-bugs) wrote :

Here test for same server as in my post here # 359
Same server but after turning port # 385

# dd if=/dev/zero of=testfile.1gb bs=1M count=1000

And during 'dd' i do vnstat 1:

 0 2 116 103632 507240 2016112 0 0 1324 16 1024 963 1 1 50 48
0
 1 2 116 101512 507484 2015736 0 0 1436 0 1314 1253 21 5 25 48
0
 0 2 116 103632 507240 2016112 0 0 1324 16 1024 963 1 1 50 48
0
 0 7 116 25208 496944 2105464 0 0 4 26272 2892 239 0 4 23 73
0
 0 9 116 21636 496972 2109568 0 0 32 21904 2150 339 0 2 8 90
0
 0 10 116 39888 481904 2105552 0 0 4 23544 1964 368 0 4 1 96
0
 0 9 116 49036 472984 2105016 0 0 8 18252 1730 728 0 3 0 97
0
 0 7 116 61700 459736 2105412 0 0 16 74176 2167 317 0 5 13 82
0
 0 7 116 71416 450576 2104272 0 0 24 8680 1322 237 0 4 16 80
0
 1 5 116 82772 439000 2106280 0 0 24 58616 1457 3332 0 7 5 88
0
 1 5 116 97224 424752 2105804 0 0 20 60164 848 286 0 6 24 70
0
 0 7 116 110700 409384 2107036 0 0 56 105584 884 397 0 9 15 76
 0
 2 5 116 116444 392304 2118776 0 0 288 95624 1096 424 1 11 10 78

As you can see there no stability iowait 90-99%, only sometimes ...

Revision history for this message
In , perlover (perlover-linux-kernel-bugs) wrote :

Here some tests:

I do to defaults settings before tuning:
# echo 100 > /proc/sys/vm/vfs_cache_pressure
#
# echo cfq > /sys/block/sda/queue/scheduler
#
# echo 128 > /sys/block/sda/queue/nr_requests

# dd if=/dev/zero of=testfile.1gb bs=1M count=1000
^C
116+0 records in
116+0 records out
121634816 bytes (122 MB) copied, 20.5609 seconds, 5.9 MB/s
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^ (!!!)

During a riunning of 'dd' i do vmstat 1:

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
-
 r b swpd free buff cache si so bi bo in cs us sy id wa st
 0 10 116 760168 503488 1329836 0 0 4 4 0 0 9 3 65 23
0
 0 11 116 756132 502488 1330536 0 0 1332 5648 1744 4909 5 2 2 91
0
 0 12 116 760208 503128 1330856 0 0 1136 4388 1875 3053 4 2 0 94
0
 0 11 116 759832 502668 1331608 0 0 1004 7488 2379 4032 1 2 0 97
0
 0 12 116 758740 503288 1331832 0 0 1280 3252 1818 2402 1 1 0 98
0
 0 10 116 733976 502936 1356780 0 0 1232 4476 1753 4143 1 3 0 96
0
 1 8 116 733596 502368 1357324 0 0 804 5792 1831 2980 20 2 0 79
0
 1 7 116 738388 502920 1357788 0 0 928 6652 1875 2349 17 2 4 77
0

**************************

Now i after this to do:

# echo 50 > /proc/sys/vm/vfs_cache_pressure
#
# echo deadline > /sys/block/sda/queue/scheduler
#
# echo 1024 > /sys/block/sda/queue/nr_requests

# dd if=/dev/zero of=testfile.1gb bs=1M count=1000

^C
638+0 records in
638+0 records out
668991488 bytes (669 MB) copied, 10.463 seconds, 63.9 MB/s
                                     ^^^^^^^^^^^^^^^^ (!!! :-))) )

During 'dd' i do in other terminal:
# vmstat 1

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
-
 r b swpd free buff cache si so bi bo in cs us sy id wa st
 1 7 116 718764 502884 1371484 0 0 4 4 0 0 9 3 65 23
0
 0 9 116 687208 502924 1405624 0 0 8 26664 2708 746 6 4 3 87
0
 1 8 116 668924 502976 1422116 0 0 16 21404 2246 8462 1 4 9 87
0
 0 8 116 654804 501632 1434492 0 0 24 30804 2072 9249 10 4 0 86
0
 0 10 116 613152 501692 1475220 0 0 20 42880 2021 4408 15 5 7 73
0
 2 10 116 559860 499464 1524600 0 0 32 58504 2108 10612 5 6 15 74
 0
 0 11 116 510132 499528 1578340 0 0 36 59400 984 1748 17 5 2 77
0
 0 10 116 399420 499672 1689316 0 0 108 111332 910 957 4 11 2 84
 0
 1 7 116 331556 499756 1750580 0 0 104 62268 1501 5255 11 6 10 74
0

*********************

and noticing:

I have other servers, there other hardware. I cannot repeat this iowait problem with and without this turning (there Fedora release 7 (Moonshine), kernel 2.6.23.17-88.fc7). Now i think that this trouble is not for all HDDs. May be this trouble is hardware dependent.

I am researching now what option will help to resolve iowait problem

Revision history for this message
In , perlover (perlover-linux-kernel-bugs) wrote :

I determined main option:

Only this option helped to me:

# echo deadline > /sys/block/sda/queue/scheduler

I don't understand why. I have read many russian topics that a changing of scheduler doesn't help ... I don't think that only a changing of scheduler will help to me. But i only have changed the scheduler from cfq to deadline and 'dd' test now this:

# dd if=/dev/zero of=testfile.1gb bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 13.7121 seconds, 76.5 MB/s

iowait sometime was only 80-90%.

Here my current setting:

# cat /proc/sys/vm/vfs_cache_pressure
100
# cat /sys/block/sda/queue/scheduler
noop anticipatory [deadline] cfq
# cat /sys/block/sda/queue/nr_requests
128

Now i will keep these setting and will watch there are a freezes or not.

Revision history for this message
In , perlover (perlover-linux-kernel-bugs) wrote :

I made a some experiments
And i think that i found main reason of high iowait with cfq scheduler.

I made some tests:

I changed cfg <--> scheduler into my two servers with same hardware & OS (FC6, kernel 2.6.22.14-72.fc6). There same CPUs, motherboard, SAS & RAID controllers & HDDs. But i saw only in one server high iowait & cfq scheduler during 'dd' command.

I think that main reason is A LOT AMOUNT OF USED INODES OF PARTIOTION into HDD.

For example:

The 'OK' server where i counld not reproduce bug:
# df -i

/dev/sda1 524288 8543 515745 2% /
tmpfs 219756 1 219755 1% /dev/shm
/dev/sda6 787200 34068 753132 5% /usr
/dev/sda5 787200 25582 761618 4% /usr/local
/dev/sda7 524288 1993 522295 1% /var
/dev/sda8 30900224 1719787 29180437 6% /wwws
/dev/sda3 1048576 49655 998921 5% /wwws/accel-proxy

I wrote the test testfile.1gb file to /wwws partiotion . There no highest iowait with deadline & cfq schedulers.

The second server, 'BAD' server has a same hardware & soft but there df -i:

Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda1 524288 7444 516844 2% /
tmpfs 219756 1 219755 1% /dev/shm
/dev/sda7 787200 35307 751893 5% /usr
/dev/sda6 787200 27520 759680 4% /usr/local
/dev/sda8 524288 2334 521954 1% /var
/dev/sda3 30900224 5332794 25567430 18% /wwws
/dev/sda5 524288 4128 520160 1% /wwws/accel-proxy

I did 'dd' tests to /wwws/ partition too (i get used to write there big files) ... There if i use cfq scheduler and (important) have some worked processes (apaches, mysql - not idle server) that during 'dd' command i have highest iowait (90-99%) and very low speed of writing (9-10 Mb/sec). If i change there to deadline scheduler and write to /wwws/ partition too i have 60-80 Mb/sec speed and not high iowait. But if i wrote testfile.1gb to other partiotion (for example to /var) i have not iowait even with cfq scheduler. Thus cfq scheeduler + a lot used inodes is bad as i think. The deadline scehduler + a lot used inodes is not bad.

So i think that high amount of used inodes in partiotion and cfq scheduler together have some wrong something.

May be if i could have a much used inodes into my other servers (FC7 where i could not reproduce iowait problem) i could reproduce this high iowait bug too.

Please to try make a many many small files in some partiotion (5-6 millions for example) and to test 'dd' & cfq scheduler.

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

It would be ideal, if you could try 2.6.30 on the problematic server. I realize that this may not be easy, however there's not much I can do about a problem on an ancient kernel.

If you do try 2.6.30 and it also has the same problem, then I want you to capture some blktrace data of both deadline and cfq. Basically, right after you start the dd test, in another terminal do:

# cd /dev/shm; blktrace /dev/sda

and ctrl-c that blktrace after ~5 seconds or so. Then stop the dd as well. Save the blktrace files on the harddrive.

Now switch do deadline and repeat the exact same thing. Then tar up the two sets of files and attach them to this bug report.

Revision history for this message
In , perlover (perlover-linux-kernel-bugs) wrote :

Jens Axboe, i happy to help but i cannot to try 2.6.30 :(

I never install kernels and i am afraid that something maybe to be not right after installing kernel and i will not be able to access to server. This server under heavy load and is located in other continent. I cannot risk, sorry ;-(

May be will anybody try to make many small files in HDD (many inodes - ~ 5-6 millions for example) and will try to compare cfq & deadline schedulers ?

Revision history for this message
In , hassium (hassium-linux-kernel-bugs) wrote :

Created attachment 22019
test results 2.6.30: cfq, deadline

It gives no improvement in responsiveness at all variants. Maybe quite a bit.

Revision history for this message
In , erbrochendes (erbrochendes-linux-kernel-bugs) wrote :

hi, i am using the 2.6.30 kernel with the patch from #366.
Before using the patch i got really trouble when downloading large files with torrent at high speed ( over 5MB/sec).
Now it just works great. Thanks for this patch.

Revision history for this message
In , hassium (hassium-linux-kernel-bugs) wrote :

Created attachment 22167
test result 2.6.30 without ACHI

I turned off the ACHI in BIOS on the laptop. System has become much more responsive. Now possible to run new applications, while the dd is running.

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

Created attachment 22180
Drain async IO on the hw side

This patch makes sure that async IO has completed drained from the device queue before starting sync IO. Hopefully that should make things as good as disabling NCQ, and it should even improve the situation without NCQ.

I'd like for people to test this patch and see if it makes a difference. It's against 2.6.31-rc (ish), but I _think_ it will apply against 2.6.30 as well. If not, holler, and I'll do a backport too.

Revision history for this message
In , hassium (hassium-linux-kernel-bugs) wrote :

Created attachment 22184
test result 2.6.30 with patch from #397

(2.6.30 + NCQ + patch from #397) == (2.6.30 + NCQ). New applications start very slow.

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

Notebooks toshiba earlier came across to me, and I saw as on them then worked Linux. If ACPI to switch off — it was possible to listen to music (but, and it is clear, not to see how many remains fuel in batteries) and if ACPI to switch on — any sound was not, whether but it was visible that with a battery began and to understand have stolen this battery while we enjoyed music. It is possible and to accuse of it the scheduler (in our case cfq) and to search why it cannot plan simultaneously two processes — to play music and to check a battery state.
For us it has turned out that all schedulers have broken (because I have tried them all — and all non-working). The theory of probability does not deny possibility of such event. But, that, for one person all schedulers broke, and all worked for another are already influence supernatural forces. Struggle against them is useless.
So why for one all works, and for another the system hardly creeps. In what a difference. Only in computers (or is more exact in their complete set).

I can be mistaken, but can then someone will tell why on one iron all simply flies, and on other hardly creeps (without looking at that on the second both the processor faster and disks faster, and bus faster).

Revision history for this message
In , kebjoern (kebjoern-linux-kernel-bugs) wrote :

Had big troubles on a ASUS PN5e Motherboard and a WD 320 G. Compiled a 2.6.31-rc3 with your patch and it works great. Thank you very much! I'd like to backport it to 2.6.29 to try it together with the realtime patch. Is there a chance to get it working?

Revision history for this message
In , hassium (hassium-linux-kernel-bugs) wrote :

2.6.31-rc3-git3 + NCQ + patch from #397: new applications start very
slow.
Without NCQ new applications start quickly.

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

There is one more interesting question.
KSYSGUARD shows "Used Memory" = 0.66Gb.
> top
top - 21:43:57 up 7:00, 3 users, load average: 0.74, 0.39, 0.29
Tasks: 149 total, 3 running, 146 sleeping, 0 stopped, 0 zombie
Cpu (s): 2.8%us, 1.3%sy, 0.0%ni, 93.1%id, 2.5%wa, 0.2%hi, 0.2%si, 0.0%st
Mem: 8035628k total, 7998716k used, 36912k free, 0k buffers
Swap: 2104472k total, 6564k used, 2097908k free, 7402836k cached

When value Mem:used aspires to value Mem:total - the graphic interface works much more slowly (and without any disk operations).

It only at me is present such problem?

Revision history for this message
In , benjfitz (benjfitz-linux-kernel-bugs) wrote :

I applied the patch in 397 to a vanilla 2.6.30.4 and the difference was dramatic (with the patch is _much_ better, ie the complete freezing for 15+ seconds when running multiple IO intensive jobs are gone). I'll work on getting some hard numbers (with iobench, etc) to see if they agree.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

(In reply to comment #397)
> Created an attachment (id=22180) [details]
> Drain async IO on the hw side
>
> This patch makes sure that async IO has completed drained from the device
> queue
> before starting sync IO. Hopefully that should make things as good as
> disabling
> NCQ, and it should even improve the situation without NCQ.
>
> I'd like for people to test this patch and see if it makes a difference. It's
> against 2.6.31-rc (ish), but I _think_ it will apply against 2.6.30 as well.
> If
> not, holler, and I'll do a backport too.

Is this in the vanilla 2.6.31-rc5 already?

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

No, the patch is queued up for 2.6.32 since it was a rather risky change for 2.6.31. But I'm glad it makes a difference, that means that the starvation experienced is largely on the device side. By draining the queue, we prevent that from happening (or, at least we lessen the effect dramatically).

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

2.6.31-rc7 + patch in 397 - There are no improvements

Revision history for this message
In , james (james-linux-kernel-bugs) wrote :

No improvements seen here with 2.6.30.5 and the patch, either. Pretty much *any* write to swap causes major latency (disruption to audio, graphics etc.).

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

There is an improvement in desktop responsiveness with kernel 2.6.31 and the as scheduler compared to the cfq scheduler. It does not solve the problem, but it makes it more sufferable. I am using a full encrypted lvm drive with ext3 partitions, mounted with noatime and data=ordered.

Revision history for this message
In , rockorequin (rockorequin-linux-kernel-bugs) wrote :

I've observed something that might be relevant to this bug (using the 2.6.31.5 kernel): when I do large I/O operations from one external device (say /dev/sdb) to another slow USB flash key (say /dev/sdc), I can hear my *internal* hard drive (/dev/sda) thrashing away constantly even though its light indicates that no read/write activity is going on. During this time anything that requires access to /dev/sda is slowed right down and hence running new programs slows down disk access.

When I start copying, eg using nautilus, there is usually a 400 MB buffering delay before writing starts to the USB drive (ie before its light starts flashing). During this time, there is NO /dev/sda thrashing. /dev/sda starts thrashing starts as soon as the USB key light starts flashing.

So there appears to be a bug that makes /dev/sda constantly seek during the /dev/sdc USB write operation, and this is affecting system responsiveness.

Revision history for this message
In , axboe (axboe-linux-kernel-bugs) wrote :

Please try 2.6.32-rc5. Make sure you are using CFQ as your io scheduler.

Revision history for this message
In , rockorequin (rockorequin-linux-kernel-bugs) wrote :

I opened http://bugzilla.kernel.org/show_bug.cgi?id=14491 to track this bug separately - I've put comments in there about 2.6.32-rc5, which I don't think exhibits the problem.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :
Download full text (4.1 KiB)

Created attachment 23618
Simple sleeper test case

As this bug occurs more permanent while working in an virtual machine or while using java and I still think, that's this is a process scheduler bug (or something related). Here another test case, which shows the suspected behaviour. As there are many system calls while using a virtual machine, I have tries to find an equal test. The test case just sleeps for 1µs and measures the time difference of the usleep operation. I am using such many of the usleep operations, as the problems does not occur deterministic and I tried to catch as many as possibly occurrences.

I have run this test case on three machines. The first one was a Core2 Duo with a first generation SDD (OCZ Core Series) with a poor write performance and on a Ubuntu kernel 2.6.31-14-generic. The partitions are block aligned. I have run this test, while my wife was using firefox. Every time, she was submitting something and firefox is using sqlite for writing the history, there was a high latency for the sleep test.

Timediff 7629094: 16.80ms Total: 61.12ms
Timediff 7629100: 18.82ms Total: 93.68ms
Timediff 7629101: 19.96ms Total: 113.54ms
Timediff 7629102: 19.98ms Total: 133.43ms
Timediff 7629103: 19.97ms Total: 153.31ms
Timediff 7629104: 20.00ms Total: 173.24ms
Timediff 7629105: 19.96ms Total: 193.09ms
Timediff 7629106: 20.02ms Total: 213.02ms
Timediff 7629107: 19.94ms Total: 232.86ms
Timediff 7636162: 16.40ms Total: 34.44ms
Timediff 7636164: 19.90ms Total: 64.00ms

While the duration of 100 usleep should be somewhere between 10ms and 20ms, 10 usleep(1) takes more than 200ms. This behaviour is reproducible.

On my machine, a Core2Duo, a normal 2.5" hard drive with a vanilla kernel 2.6.31.5, there is an equal behaviour. While making a backup from one hard drive to another, the latency jumps to >30ms for one usleep(1) nearby every second. But there are some latencies grater than >150ms for one usleep(1).

Timediff 11054523: 38.23ms Total: 53.19ms
Timediff 11212737: 21.64ms Total: 31.46ms
Timediff 11213557: 35.59ms Total: 44.62ms
Timediff 11213939: 59.88ms Total: 65.76ms
Timediff 11264190: 40.83ms Total: 49.72ms
Timediff 11264709: 53.77ms Total: 63.09ms
Timediff 11265629: 145.74ms Total: 155.96ms
Timediff 11327458: 16.94ms Total: 25.23ms
Timediff 11376430: 18.91ms Total: 27.67ms
Timediff 11408941: 17.67ms Total: 26.36ms
Timediff 11424964: 19.26ms Total: 28.01ms
Timediff 11509722: 19.84ms Total: 28.30ms
Timediff 11627259: 27.01ms Total: 34.51ms
Timediff 11645718: 18.26ms Total: 29.80ms

On my server Athlon X2 on a full encrypted RAID-5 with lvm on a 2.6.18-128.2.1.el5.028stab064.7 (CentOS with OpenVZ) kernel, the behaviour was even worse. While copying a 4GB iso, there are latencies of one usleep(1) up to one second.

Timediff 40397: 24.16ms Total: 122.93ms
Timediff 40417: 859.04ms Total: 981.78ms
Total 40417: 981.78ms
Timediff 45928: 22.62ms Total: 220.16ms
Timediff 50471: 25.02ms Total: 135.80ms
Timediff 51085: 19.23ms Total: 163.03ms
Timediff 51097: 205.12ms...

Read more...

Revision history for this message
In , vshader (vshader-linux-kernel-bugs) wrote :

I also had problem with system latency with high I/O usage. After applying patch from #397 to kernel 2.6.31.5, the problem became really smaller. Before patching, machine were sometimes freezing for more than 5 minutes. Now, maximum latency delay is less than half-second.

Revision history for this message
In , zenith22.22.22 (zenith22.22.22-linux-kernel-bugs) wrote :

I have the same issue on a machine with i845e chipset, P4-1.5 Northwood, 2GB DDR RAM, GF6800 video and Audigy2 sound card. My main HDD is 160GB IDE Seagate.

When there is disk activity the system becomes virtually unusable.

For example, when I am burning a DVD on the drive attached to SII 3512 SATA controller, the CPU load goes from 40% at 7-8x to 98% at 16x.

Downloading Fedora12 ISO last night at 500 kb/s kept system busy at 90%!

If I start kernel compile, CPU load is stable 100%, which is Okay, but switching tabs in Firefox takes 10 seconds and starting any application like JUK, Dolphin, Konsole takes up to 1 minute.

Running Fedora11 with 2.6.30.9.96 FC11 i686 PAE kernel.

The system has become a bit more responsive (by about 10-20%) since I noticed p4-clockmod was being loaded and shut it down.

Revision history for this message
In , ylalym (ylalym-linux-kernel-bugs) wrote :

There are not enthusiastic comments after an output 2.6.32. I understand so - "And cartful and now there"

Revision history for this message
sbec67 (sbec) wrote :
Revision history for this message
Dexter (pogany-tamas+bug) wrote :

I have this problem too, check my logs. My USB 2.0 pendrive's speed is about 3-5 MB/s. Soooo slow.

Revision history for this message
sbec67 (sbec) wrote :

did someone took a look on this ?
This Bug is really annoying, as it takes ages to bring some Data on a USB Device ;-(

Regards

Changed in linux (Ubuntu):
status: New → In Progress
assignee: nobody → Colin King (colin-king)
Revision history for this message
Colin Ian King (colin-king) wrote :

@sbec67, can you run the following command to collect all the system specific information about your computer to help us diagnose this bug:

apport-collect 500069

please can you supply the model number of the pendrive and if you are using it via a USB hub or not.

Thanks!

Changed in linux (Ubuntu):
importance: Undecided → Medium
importance: Medium → High
Andy Whitcroft (apw)
tags: added: kernel-series-unknown
Revision history for this message
sbec67 (sbec) wrote :

@Colin: I entered "sudo apport-collect 500069"
the Device i facing the Problem is a Tech Line DFS-1002 MP3 Player.
But it seems the Problem happens with almost any Flash Memory USB Device.

@Andy: GRUB uses ( out of /boot/gruub/menu.lsi ) this Kernel Parm to boot:

kernel /boot/vmlinuz-2.6.31-17-generic root=UUID=96f8feb3-e65b-4696-b8f
5-c32638cde861 ro quiet splash

Kind regards
Simon

Revision history for this message
Cklein (pablo-cascon) wrote : apport-collect data

AplayDevices:
 **** List of PLAYBACK Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC272 Analog [ALC272 Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
Architecture: i386
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC272 Analog [ALC272 Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: pablo 1645 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xf0340000 irq 22'
   Mixer name : 'Realtek ALC272'
   Components : 'HDA:10ec0272,144dca00,00100001'
   Controls : 19
   Simple ctrls : 12
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=UUID=c1938014-8689-4db3-add3-99b0c514b946
MachineType: SAMSUNG ELECTRONICS CO., LTD. NC10
Package: linux (not installed)
ProcCmdLine: root=UUID=5f964d43-b8cd-4886-a6e5-80376a5eb7b5 ro quiet splash
ProcEnviron:
 SHELL=/bin/bash
 LANG=es_ES.UTF-8
ProcVersionSignature: Ubuntu 2.6.31-17.54-generic
RelatedPackageVersions:
 linux-backports-modules-2.6.31-17-generic N/A
 linux-firmware 1.25
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
Tags: ubuntu-unr
Uname: Linux 2.6.31-17-generic i686
UserGroups: adm admin cdrom dialout fuse lpadmin plugdev sambashare
WpaSupplicantLog:

dmi.bios.date: 09/08/2009
dmi.bios.vendor: Phoenix Technologies Ltd.
dmi.bios.version: 11CA.M015.20090908.RHU
dmi.board.name: NC10
dmi.board.vendor: SAMSUNG ELECTRONICS CO., LTD.
dmi.board.version: Not Applicable
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: SAMSUNG ELECTRONICS CO., LTD.
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLtd.:bvr11CA.M015.20090908.RHU:bd09/08/2009:svnSAMSUNGELECTRONICSCO.,LTD.:pnNC10:pvrNotApplicable:rvnSAMSUNGELECTRONICSCO.,LTD.:rnNC10:rvrNotApplicable:cvnSAMSUNGELECTRONICSCO.,LTD.:ct10:cvrN/A:
dmi.product.name: NC10
dmi.product.version: Not Applicable
dmi.sys.vendor: SAMSUNG ELECTRONICS CO., LTD.

tags: added: apport-collected
Revision history for this message
David Turner (dkturner) wrote : Re: since Ubuntu karmic Filetransfer to some USB Drives got realy slow

I've also experienced very slow USB transfers, with CPU waiters piling up. The odd thing is that USB-to-USB transfers go at about four times the speed of HDD-to-USB transfers. On average I get 2MB/s from hard drive to USB, and about 8MB/s copying between flash drives.

Revision history for this message
sbec67 (sbec) wrote :

its correct.. this Report is a duplicate of #197762
But #197762 has Status "incomplete", and is Medium. & almost 1 1/2 Years old !

I think this Bug should be High.. its realy a Pain in the A..... for many Users.
Regards
Sbec

Revision history for this message
Simon Holm (odie-cs) wrote :

sbec67:
Have you tried your device with another OS with different performance results?

From your lsusb output is does not look like your device is listed? Was it plugged in when you ran lsusb?

If that doesn't turn up anything obvious you should probably see https://help.ubuntu.com/community/DiskPerformance and test whether the raw device performance is better than when formatted. Backup your data on the device first.

dd numbers with bs=1M in addition to the bs=32 would also be interesting.

Revision history for this message
MMS-Prodeia (mms-prodeia) wrote :

I do second this report!

Revision history for this message
sbec67 (sbec) wrote :

Simon Holm Thøgersen:
it was connected, and it also shows on lsusb
Bus 001 Device 004: ID 0402:5661 ALi Corp

its a Bug, witch hit many Users.
Do a google...... just one Talk:
http://ubuntuforums.org/showthread.php?t=1306333

i passed all Data with the apport-collect command.. Unit was connected while doing this command.

i did a other Test with a Sony 4GB USB stick.. there i dont see this Problem...
so it seems that that Problem happens only with certain Flash Drives
Regards

Mudstone (agovernali)
description: updated
Revision history for this message
Mathias Zubek (mathias-zubek) wrote :

Hello,

I have this problem too....
2 Installations: My PC is an Intel MoBo DG965WH; my Laptop is an ThinkpadT60.
Both with Karmnic 9.10 and patches are up to date. Maximum Write speed after transferring more than 150MB slows down to 3,5MB/s.
Adding ehci_hcd to /etc/initramfs-tools/modules as recommended in some posts was not helpful.

Thanks in advance

Revision history for this message
In , Hedayat (hedayat-redhat-bugs) wrote :

Description of problem:
I was trying to copy a very large file (3.3 Gb iso of Fedora x86_64) to a flash drive, and after copying a few hundred megabytes (e.g. 300MB) the copying speed slowed down to around 500KB/s and it was going even lower gradually. It was really unacceptable (the remaining time was climbing up to over 1 hour and 30 minutes using nautilus to copy the file). I've tried different flash drives and different USB ports of my laptop, and there was no significant difference. I encountered this slow copying in past, but usually I've ignored it as the files was not that big. But this time it was really disappointing and I've decided to see why it is so slow.

iotop showed slow disk writes (it was often 0, jumping to some low values (usually less than 500KB/s) and lowering to 0 again) and lots of IO waiting time for pdflush. After searching a little in the Internet, I found that playing with dirty_ratio and dirty_background_ratio kernel parameters in /proc/sys/vm/ might help.

First, I lowered the dirty_background_ratio to 1 (default value is 10), and the disk write speed jumped to around 2.5 MB/s for awhile (around 1 or a few minutes) and then the speed dropped again to around 0.

Finally, I lowered the dirty_ratio parameter to 10 (default value is 20), and it resulted in a constant 2.5MB/s to 3MB/s disk read (from hard disk) and write (to usb disk) speed to the end of the copy operation (which took a few minutes rather than more than an hour!).

The problem is so weird and unacceptable.

As it might help, I have 1.5GBs of RAM and at the time of copy around 44% of it was used by running programs.

Version-Release number of selected component (if applicable):
kernel-2.6.31.12-174.2.3.fc12.x86_64
but the problem was observed in previous F12 kernels too.

How reproducible:
100% in my few tests.

Steps to Reproduce:
1. Start copying a large file (e.g. over 2GB) to a regular USB flash drive. Use a single big file rather than many small files.
2. Observe the copying speed

Actual results:
Very slow copying operation (around 450KB/s in my case)

Expected results:
Regular copy speed (2.5MB/s seems to be possible with my hardware and the mentioned settings)

Additional info:
I don't know if it helps but: the source file was on an NTFS mounted partition, and both of the partitions were mounted by Gnome.
The problem might be related to the mount options used by gnome, but doesn't seem to be related to nautilus. Since I get slow speeds even using dd to copy the file.

Revision history for this message
sbec67 (sbec) wrote :

@colin Mins: any new on this, as its assigned to U ?

Rgerads
Sbec

Revision history for this message
In , spawels13 (spawels13-linux-kernel-bugs) wrote :

Created attachment 25281
perf chart high io latency

I am using 2.6.33 kernel and this problem is still present. When I copy big file (few GB) system becomes unresponsive. I ran perf chart and generated svg image. You can notice Plasma-desktop (part of KDE) is blocked for long time by IO. I copied file from the ntfs partition, but it also happens when I am copying big files over my Linux partition or from hard drive to pendrive.

Revision history for this message
In , cmertes (cmertes-linux-kernel-bugs) wrote :

(In reply to comment #416)
> I am using 2.6.33 kernel and this problem is still present.

Yep, this definitely earns the Most Embarrassing Linux Bug Award 2009 and is a Nominee for Most Annoying Linux Bug 2009 although the ATI binary driver wins in this category. Call me unfair for allowing binary blobs.

Revision history for this message
In , bgamari (bgamari-linux-kernel-bugs) wrote :

I will agree that something still isn't right with the VM. In my uninformed opinion, it does seem to be far too eager to swap out executable page in favor of streaming pages. Unfortunately, it seems that very few people know the VM well enough to fix it.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

I am currently using the linux kernel 2.6.33 and the the desktop responsiveness is awful on my machine compared to the 2.6.32.x kernel. It's even worse than I have even seen it before. The load avg is rising to >7 very quickly, while writing many small file to the filesystem. I can make some tests with my configuration, but a kernel developer should tell me which tests.

Revision history for this message
In , akpm (akpm-linux-kernel-bugs) wrote :

(In reply to comment #419)
> I am currently using the linux kernel 2.6.33 and the the desktop
> responsiveness
> is awful on my machine compared to the 2.6.32.x kernel. It's even worse than
> I
> have even seen it before. The load avg is rising to >7 very quickly, while
> writing many small file to the filesystem. I can make some tests with my
> configuration, but a kernel developer should tell me which tests.

This isn't really the best place to bring this up. Please send a full description to <email address hidden>. cc myself, Ingo Molnar <email address hidden>, Peter Zijlstra <email address hidden>, Jens Axboe <email address hidden>. In that email, please identify what the system is doing at the time. Is it disk-related? CPU scheduler related? etc.

Thanks.

Revision history for this message
In , frankrq2009 (frankrq2009-linux-kernel-bugs) wrote :
Download full text (8.4 KiB)

Gentlemen,
 I have suffered the high iowait problem for almost 4 years, and I have been watching the bug report(Bug 12309) on bugzilla.kernel.org for 1 year,
and yesterday I finally managed to get out of this trouble by switching from CentOS 5.4(with kernel 2.6.18) to zenwalk 6.2(with a snapshot kernel 2.6.32.2).
 The computer is used to collect signal data from 4 gas turbines in a power plant. The project started from 2004,and we used mandrake 9 and zenwalk, both are 2.4.x kernel,and there was no high iowait problems. Since 2006 we switched to fedore 6(kernel 2.6.18) and then centos 5, and the iowait began to make trouble, the system's response of mouse and keyboard became very slow, new applications needed a long time to be launched. During these years, I always thought the main reason of this was because the computer's hardware was not good enough. But early this month, the plant has upgraded the computer to a new Lenovo server with two Xeon E5504 CPUs(total 8 cores), and 4GB memory,but the iowait is still very very high, the following is the output of "top" command on that machine:

Tasks: 215 total, 1 running, 213 sleeping, 0 stopped, 1 zombie
Cpu0 : 1.0%us, 0.3%sy, 0.0%ni, 65.9%id, 32.8%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 1.0%us, 3.6%sy, 0.0%ni, 45.0%id, 50.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 1.0%us, 4.0%sy, 0.0%ni, 94.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu3 : 1.3%us, 3.3%sy, 0.0%ni, 56.3%id, 38.3%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu4 : 1.3%us, 6.7%sy, 0.0%ni, 0.0%id, 89.7%wa, 0.7%hi, 1.7%si, 0.0%st
Cpu5 : 0.3%us, 3.3%sy, 0.0%ni, 91.7%id, 0.0%wa, 0.7%hi, 4.0%si, 0.0%st
Cpu6 : 10.3%us, 30.2%sy, 0.0%ni, 50.2%id, 2.3%wa, 1.0%hi, 6.0%si, 0.0%st
Cpu7 : 1.3%us, 8.6%sy, 0.0%ni, 83.1%id, 4.0%wa, 1.0%hi, 2.0%si, 0.0%st
Mem: 4078540k total, 3872720k used, 205820k free, 182344k buffers
Swap: 4192956k total, 0k used, 4192956k free, 2815596k cached

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 3841 markv 15 0 72172 12m 8380 S 42.2 0.3 1984:24 lvinf
 8573 markv 15 0 60232 12m 8876 S 11.6 0.3 0:17.22 mark
 4067 markv 15 0 19056 3224 2336 S 10.6 0.1 759:52.00 dms
 3548 mysql 21 0 656m 617m 9292 S 9.0 15.5 764:42.05 mysqld
27042 markv 15 0 69404 12m 8756 S 4.3 0.3 290:36.14 walin
 3810 root 15 0 39772 15m 8224 S 1.3 0.4 3:59.76 Xorg
    1 root 15 0 2068 620 532 S 0.0 0.0 0:01.19 init
    2 root RT -5 0 0 0 S 0.0 0.0 0:00.04 migration/0
    3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
    4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
    5 root RT -5 0 0 0 S 0.0 0.0 0:00.02 migration/1
    6 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1
    7 root ...

Read more...

Revision history for this message
Badcam (kiwicameron+launchpad) wrote :

I believe that I have this very same issue on my Mint 8.0 distro. I have two 5 port USB 2.0 hubs and only 1 USB port in either device is showing as a USB 2.0, with all the rest showing 1.1 I've checked the hardware and every port is supposed to be 2.0

Running a "sudo dmesg | grep usb" swwms to show that all the devices are recognised as 2.0 but somehow being limited to 1.1:

[156037.796028] usb 10-2: new full speed USB device using uhci_hcd and address 5
[156037.941105] usb 10-2: not running at top speed; connect to a high speed hub
[156037.979263] usb 10-2: configuration #1 chosen from 1 choice

I have not attached a patch, just my Terminal info.

Revision history for this message
aleth (aleth) wrote :

I also have this issue when writing to a 4Gb USB stick. Transfer rates start high, then slow to a crawl or even stall completely. As they slow down, the whole computer becomes unresponsive for seconds at a time.
On the same machine, the same USB stick is working fine under Windows XP; read access is also no problem.

Revision history for this message
aleth (aleth) wrote :

Just in case, it might be worth pointing out there are many more reports of this problem at http://ubuntuforums.org/showthread.php?t=1306333&page=12

Revision history for this message
André Desgualdo Pereira (desgua) wrote :

This is one workaround until it's got fixed: update the kernel.

Go to http://kernel.ubuntu.com/~kernel-ppa/mainline and choose a recent linux-image file. Download and install.

Changed in linux (Ubuntu):
status: In Progress → Confirmed
Revision history for this message
In , dik_again (dikagain-linux-kernel-bugs) wrote :

Im using mandriva 2010 with kernel 2.6.33-rc5

The freeze ist huge. System becomes unusable at every small disk activity (for example sudo urpmi blackbox).

with kernels 2.6.31 2.6.32 is the problem too. Other kernels was not tested.

Please, reopen the bug. It is a huge problem for many people.

Revision history for this message
In , cmertes (cmertes-linux-kernel-bugs) wrote :

It *is* a huge problem indeed. I kinda got used to it but it feels like in the 80s. I still have a Windows in a 10GB corner of my HDD which I use very rarely but every time I do it feels like a miracle to see what these modern computers are able to do when they don't run a f*cked up kernel :/

Revision history for this message
In , Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote :

One angle to tackle this would be those who don't suffer from this bug, what kind of kernel (and with what parameters) and hardware they're running. Since this seem to affect a wide range of people and setups, could be intresting... but also a huge undertake

Revision history for this message
In , dik_again (dikagain-linux-kernel-bugs) wrote :

Probably is this bug triggered by GCC compiler? Have anyone tryed to compile a 2.6.30-2.6.33 with a earlyer gcc version?

Revision history for this message
In , akatopaz (akatopaz-linux-kernel-bugs) wrote :

Could be interesting, but I've read some comment whom writers had tried to isolate two consecutive kernel versions surrounding the bug.

At last, it might be quicker, but quite boring for the operator, to try a laggy scenario with many different kernel versions catching the bug by dichotomia ?

We might also distribute the effort between ourselves. I propose something like this: everybody uses the same kernel version which exhibit the bug, and try the same laggy scenario amongst a set of kernel version. Let's play with 4 version each, to cover the 2.6.x revisions, should not take so long. Who is volonteer ?

Have a nice day.
Topaz.

Revision history for this message
In , Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote :

Topaz, you'll have to explain the meaning of catching the bug by dichotomia. I wonder if running with a "barebone" kernel can trigger this bug?

Revision history for this message
In , akatopaz (akatopaz-linux-kernel-bugs) wrote :

I'm currently running on Ubuntu Lucid, and I've noticed the bug since the Jaunty release (Intel x86 centrino platform, with a core 2 duo and on two different machines, both contaminated).
When I first had some poor performance problems, I've tried to compile the vanilla kernel by myself and it resulted in a failure, the vanilla kernel 2.6.30 was also affected by this bug.
My plans are to establish a laggy scenario, and to compile all version of 2.6.x kernel, and to test them all against my laggy scenario. Should not take that long, but the more the merrier :)

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

One clear angle that has not been investigated by kernel developers is that this issue is highlighted by 64-bit code. I don't see this lag and high IO wait in 32-bit kernel. I have a laptop with 2GB RAM and I got so sick of the lag that I have gone back to 32-bit kernel and userspace.

And the speed difference is amazing to say the least. No more stuck mouse and no more waiting to see that konsole window pop up. Everything is much faster. Feels like a new laptop. And this is a 2Ghz core2duo based T61, not a slow hardware by any means!

And I get extra 300-400MB of RAM back (YES! that's what you are reading!) by just switching to 32-bit system. 64-bit C++ apps like firefox and KDE eat almost twice the RAM. Firefox is running at 250MB with the same number of tabs and windows as was the 64-bit system, where it was consuming (RSS) about 450MB. Go figure!

I am running a Virtualbox copy of XP on the laptop and I still don't see swap kick in. With 64-bit, running firefox and XP in VB at the same time would lead to heavy swapping and things would be crawling!

So much for advancement to 64-bit! I have been running 64-bit systems for 4 years now and switching to 32-bit feels like I was living under a rock!

I know all this sounds backwards. But give it a try.

Revision history for this message
In , nalimilan (nalimilan-linux-kernel-bugs) wrote :

Frank Ren: If you are sure the bug doesn't happen with 2.6.32.2, but with all other releases you could test, then you should try to find what has changed in it. Were you always running vanilla upstream kernels? Or always kernels from your distribution? Built on the same machine with the same compiler? If so, then have a look at the changelog from 2.6.32.2 to 2.6.32.8, looking for the culprit. I'd suggest you try 2.6.32.3 and check if the bug is there; and if not, increase the minor version until you get it: that will make the changelog really small. Then, send a mail to LKML with your findings.

You seem to be the reporter with the most precise informations out there, you may catch something interesting!

devsk: Beware not to be misled by the swapping behavior of your system. If you're often completely filling your RAM when on 64bits, then swapping may hurt responsiveness badly. When moving to 32bits, if you gain 300MB, you may not suffer from this because there's free RAM, but that's not really linked with a 64bit-only bug.

Revision history for this message
In , cmertes (cmertes-linux-kernel-bugs) wrote :

(In reply to comment #430)
> One clear angle that has not been investigated by kernel developers is that
> this issue is highlighted by 64-bit code. I don't see this lag and high IO
> wait
> in 32-bit kernel.

I do. I share your opinion on RAM use though but it surely doesn't belong here. The bug itself is definitely not restricted to 64-bit systems.

Revision history for this message
In , perlover (perlover-linux-kernel-bugs) wrote :

Anybody, please read this comment:
https://bugzilla.kernel.org/show_bug.cgi?id=13347#c59
I think there is the worthwhile suggestion.

Revision history for this message
In , l.wandrebeck (l.wandrebeck-linux-kernel-bugs) wrote :

I'm really unsure CFQ is the (only ?) culprit.
I've met the same behaviour using deadline and a 3ware 9650, and the fix was a completely other thing (pci_set_mwi).
See https://bugzilla.redhat.com/show_bug.cgi?id=444759 for more details.

Revision history for this message
In , Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote :

I'm gonna chip in my experiences:
I've had this bug with both 32 and 64 bits of the kernel.
setting different schedulers didnt make a difference.
I've tried different versions of the kernel with no luck (though i haven't tried specifically 2.6.32-2).

Revision history for this message
In , dik_again (dikagain-linux-kernel-bugs) wrote :

I have a 32bit system. The bug is almost there.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

This bug depends on cpu, memory and first of all disc and filesystem, lvm and encryption. It's a mix of transactions/s and throughput. If both are in a system dependent range, the problem starts.
There is no throughput/transaction statistic for processes in the scheduler to disadvantage processes, which are causing a high load. A process can gain all available dirty pages and block the other processes.

Revision history for this message
In , ivan1986 (ivan1986-linux-kernel-bugs) wrote :

I update from 2.6.32 to 2.6.34 and bug fixed on two computers

on vmstat
wa take full free time, but interface not freeze

may give all need info and try build any version from git for test

Revision history for this message
In , ruslan (ruslan-linux-kernel-bugs) wrote :

2 topaz (#429)
Ready to join. It would be nice to determine the methods of testing: be advised that some of the methods. At the kernel of the current Lucid 2.6.32 bug reproduced.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

That's the problem. There is no reliable method for testing.

Revision history for this message
In , ruslan (ruslan-linux-kernel-bugs) wrote :

I know about using dd. In addition I can move really big files (files in 4-7Gb). My database on the server really tiny and there I can easily reproduce the bug (the system of Hardy 32bit, 2.6.24-19-server) is copying the archives of sites and virtual machines.

Further, in the office, I use Lucid (2.6.32-22-386, 32bit), but at home Fedora12 (32bit).

But this is all subjective:

Revision history for this message
In , ivan1986 (ivan1986-linux-kernel-bugs) wrote :

after update to 34

ivan1986@ivan1986:~/$ dd if=/dev/zero of=testfile.1gb bs=1M count=1000[]
1000+0 записей считано
1000+0 записей написано
 скопировано 1048576000 байт (1,0 GB), 36,1762 c, 29,0 MB/c
ivan1986@ivan1986:~/$ dd if=/dev/zero of=testfile.1gb bs=1M count=1000[]
1000+0 записей считано
1000+0 записей написано
 скопировано 1048576000 байт (1,0 GB), 26,7475 c, 39,2 MB/c
ivan1986@ivan1986:~/$ dd if=/dev/zero of=testfile.1gb bs=1M count=1000[]
1000+0 записей считано
1000+0 записей написано
 скопировано 1048576000 байт (1,0 GB), 32,8729 c, 31,9 MB/c

 1 3 0 20940 19664 315188 0 0 128 7860 571 1108 6 10 3 81
 2 2 0 15744 19668 320272 0 0 68 65332 893 1593 3 32 7 58
 0 3 0 11932 19668 323260 0 0 96 49384 579 1142 3 11 0 85
 0 3 0 17252 19704 318232 0 0 0 6832 516 1131 2 3 0 94
 0 4 0 12732 19704 323204 0 0 128 6520 940 1145 4 22 4 69
 2 4 0 11808 19980 323796 0 0 88 30492 1093 1393 7 20 2 70
 0 4 0 32860 19980 302340 0 0 148 70892 1117 2026 2 10 4 84
 0 4 0 11856 19980 323400 0 0 176 6652 553 1217 3 9 33 54
 1 4 0 12340 19980 323156 0 0 12 12396 604 1269 2 8 4 85
 0 4 0 12228 19980 323768 0 0 0 13520 816 1612 2 6 0 91
 0 4 0 13136 19980 322244 0 0 0 21924 937 1504 7 8 0 85
 0 3 0 11820 19980 324064 0 0 112 42740 857 1404 1 33 14 52
 0 3 0 11896 19980 323468 0 0 48 9668 600 1161 4 6 1 88
 0 4 0 12608 19980 322604 0 0 128 55032 746 1342 10 11 19 61
 0 3 0 11328 19980 323508 0 0 76 27868 498 1087 4 3 6 86
 0 4 0 11952 20020 322996 0 0 36 1196 502 1268 5 3 0 92
 0 4 0 11952 20020 323512 0 0 0 4036 540 1064 3 8 0 89
 0 4 0 11868 20304 323064 0 0 112 64560 893 1190 5 28 3 64
 0 5 0 21888 20304 312760 0 0 336 35284 639 1520 4 15 0 82
 0 5 0 21764 20304 313068 0 0 0 20936 572 1490 6 3 0 90
 0 4 0 11844 20316 323896 0 0 248 364 610 1165 5 12 0 83
 1 3 0 12336 20360 323368 0 0 0 31160 1113 1188 3 18 0 78

max 30% cpu in htop

interface NO freeze, music play normal, and other work fine

Revision history for this message
In , cat (cat-linux-kernel-bugs) wrote :

Simplest way to reproduce this bug on most hardware is:

1. create a cryptsetup partition (on LVM or without LVM, both variants are ok). Preferably, all partitions used in test case are must be encrypted;
2. install VirtualBox and try to create preallocated hard disk image, size must be 4GB or more.

That's it! If you try to use another applications at same time, you will see 5-10 sec freezes.

I've reproduced bug on many hardware configurations with 2.6.34 and older kernels, such as:
C2Q Q9650 / 8GB RAM / Seagate HDD / x86_64
i7 920 / 6GB / WD HDD / x86_64
C2D U7600 / 2GB / Samsung SSD / i686
C2D T7200 / 3GB / Seagate HDD / i686

So it's not hardware problems - hardwares age from 4 to 1 years and results are same.
Also, on *BSD and Win there are no problems with that hardware.

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

I'm wondering. Isn't bad reponsiveness equals starvation of processes in the
cpu schedueler? In that case it would be better to meassure the amount of cpu-
cycles it is possible to burn during pekmop1024's procedure.

I have tried to just dd a 8 Gb file, and it gives me stalls in the gui, but it
is because of stat64-calls in the application. Under normal circumstances the
file that is stat'ed is cached. But during high thoughput the cache is filled up
with other data. So the stat64-call have to read from the disk which will the
compete my dd. Running glxgears alongside the dd show a constant fps during to
whole dd.

I have followed this thread a long time and I do not remember anyone
mensioning that a single high thoughput application renders the cache useless
to other applications.

Is it possible to avoid filling the cahce with data that is written ?

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

I'm wondering. Isn't bad reponsiveness equals starvation of processes in the
cpu schedueler? In that case it would be better to meassure the amount of cpu-
cycles it is possible to burn during pekmop1024's procedure.

I have tried to just dd a 8 Gb file, and it gives me stalls in the gui, but it
is because of stat64-calls in the application. Under normal circumstances the
file that is stat'ed is cached. But during high thoughput the cache is filled up
with other data. So the stat64-call have to read from the disk which will the
compete my dd. Running glxgears alongside the dd show a constant fps during to
whole dd.

I have followed this thread a long time and I do not remember anyone
mensioning that a single high thoughput application renders the cache useless
to other applications. I'm guessing that a simple application that once per
second reads the first byte from a memory mapped file will stall, even if it is
only a single byte that needs to be cached.

I'm sorry If my thoughts have been mensioned before in this thread :)

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

I've tested my assumption about the 1-byte mmap'ed file. It turned out that it is running fine during my dd. Probable 1 byte is not enough.

Revision history for this message
In , ivan1986 (ivan1986-linux-kernel-bugs) wrote :

Still repeats itself - a compilation psi freezing interface

Revision history for this message
In , mpartap (mpartap-linux-kernel-bugs) wrote :
Download full text (24.0 KiB)

(In reply to comment #421)
> I have suffered the high iowait problem for almost 4 years
Then let's finally kill it!

> I got information from this bugzilla report that kernel 2.6.32 has fixed
> this high iowait problem, and I tested the snapshot kernel 2.6.32.2 of
> zenwalk
> on my notebook, and found the high iowait is gone

> I found in kernel 2.6.32.8 the high iowait is back. How do I know
> that? When I copy a 700MB avi file from my notebook disk to a 3.5" usb
> mobile disk, I found the reading side disk LED start to falsh quickly and
> immediately, but the writing side disk LED will keep still for a long
> time(like
> 25-30 seconds), and then start to flash slowly,and the course is abnormally
> long and low responsive.

> The kernel 2.6.32.2 is the only 2.6 kernel (since 2.6.18) on which I found
> both of the reading and writing side disk LED will start to falsh
> quickly and immediately.There must be somthing wrong with the write
> cache behavior which will cause the high iowait, and it has been fixed in
> 2.6.32.2 and brought back in 2.6.32.8.

This is the complete git log 2.6.32.2..2.6.32.8:
b0e4370 Linux 2.6.32.8
6117db7 NET: fix oops at bootime in sysctl code
e4a6a35 powerpc: TIF_ABI_PENDING bit removal
a420e9f ath9k: fix beacon slot/buffer leak
1c97637 ath9k: fix eeprom INI values override for 2GHz-only cards
2c7f87e pktcdvd: removing device does not remove its sysfs dir
b31aa5c uartlite: fix crash when using as console
e06fbe9 kernel/cred.c: use kmem_cache_free
35cfb03 starfire: clean up properly if firmware loading fails
906f68d mx3fb: some debug and initialisation fixes
682efb8 imxfb: correct location of callbacks in suspend and resume
b260729 mac80211: fix NULL pointer dereference when ftrace is enabled
3a9353f mm: flush dcache before writing into page to avoid alias
78da404 be2net: Fix memset() arg ordering.
e38d76e be2net: Bug fix to support newer generation of BE ASIC
43d7ff2 connector: Delete buggy notification code.
f06f00e usb: r8a66597-hdc disable interrupts fix
0ae2b7d block: fix bugs in bio-integrity mempool usage
9648148 random: Remove unused inode variable
8857a1a random: drop weird m_time/a_time manipulation
94af44b Fix 'flush_old_exec()/setup_new_exec()' split
cb723ba block: fix bio_add_page for non trivial merge_bvec_fn case
e52299d mm: purge fragmented percpu vmap blocks
56d4b77 mm: percpu-vmap fix RCU list walking
dce6a09 libata: retry link resume if necessary
42f7e23 oprofile/x86: fix crash when profiling more than 28 events
9c66557 oprofile/x86: add Xeon 7500 series support
4f7d666 KVM: allow userspace to adjust kvmclock offset
a74e62c ax25: netrom: rose: Fix timer oopses
3125258 af_packet: Don't use skb after dev_queue_xmit()
ecb7287 net: restore ip source validation
1681333 sky2: Fix oops in sky2_xmit_frame() after TX timeout
16b8efa tcp: update the netstamp_needed counter when cloning sockets
359e2f2 clocksource: fix compilation if no GENERIC_TIME
253f887 x86/amd-iommu: Fix possible integer overflow
d1a3103 x86: Add quirk for Intel DG45FC board to avoid low memory corruption
8159070 x86: Add Dell OptiPlex 760 reboot quirk
00362b9 regulator: Specify REGULATOR_CHANGE_STATUS for WM835x LED constraints
6db6ace ...

Revision history for this message
In , frankrq2009 (frankrq2009-linux-kernel-bugs) wrote :

(In reply to comment #448 and comment #431)
Sorry, I really want to help, but I am not a kernel developer, hacking the kernel source is too difficult for me. Besides, the gas turbine historian is a live production system, it can not be used as a debug system. I will keep watching for the final resolve, for now, we will stick with 2.6.32.2.

Revision history for this message
In , mpartap (mpartap-linux-kernel-bugs) wrote :

wait wait what is this :O
updating to yesterday's git kernel (from 2.6.34-git12) gave me a huge perceived speed boost? haven't specifically compared iowait times - but all processes seem to be using less cpu time? my BOINC likes that very much ;)
a lot of concurrent IO here and the system, apart from minor application stalling (although 8GiB RAM and no swap), hasn't been this un-sluggish for a loooong time (2.6.18? ;)
feels like someone finally released the breaks - hope you guys can confirm this!

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

wrt comment #450, 2.6.35-rc1 is out! I hope that has something for all of us sufferers. I will try it later today. Can other folks also try and report here?

Revision history for this message
In , andre (andre-linux-kernel-bugs) wrote :

(In reply to comment #450)
Maybe this is related to the observations at phoronox's kernel tracker[1]. An in depth article was also posted[2].

1: http://www.phoromatic.com/kernel-tracker.php?sys_1=yes&sys_3=yes&sys_4=yes&sub_type_System=yes&sub_type_Processor=yes&sub_type_Disk=yes&sub_type_Graphics=yes&sub_type_Memory=yes&sub_type_Network=yes&date_range=15&regression_threshold=0.15&only_show_regressions=yes&submit=Update+Results
2: http://www.phoronix.com/scan.php?page=article&item=linux_2635_fail&num=1

Note: Link 1 is valid for the next few days, thereafter you have to raise the displayed days to get the regression back into view

Revision history for this message
In , akpm (akpm-linux-kernel-bugs) wrote :

lol, this bug was marked "resolved". I wish.

(Hi, everyone).

I suspect we have about 25 different bugs here. Really the only way we'll make progress here is if people can come up with specific test cases which developers can run on their own machines, and reproduce the bug.

So if any of you guys have time to try that and are successful then please attach that testcase here, or send it out via email to the relevant culprits.

It's really that important. There's practically a 1:1 ratio between reproduction-test-cases and bugfixes.

Revision history for this message
In , desasterman (desasterman-linux-kernel-bugs) wrote :

Let me point out a potential pitfall: For a long while I thought my machine was suffering from this bug. However, the real reason for my high IO wait and extremely poor performance was this:

http://www.osnews.com/story/22872/Linux_Not_Fully_Prepared_for_4096-Byte_Sector_Hard_Drives

So everyone should rule out that one first... for me, a repartitioning of my drive helped a lot :).

Revision history for this message
In , Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote :

Just want to report that I've had great success with the kernel 2.6.35-020635rc4-generic on ubuntu 32 bit. Apps can still grey out when allocating space for big files, but the interface is still responsive on other apps. I'll try it out on more setups and report back here if i notice it appearing on other places.

Finally I can say that my linux machines are usable again. Cheers!

Revision history for this message
blahde (daisy-ice) wrote :

@André Desgualdo Pereira:

which kernel exactly do you recommend? Linux 2.6.35-020635rc1-generic does not bring any changes - for me.

Revision history for this message
André Desgualdo Pereira (desgua) wrote :

@blahde

The 2.6.33 kernel has worked with Karmic Koala, but I haven 't tested with Lucid Lynx.

Regards.

Revision history for this message
In , psypher246 (psypher246-linux-kernel-bugs) wrote :

(In reply to comment #453)
> lol, this bug was marked "resolved". I wish.
>
> (Hi, everyone).
>
> I suspect we have about 25 different bugs here. Really the only way we'll
> make
> progress here is if people can come up with specific test cases which
> developers can run on their own machines, and reproduce the bug.
>
> So if any of you guys have time to try that and are successful then please
> attach that testcase here, or send it out via email to the relevant culprits.
>
> It's really that important. There's practically a 1:1 ratio between
> reproduction-test-cases and bugfixes.

Hi ANdrew,

Very simple testing procedure:

Launch Firefox

Run 'stress -d 1'

Try open some websites

Machine hangs

Thanks

Revision history for this message
In , psypher246 (psypher246-linux-kernel-bugs) wrote :

(In reply to comment #455)
> Just want to report that I've had great success with the kernel
> 2.6.35-020635rc4-generic on ubuntu 32 bit. Apps can still grey out when
> allocating space for big files, but the interface is still responsive on
> other
> apps. I'll try it out on more setups and report back here if i notice it
> appearing on other places.
>
> Finally I can say that my linux machines are usable again. Cheers!

I will try that, but I have no issues in XP and my hard drive is at least 2 1/2 years old and this issue has been around for even longer than that.

Doubt it's the reason for my issues.

I have also tried playing around with other schedulers and disk mounting options. I have tried writeback and journal mode. Writeback provides very minimal improvement, not enough to make it worth my while to run always. Changing between ATA and AHCI mode makes no difference as well as changing the scheduler from cfg to anticipatory or deadline.

I am testing this on a Dell Precision M6300 Laptop with SATA drive, but I have experienced this issue on all my various types of PC's since at least Ubuntu Gusty or Intrepid.

Revision history for this message
In , akpm (akpm-linux-kernel-bugs) wrote :

(In reply to comment #456)
>
> Very simple testing procedure:
>
> Launch Firefox
>
> Run 'stress -d 1'
>

From where does one obtain a copy of `stress'?

Thanks.

Revision history for this message
In , benjfitz (benjfitz-linux-kernel-bugs) wrote :

I believe this is the website (according to gentoo portage).
http://weather.ou.edu/~apw/projects/stress/
Benj

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

I've tried stress also.
I have 2 Gb og memory and 1.5 Gb swap

With swap activated stress -d 1 hangs my machine

Same does stress -d while swapiness set to 0

Widh swap deactivated things runs pretty fine. Of couse apps utilizing syncronous disk-io fight stress for priority.

There must be a reasonable explanation on why everything stops when swap is activated. Even a simple app like "dstat" stalls.

Revision history for this message
In , nels.nielson (nels.nielson-linux-kernel-bugs) wrote :

I can also confirm this. Disabling swap with swapoff -a solves the problem.
I have 8gb of ram and 8gb of swap with a fake raid mirror.

Before this I couldn't do backups without the whole system grinding to a halt. Right now I am doing a backup from the drives, watching a movie from the same drives and more. No more iowait times and programs freezing as they are starved from being able to access the drives.

Revision history for this message
In , andrew (andrew-linux-kernel-bugs) wrote :

Perhaps you could capture some vmstat 1 output from just before/when the stall occurs?

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

Created attachment 27230
vmstat for my system running "stress -d 1" without hanging.

My system just logged into KDE around 650 Mb of memory used by applications
prior to starting "stress -d 1"

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

Created attachment 27231
vmstat for my system running "stress -d 1". System hangs.

My system just logged into KDE around 860 Mb of memory used by applications
prior to starting "stress -d 1". Application utilizing extra memory is
digikam and kontact - both sitting there doing nothing.

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

Created attachment 27232
vmstat for my system (without swap) running "stress -d 1" without hanging.

Same setup as stress_swap_hang.vmstat except that swap is turned off using
"swapoff -a" in this run.

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

The strange thing about every high throughput io is that *every* byte of memory is used up intil a certain limit. That use of memory will even swap out stuff.

Also looking at especially stress_noswap_nohang.vmstat the behavior mimics this.

1. Place data to be written into memory
2. Write some data to the disk
3. goto 1 if not all allowed memory is used.

Interesting is that "stress -d 1" places data into memory a lot faster than a normal hard disk can handle. So the memory will be filled up eventually (the limit will be reached eventually).

So for me I only have a hanging system when "stress -d 1" writes compete with "swap out" - which is actually caused by "stress -d 1" filling the memory.

So the big question: Why do the kernel allow large data writes to fill up the memory and even swap out stuff just to get data to be written into memory?

Revision history for this message
In , cmertes (cmertes-linux-kernel-bugs) wrote :

(In reply to comment #466)
> So the big question: Why do the kernel allow large data writes to fill up the
> memory and even swap out stuff just to get data to be written into memory?

A good question, but not the real source of this problem I guess. Judging by the previous posts and my own experience, this problems seems to occur with any concurrent I/O, possibly promoted by encryption. Provided that it is only one bug we are talking about.

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

I've notices that earlier in the long list of comments. But could it be that others confuse the real issue with swapout slowing things down during high disk write?

Revision history for this message
In , james (james-linux-kernel-bugs) wrote :

(In reply to comment #468)
> I've notices that earlier in the long list of comments. But could it be that
> others confuse the real issue with swapout slowing things down during high
> disk
> write?

This squares somewhat with my own experience:

1. The file cache is *very* aggressive, even pushing out to swap stuff I think I might be using.

2. Large writes to swap trounce interactivity (and little gets scheduled).

Small writes seem not to have an adverse effect. OK, I understand pushing out pages that haven't been used in a while in favour of more current caches; however, doing something that can result in 1.5 GiB going to page cache on a 2 GiB system (large copy, kernel compile) seem to provoke these large writes which make everything go slow.

Revision history for this message
blahde (daisy-ice) wrote :

@André Desgualdo Pereira:

Thank you. Unfortunately 2.6.33 doesn't make any difference for me either - on Lucid Lynx. Whereas I can confirm that the transfer rate from USB to USB seems to be not affected by this.

Revision history for this message
André Desgualdo Pereira (desgua) wrote :

Sorry I can't help any further.
If I found something I will post here.
Regards.

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

(In reply to comment #469)
>
> 1. The file cache is *very* aggressive, even pushing out to swap stuff I
> think
> I might be using.
>

Now, I'm not a kernel hacker, but a programmer afterall, and to me it seems to be a an easier job to fix the aggressive file cache than to fix this "large I/O operations ......"-thing - which is not at all that concrete and varies over platforms, machine specs etc.

Maybe fixing the aggressive file cache would fix a lot of peoples problems - I'm guessing that the file cache behaves 100% the same on all systems. Is that a correct assumption?

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #470)
> (In reply to comment #469)
> >
> > 1. The file cache is *very* aggressive, even pushing out to swap stuff I
> think
> > I might be using.
> >
>
> Now, I'm not a kernel hacker, but a programmer afterall, and to me it seems
> to
> be a an easier job to fix the aggressive file cache than to fix this "large
> I/O
> operations ......"-thing - which is not at all that concrete and varies over
> platforms, machine specs etc.

Isn't there already a knob for controlling the kernel's preference for swapping anonymous pages out to disk versus retaining cached/buffered block-device pages?

/proc/sys/vm/swappiness — http://kerneltrap.org/node/3000

Our apps are appearing to hang because their GUI threads have stalled while waiting on pages (containing either executable code or auxiliary data like pixmaps) to come back into RAM from the disk. Reading those pages back in is taking forever because the disk queue is full of writes. The situation is worsened because reading the pages is not pipelined since the requests are being submitted from the page fault handler, so a program executing while huge disk activity is in progress will submit a request to load one page from disk and stall; then when that request is fulfilled, the program will execute a few hundred instructions more until its instruction pointer crosses into another page that isn't loaded from disk, whereupon the page fault handler will be invoked again, a new request will be submitted to the disk queue, and the application will hang again. Repeat ad infinitum. Meanwhile, while the program is stalled waiting for the page it needs to be loaded in from disk, all the rest of its pages are being evicted from RAM to make room for the huge disk buffers, thus perpetuating the problem.

I would think the easiest and most reliable solution to this problem would be for the kernel to prefer fulfilling page-in requests ahead of dirtying blocks. If there are any requests to read pages in from disk to satisfy page faults, those requests should be fulfilled and a process's request to dirty a new page should be blocked. In other words, as dirty blocks are flushed to disk, thus freeing up RAM, the process performing the huge write shouldn't be allowed to dirty another block (thus consuming that freed RAM) if there are page-ins waiting to be fulfilled.

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

Created attachment 27243
vmstat for my system running "stress -d 1" without hanging.

My system just logged into KDE around 650 Mb of memory used by applications
prior to starting "stress -d 1"

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

(In reply to comment #471)
>
> I would think the easiest and most reliable solution to this problem would be
> for the kernel to prefer fulfilling page-in requests ahead of dirtying
> blocks.
> If there are any requests to read pages in from disk to satisfy page faults,
> those requests should be fulfilled and a process's request to dirty a new
> page
> should be blocked. In other words, as dirty blocks are flushed to disk, thus
> freeing up RAM, the process performing the huge write shouldn't be allowed to
> dirty another block (thus consuming that freed RAM) if there are page-ins
> waiting to be fulfilled.

I agree with you on the preference-part. It will fix the race-like situation. But as I understand, it will not keep the file cache from swapping out a single page?

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #473)
> I agree with you on the preference-part. It will fix the race-like situation.
> But as I understand, it will not keep the file cache from swapping out a
> single
> page?

Implementing my suggestion wouldn't prevent mmap'd pages from being evicted from RAM to make room for file cache. It would only mean (1) that the file cache wouldn't be allowed to consume pages that are needed to satisfy page faults, and (2) that requests to read pages in from disk (whether from swap (anonymous pages) or from mmap'd files such as executables) would be serviced ahead of any other reads or writes in the disk queue.

Revision history for this message
In , james (james-linux-kernel-bugs) wrote :

(In reply to comment #471)

> Isn't there already a knob for controlling the kernel's preference for
> swapping
> anonymous pages out to disk versus retaining cached/buffered block-device
> pages?
>
> /proc/sys/vm/swappiness — http://kerneltrap.org/node/3000

(For some reason playing with this doesn't seem to do anything, but perhaps that's another bug report.)

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

> I would think the easiest and most reliable solution to this problem would be
> for the kernel to prefer fulfilling page-in requests ahead of dirtying
> blocks.
> If there are any requests to read pages in from disk to satisfy page faults,
> those requests should be fulfilled and a process's request to dirty a new
> page
> should be blocked. In other words, as dirty blocks are flushed to disk, thus
> freeing up RAM, the process performing the huge write shouldn't be allowed to
> dirty another block (thus consuming that freed RAM) if there are page-ins
> waiting to be fulfilled.

Matt: Wouldn't setting dirty_bytes to low values make sure that the processes never dirty more than a fixed number of pages, and hence never get to consume more RAM until their existing dirty pages are flushed? Or may that's not how dirty_*bytes is designed to work. May be (I am guessing here) it just controls when the flush begins to happen for dirty pages, the application can still continue to dirty more pages. But if dirty_bytes controls when the process itself has to flush its dirty buffers, then it would be busy flushing and waiting on IO to complete and can't be dirtying more memory, right? So, it does look like setting dirty_bytes to a low value like 4096 will produce an extreme case where the process writes are almost completely sync and page cache is not pounded at all.

Can someone try this extreme test? set dirty_bytes to 4096 and rerun your scenario. The sequential bandwidth seen by the disk stresser will go down the drain but your system should survive.

Revision history for this message
In , andrew (andrew-linux-kernel-bugs) wrote :

According to http://www.kernel.org/doc/Documentation/sysctl/vm.txt

"Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any
value lower than this limit will be ignored and the old configuration will be
retained."

Better make that 8192

Also you could try lowering /proc/sys/vm/dirty_ratio

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

(In reply to comment #477)
> According to http://www.kernel.org/doc/Documentation/sysctl/vm.txt
>
> "Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any
> value lower than this limit will be ignored and the old configuration will be
> retained."
>
> Better make that 8192
>
> Also you could try lowering /proc/sys/vm/dirty_ratio

Setting dirty_bytes to 8192 solves the slowdown of me. Of cause it ends up with a troughput from "stress -d 1" which is considerably lower than when dirty_bytes was set to 0 (ie.

<quote-from-doc>
If dirty_bytes is written, dirty_ratio becomes a function of its value
(dirty_bytes / the amount of dirtyable system memory).
</quote-from-doc>

Now, dirty_ratio is 60 by default, so 60% of my system memory can be used for dirty pages. On my system that is 1.2GB. So if I do not have 1.2GB free and I am doing some high troughput write to disk my system will hang. I think it is a bit overkill especially seen in the perspective that a standard harddisk can write no more that 100MB/sec.

The kernel should be reosonable enough to behave and not just hog the majority of system memory during high throughput operations. Just think of system with 8GB of memory and the 6Gb is used by running application. Runnning "stress -d 1" on such a setup would kill it. The writing application would be allowed to use 60% of the 8GB for dirty pages. It seems massive, so please correctly me if I'm wrong since I have not done a test on such a system.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

Søren: These parameters exist to tune the system behavior. There are other parameters which control the behavior of pdflush and FS journal threads but getting these all in harmony to make the system perform well in all scenarios is not an easy task. I think the hope is that pages will be reclaimed fast enough by pdflush if its parameters are tuned as well.

But I agree that by default letting one process to dirty 60% of physical RAM before it blocks itself on IO flush, is a bad thing. Particularly, when filling RAM is many orders of magnitude faster than emptying it to disk. A couple of rogue user processes can bring the system down in a hurry.

Linux needs to account for the disparity between RAM and disk, and how that disparity has increased many folds in recent times. 2GB system is considered minimum these days. Filling 60% of it will take few microseconds even on slowest of RAM, but emptying it to disk will take many seconds if not minutes on fastest drives.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

Søren: These parameters exist to tune the system behavior. There are other parameters which control the behavior of pdflush and FS journal threads but getting these all in harmony to make the system perform well in all scenarios is not an easy task. I think the hope is that pages will be reclaimed fast enough by pdflush if its parameters are tuned as well.

But I agree that by default letting one process to dirty 60% of physical RAM before it blocks itself on IO flush, is a bad thing. Particularly, when filling RAM is many orders of magnitude faster than emptying it to disk. A couple of rogue user processes can bring the system down in a hurry.

Linux needs to account for the disparity between RAM and disk, and how that disparity has increased many folds in recent times. 2GB system is considered minimum these days. Filling 60% of it will take few microseconds even on slowest of RAM, but emptying it to disk will take many seconds if not minutes on normal drives.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

Apologies for the double post. The first one timed out on me. While reposting, I realized fastest drives on market today (the SSDs) will likely be able to do stuff in seconds, so, I changed the word fastest to normal...:-)

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

devsk: Yeah, but shouldn't those knobs be to squeeze the most out of your system? The defaults should be set in a way that is not destructive.

fx.

swappiness = 0 - 10
 or
dirty_ratio = 10

or a combination of both or some other settings.

People will experience trouble with the default settings anyway, so reports like "high troughput disk writes is slow" is certainly a lot better than "high troughput disk write locks my machine".

What is the best fist steps to solving this:
1. Changing defaults on existing knobs?
2. Change the kernel code?

Revision history for this message
In , andrew (andrew-linux-kernel-bugs) wrote :

There are currently various patches dealing with various aspects of writeback. Some or all of these _may_ be ready for inclusion in 2.6.36

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

Nice .... where are those. If they apply to 2.6.35-something I will be happy to try them out.

Revision history for this message
In , andrew (andrew-linux-kernel-bugs) wrote :

Here are a couple of things being worked on.

http://lwn.net/Articles/397003/
http://lwn.net/Articles/396512/

You'll need to dig around for the patches.

Revision history for this message
In , andrew (andrew-linux-kernel-bugs) wrote :

Wu Fengguang of Intel has started looking through this bug report. He has some patches that he'd like people to try.

http://lkml.org/lkml/2010/8/1/40
http://lkml.org/lkml/2010/8/1/45

Revision history for this message
In , mpartap (mpartap-linux-kernel-bugs) wrote :

Created attachment 27313
screenshot of extreme iowait at ridiculously low throughput

Revision history for this message
In , mpartap (mpartap-linux-kernel-bugs) wrote :

Created attachment 27314
Wu Fengguang's anti-io-stall patch rebased for vanilla 2.6.35

@#486
The posted patches didn't apply to recent kernels, just rebased for latest kernel release and compiled.. Will restart machine now and party wildly if FINALLY this small change fixes this issue.

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

(In reply to comment #487)
> Created an attachment (id=27313) [details]
> screenshot of extreme iowait at ridiculously low throughput

I have found that even if dstat should 0B throughput the disk have be very much active. So dstat seems to not measure the amount of bytes actually going to the disk.

Revision history for this message
In , hassium (hassium-linux-kernel-bugs) wrote :

2.6.35 + patch from #488

Mouse froze four times at 1 - 1.5 seconds, while dd wrote.

When the sweep opens the file and swap grew from 0 to 1.3 GiB, mouse frozen. After opening the file Kopete loses connection to the Jabber account and KWin disables desktop effects.

Revision history for this message
In , hassium (hassium-linux-kernel-bugs) wrote :

Created attachment 27324
test results

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

(In reply to comment #490)
> 2.6.35 + patch from #488
>
> Mouse froze four times at 1 - 1.5 seconds, while dd wrote.
>
> When the sweep opens the file and swap grew from 0 to 1.3 GiB, mouse frozen.
> After opening the file Kopete loses connection to the Jabber account and KWin
> disables desktop effects.

Did you ensure to have a 50% usage before starting the test. Just to make sure to trigger pageout.

Revision history for this message
In , gaguilar (gaguilar-linux-kernel-bugs) wrote :

I'm just copying a few files from NFS folder to USB in my computer.

I found that IO wait times are huge but Network is not in use. This is strange as the folder is a NFS one with GB ethernet attached.

The problem is that the IOWait times are making my desktop unusable. Window manager takes a lot of time to move a window around, desktop does not responds well, mouse got hang sometimes... This is a mess.

This is kernel.

Linux azul1 2.6.35-10-generic #15-Ubuntu SMP Thu Jul 22 11:10:38 UTC 2010 x86_64 GNU/Linux

Some maintainer of the kernel should order this bug. Separate in a few different bugs (because I'm sure that ther are more than one related to this) and try to resolve them. Divide and conquer!

Thank you guys!

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

The patch from #488 does not solve the problem on my machine. My machine start to stall even if there is still 2GiB of 8GiB RAM free. The menu stalls, if the icons are not loaded and there is heavy io.

It starts faster to stall while executing
dd if=/dev/zero of=t1 bs=1M count=8K (throughput ~48,2MiB/s)
instead of
dd if=/dev/zero of=t1 bs=4K count=2M (throughput ~52,7MiB/s)

The test data is written on the inner part of the disk, while the os is on the outer part. All partitions are ext4.

High fragmentation caused by lvm snapshots, increases this problem.

Revision history for this message
In , pedrib (pedrib-linux-kernel-bugs) wrote :

Hi,

I did some tests with the patch from #488.

Test procedure:
- filled up memory to 70/80% (4GB physical memory total)
- executed "stress -d 1"
- played changing windows, changing tabs in chromium, accessing menus, etc

-----------------------------------------
2.6.35 vanilla, 10GB swap partition on:
Complete hang, no response at all from mouse or keyboard, had to reboot manually

2.6.35 vanilla, 10GB swap partition off:
A few hiccups, but system was still usable, although slow.

2.6.35 + patch from #488, swap partition on:
A few hiccups, but system was still usable, although slow.

2.6.35 + patch from #488, swap partition off:
A few hiccups, but system was still usable, although slow.
-----------------------------------------

So the patch from #488 seems to solve the problem for me. The hiccups and slowness can be attributed to my relatively slow magnetic disk and the fact that my partition is encrypted under LUKS.

This is a very important bug for Linux in the desktop, I'm glad there is a patch out for it and I'll continue to use the patch for my kernels, but it should definitely be fixed in mainline!

Revision history for this message
Adam Kulagowski (fidor-fidor) wrote :

I have Sandisk Backup U3. (32G). I've tested it on 4 different computers. I had always writing speed around 3MB/s, which is slow, because this pendrive is capable of doing 16-17MB/s writing speed. The only way I'm able to put files faster is to use dd with bs=64 AND oflag=direct. Using these options I have full writing speed.

dd if=ubuntu-10.04-server-amd64.iso of=/media/FE35-228F/file.bin bs=64k oflag=direct
10840+1 records in
10840+1 records out
710412288 bytes (710 MB) copied, 43,8788 s, 16,2 MB/s

What is also important I can break the copying proccess any time. Without oflag=direct dd ignores Ctrl-C or even kill -9.

This was tested on 10.04 and with similar result on "Recovery is Possible" distro (kernel 2.6.34-git16)

uname -a
Linux fidor 2.6.32-24-generic #38-Ubuntu SMP Mon Jul 5 09:20:59 UTC 2010 x86_64 GNU/Linux

Maybe it will help.

Revision history for this message
In , psypher246 (psypher246-linux-kernel-bugs) wrote :

Hi all, has anyone seen this article?

http://www.phoronix.com/scan.php?page=news_item&px=ODQ3Mw

Are they talking about the same patches? Sounds like the same issue.

Revision history for this message
Adam Kulagowski (fidor-fidor) wrote :

I've made a typo in my previous comment: you have to specify (at least in my case) bs=64k , not bs=64. Command line example was correct. Any other value, bigger or smaller (32k, 128k, 256k) brings speed down back to 3MB/s.

One more thing. I've found second pen drive which is working correctly (full 5MB/s writing speed). There are some small differences in lsusb between those two. I'm attaching lsusb -v output.

On the working pen drive (Adata) bs doesn't really matter. Of course with bigger block size, you get bigger writing speed up to 128k. Bigger bs than 128k doesnt change anything. I've tested from 256k up to 2048k, still achieving full writing speed.

I'll try to test more USB sticks.

Revision history for this message
In , coornail (coornail-linux-kernel-bugs) wrote :

I tried the patch from #488 on 2.6.35.
When running dd if=/dev/zero of=/tmp/test bs=1M count=1M the system was almost flawless, windows switched quickly, opened programs reacted instantly.

It might be that I'm mistaken, but I'm under the impression that my programs takes more time to launch. I wonder if anyone else have that.

Revision history for this message
In , uzytkownik2 (uzytkownik2-linux-kernel-bugs) wrote :

*** Bug 15463 has been marked as a duplicate of this bug. ***

Revision history for this message
In , mpartap (mpartap-linux-kernel-bugs) wrote :

#496:
yes the patch mentioned on phoronix IS the one from #488, and as reported by several it seems to improve IO latency (at the cost of throughput?) but falls short of completely preventing stalls. Strange thing for me is, the problems seemingly increase with uptime... besides i noticed some rogue flash-btrfs-1 threads causing 1MiB/s avg disk writing (uptime > 2 days, even after bringing down services causing heavy IO).. posted a blktrace of that to the linux-btrfs ml but no answer yet ^^

Wow this one's a tricky one.
One thing i noticed a few kernel revisions back that might be relevant: there were a lot of processes in IOWAIT state (result of compiling packages, BOINC, munin-graph, ntop... and then some) and i wanted to priorize a single process so i issued a ionice -p xxx -c1 -n0 (realtime: prio 0). What i expected was that that process would instantly get its IO through and pick up work - alas it took SEVERAL MINUTES before it did. That really wtfed me.. Is this broken by design? Shouldn't iorenicing take effect immediately?

Revision history for this message
In , gatekeeper.mail (gatekeeper.mail-linux-kernel-bugs) wrote :

#496 doesn't solve the problem IMHO.

Tested on Ubuntu Karmic (10.04) with vanilla 2.6.35.

A simple 'dd if=/dev/zero of=/some/file bs=1M' caused 100% load (dual-head Core2 Duo E8500) and a high latency on even ^C'ing dd process itself. Need more info? Ask please.

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

I tried the patch rebased for 2.6.35
https://bugzilla.kernel.org/attachment.cgi?id=27314

It is problably ok, byt my first test is to fill my memory with all apps I can find and then run "stress -d 1". And as expected it started paging stuff out. You other guys must have the exact same problem, at least you Pedro. To me the responsiveness drop because of paging out.

Revision history for this message
In , sprutos (sprutos-linux-kernel-bugs) wrote :

echo 10 > /proc/sys/vm/vfs_cache_pressure
echo 4096 > /sys/block/sda/queue/nr_requests
echo 4096 > /sys/block/sda/queue/read_ahead_kb
echo 100 > /proc/sys/vm/swappiness
echo 0 > /proc/sys/vm/dirty_ratio
echo 0 > /proc/sys/vm/dirty_background_ratio

this solution work for me.
or use "sync" fs-mount option.

Revision history for this message
In , sgh (sgh-linux-kernel-bugs) wrote :

(In reply to comment #502)
> echo 10 > /proc/sys/vm/vfs_cache_pressure
> echo 4096 > /sys/block/sda/queue/nr_requests
> echo 4096 > /sys/block/sda/queue/read_ahead_kb
> echo 100 > /proc/sys/vm/swappiness
> echo 0 > /proc/sys/vm/dirty_ratio
> echo 0 > /proc/sys/vm/dirty_background_ratio
>
> this solution work for me.
> or use "sync" fs-mount option.

Yeah, but testing a kernel patch with those testtings is not good for seing its effects.

Revision history for this message
In , pedrib (pedrib-linux-kernel-bugs) wrote :

(In reply to comment #501)
> I tried the patch rebased for 2.6.35
> https://bugzilla.kernel.org/attachment.cgi?id=27314
>
> It is problably ok, byt my first test is to fill my memory with all apps I
> can
> find and then run "stress -d 1". And as expected it started paging stuff out.
> You other guys must have the exact same problem, at least you Pedro. To me
> the
> responsiveness drop because of paging out.

Hi Soren,

as said in my comment, I do have the responsiveness drop, but I don't think that is a bug. If you are swapping to a slow disk, that is kind of expected. However, what is not expected is a complete loss of responsiveness, with the UI hanging if even for a few seconds.

I find that the mentioned patch improves a lot this situation vs the vanilla kernel. Of course, the best option yet is to disable swap, but for me 4GB of ram is not enough...

Revision history for this message
In , alpha_one_x86 (alphaonex86-linux-kernel-bugs) wrote :

I have too reactivity problem on linux when I do large file copy.
Other OS is very responsive when do multiple file copy but not linux.
Windows have all IO async (no sync possible, read in the Qt doc), why not have same option in linux kernel?

Revision history for this message
In , pedrib (pedrib-linux-kernel-bugs) wrote :

After testing the patches intensively, I have to say that although they do improve the situation, they do it only slightly. I guess the best solution is still disabling swap.

Also, what's the idea of having a swappiness tunable if it doesn't work? I can set it to 0, and even though I have only 70% of physical memory in use the system starts swapping to disk.

Revision history for this message
In , rsarraf (rsarraf-linux-kernel-bugs) wrote :

(In reply to comment #506)
> After testing the patches intensively, I have to say that although they do
> improve the situation, they do it only slightly. I guess the best solution is
> still disabling swap.
>

It does help initially but not always. Under memory crunch, I found my laptop completely unresponsive even though swap was off (RAM is 3GiB)

> Also, what's the idea of having a swappiness tunable if it doesn't work? I
> can
> set it to 0, and even though I have only 70% of physical memory in use the
> system starts swapping to disk.

That's weird. On my box, it does work the way it is designed. I have overall concluded that the default value of 60 is correct. If there is a buggy application, that should be fixed. I wouldn't be interested in OOMs on my box.

Revision history for this message
In , zenith22.22.22 (zenith22.22.22-linux-kernel-bugs) wrote :

Memory count actually drops when the system becomes unresponsive during copying of a large file, if a bunch of small files was copied immediately before.

Revision history for this message
FriedChicken (domlyons) wrote :
Revision history for this message
In , peterhoeg (peterhoeg-linux-kernel-bugs) wrote :

I've added some information on the Ubuntu bug page, but will add it here for completeness sake:

1) I'm seeing this problem extremely frequently due to an unrelated bug that makes X leak memory.

2) On a machine with 4GB memory and no swap, the disk starts thrashing like crazy when 60-70% of the memory is used. It's so bad that I can't even log in on a console as getty times out before I get a chance to enter the password.

3) If swap is enabled on the same machine, it will start swapping out. Doing a "swapoff -a" will force the swap in as planned, but it happens with approximately 500KB/s.

Revision history for this message
In , frankrq2009 (frankrq2009-linux-kernel-bugs) wrote :

I have compiled the new 2.6.36 kernel today, I found this bug is REALLY fixed on my notebook! Copy a 700MB movie to USB disk became very smooth and quick, GUIs are very responsive, much better than 2.6.35.4(the last kernel of Zenwalk). Just like some one said, the angels are singing again! Congratulations! Great work! Long live Linux!

Revision history for this message
In , mihel (mihel-linux-kernel-bugs) wrote :

I'm not seeing this issue on 2.6.36 amd64 4Gb RAM 3Gb swap swapiness 20
Running 'stress -d 1' and browsing websites for 15 minutes with no issues

Revision history for this message
In , vi0oss (vi0oss-linux-kernel-bugs) wrote :

2.6.36-zen0-00214-g665fe96

Still has about 1 second page faults when "stress -d 1" or "pv /dev/zero > qqq".

Swap is off.

This:

> echo 10 > /proc/sys/vm/vfs_cache_pressure
> echo 4096 > /sys/block/sda/queue/nr_requests
> echo 4096 > /sys/block/sda/queue/read_ahead_kb
> echo 100 > /proc/sys/vm/swappiness
> echo 0 > /proc/sys/vm/dirty_ratio
> echo 0 > /proc/sys/vm/dirty_background_ratio

does not help.

Uniprocessor system, i386. 1.5G of RAM. 1G of it was in use by applications when testing.

Revision history for this message
In , loki (loki-linux-kernel-bugs) wrote :
Download full text (5.3 KiB)

Just wanted to add my two cents, since I'm experiencing this problem for a very long time now on various machines. But I just adopted myself by doing nothing on the OS when I have large file copies. But somehow I stumbelled upon a solution for this, maybe. I had this problems, the one that you are talking about in this bug and some others after I started using MD-raid. First I thought it was something with the IO-scheduler. Tried all schedulers there are, No-op, CFQ, Deadline, Anticipatory... Some helped a little bit some didn't. Then I thought it was something with the FS, tried ext2, ext3, XFS and now ext4. The same problem prevailed. When I started copying large files I had OS "hickups". Everything that had to do some disk work stopped. Music, and OpenGL where still functioning normal, only the responsivness of the system was gone for 1 or 2 secs. No browsing, no changing terminal windows. Then I thought that it had something to do with SWAP, too.

A few days ago I got meself a new machine, i7/950, 2 x SATA3 WD HD, 12GB of ram, and I installed myself a new OS, pure64 bit kernel 3.6.36. The thing I had to do was to copy my old data to the new disks, and reuse the old disks. Now the way I did it is very important. I took a 1 TB WD HD Sata3, made some partitions (6 to be exact) and compiled a new OS. Then I copied the old data from the old raid. The old raid was 4 partitions on each disk with MD RAID 1 on two part. each. While I copied the data I had this hickups also, with the new system.

I had this idea, since now it is possible to make partitioned raid with MD, and you can take whole disks for an array, to make a RAID 10 out of this four disks, 2 new ones and 2 old ones. So it was like "mdadm --create /dev/md0 ... --raid-devices=4 /dev/sda /dev/sdb..."
Worked like a charm. Then I partitioned the array "fdisk /dev/md0". No problem there. Then I copied the old stuff from the single hard, with 6 part, to the new array. Now here is the interesting bit. No hickups !!!. Throughput was around 120MB/s and the OS was working smoothly as a Babies bump. And it was the same OS, no changes at all regarding kernel compile, or something else. Reading throughput was 270MB/s (dd-test). But since rootfs won't work on a partitioned MD array (some kernel racing problem, but that's another story) I had to change my setup on the new HDs. So again I created 4 normal partitions on each disk, one from all HD's for bootfs RAID 1, another 4 for swap, another 4 for the rootfs also RAID1 and the last four ones for RAID10 which I partitioned into two seperate partitions (srv and home). And the hickups came back. So this isn't hardware related. Because this problem I can reproduce on many Hardware. A list will follow. It's not with file system or such because I used them all. It's not SWAP, because on this new machine it didn't start to swap while I was copying. But this problem always comes up when I make more partitions (normal ones) for md-raid.

The list of Hardware:

Quad-Core 6600, I think it was ICH7 chipset, 8GB Ram, 2 x WD10EARS I think the kernel was 2.6.20 something, 32bit system, LinuxFromScratch 6.1 or 2. Can't
remember. The system worked for three yrs to...

Read more...

Revision history for this message
In , michiel (michiel-linux-kernel-bugs) wrote :
Download full text (3.6 KiB)

To tackle this bug, there needs to be deep digging by the people who have these bugs, or good debug data has to be generated. And good info has to be given on the system.

Because there can be serveral bugs out there with the same symptoms as this one. To solve this bug, the best you could do individual bug reports with complete information. If you cannot give complete information, don't post that report, because then you are sure it cannot be solved. The more relevant info we get, the easier it becomes to detect the problems.

First install the newest kernel. Because that has the newest code and it will reduce the change that you'll run into an old and fixed bug. On time of writing it's: 2.6.36. Then test again, if it still happens, file a bug report.

First give correct system information:
Kernel: uname -a and cat /proc/version
Architecture: also from uname -a
Distro: name and version (could be handy for distro specific patches)
CPU info: cat /proc/cpuinfo | grep -e '\(model name\|bogomips\|MHz\|flags\)'
Mem info: cat /proc/meminfo | grep MemTotal
IO scheduler used: cat /sys/block/sdX/queue/scheduler

harddisk configuration: has raid, type of disks, speed of disk, partitions used and filessystems used

harddisk speed by hdparm:
hdparm -tT --direct /dev/sdX
hdparm -tT /dev/sdX

give dumps of the following commands:
lshw
dmesg
lsmod
cat /proc/swaps
cat /proc/meminfo
cat /proc/cmdline
cat /proc/config.gz | gunzip -

and give dumps of the following files:
for every disk:
 /sys/block/<disk>/queue/*
 /sys/block/<disk>/queue/iosched/*
/proc/sys/vm/*

This is for information, so the developers can detect what configuration the system has. And if there are known configurations or drivers which are bad and maybe giving the same symptoms, they will be noticed earlier.

If you want to use a script for that to help you collect the information, you can use the script located at: http://github.com/meghuizen/systeminfo which will build a tar.bz2 for you you can give as attachment, so you'll have complete information.

After that learn a bit on the I/O scheduler. To make it easier for yourself to debug and understand the situation:
  - http://www.linuxjournal.com/article/6931 (info on I/O schedulers)
  - http://www.devshed.com/c/a/BrainDump/Linux-IO-Schedulers/
  - http://kerneltrap.org/node/7637
  - kernel-source/Documentation/block/iosched-description.txt (see: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=tree;f=Documentation/block;hb=HEAD)
  - http://www.westnet.com/~gsmith/content/linux-pdflush.htm
  - http://www.docunext.com/blog/2009/10/debugging-and-reducing-io-wait.html

There are some tools which are very handy to use. The Linux Perf tool, is for example very handy to debug slowness and latencies and stuff in your system.

For some documentation on perf see:
  - https://perf.wiki.kernel.org/index.php/Main_Page
  - http://anton.ozlabs.org/blog/2010/01/10/using-perf-the-linux-performance-analysis-tool-on-ubuntu-karmic/
  - http://blog.fenrus.org/?p=5

perf --help gives you also a lot of information.

And other profiling tools:
  - http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/basic_...

Read more...

Revision history for this message
In , vi0oss (vi0oss-linux-kernel-bugs) wrote :

Tried with recent official master: 18cb657ca1bafe635f368346a1676fb04c512edf

http://vi-server.org/vi/12309_report/linux-2.6.36-09212-g18cb657_i686-sysinfo.tar.bz2

While running "pv /dev/zero > qqq" (http://vi-server.org/vi/12309_report/fill.txt), after about 2 GB I get pagefaults: http://vi-server.org/vi/12309_report/pagefault.txt http://vi-server.org/vi/12309_report/pagefault2.txt

If I try deadline or noop scheduler, I still get pagefaults, but after about 5 GB of copied data (and probably not that often)

In case of cfq the speed is jumping between 10 MB/s to 200 MB/s.

In case of deadline or noop it is more stable, around 40 MB/s

Trying
> echo 10 > /proc/sys/vm/vfs_cache_pressure
> echo 4096 > /sys/block/sda/queue/nr_requests
> echo 4096 > /sys/block/sda/queue/read_ahead_kb
> echo 100 > /proc/sys/vm/swappiness
> echo 0 > /proc/sys/vm/dirty_ratio
> echo 0 > /proc/sys/vm/dirty_background_ratio
on this kernel leads to low filling speed (lover than 10 MB/s, measured with pv)
Also after applying that settings applications (starting with gpg2) begin to hang in uninterruptible sleep with these settings. I cannot stop filling (probably it hangs too).

P.S. Using this kernel I also cannot start X server.

If somebody want, I can try other settings, other kernel revisions, patches, other config.

Revision history for this message
In , vi0oss (vi0oss-linux-kernel-bugs) wrote :

Checked a bit more with CONFIG_HZ_100 and CONFIG_PREEMPT_NONE: the same.

Filling rate with vm.dirty_ratio=0 is 1 MB/s (with periodic stalls of everything).

If I set vm.dirty_ratio to 1, it raises to 40 MB/s (stable). Long page faults when loading programs are present as well.

Was testing with only 200 MB (of 1.5G) of memory filled.

Revision history for this message
In , Bug (bug-redhat-bugs) wrote :

This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 12 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

The process we are following is described here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Revision history for this message
In , james (james-linux-kernel-bugs) wrote :

While it feels like a general improvement with 2.6.36 (no audio stutter with swap, and building a kernel no longer drags the system down (and fills up cache) like it did with 2.6.35), I still see cursor jerkiness when I first log in and start loading Firefox, Evolution and Pidgin (all at the same time).

Revision history for this message
In , Bug (bug-redhat-bugs) wrote :

Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Revision history for this message
In , snigurmd (snigurmd-linux-kernel-bugs) wrote :

I've come to face this problem when using the new cgroup-sheduler patch.
PC: Samsung NC10 netbook, kernel 2.6.36 vanilla, Zenwalk-snapshot.
WHen trying to upgrade some packets in X session and browsing the Net at the same time, the latency increases badly, but not constantly, just in hitches. If i stop serfing the Net and return to my packager - the system works further, otherwise it may hang so that i have to reboot with a sysrq-key.
If i turn off the cgroup scheduler in /sys - everything works fine.
The kernel is compiled with full preemption, 1000 hz timer.

Revision history for this message
In , vi0oss (vi0oss-linux-kernel-bugs) wrote :

Trying 162253844be6caa9ad8bd84562cb3271690ceca9 from zenstable/io-less-dirty-throttling-2.6.37 - the same.

Page faults of random processes (including Xorg) jump over 1 second while "pv /dev/zero > qqq".

The speed measurements by "pv" are fluctuating (from 64 kb/s to 120 MB/s; avg 40 MB/s) just like in usual 2.6.35-zen2

Revision history for this message
In , anonymous (anonymous-linux-kernel-bugs) wrote :

Reply-To: <email address hidden>

I'm currently Out Of Office. I'll be responding to emails, but except some delay in replies.

For any urgent issues, please contact my manager, Kugesh Veeraraghavan <email address hidden>

Changed in linux (Ubuntu):
assignee: Colin King (colin-king) → nobody
Revision history for this message
In , dik_again (dikagain-linux-kernel-bugs) wrote :

I have a reproducible test sequence for a 12309. It's easy:

Take a _SCRATCHED_ DVD. Put it into the drive and copy all files on it to a HDD. The bug comes early :)

The system freezes COMPLETELY at the time the drive read a scratched sectors.

Distro: Arch

Linux linuxhost 2.6.36-ARCH #1 SMP PREEMPT Fri Dec 10 20:01:53 UTC 2010 i686 AMD Athlon(TM) XP AuthenticAMD GNU/Linux

Drive (dmesg |grep TSS)

Feb 14 20:11:45 linuxhost kernel: scsi 2:0:0:0: CD-ROM TSSTcorp CDDVDW SH-S203B SB00 PQ: 0 ANSI: 5
Feb 10 12:05:36 linuxhost kernel: ata1.00: ATAPI: TSSTcorp CDDVDW SH-S203B, SB00, max UDMA/100

SATA-Controller (on the PCI-bus, drive connected to it):

00:0a.0 RAID bus controller: VIA Technologies, Inc. VT6421 IDE RAID Controller (rev 50)

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #521)
> I have a reproducible test sequence for a 12309. It's easy:
>
> Take a _SCRATCHED_ DVD. Put it into the drive and copy all files on it to a
> HDD. The bug comes early :)
>
> The system freezes COMPLETELY at the time the drive read a scratched sectors.

I suspect this has more to do with the IDE bus than with the interaction between the kernel's block layer and the VM.

Try this:
dd if=/dev/dvd of=/dev/null bs=2048

I bet you get the same freezes when it reaches the scratches.

Revision history for this message
In , dik_again (dikagain-linux-kernel-bugs) wrote :

I checked the same DVD with another DVD-Drive (The Drive is on the IDE-bus, and not on the SATA-bus). All was OK. Any freezes at all. Any ideas? Is this another bug?

Revision history for this message
In , dik_again (dikagain-linux-kernel-bugs) wrote :

>Try this:
>dd if=/dev/dvd of=/dev/null bs=2048

>I bet you get the same freezes when it reaches the scratches.

You're right.

But this is still the 12309 bug, isn't it?

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #524)
> But this is still the 12309 bug, isn't it?

No.

However, this bug report has turned into a dumping ground for anyone experiencing any lagginess, regardless of cause. The actual bug here is related to the kernel preferring to evict memory-mapped executable pages when a process dirties blocks faster than they can be flushed to disk. The apparent hangs in responsiveness are due to threads (particularly GUI threads) triggering page faults and being unable to make progress until their code is re-fetched from disk. The fix should be to block the writing process from dirtying any more blocks well before the kernel starts evicting mapped executable pages from memory, but so far no one has been able to make it work correctly in all cases (afaik).

Revision history for this message
In , dik_again (dikagain-linux-kernel-bugs) wrote :

Alos, I better make a new bugreport for my bug?

Revision history for this message
In , vi0oss (vi0oss-linux-kernel-bugs) wrote :

Trying kernel from writeback/dirty-throttling-v6

Nothing seems to be changed, as usual. Still lengthy "Page Faults" (and others) for firefox-bin while "pv /dev/zero > qqq".

Should provide more info about dirty-throttling-v6 (how to collect it)?

Revision history for this message
In , jaroslaw.fedewicz (jaroslaw.fedewicz-linux-kernel-bugs) wrote :

> The actual bug here is
> related to the kernel preferring to evict memory-mapped executable pages when
> a
> process dirties blocks faster than they can be flushed to disk.

Okay.

Let it be so. However, the subject line for this bug is

> Large I/O operations result in poor interactive performance and high iowait
> times

and that's what I'm experiencing now, rsync'ing a 100 GB worth of data with almost everything being there on the receiving side (thus making the receiving rsync read files heavily for the checksums). And I am dead sure this has nothing to do with the virtual memory as the swap is completely off (I would probably need to compile a different kernel with no support for swapping to reconfirm). iowait rises to 90%, LA shows disturbingly large numbers of up to 20, and unrelated processes like Xorg freeze, taking around 15 seconds to redraw the screen or move the mouse cursor or whatever.

What I thought this bug was about is that while one process does overwhelmingly large volumes of I/O, it should by no means impact other, unrelated processes which might not even use the disc subsystem, or not use the same disc. At least this is what Mac OS X does: for example, Transmission preallocates space for 40 GB worth of torrent data, naturally freezing in the process and ceasing to respond to any events, but then again, I can minimise its window, type code in Eclipse or anything — barely noticing the disc thrashing. I think I'm reiterating this example for the upmteenth time here, sorry if that's the case.

If I'm wrong and the bug #12309 was reduced to its VM part, I just request which one is about the above problem — high iowait affecting unrelated processes, with no swapping involved. Is that #13347? I cannot follow it because the submitter uses a dialect of English I'm not quite capable to parse. If there's no specific bug, I'll take the time to report it, because it bugs me a great deal, however I'm afraid I'll have to repeat most of the tests already conducted here.

Please don't take it as if I'm trying to offend anyone, because I'm not. I just want to know where does go the specific symptom as described above.

Thank you all for every effort to have it resolved.

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :
Download full text (3.1 KiB)

@Yaroslav: Your misconception is that having swap disabled means that memory pages are never backed by disk blocks. That is simply not true. All it means is that *anonymous* pages cannot be backed by disk.

All Linux kernels launch processes from disk (via execve(2)) by memory-mapping the executable image on disk and then jumping to the entry point address in the mapped image. Since the entry point address is in a non-resident page, the CPU's attempt to fetch an instruction from it triggers a page fault, which the kernel then handles by loading the needed page (and usually several more) from disk.

When physical memory becomes scarce, the kernel has several tricks it may employ to attempt to free up memory. One of the first of these tricks is dropping cached blocks from the block layer and cached directory entries from the file system layer, which means that those blocks and dentries will have to be fetched from disk the next time they are accessed. One of the last tricks the kernel has is the OOM killer, which selects the "most offending" process and KILLs it in order to reclaim the memory it was using.

Somewhere in between those two tricks, the kernel has another trick it attempts for freeing up physical memory. It can force memory pages out to disk. If the system has swap enabled, the kernel may force anonymous pages (e.g., process heaps and stacks) out to disk. In all cases, however, the kernel may also choose to force memory-mapped pages out to disk. If those memory-mapped pages are read-only (such as is the case with executable images), then "forcing them out to disk" really just means dropping them from physical memory, since they can always be fetched back in later.

So, what does this mean in the context of this bug? The process that's hitting the disk a lot (usually it's dirtying blocks, but maybe it's possible that this happens even if it's just reading blocks) causes RAM to fill up with disk blocks. The kernel starts attempting its tricks to free up physical memory. One of those tricks is dropping memory-mapped pages from RAM, since they can always be fetched back into RAM from disk later. Then you the user switch applications or click on a button in the GUI or try to log into an SSH session, and what happens? Page fault! The code for repainting the X11 window or handling the button click or spawning a login session is not resident in memory because it was forced out by the kernel. That code now must be refetched from disk to satisfy the page fault, but uh oh, the disk is VERY busy and has very long queue depths, so it will be a while before the needed pages can be fetched. And at the same time as those pages are being fetched, the kernel is evicting other memory-mapped pages from RAM, so the responsiveness problem is just going to persist until the pressure on RAM subsides.

Ideally, the kernel should not allow so many blocks to be dirtied that it has to resort to dropping memory-mapped pages from RAM. The dirty_ratio knob is supposed to control how much of RAM a process is allowed to fill with dirty blocks before it's forced to write them to disk itself (synchronously), but that does not appear to be working p...

Read more...

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

Incidentally, one reason this bug seems to manifest a lot more on 64-bit systems than on 32-bit systems is that 64-bit systems use Position-Independent Code (PIC) in their shared libraries universally, whereas 32-bit systems usually don't. Not using PIC means that 32-bit systems usually have to perform relocations throughout their shared libraries upon memory-mapping them, and those relocations cause private (anonymous) copies of those pages to be created, and those anonymous pages cannot be forced out to disk on systems without swap, so accessing those pages can never cause page faults. On 64-bit systems, PIC virtually eliminates the need to perform relocations in shared libraries, meaning most mappings of shared-library code are directly backed by the images on the disk and thus *may* be forced out of RAM and *may* cause page faults. In principle, using PIC (on 64-bit systems, which have new addressing modes to make it efficient) is a good idea because it means only one copy of a library needs to be in RAM, regardless of how many processes map it, rather than one relocated, private copy for each process, but because of this bug, *not* making private copies of the library code is what's killing us, as the only copy we have in memory is evictable. Please note, I am not arguing that the kernel should be making private copies of all executable pages; that would be the wrong solution. A better solution would be to prevent processes from dirtying so much RAM that the kernel has to start evicting pages that were memory-mapped by execve or dlopen (but not by plain old mmap!).

Revision history for this message
In , jaroslaw.fedewicz (jaroslaw.fedewicz-linux-kernel-bugs) wrote :

Thanks for prompt reply and the patience to explain these things,

but then there's one more misconception on my side in a desperate need for debunking. And it's about the I/O queues.

This misconception starts from a suggestion that not all data are equal. For example, non-resident executable pages are tier-0. I/O buffers for application usage like those for read(), write() and friends are tier-1. If there are no priorities on the queue, we cannot tell the origins of I/O requests apart and thus get what we have: swapping a process in has to wait until the queue is emptied by a disk-hungry application beast which just happened to fill it up.

If we prioritize the queue and find a way to tell swap-in reads from application reads (say), on the other hand, it might improve interactive responsiveness. And the expense of having a tiered queue might be mitigated by employing it only on the media which has at least one mmapp'ed process. I say "it might improve things" because the solution is so obvious, in fact, that I have little doubt it has been thoroughfully thought through and ultimately rejected.

And I have no doubt that every folk who gets a single line of code accepted and committed into mainline is smarter than me in this respect[1] so this must have popped up a while ago.

[1] I'm no kernel hacker at all, just your average applications developer.

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

@Yaroslav: I agree. I've had the same thoughts regarding priority in the I/O queues. The biggest problem with this approach is that much of the queues actually sit inside the hardware nowadays. SCSI TCQ (tagged command queuing) and SATA NCQ (native command queuing) have exacerbated this. The Linux kernel can't do anything to prioritize queues inside the hardware, but it can limit how much of the hardware queue it will use, thus effectively keeping the queue in software only. Some proposed workarounds to this bug 12309 involve reducing the depth of the hardware queue that Linux is allowed to use, and that does seem to improve the worst case, although it severely degrades the common case.

Another workaround might be to prevent the kernel from evicting executable memory-mapped pages in the first place. This would be only a partial solution, though, as applications often memory map resources that are not executable (for example, fonts, pixmaps, databases), so their responsiveness could hang on page faults for those resources just as readily as on page faults for code.

Revision history for this message
In , jaroslaw.fedewicz (jaroslaw.fedewicz-linux-kernel-bugs) wrote :

You are right about the workaround, but having a queue prioritised would be of help when, despite all workarounds, pages were actually evicted.

I actually imagine it as a 4-tier queue: tier 0 for realtime processes, 1 for swap-ins we are talking about now, 2 for every other virtual memory operations, and 3 for everything else (or count 2 and 3 as everything else, maybe).

My question then will be as follows:

yes, we cannot control the commands queueing once they enter the hardware. But if we happen to know the hardware command queue size (which we do) and if we are able to tell how full it currently is (which I'm not quite sure about but I think it can be figured out), we could split it so that every tier is permitted to fill no more than some percentage of the hardware queue. It would of course hit average case performance, but still guarantee some bandwidth for higher tier I/O which is a good thing IMHO.

Sorry for bugging and probably ignorance, but I really want this nailed.

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

To everyone interested in this bug:

An easy and reliable way to demonstrate the issues surrounding this bug (on a system without anonymous swap) is to mount a tmpfs that is sized as large as your physical RAM. Then start writing to it (slowly!!!). The kernel will be unable to flush those blocks to disk, as they are not backed by disk. As you continue writing to the tmpfs, the kernel will gradually evict everything else in your block cache and file system cache.

At some point, the kernel will have run out of caches to evict and will start evicting memory-mapped pages. You'll know this has happened when the system responsiveness comes to a crawl and your disk starts thrashing. Yes, your disk will thrash, even though you're only writing to a tmpfs. The thrashing is due to all the page-ins of executable pages that are being accessed as various processes on your system struggle to keep executing their background threads and event processing loops.

If your writer process continues writing to the tmpfs, your system will become completely unusable. If you're lucky, eventually the kernel's OOM killer will be invoked. The OOM killer probably won't choose your tmpfs writer as its victim, though, so you'll have only a short time to kill the writer yourself before your system grinds to a halt again. If you do manage to get it killed, you can simply unmount the tmpfs, and everything will return to normal in short order. You will notice a bit of lag the first time you switch back to other applications that were running, as they will trigger page faults to get their code loaded back into RAM, but once that's done, everything will be as usual.

Revision history for this message
In , zenith22.22.22 (zenith22.22.22-linux-kernel-bugs) wrote :

It would have made sense if only starting new processes was slow. Copying large volumes of data slows down even mouse cursor, where Xorg HID driver already sits in memory. If what you've described affects driver already in memory, entire architecture has to be abandoned. So to say, definition of the problem, not an excuse.

Revision history for this message
In , jaroslaw.fedewicz (jaroslaw.fedewicz-linux-kernel-bugs) wrote :

Hm, I then have another wild suggestion.

It is in fact a very rare event that a process needs to hang in memory but wake up once in a blue moon, so that it can be harmlessly paged out and not bring the system to a halt. From my desktop experience I can only remember LibreOffice sitting on my long-running machine and be actually used once in two weeks or so.

If the problem is really so grave that an often-running process (like Xorg!) is selected by the kernel to be paged out, why not work this around by disabling evicting processes' pages altogether? I think it must be somewhat easier than designing an over-engineered strategy for choosing what pages to throw away, test it over a couple years, find bugs in the very design, throw it away, design another one and so on.

I would love to see a flag which I could set per control group. If the flag is set, pages owned by processes in that cgroup are never swapped out. Combined with pessimistic overcommit policy, it could help at least a bit.

Or at least worth a try.

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #535)
> It would have made sense if only starting new processes was slow. Copying
> large
> volumes of data slows down even mouse cursor, where Xorg HID driver already
> sits in memory. If what you've described affects driver already in memory,
> entire architecture has to be abandoned. So to say, definition of the
> problem,
> not an excuse.

If you're seeing the mouse cursor lag/skip while copying large volumes of data, an alternative explanation could be that you're using PIO mode for your data transfers rather than DMA. However, as you identify, it's possible that the X.org driver that handles the mouse input is indeed being paged out, and that would result in mouse interrupts triggering page faults, and the mouse cursor would not update on screen until the code for doing so had been paged back in.

To say the entire architecture must be abandoned is too extreme. Memory-mapping executable images is a very efficient mechanism that ordinarily works beautifully. This bug is creating pathological conditions that should never occur.

(In reply to comment #536)
> If the problem is really so grave that an often-running process (like Xorg!)
> is
> selected by the kernel to be paged out, why not work this around by disabling
> evicting processes' pages altogether?

You can't do that. Consider a process that maps a 1 TB file into memory and then starts randomly reading from it, thus causing more and more of the file to be loaded from disk into physical memory. You *must* allow pages to be evicted, or you will run out of RAM.

Don't try to solve a problem that doesn't exist. The actual problem here is that the block layer is using too much RAM for dirty (or possibly even clean) blocks. To demonstrate to yourself that this is so, you may try another of the proposed workarounds, which is to mount your file system in "sync" mode, which causes all file writes to be performed synchronously rather than being buffered and written back later. Under that constraint, you will never run into this bug, because the block layer is never allowed to use so much RAM that the kernel starts paging out "hot" memory-mapped pages. (By "hot," I mean pages that are regularly being accessed, such that you would notice if they had to be paged back in from disk.)

Revision history for this message
In , jaroslaw.fedewicz (jaroslaw.fedewicz-linux-kernel-bugs) wrote :

Okay, sync might work, but it also would make filesystems slow as hell and contribute to media wear from another side. If what you say is the case, and I have no reason for disbelief, then there must be a way to limit the number of dirty blocks (and total blocks) which may exist before buffers are flushed. E. g., there's X seconds of commit interval or Y dirty blocks, whichever comes first, and a max Z buffered blocks in total per device or per system. This would be 'almost sync', I think, and it would solve one more problem with USB flash media.

The problem is that too big write buffers tend to be flushed at a sub-optimal speed, thus increasing the total time needed to copy and sync the data. Again, this does not occur neither with Windows nor with OS X. And they don't mount 'sync'; they buffer writes (which is a good thing with any device with expensive and wearsome writes), it's just that their buffers are considerably smaller in size than those of Linux.

I'd be happy to know that a solution of limiting buffer sizes exists, this at least would enable us to fine-tune the system so that in 90% of use cases the problem wouldn't appear, and that it would appear only in the cases where it's tough anyway.

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

@Yaroslav: There is already a knob for tuning the maximum amount of RAM that may be used for holding dirty blocks.

From Documentation/sysctl/vm.txt:
> dirty_ratio
>
> Contains, as a percentage of total system memory, the number of pages at
> which
> a process which is generating disk writes will itself start writing out dirty
> data.

The intent is as you describe: asynchronous writing until dirty_ratio is reached, and then synchronous writing only. "dirty_ratio" is 10% by default. You can test if it's working by starting a large write to disk (`dd if=/dev/zero of=/bigfile bs=1M`) and monitoring the "Dirty" counter in /proc/meminfo (`watch grep Dirty /proc/meminfo`).

For what it's worth, it does work for me (and I haven't seen this bug manifest on my system in quite a while). I'm running Linux 2.6.36-gentoo-r5. I can still get the unresponsiveness and disk thrashing to happen using the tmpfs test case I described in comment #534, but that's not a failing of the kernel; that's a failing of the user (filling a tmpfs too much).

Revision history for this message
In , zenith22.22.22 (zenith22.22.22-linux-kernel-bugs) wrote :

(In reply to comment #537)
> an alternative explanation could be that you're using PIO mode for your data
> transfers rather than DMA. However, as you identify, it's possible that the

Excuse me, I am using you said? That would be like, specifically configuring the kernel to use PIO? Why would anyone do that?

[ 1.101092] ata2.00: ATA-7: WDC WD3200KS-00PFB0, 21.00M21, max UDMA/133
[ 1.101205] ata2.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 1), AA
[ 1.102146] ata2.00: configured for UDMA/133

[ 2.191312] ata13.00: ATA-7: ST3160215A, 3.AAD, max UDMA/100
[ 2.191343] ata13.00: 312581808 sectors, multi 16: LBA48
[ 2.266143] ata13.00: configured for UDMA/100

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #540)
> That would be like, specifically configuring
> the kernel to use PIO? Why would anyone do that?

The kernel can fall back to PIO mode if DMA mode is encountering problems (which can happen with faulty hardware). It happens with CD/DVD drives more often than with hard drives.

The next time you encounter system sluggishness and the mouse cursor starts skipping, see if you can get a readout of /proc/meminfo (while the sluggishness is happening). If your "MemFree" is very low *and* your "Cached" or "Dirty" is very high, then you might be suffering from this bug.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

dirty_ratio is not really a good measure of when to start flushing to disk. On a 24GB system, even 1% may be large for your disks to handle. Its better to configure dirty_bytes and dirty_background_bytes. dirty_bytes applies to the process which is doing the IO and dirty_background_bytes applies to kernel flush threads. When these thresholds are hit, if sum total of IO happening in the system is at a rate higher than your disks can take, you will start seeing very initial symptoms of this bug. The overall flow has been described well by Matt. I think this is precisely what's happening.

One way to avoid the issue would be set the dirty_bytes and dirty_background_bytes in such a way that their sum total is within reasonable ratio of your disk's sequential bandwidth. When a Linux system is in steady state with a reasonable uptime, it will likely use all RAM for read side caches. It will free up those on demand when it comes under memory pressure (which may be created by large IO). By keeping the (dirty_bytes + dirty_background_bytes) a multiple of your disk's raw speed, you can put a bound on the overall latency of the system. For example, I don't let dirty to go beyond 200MB on my laptop. It makes all my sequential operations bound by the sequential speed of the disk but lets the small random IO to be buffered (so, its better than "sync" mode of the FS in that sense).

Revision history for this message
In , nalimilan (nalimilan-linux-kernel-bugs) wrote :

And can we find a solution that would apply in the case where the system is running out of free RAM and starts swapping out everything? I often experienced total unresponsiveness of both X and the consoles when a program tries to use more RAM than is available, and I wasn't even even able to kill the process manually (forced reboot). Maybe that should be considered as a pathological case requiring just the OOM killer to be more aggressive - I don't know.

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #543)
> And can we find a solution that would apply in the case where the system is
> running out of free RAM and starts swapping out everything? I often
> experienced
> total unresponsiveness of both X and the consoles when a program tries to use
> more RAM than is available, and I wasn't even even able to kill the process
> manually (forced reboot). Maybe that should be considered as a pathological
> case requiring just the OOM killer to be more aggressive - I don't know.

If you have the Magic SysRq key enabled in your kernel, you could do AltGr+SysRq+F to invoke the OOM killer manually.

I do agree in principle, though, that the offending process should be denied the allocation of any additional memory before any frequently used memory-mapped pages start getting evicted from RAM.

One possible solution might be to set a threshold for the minimum number of memory-mapped pages that the kernel must allow to remain in RAM. As an example, setting such a knob to 100000 would mean that the kernel would not evict any memory-mapped pages if fewer than 100000 memory-mapped pages were resident in RAM. Assuming that the kernel uses a least-recently-used eviction policy, this would prevent the debilitating thrashing scenario that occurs when essentially all memory-mapped pages have been and continue to be evicted.

Revision history for this message
In , jaroslaw.fedewicz (jaroslaw.fedewicz-linux-kernel-bugs) wrote :

(In reply to comment #544)
> Assuming that the kernel uses a least-recently-used eviction
> policy, this would prevent the debilitating thrashing scenario that occurs
> when
> essentially all memory-mapped pages have been and continue to be evicted.

Given the fact that Xorg all too often falls victim to that, and it is active most of the time, I cannot help but assume something is wrong with the kernel's definition of "least recently used."

By the way, setting vm.overcommit_memory to 2 and overcommit_ratio to 80 seems to at least somewhat reduce the problem; the same rsync command which has triggered this bug (or similar bug if you prefer) now behaves a lot better, letting me type these words.

Revision history for this message
In , Andrej (andrej-redhat-bugs) wrote :

This bug is still present (in version F14):
2.6.35.6-48.fc14.x86_64

Copying big files is at beginning fast but gradually becoming slower and then stops at the end of the file, after a few moments (minutes or so) it continue and finally finish.

What should I do to gather more details?

Revision history for this message
In , zenith22.22.22 (zenith22.22.22-linux-kernel-bugs) wrote :

I find that amount of slowness strongly depends on the writing driver.
Today I had to evacuate Win7 machine onto Fedora14 and copying from NTFS to EXT3 was painful. Now I am returning the files back onto NTFS and there is no slowdown at all. Dig in the ext3 filesystem, it should be in the writing code.

Revision history for this message
In , vesok (vesok-linux-kernel-bugs) wrote :

This seems to be a hardware related issue, at least in some cases.
Can the other people experiencing it confirm whether they have a WD Greed hard disk?
Google search for "wd15eads firmware" reveals quite a few people having similar problems.
I have one of these hard disks and I was using it on a fanless VIA Samuel 2 (pre-686) CPU and I was seeing the high IOWait problem and associated poor performance. When I put the same hard disk in a dual AMD opteron it had the same problem.
Then I did a full backup and restore on a different hard disk. It is the same debian system on the same VIA cpu but now the high IOWait times are gone and the performance is adequate for the CPU.
I should point out that the kernel should not suffer poor overall performance during disk I/O even on flakey hardware, especially with swap disabled.
The offending hard disk is now blanked. I can run a few tests with it if somebody is interested.

Revision history for this message
In , zenith22.22.22 (zenith22.22.22-linux-kernel-bugs) wrote :

Blaming hardware is the lamest practice in IT world and it surely earns those who practice that great deal of disrespect.

Revision history for this message
In , powerman-asdf (powerman-asdf-linux-kernel-bugs) wrote :

(In reply to comment #548)
> Blaming hardware is the lamest practice in IT world and it surely earns those
> who practice that great deal of disrespect.

Vesselin Kostadinov doesn't blame hardware, he says this bug (or one of bugs discussed here) is hardware-dependent. I can confirm this too: initially I used
Barracuda 7200.10 320GB ST3320620AS, then I tried to replace it with Seagate Barracuda LP 2TB without success (nothing changed), then I replaced it with Samsung HD103UJ 1TB and this helps a lot - bug is still noticeable, but very very rarely and have much less impact on overall system performance. You can find more details about this in my comments on bug 13347.

Revision history for this message
In , loki (loki-linux-kernel-bugs) wrote :

Regarding the WEADS disks from WD. It has something to do with disk geometry. We had some problems with them as well. We have some 30 pieces of them. But actually it's not a problem it's more a RTFM thingie. I think that there's something on the WD site, not sure. To partition this disk under linux / Windows XP (Win 7 is automagically doing it) you have to use fdisk -H 224 -S 56 /dev/sd...
You can read my comment at https://bugzilla.kernel.org/show_bug.cgi?id=12309#c513
Two of the disks are green WD-s partitioned with the fdisk method. Until then I had also problems with speed where the HD-s only had a throughput of 2-5 MB/s. After the fdisk I had a throughput of up to 100 MB/s. But again the problem with this bug is not throughput it's if you start a big file copy or like dd if=/dev/zero of=test.img bs=1M count=5000 your desktop comes almost to a halt. But after some time I think that this even isn't a bug it's more a new kernel queing methodology. After entering this:

vm.swappiness=1
vm.dirty_background_ratio=1
vm.dirty_ratio=1

into sysctl.conf I almost don't have this problem anymore. I read a lot about this problem and as far as I can understand the new way the kernel is working is that it, depending on the above configuration, put's something first into RAM and then writes it to disk (very simplified). So if you have a lot of Ram (in my case 12GB) and the above configuration is per default 40% then the kernel is putting almost 5GB as cache into RAM. And then writes it to disk, and yes I have a very fast RAID system but even with 400MB/s I have to wait 10 secs, and more, in which he has to write it to disk. I forgot with which kernel version this started but I know that I checked it and that my problems with responsivness started after changing to this new kernel (methodology). So you can say that this is not a bug but merely a kernel configuration matter. Because with this new metodolgy a default configuration of vm doesn't work for all, especially with those with a lot of RAM.
And yes I would like that the old methodology would be integrated again into the new kernels but until then I'll try to circumvent this problem with understanding and configuring the kernel. The above sysctl configuration is working for me with the setup that I have in my comment #513 in this bug. There are slight hickups but nothing as severe as earlier when I couldn't do anything until the file writing finished.

Revision history for this message
In , mihel (mihel-linux-kernel-bugs) wrote :

Sorry for interrupting your research with my naive question, but does this bug have clear steps to reproduce it?

The initial comment says 'starting a new shell takes minutes' after the system is left with dd running for significant time.

But for me shells/browsers etc take just maybe 1 or 2 seconds longer to start after I have 'stress -d 1' or 'dd if=/dev/zero of=bigfile bs=1M' running for ~10 minutes (bigfile is 30Gb after my tests, dirty blocks quickly reach ~670M (3.67G RAM total) and stay there.

The small file test that I accidentally ran with TWO simultaneous bigfile dd processes in the background finished in 0.073s (or is this bad?):

$ dd if=/dev/zero of=/tmp/bigfile bs=1M count=30000 conv=fdatasync & sleep 30 ; time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync
[2] 27953
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.0718053 s, 57.0 kB/s

real 0m0.073s
user 0m0.001s
sys 0m0.001s

dd: writing `/tmp/bigfile': No space left on device
dd: writing `/var/tmp/bigfile': No space left on device
22891+0 records in
22890+0 records out
24002064384 bytes (24 GB) copied, 1211.53 s, 19.8 MB/s
21957+0 records in
21956+0 records out
23022534656 bytes (23 GB) copied, 1189.07 s, 19.4 MB/s

[1]- Exit 1 dd if=/dev/zero of=/var/tmp/bigfile bs=1M count=100000 conv=fdatasync
[2]+ Exit 1 dd if=/dev/zero of=/tmp/bigfile bs=1M count=30000 conv=fdatasync

I'm noticing loss of interactivity when my RAM gets filled up and swap grows >500M, but this bug is not about such case is it?

Could it be my HW on latest stable vanilla 2.6.38.2 amd64 (swappiness 20, the rest being defaults)? Or could I have just configured my kernel in some genius way?

[ 2.051391] ata1.00: ATA-8: HITACHI HTS545025B9A300, PB2ZC61H, max UDMA/100
[ 2.054162] ata1.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[ 2.065605] ata1.00: configured for UDMA/100
[ 2.087958] scsi 0:0:0:0: Direct-Access ATA HITACHI HTS54502 PB2Z PQ: 0 ANSI: 5

$ sudo hdparm -i /dev/sda

/dev/sda:

 Model=HITACHI HTS545025B9A300, FwRev=PB2ZC61H, SerialNo=100408PBNXXXXXXXXXX
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=DualPortCache, BuffSize=7208kB, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=488397168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4
 DMA modes: mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
 AdvancedPM=yes: mode=0x80 (128) WriteCache=enabled
 Drive conforms to: unknown: ATA/ATAPI-2,3,4,5,6,7

PS: I'm on ext3

Revision history for this message
In , mihel (mihel-linux-kernel-bugs) wrote :

controller in the previous comment was
        *-storage
             description: SATA controller
             product: Ibex Peak 6 port SATA AHCI Controller
             vendor: Intel Corporation
             physical id: 1f.2
             bus info: pci@0000:00:1f.2
             logical name: scsi0
             version: 06
             width: 32 bits
             clock: 66MHz
             capabilities: storage msi pm ahci_1.0 bus_master cap_list emulated
             configuration: driver=ahci latency=0
             resources: irq:41 ioport:1860(size=8) ioport:1814(size=4) ioport:1818(size=8) ioport:1810(size=4) ioport:1840(size=32) memory:f2727000-f27277ff(In reply to comment #551)

Revision history for this message
In , vesok (vesok-linux-kernel-bugs) wrote :
Download full text (3.8 KiB)

OK, the fun continues.

Installed the offending hard disk in another system, booted Fedora 14 live and the drive worked OK:
[root@localhost ~]# dd if=/dev/zero of=/dev/sd_ bs=1M count=4000 conv=fdatasync
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 50.0265 s, 83.8 MB/s

(Replaced /dev/sda with /dev/sd_ in case someone decides to copy/paste the command).

Then I booted Knoppix 5.1.1 (from 2007) and saw the fault. CPU usage was 49.7%wa (dual cpu) and had to interrupt dd because it was taking way too long. Then I tried again with a smaller file:

root@Knoppix:~# uname -a
Linux Knoppix 2.6.19 #7 SMP PREEMPT Sun Dec 17 22:01:07 CET 2006 i686 GNU/Linux
root@Knoppix:~# dd if=/dev/zero of=/dev/sd_ bs=1M count=40 conv=fdatasync
40+0 records in
40+0 records out
41943040 bytes (42 MB) copied, 20.8245 seconds, 2.0 MB/s

Then I booted Fedora again and saw the fault again:
[root@localhost ~]# uname -a
Linux localhost.localdomain 2.6.35.6-45.fc14.i686 #1 SMP Mon Oct 18 23:56:17 UTC 2010 i686 i686 i386 GNU/Linux
[root@localhost ~]# dd if=/dev/zero of=/dev/sd_ bs=1M count=40 conv=fdatasync
40+0 records in
40+0 records out
41943040 bytes (42 MB) copied, 20.3055 s, 2.1 MB/s

@ #548 From Zenith88:
Ignoring the possibility of a hardware fault when the evidence points that way surely brings those who practice that great deal of fruitless debugging and frustration.

@ #550 From D.M.
I don't think it is the "partition starts at the wrong sector" issue. In the dd commands listed above I was writing to the drive as a whole, without messing with partitions at all.
For the sake of it I decided to create a new partition and see what will happen:
[root@localhost ~]# fdisk -H 224 -S 56 /dev/sd_
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x9b81ad16.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Command action
   e extended
   p primary partition (1-4)
p
Partition number (1-4, default 1): 1
First sector (2048-2930275054, default 2048):
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-2930275054, default 2930275054): +10G

Command (m for help): p

Disk /dev/sda: 1500.3 GB, 1500300828160 bytes
224 heads, 56 sectors/track, 233599 cylinders, total 2930275055 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x9b81ad16

   Device Boot Start End Blocks Id System
/dev/sda1 2048 20973567 10485760 83 Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
[root@localhost ~]# mkfs.ext2 -q /dev/sda_
[root@localhost ~]# mount /dev/sda1 /mnt
[root@localhost ~]# dd if=/dev/zero of=/mnt/bigfile bs=1M count=100 conv=fdatasync
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 77.3839 s, 1.4 MB/s

I guess the perfor...

Read more...

Revision history for this message
In , zenith22.22.22 (zenith22.22.22-linux-kernel-bugs) wrote :

You can continue debating hard drives, or look into comparison of ntfs vs ext3 code on the cue from post #546 which is a reproducible test case. Your call.

Revision history for this message
In , loki (loki-linux-kernel-bugs) wrote :

Oleg: Unfortunatly no clear steps. If you read my comment #513 you'll see that I didn't have any troubles with whole disk software raid10. After that I thought that it was something file-system related but I tested ext2-ext4 and xfs and this is also answering your question Zenith. Same thing, no matter what.

   And regarding the Hardware it may be that this particular HD is broken and in the case of Kostadinov I even think it is a broken hardware problem because on one system (Fedora) it worked and then after using Knoppix and getting back to Fedora it didn't. I'm just mentioning which troubles we had with the green WDs, and not only under linux, until I read about this fdisk thing. Now I have two of them and they didn't give me any troubles when I had them in the whole disk RAID10 or when I had an older kernel, or now with the new kernel setting.

   But to get backto the substance again. Yes, if you go into dd-ing multiple RAM onto HD the system is coming to a halt. With the old kernels it was "I'm doing dd and the system automagically knows that firefox or mail or whatever is of more priority to me than dd, so he slows down dd a bit so firefox could get some time reading from the HD. Or maybe the queing was more fairly so all processes got some time raping the HD, I don't know, I'm not a kernel developer. I'm just a user and as a user I'm mentioning the diffs between the old and the new kernels." With the new kernel it's not it's he who's writing has all the power over the HD. But again that is more a perception than a fact.

   The difference I have to earlier, before I configured vm, is that wa was up to 98 and now it's up to max 45-50.

Revision history for this message
In , zenith22.22.22 (zenith22.22.22-linux-kernel-bugs) wrote :

You can deny reality however much you see fit, it won't change the fact that writing onto ext3 partition causes freeze, while writing to ntfs does not on the same system. And this is not a VM but physical machine. Denial of reality and passing the blame is what's causing this project to sit on its hands for 3 years.

Revision history for this message
In , nalimilan (nalimilan-linux-kernel-bugs) wrote :

There are probably different bugs at stake here, and investigating one doesn't mean denying the other. Please be more respectful of people that try to improve our understanding of the problem instead of ranting.

Just a guess: ntfs-3g driver is using FUSE, while ext3 driver is in kernelspace. *Maybe* this can explain the difference (ntfs-3g isn't considered as in-kernel as regards I/O scheduling).

Revision history for this message
In , loki (loki-linux-kernel-bugs) wrote :

I'm sorry if I offended you in any way. Again I'm not in denial, and I'm not blaming anyone I'm merely pointing out that it's not only a ext3 problem because I had the same problem on xfs, and that, as you pointed it out, the kernels from 3 years ago didn't have this kind of problem. And with vm I didn't mean Virtual Machine but virtual memory, because I was referring to the sysctl.conf (i.e.
...
vm.dirty_background_ratio = 1
vm.dirty_background_bytes = 0
vm.dirty_ratio = 1
vm.dirty_bytes = 0
vm.dirty_writeback_centisecs = 500
vm.dirty_expire_centisecs = 3000
...
and so on.)

Again, I'm sorry if I have offended you in any way.

Revision history for this message
In , mihel (mihel-linux-kernel-bugs) wrote :

Another attempt to narrow down the use case for the issue.
You are not going to get anywhere if you continue reporting issues against all of the different breeds of Linuxes. You never know how Fedora or Knoppix patched the kernel, and you should report issues with their kernels to them instead of posting your observations here.

As I see it the only way to trace down the issue is to use the same version of VANILLA kernel (preferably the latest) with different build and runtime configs.
I personally have ext3 compiled in the kernel - could it be the reason why I can't reproduce the issue?

Zenith88: would it take you a lot of effort to produce the latest not patched vanilla kernel with ext3 compiled into it and to see if it makes things any better for you?

Revision history for this message
fgr (f-gritsch) wrote :

uo, this bug has a very long history....
are you shure that this is the same problem that I hvae reported? because with 10.10 I did not have the problem, the read performance was as good as it is now in windows 7 on the same pc. It slowed down just after the update to 11.04!

What information do you need, to can investigate the problem? Can anybody give a hint?

Revision history for this message
Badcam (kiwicameron+launchpad) wrote :

I haven't had this issue since 10.10
Mint 10 is awesome.

Revision history for this message
Maxime Ritter (airmax) wrote : you have got new "show interests" from ladie

I am Nastya, 22 y.o,
I am looking for man to have a strong family.
Please let me know if you are ready :)))
I am on-line now,
my profile is here:

http://sonya201010.com.ua/?message_from=Nastya

Note!
New free services! check info at the site!
( to unsubscribe - please, click link and enter e-mail address .)

Revision history for this message
FriedChicken (domlyons) wrote : Re: since Ubuntu karmic Filetransfer to some USB Drives got realy slow

OT: Is there a way to report spam as in the message above?

Revision history for this message
nomnex (nomnex) wrote :

Yes, send a message to Nastya... Joke aside, tell Maxime to change is password!

Revision history for this message
Brad Figg (brad-figg) wrote : Unsupported series, setting status to "Won't Fix".

This bug was filed against a series that is no longer supported and so is being marked as Won't Fix. If this issue still exists in a supported series, please file a new bug.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: Confirmed → Won't Fix
Revision history for this message
Janusz (yorashtan2) wrote : Re: since Ubuntu karmic Filetransfer to some USB Drives got realy slow

Ubuntu team won't fix this bug as it affects all distributions.

Take a look at this, probably might help:

http://mailman.archlinux.org/pipermail/arch-general/2010-June/014470.html

Revision history for this message
In , Hedayat (hedayat-redhat-bugs) wrote :

This problem still persists in Fedora 16. Again, lowering dirty_ratio and dirty_background_ratio to 2 and 1 respectively (instead of 20 and 10) resulted in constant 4.5 MB/s speed while copying with the default settings the speed was going down... (I stopped it when it was around 1.5MB/s).

Revision history for this message
In , Dave (dave-redhat-bugs) wrote :

This is probably going to get fixed for real in 3.3, but there's a hack that might make things at least slightly better until then. I'll throw into the next 16 build.

Revision history for this message
In , Dave (dave-redhat-bugs) wrote :

oh, actually we have that hack in f16 since 3.1.2-0.rc1.1

Revision history for this message
In , Hedayat (hedayat-redhat-bugs) wrote :

IIRC, I experienced the problem on kernel-3.1.2-1.fc16.x86_64 :(
I hope that it'll be at least really fixed in 3.3.

Revision history for this message
In , oleksandr (oleksandr-linux-kernel-bugs) wrote :

It seems to be fixed in 3.2.

Revision history for this message
In , wolfram (wolfram-linux-kernel-bugs) wrote :

> It seems to be fixed in 3.2.

Somewhere in parallel universe I think.

Nothing changed for me on

> Intel Corporation 5 Series/3400 Series Chipset SMBus Controller

Revision history for this message
In , james (james-linux-kernel-bugs) wrote :

(In reply to comment #561)
> Nothing changed for me on
>
> > Intel Corporation 5 Series/3400 Series Chipset SMBus Controller

Nor here on my ICH8-based notebook, with 2GiB RAM. If anything, 3.2 seems worse than 3.1 when it comes to the ability of one process to binge out on dirtying pages, and then bring the rest of the system down to a snail's pace.

One consistent example case is unpacking to the local SATA drive an ISO image (using Nautilus, for example) stored on another drive. Compute-heavy processes with little disc access suffer (and even those without any I/O do --- CPU usage shoots right down).

Another one is a kernel build. The file cache goes bananas, and even with no other desktop applications loaded, everything gets paged out and it takes around a minute (in the worst case) for the unlock screen prompt to appear.

Revision history for this message
In , fedora (fedora-linux-kernel-bugs) wrote :

(In reply to comment #561)
> > It seems to be fixed in 3.2.
> Somewhere in parallel universe I think.

There are multiple issues that can lead to a behaviour like the one that is discussed in this bug.

A few patches that went into 3.2 make some situation better. But some problems were still known back then; see http://lwn.net/Articles/467328/

Fixes for those went into 3.3-rc1. Quoting from this weeks LWN.net kernel page (I'm quite sure Jonathan won't mind):

"""
There have been some significant changes made to the memory compaction code to avoid the lengthy stalls experienced by some users when writing data to slow devices (USB keys, for example). This problem was described in this article (http://lwn.net/Articles/467328/), but the solution has evolved considerably. By making a number of changes to how compaction works, the memory management hackers (and Mel Gorman in particular) were able to avoid disabling synchronous compaction, which had the unfortunate effect of reducing huge page usage. See this commit (
http://git.kernel.org/linus/a77ebd333cd810d7b680d544be88c875131c2bd3 ) for a lot of information on how this problem was addressed.
"""

IOW: Best to test 3.3-rc and report bugs if there are still issues.

While at it (and with a view from someone that is not very active in this bug tracker): I'd say opening a new bug and mentioning it here in this report might be the best way forward for any remaining issues, as the long history might be misleading/confusing when it comes to solving today's bugs. Just my 2 cent.

Revision history for this message
Adam Porter (alphapapa) wrote :

This article explains that the problem is the Transparent Huge Pages feature of the kernel: http://lwn.net/Articles/467328/

According to this, some of the fixes are in 3.2, and some in 3.3: https://bugzilla.kernel.org/show_bug.cgi?id=12309#c563

This is a horrible bug for desktop use, and for some server use as well. This should be a top priority bug. Ubuntu needs to backport the fixes or consider disabling Transparent Huge Pages in desktop kernels.

Having the entire system freeze for minutes at a time and file copy operations take hours instead of minutes is entirely unacceptable behavior. As Corbet said, it's enough to make users consider the benefits of proprietary operating systems (i.e. Bug #1).

P.S. Automated scripts marking critically important bugs as WONTFIX is also unacceptable behavior.

summary: - since Ubuntu karmic Filetransfer to some USB Drives got realy slow
+ USB file transfer causes system freezes; ops take hours instead of
+ minutes
Brad Figg (brad-figg)
tags: added: precise
Changed in linux (Ubuntu):
status: Won't Fix → Confirmed
Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.2.0-12.20)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-request-3.2.0-12.20
Revision history for this message
Adam Porter (alphapapa) wrote :

As I noted, the kernel bugzilla says that some of the fixes for this bug will be in 3.3. Since the request was to test 3.2, I think it's reasonable to assume that 3.2 will not solve the bug, and that fixes will need to be backported to 3.2. And even if 3.2 were to completely address it, the fixes would still need to be backported to earlier, supported kernels, because this is a very serious bug.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: bot-stop-nagging.
Revision history for this message
Brad Figg (brad-figg) wrote :

The upstream patcheset has been applied to a version of the Precise kernel. For those wishing to give it a spin to see if it addresses the issue for them you will find built versions at:

http://people.canonical.com/~bradf/lp500069/

Changed in linux:
importance: Unknown → High
status: Unknown → Confirmed
Revision history for this message
Rui Barreiros (rbarreiros) wrote :

I'm having this problem for more than a year, and it's terribly annoying.

Only of late I managed to focus on trying to get rid of this since my backup ammount of data done weekly is getting huge and I have to put the desktop doing his backups at night only to arrive sometimes at the morning with it still running, and forced to cancel to be able to work.

Indeed as Adam said, this is unacceptable, and this is comming from a 12+ years of using linux only as my OS and being a huge linux advocate.

I'm going to try today this 3.2 kernel and see how it goes, more news later.

Best regards,

P.S.
My disappointment is not towards ubuntu as I believe ubuntu actually brought linux to a wider range of users and inovated more in linux than any other distro, but mainly towards kernel development (which I gave up contributing at all due to most of their elitist behaviour).

Revision history for this message
Rui Barreiros (rbarreiros) wrote :

Hi there,

As promised, I'm right now using 3.2 but on Ocelot, and, it apparently the bug is fixed, as I'm writing this, I'm copying/deleting about 5gb on an external usb2.0 HDD and had no system lock ups yet and speed is acceptable.

I couldn't install due to dependency issues obviously linux-tools and linux-headers (although I think linux-headers-3.2.0-13-generic_3.2.0-13.21~lp500069_amd64.deb has a circular dependency, maybe it's a bug ?)

I'll try to build all this packages here in ocelot and start using them to better test it.

Best regards,

Revision history for this message
bth73 (bth1969) wrote :

Wow, this sucks. I too am having the same problem with mint 9x64. How is it that such basic, basic functions can be handled so badly? Ubuntu now sucks and it seems that Mint is no better. Taking over 2 hours to transfer 7.3gigs to a 8gig stick (fat32). USB2 not USB1. What is up? Will we ever have a OS that just works? Sh%^ I'm about to go buy Win 7 or start beta testing Win8 to find a system that can do BASIC FUNCTIONS, IE: open files, manipulate them and move and transfer to different hard drives.
TRANSFERRING AT THE BLAZING SPEED OF 930 KB/SEC.
1TB TRANSFER TO NEW WD 1TB 2.5" USB DRIVE TOOK OVER 19HRS!
WAY TO GO PROGRAMMERS! REALLY YOU SHOULD BE PROUD.

Revision history for this message
bth73 (bth1969) wrote :

It is like designing a car that the wheels and tires fall off every 2 miles. The radio works and the motor runs fine, BUT don't try to go anywhere cause there is no tires or wheels - just axles. Or better yet an airplane with no wings.

Revision history for this message
bth73 (bth1969) wrote :

740 KB/sec.
 sudo apt-get upgrade only hangs with no progress. Probably would only BORK the system anyway.

Revision history for this message
FriedChicken (domlyons) wrote :

Linux 3.2 contains some fixes and Linux 3.3 is said to finaly fix it.

@bth73:
Spamming is no solution.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

The problem is really fixed in 3.3rc4. I installed two guest systems on a first generation ssd. The ssd was only for the virtualisation guest. My system is on a >40000 IOPS ssd.
The first installation was done with kernel 3.2.6, in which the long stalls up to 10 seconds reappeared. Even as bad as in kernel 2.6.2[4-9].
The second installation was done with kernel 3.3rc4. I could even work in an other running virtualisation guest. It's really great. Thanks to all people involved in solving this bug.

Revision history for this message
In , oleksandr (oleksandr-linux-kernel-bugs) wrote :

Could someone else prove it?

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :
Download full text (3.7 KiB)

(In reply to comment #564)
> Thanks to all people involved in solving this bug.

Does anyone have a link to a discussion list post or a technical article detailing the theory behind the solution to this bug? Since this "bug" encompasses so many scenarios, I have doubts about whether all of them have indeed been resolved. I'm glad one person's problem went away, but until a kernel hacker can stand up and explain exactly what was wrong and how they fixed it, I'm going to assume there are still lurking problems in Linux's I/O subsystem.

One problem we've seen and discussed in this thread is that large numbers of dirty blocks waiting to be flushed to disk can cause eviction of "hot" pages of code that are needed by interactive user processes, thus bringing the system to a state of thrashing in which processes continually trigger page faults because their actively executing code keeps being forced out of RAM by the large buffered write to disk. Even if this problem has been solved (presumably by fixing a bug in the code that is supposed to force a process to flush its own dirty pages to disk once dirty_ratio has been reached), there would still be the problem of the kernel's evicting hot pages from RAM so aggressively in low-memory conditions that interactivity of the system is compromised to the point where it's impossible for the user to resolve the memory shortage.

It's pretty easy to reproduce the thrashing scenario: just mount a tmpfs whose max size is close to the amount of physical memory in the system and start writing data to it. Eventually you may find that you are no longer able to do anything, even to give input focus to your terminal emulator so you can interrupt your writing process (or in some setups, even to move your mouse cursor on the screen), because your entire desktop environment and even the X server have been evicted from RAM and are continually paging back in from disk (and being immediately evicted again), hindering your ability to do anything. I've encountered this scenario while compiling Chromium in a tmpfs. I'd expect the OOM killer to activate, but instead I find that all of my running applications are responding at a snail's pace because they have to keep paging in bits of their program code from disk. I should mention that I run without swap.

I would think one way to solve the thrashing problem would be to introduce a kernel knob that would set how much time must elapse between a page being fetched from disk into RAM due to a page fault and that page becoming eligible for eviction from RAM. If set to, say, 30 seconds, then the user's interactive processes could retain a usable degree of interactivity, even under extremely low memory conditions. This would, of course, mean that the OOM killer would activate sooner than it does now, since pages that the kernel would presently choose to evict in order to free up RAM would be ineligible under this new time limit. Setting the knob to zero would yield the behavior we have now, in which the kernel is free to evict all unlocked pages.

I'll reiterate once more, as a refresher, that this was formerly not such a problem on 32-bit x86 systems because most library code ther...

Read more...

Revision history for this message
In , nalimilan (nalimilan-linux-kernel-bugs) wrote :
Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #567)
> Maybe this:
> http://lwn.net/Articles/467328/

Interesting. Thanks for the link. However, this article doesn't explain why we see thrashing and extremely degraded interactivity on systems that don't have HugeTLB support enabled in the kernel (such as mine). This reinforces the point that there are many scenarios that exhibit poor interactive responsiveness under heavy disk writing load.

Regarding this debate about the transparent huge pages, I have to wonder why the kernel would bother trying to create a huge page in a location where there are dirty pages waiting to be written to disk. Shouldn't it just choose some other area in RAM that doesn't intersect any dirty buffers? This isn't really the place for a discussion of page compaction, though, so I'll discourage anyone from responding to my idle musing here.

Revision history for this message
In , Josh (josh-redhat-bugs) wrote :

There were further fixes for this issue in 3.2. Is this problem still there on 3.2.7 or newer?

If anyone is willing to test 3.3-rc5, this build should also contain the fixes Dave mentioned:

http://koji.fedoraproject.org/koji/buildinfo?buildID=301620

Revision history for this message
Janusz (yorashtan2) wrote :

I discovered that using the noop scheduler helps. However, they seem to have finally fixed this bug.

I'm on Ubuntu 11.10 with 3.3.0-rc5 and these are my results (copying from an external usb drive):

rsync:
   733507584 100% 30.89MB/s 0:00:22 (xfer#1, to-check=0/1)

Somebody should verify with 3.2 as I guess this will be the kernel that will ship with Precise.

Revision history for this message
Janusz (yorashtan2) wrote :

I did that test with cfq.

Revision history for this message
Damir Butmir (d4m1r2) wrote :

This is still an issue in Ubuntu 11.10 (32 bit) with the most up to date kernel provided through update manager:

Linux Damir-Ubuntu 3.0.0-16-generic-pae #28-Ubuntu SMP Fri Jan 27 19:24:01 UTC 2012 i686 athlon i386 GNU/Linux

I do not get transfer speeds faster than 7-8mb/s to a 8GB external USB stick....I cannot believe this issue is so old and it hasn't been addressed yet, this is a critical bug!!!

tags: added: kernel-fixed-upstream
Revision history for this message
adri58 (adri58) wrote : Re: [Bug 500069] Re: USB file transfer causes system freezes; ops take hours instead of minutes

I filled several bugs almost 1 year ago and, today I still have the same
problem.
Even with the latest 3.2.5 kernel
Therefore, it seems that reporting problems is useless. That's my point of
view.
I think that slow usb transfer must be a highly critical bug, and most of
the effort should be put on it.
I have another bug pending to be solved (not critical), and there is no
solution yet.

Sorry for my English.
Bye!
 El 09/03/2012 18:12, "Joseph Salisbury" <email address hidden>
escribió:

> ** Tags added: kernel-fixed-upstream
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (477843).
> https://bugs.launchpad.net/bugs/500069
>
> Title:
> USB file transfer causes system freezes; ops take hours instead of
> minutes
>
> Status in The Linux Kernel:
> Confirmed
> Status in “linux” package in Ubuntu:
> Confirmed
> Status in “linux” package in Fedora:
> Unknown
>
> Bug description:
> USB Drive is a MP3 Player 2GB
>
> sbec@Diamant:~$ lsusb
> Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 003 Device 002: ID 046d:c50e Logitech, Inc. MX-1000 Cordless Mouse
> Receiver
> Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 001 Device 004: ID 0402:5661 ALi Corp.
> Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
> sbec@Diamant:~$
>
> Linux Diamant 2.6.31-15-generic #50-Ubuntu SMP Tue Nov 10 14:54:29 UTC
> 2009 i686 GNU/Linux
> Ubuntu 2.6.31-15.50-generic
>
> to test, i issued dd command:
> dd if=/dev/zero of=/media/usb-disk/test-file bs=32
>
> while dd is running i run dstat.... this is in the log file attached.
>
> other logs are also in the tar.gz file...
>
> there is a huge USB performance Bug report #1972262. this Report is
> something simular
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linux/+bug/500069/+subscriptions
>

Changed in linux (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
In , Dave (dave-redhat-bugs) wrote :

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Revision history for this message
In , gatekeeper.mail (gatekeeper.mail-linux-kernel-bugs) wrote :

Large IOW on writing/reading to/from any Hard drive disk still occurs. Wasting huge amounts of ticks on any disk IO while _waiting_ is nonsense.

Revision history for this message
Ming Lei (tom-leiming) wrote :

Anyway, if you still have this kind of slow usb problem, please
post the usbmon trace(see guide in below link), otherwise it is
difficult to say where is wrong.

[1], http://www.mjmwired.net/kernel/Documentation/usb/usbmon.txt

Thanks,

On Sat, Mar 10, 2012 at 3:08 AM, adri58 <email address hidden> wrote:
> I filled several bugs almost 1 year ago and, today I still have the same
> problem.
> Even with the latest 3.2.5 kernel
> Therefore, it seems that reporting problems is useless. That's my point of
> view.
> I think that slow usb transfer must be a highly critical bug, and most of
> the effort should be put on it.
> I have another bug pending to be solved (not critical), and there is no
> solution yet.
>
> Sorry for my English.

Revision history for this message
adri58 (adri58) wrote :
Download full text (6.9 KiB)

USBMON trace:

T: Bus=08 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0
D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1d6b ProdID=0001 Rev= 3.02
S: Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
S: Product=UHCI Host Controller
S: SerialNumber=0000:00:1d.2
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms

T: Bus=07 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0
D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1d6b ProdID=0001 Rev= 3.02
S: Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
S: Product=UHCI Host Controller
S: SerialNumber=0000:00:1d.1
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms

T: Bus=06 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0
D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1d6b ProdID=0001 Rev= 3.02
S: Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
S: Product=UHCI Host Controller
S: SerialNumber=0000:00:1d.0
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms

T: Bus=05 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
B: Alloc= 41/900 us ( 5%), #Int= 3, #Iso= 0
D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1d6b ProdID=0001 Rev= 3.02
S: Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
S: Product=UHCI Host Controller
S: SerialNumber=0000:00:1a.2
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms

T: Bus=05 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=1.5 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
P: Vendor=046d ProdID=c050 Rev=27.20
S: Manufacturer=Logitech
S: Product=USB-PS/2 Optical Mouse
C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr= 98mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=02 Driver=usbhid
E: Ad=81(I) Atr=03(Int.) MxPS= 5 Ivl=10ms

T: Bus=05 Lev=01 Prnt=01 Port=01 Cnt=02 Dev#= 3 Spd=1.5 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
P: Vendor=045e ProdID=00dd Rev= 1.73
S: Manufacturer=Microsoft
S: Product=Comfort Curve Keyboard 2000
C:* #Ifs= 2 Cfg#= 1 Atr=a0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=01 Driver=usbhid
E: Ad=81(I) Atr=03(Int.) MxPS= 8 Ivl=10ms
I:* If#= 1 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=00 Prot=00 Driver=usbhid
E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=10ms

T: Bus=04 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0
D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1d6b ProdID=0001 Rev= 3.02
S: Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
S: Product=UHCI Host Controller
S: SerialNumber=0000:00:1a.1
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00...

Read more...

Revision history for this message
Ming Lei (tom-leiming) wrote :
Download full text (5.1 KiB)

On Sat, Mar 31, 2012 at 9:32 PM, adri58 <email address hidden> wrote:
> USBMON trace:

The below is not usbmon trace at all, please read the doc in the link below

        http://www.mjmwired.net/kernel/Documentation/usb/usbmon.txt

then post out your usbmon trace.

Also you can refer to LP624510 about how to do it.

       https://bugs.launchpad.net/bugs/624510

Thanks,

>
> T:  Bus=08 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12   MxCh= 2
> B:  Alloc=  0/900 us ( 0%), #Int=  0, #Iso=  0
> D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
> P:  Vendor=1d6b ProdID=0001 Rev= 3.02
> S:  Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
> S:  Product=UHCI Host Controller
> S:  SerialNumber=0000:00:1d.2
> C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
> E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms
>
> T:  Bus=07 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12   MxCh= 2
> B:  Alloc=  0/900 us ( 0%), #Int=  0, #Iso=  0
> D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
> P:  Vendor=1d6b ProdID=0001 Rev= 3.02
> S:  Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
> S:  Product=UHCI Host Controller
> S:  SerialNumber=0000:00:1d.1
> C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
> E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms
>
> T:  Bus=06 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12   MxCh= 2
> B:  Alloc=  0/900 us ( 0%), #Int=  0, #Iso=  0
> D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
> P:  Vendor=1d6b ProdID=0001 Rev= 3.02
> S:  Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
> S:  Product=UHCI Host Controller
> S:  SerialNumber=0000:00:1d.0
> C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
> E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms
>
> T:  Bus=05 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12   MxCh= 2
> B:  Alloc= 41/900 us ( 5%), #Int=  3, #Iso=  0
> D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
> P:  Vendor=1d6b ProdID=0001 Rev= 3.02
> S:  Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
> S:  Product=UHCI Host Controller
> S:  SerialNumber=0000:00:1a.2
> C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
> E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms
>
> T:  Bus=05 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  2 Spd=1.5  MxCh= 0
> D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
> P:  Vendor=046d ProdID=c050 Rev=27.20
> S:  Manufacturer=Logitech
> S:  Product=USB-PS/2 Optical Mouse
> C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr= 98mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=01 Prot=02 Driver=usbhid
> E:  Ad=81(I) Atr=03(Int.) MxPS=   5 Ivl=10ms
>
> T:  Bus=05 Lev=01 Prnt=01 Port=01 Cnt=02 Dev#=  3 Spd=1.5  MxCh= 0
> D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
> P:  Vendor=045e ProdID=00dd Rev= 1.73
> S:  Manufacturer=Microsoft
> S:  Product=Comfort Curve Keyboard 2000
> C:* #Ifs= 2 Cfg#= 1 Atr=a0 MxPwr=100mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=01 Prot=01 Driver=usbhid
> E:  Ad=81(I) Atr=03(Int.) MxPS=   8 Ivl=10ms
> I:* If#= 1 Alt= 0 #EPs= 1 Cls=03(HID...

Read more...

Revision history for this message
elatllat (elatllat) wrote :

maybe Ming is trying to say, do this:

1) started a capture using this command:
 cat /sys/kernel/debug/usb/usbmon/1u > /tmp/1u.mon.out

2) connect the external drive

3) copy a file to the external drive

4) kill the capture with CTRL-C

5) zip and add attach 1u.mon.out.zip here.

Revision history for this message
Ming Lei (tom-leiming) wrote :

On Sat, Mar 31, 2012 at 11:55 PM, Dmole <email address hidden> wrote:
> maybe Ming is trying to say, do this:
>
> 1) started a capture using this command:
>  cat /sys/kernel/debug/usb/usbmon/1u > /tmp/1u.mon.out
>
> 2) connect the external drive
>
> 3) copy a file to the external drive
>
> 4) kill the capture with CTRL-C
>
> 5) zip and add attach 1u.mon.out.zip here.

Exactly, that is just what I wanted.

Thanks,

Revision history for this message
adri58 (adri58) wrote :
  • 1u.mon.out Edit (4.2 MiB, application/octet-stream; name="1u.mon.out")

Here's the file

2012/4/1 Ming Lei <email address hidden>

> cat /sys/kernel/debug/usb/usbmon/1u > /tmp/1u.mon.out
>

Revision history for this message
Ming Lei (tom-leiming) wrote :

On Sun, Apr 1, 2012 at 1:15 PM, adri58 <email address hidden> wrote:
https://bugs.launchpad.net/bugs/500069/+attachment/2980204/+files/1u.mon.out

adri58, thanks for your post.

From your usbmon trace, I found that it may take about ~22ms averagely
to complete writing 120KB[1] into your usb mass storage, so the max write
performance is about 5.3MB/sec, for example:

/*send WRITE cmd from host to usb mass storage device*/
ffff880037674d40 905709519 S Bo:2:007:2 -115 31 = 55534243 9f080000
00e00100 00000a2a 00000457 c60000f0 00000000 000000
ffff880037674d40 905709611 C Bo:2:007:2 0 31 >

/*write 120KB data to usb mass storage device*/
ffff8801139ac080 905709619 S Bo:2:007:2 -115 122880 = 831683e5
c00e55d7 83e9c00e 95c1f2bf fb0300c6 81908c93 c98144a9 5980441a
ffff8801139ac080 905731863 C Bo:2:007:2 0 122880 >

/*read the status of writing operation*/
ffff880037674d40 905731871 S Bi:2:007:1 -115 13 <
ffff880037674d40 905733112 C Bi:2:007:1 0 13 = 55534253 9f080000 00000000 00

The above 3 steps are an intact procedures to write 120KB into usb
mass storage device.

Also looks no any error information is found in your trace, so your problem
should be that the usb mass storage is slow device, especially wrt. writing
performance.

I suggest you to do some tests on windows to see if you can get same
performance with ubuntu.

[1], 120KB is the max transfer unit per scsi command, also it is the most
frequent transfer unit in linux usb mass storage read/write.

Thanks
--
Ming Lei

Revision history for this message
adri58 (adri58) wrote :
Download full text (3.1 KiB)

In Windows I have no problem at all. So, there must be something wrong with
the Linux kernel

2012/4/2 Ming Lei <email address hidden>

> On Sun, Apr 1, 2012 at 1:15 PM, adri58 <email address hidden> wrote:
>
> https://bugs.launchpad.net/bugs/500069/+attachment/2980204/+files/1u.mon.out
>
> adri58, thanks for your post.
>
> >From your usbmon trace, I found that it may take about ~22ms averagely
> to complete writing 120KB[1] into your usb mass storage, so the max write
> performance is about 5.3MB/sec, for example:
>
> /*send WRITE cmd from host to usb mass storage device*/
> ffff880037674d40 905709519 S Bo:2:007:2 -115 31 = 55534243 9f080000
> 00e00100 00000a2a 00000457 c60000f0 00000000 000000
> ffff880037674d40 905709611 C Bo:2:007:2 0 31 >
>
> /*write 120KB data to usb mass storage device*/
> ffff8801139ac080 905709619 S Bo:2:007:2 -115 122880 = 831683e5
> c00e55d7 83e9c00e 95c1f2bf fb0300c6 81908c93 c98144a9 5980441a
> ffff8801139ac080 905731863 C Bo:2:007:2 0 122880 >
>
> /*read the status of writing operation*/
> ffff880037674d40 905731871 S Bi:2:007:1 -115 13 <
> ffff880037674d40 905733112 C Bi:2:007:1 0 13 = 55534253 9f080000 00000000
> 00
>
> The above 3 steps are an intact procedures to write 120KB into usb
> mass storage device.
>
> Also looks no any error information is found in your trace, so your problem
> should be that the usb mass storage is slow device, especially wrt. writing
> performance.
>
> I suggest you to do some tests on windows to see if you can get same
> performance with ubuntu.
>
> [1], 120KB is the max transfer unit per scsi command, also it is the most
> frequent transfer unit in linux usb mass storage read/write.
>
> Thanks
> --
> Ming Lei
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (477843).
> https://bugs.launchpad.net/bugs/500069
>
> Title:
> USB file transfer causes system freezes; ops take hours instead of
> minutes
>
> Status in The Linux Kernel:
> Confirmed
> Status in “linux” package in Ubuntu:
> Triaged
> Status in “linux” package in Fedora:
> Unknown
>
> Bug description:
> USB Drive is a MP3 Player 2GB
>
> sbec@Diamant:~$ lsusb
> Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 003 Device 002: ID 046d:c50e Logitech, Inc. MX-1000 Cordless Mouse
> Receiver
> Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 001 Device 004: ID 0402:5661 ALi Corp.
> Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
> sbec@Diamant:~$
>
> Linux Diamant 2.6.31-15-generic #50-Ubuntu SMP Tue Nov 10 14:54:29 UTC
> 2009 i686 GNU/Linux
> Ubuntu 2.6.31-15.50-generic
>
> to test, i issued dd command:
> dd if=/dev/zero of=/media/usb-disk/test-file bs=32
>
> while dd is running i run dstat.... this is in the log file attached.
>
> other logs are also in the tar.gz file...
>
> there is a huge USB performance Bug report #1972262. this Report is
> something simular
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linux/+bug/500069/+subscri...

Read more...

Revision history for this message
Ming Lei (tom-leiming) wrote :

On Mon, Apr 2, 2012 at 10:28 PM, adri58 <email address hidden> wrote:
> In Windows I have no problem at all.

OK, so what is your problem in linux? just you fell that the writing
is very slow?
or USB file transfer may cause system freezes?

> So, there must be something wrong with
> the Linux kernel

Could you post output of the below commands on your effected machine?

        uname -a
        lsusb -vv #plug your usb mass storage into machine
        lcpci -vv -n

Thanks
--
Ming Lei

Revision history for this message
elatllat (elatllat) wrote :
Download full text (3.2 KiB)

USB 1 is 001.5 MB/s
USB 2 is 060.0 MB/s
USB 3 is 625.0 MB/s

adri58 is getting 5.3 MB/s, I would bet that is the max speed of his device.
But if not please post the output of a disk speed testing tool from some other OS.

Speed tests showing how slow flash drives are:

---------------------------------------------------------------------------------------------------------------------------
--flash fat32 drive

Darwin imac 11.3.0 Darwin Kernel Version 11.3.0: Thu Jan 12 18:47:41 PST 2012; root:xnu-1699.24.23~1/RELEASE_X86_64 x86_64

writing:
1024+0 records in
1024+0 records out
102400000 bytes transferred in 32.315127 secs (3168795 bytes/sec)

reading:
1024+0 records in
1024+0 records out
102400000 bytes transferred in 3.597781 secs (28461989 bytes/sec)

---------------------------------------------------------------------------------------------------------------------------
--flash fat32 drive

Linux ubuntu 3.2.0-20-generic-pae #33-Ubuntu SMP Tue Mar 27 17:05:18 UTC 2012 i686 i686 i386 GNU/Linux

writing:
1024+0 records in
1024+0 records out
102400000 bytes (102 MB) copied, 47.9621 s, 2.1 MB/s

reading:
1024+0 records in
1024+0 records out
102400000 bytes (102 MB) copied, 3.9149 s, 26.2 MB/s

---------------------------------------------------------------------------------------------------------------------------
--sata ext4 drive

Linux ubuntu 2.6.32-39-generic #86-Ubuntu SMP Mon Feb 13 21:47:32 UTC 2012 i686 GNU/Linux

writing:
1024+0 records in
1024+0 records out
102400000 bytes (102 MB) copied, 0.759231 s, 135 MB/s

reading:
1024+0 records in
1024+0 records out
102400000 bytes (102 MB) copied, 1.74171 s, 58.8 MB/s

---------------------------------------------------------------------------------------------------------------------------
--usb ntfs drive

Linux ubuntu 2.6.32-39-generic #86-Ubuntu SMP Mon Feb 13 21:47:32 UTC 2012 i686 GNU/Linux

writing:
1024+0 records in
1024+0 records out
102400000 bytes (102 MB) copied, 3.08867 s, 33.2 MB/s

reading:
1024+0 records in
1024+0 records out
102400000 bytes (102 MB) copied, 3.27496 s, 31.3 MB/s

---------------------------------------------------------------------------------------------------------------------------
--3 usb lvm dm_crypt ext4 drives (dm_crypt is messing with the write speed here)

Linux ubuntu 2.6.32-39-generic #86-Ubuntu SMP Mon Feb 13 21:47:32 UTC 2012 i686 GNU/Linux

writing:
1024+0 records in
1024+0 records out
102400000 bytes (102 MB) copied, 0.463585 s, 221 MB/s

reading:
1024+0 records in
1024+0 records out
102400000 bytes (102 MB) copied, 3.04551 s, 33.6 MB/s

---------------------------------------------------------------------------------------------------------------------------
--the test used:

#!/bin/bash

#
# test_drive_speed.sh
#

OUT=./file1G.tmp
uname -a
echo "spin you right round" >$OUT;
sleep 1
echo -e "\nwriting:"
if [ "$1" == "-u" ]; then
 dd if=/dev/urandom of=/dev/shm/$OUT bs=100000 count=1024 >/dev/null 2>&1
 dd if=/dev/shm/$OUT of=$OUT bs=100000 count=1024
 rm /dev/shm/$OUT;
else
 dd if=/dev/zero of=$OUT bs=100000 count=1024
fi
sync
W=$(which purge);
if [ "$W" == "" ] ; then
 sudo echo 3 > /proc/sys/vm/drop_caches;
else
 purge;
fi
sleep 1
echo...

Read more...

Revision history for this message
adri58 (adri58) wrote :
  • lspci Edit (13.1 KiB, application/octet-stream; name=lspci)
  • lsusb Edit (19.3 KiB, application/octet-stream; name=lsusb)

First of all, thanks everybody for helping me with this problem.
When trying to transfer big files (>1GB) the system freezes until it
finishes. Anyway, speed is really slow comparing to windows:

uname -a

Linux adrian-PC 3.2.0-2-amd64 #1 SMP Tue Mar 20 18:36:37 UTC 2012 x86_64
GNU/Linux

I also attach the two outputs you asked for.

2012/4/2 Ming Lei <email address hidden>

> On Mon, Apr 2, 2012 at 10:28 PM, adri58 <email address hidden> wrote:
> > In Windows I have no problem at all.
>
> OK, so what is your problem in linux? just you fell that the writing
> is very slow?
> or USB file transfer may cause system freezes?
>
> > So, there must be something wrong with
> > the Linux kernel
>
> Could you post output of the below commands on your effected machine?
>
> uname -a
> lsusb -vv #plug your usb mass storage into machine
> lcpci -vv -n
>
> Thanks
> --
> Ming Lei
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (477843).
> https://bugs.launchpad.net/bugs/500069
>
> Title:
> USB file transfer causes system freezes; ops take hours instead of
> minutes
>
> Status in The Linux Kernel:
> Confirmed
> Status in “linux” package in Ubuntu:
> Triaged
> Status in “linux” package in Fedora:
> Unknown
>
> Bug description:
> USB Drive is a MP3 Player 2GB
>
> sbec@Diamant:~$ lsusb
> Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 003 Device 002: ID 046d:c50e Logitech, Inc. MX-1000 Cordless Mouse
> Receiver
> Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 001 Device 004: ID 0402:5661 ALi Corp.
> Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
> sbec@Diamant:~$
>
> Linux Diamant 2.6.31-15-generic #50-Ubuntu SMP Tue Nov 10 14:54:29 UTC
> 2009 i686 GNU/Linux
> Ubuntu 2.6.31-15.50-generic
>
> to test, i issued dd command:
> dd if=/dev/zero of=/media/usb-disk/test-file bs=32
>
> while dd is running i run dstat.... this is in the log file attached.
>
> other logs are also in the tar.gz file...
>
> there is a huge USB performance Bug report #1972262. this Report is
> something simular
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linux/+bug/500069/+subscriptions
>

Revision history for this message
adri58 (adri58) wrote :
Download full text (5.4 KiB)

And the output of the speed test

Linux adrian-PC 3.2.0-2-amd64 #1 SMP Tue Mar 20 18:36:37 UTC 2012 x86_64
GNU/Linux

writing:
1024+0 registros leídos
1024+0 registros escritos
102400000 bytes (102 MB) copiados, 0,315309 s, 325 MB/s
./script: línea 22: /proc/sys/vm/drop_caches: Permiso denegado

reading:
1024+0 registros leídos
1024+0 registros escritos
102400000 bytes (102 MB) copiados, 0,0303001 s, 3,4 GB/s

Anyway, I'll test it under Windows

2012/4/2 Dmole <email address hidden>

> USB 1 is 001.5 MB/s
> USB 2 is 060.0 MB/s
> USB 3 is 625.0 MB/s
>
> adri58 is getting 5.3 MB/s, I would bet that is the max speed of his
> device.
> But if not please post the output of a disk speed testing tool from some
> other OS.
>
> Speed tests showing how slow flash drives are:
>
>
>
> ---------------------------------------------------------------------------------------------------------------------------
> --flash fat32 drive
>
> Darwin imac 11.3.0 Darwin Kernel Version 11.3.0: Thu Jan 12 18:47:41 PST
> 2012; root:xnu-1699.24.23~1/RELEASE_X86_64 x86_64
>
> writing:
> 1024+0 records in
> 1024+0 records out
> 102400000 bytes transferred in 32.315127 secs (3168795 bytes/sec)
>
> reading:
> 1024+0 records in
> 1024+0 records out
> 102400000 bytes transferred in 3.597781 secs (28461989 bytes/sec)
>
>
> ---------------------------------------------------------------------------------------------------------------------------
> --flash fat32 drive
>
> Linux ubuntu 3.2.0-20-generic-pae #33-Ubuntu SMP Tue Mar 27 17:05:18 UTC
> 2012 i686 i686 i386 GNU/Linux
>
> writing:
> 1024+0 records in
> 1024+0 records out
> 102400000 bytes (102 MB) copied, 47.9621 s, 2.1 MB/s
>
> reading:
> 1024+0 records in
> 1024+0 records out
> 102400000 bytes (102 MB) copied, 3.9149 s, 26.2 MB/s
>
>
> ---------------------------------------------------------------------------------------------------------------------------
> --sata ext4 drive
>
> Linux ubuntu 2.6.32-39-generic #86-Ubuntu SMP Mon Feb 13 21:47:32 UTC
> 2012 i686 GNU/Linux
>
> writing:
> 1024+0 records in
> 1024+0 records out
> 102400000 bytes (102 MB) copied, 0.759231 s, 135 MB/s
>
> reading:
> 1024+0 records in
> 1024+0 records out
> 102400000 bytes (102 MB) copied, 1.74171 s, 58.8 MB/s
>
>
>
> ---------------------------------------------------------------------------------------------------------------------------
> --usb ntfs drive
>
> Linux ubuntu 2.6.32-39-generic #86-Ubuntu SMP Mon Feb 13 21:47:32 UTC
> 2012 i686 GNU/Linux
>
> writing:
> 1024+0 records in
> 1024+0 records out
> 102400000 bytes (102 MB) copied, 3.08867 s, 33.2 MB/s
>
> reading:
> 1024+0 records in
> 1024+0 records out
> 102400000 bytes (102 MB) copied, 3.27496 s, 31.3 MB/s
>
>
> ---------------------------------------------------------------------------------------------------------------------------
> --3 usb lvm dm_crypt ext4 drives (dm_crypt is messing with the write speed
> here)
>
> Linux ubuntu 2.6.32-39-generic #86-Ubuntu SMP Mon Feb 13 21:47:32 UTC
> 2012 i686 GNU/Linux
>
> writing:
> 1024+0 records in
> 1024+0 records out
> 102400000 bytes (102 MB) copied, 0.463585 s, 221 MB/s
>
> reading:
> 1024+0 records in
> 1024+0 records out
> 102400000 ...

Read more...

Revision history for this message
elatllat (elatllat) wrote :

Hi Adrian,

You need to run that script as root for it to work
(3,4 GB/s is your RAM speed not your disk speed)
also cd to your USB drive before running it.

Revision history for this message
Marius B. Kotsbak (mariusko) wrote :

Are you sure that this is not caused by the USB media mounted with "sync" option? (check with the "mount" command). I had the problem that Ubuntu mounted with sync and experienced this behavior.

Revision history for this message
adri58 (adri58) wrote :

No no, I checked that several times

2012/4/2 Marius Kotsbak <email address hidden>

> Are you sure that this is not caused by the USB media mounted with
> "sync" option? (check with the "mount" command). I had the problem that
> Ubuntu mounted with sync and experienced this behavior.
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (477843).
> https://bugs.launchpad.net/bugs/500069
>
> Title:
> USB file transfer causes system freezes; ops take hours instead of
> minutes
>
> Status in The Linux Kernel:
> Confirmed
> Status in “linux” package in Ubuntu:
> Triaged
> Status in “linux” package in Fedora:
> Unknown
>
> Bug description:
> USB Drive is a MP3 Player 2GB
>
> sbec@Diamant:~$ lsusb
> Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 003 Device 002: ID 046d:c50e Logitech, Inc. MX-1000 Cordless Mouse
> Receiver
> Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 001 Device 004: ID 0402:5661 ALi Corp.
> Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
> sbec@Diamant:~$
>
> Linux Diamant 2.6.31-15-generic #50-Ubuntu SMP Tue Nov 10 14:54:29 UTC
> 2009 i686 GNU/Linux
> Ubuntu 2.6.31-15.50-generic
>
> to test, i issued dd command:
> dd if=/dev/zero of=/media/usb-disk/test-file bs=32
>
> while dd is running i run dstat.... this is in the log file attached.
>
> other logs are also in the tar.gz file...
>
> there is a huge USB performance Bug report #1972262. this Report is
> something simular
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linux/+bug/500069/+subscriptions
>

Revision history for this message
adri58 (adri58) wrote :

Script run as root from the USB drive

Linux adrian-PC 3.2.0-2-amd64 #1 SMP Tue Mar 20 18:36:37 UTC 2012 x86_64
GNU/Linux
-e
writing:
/home/adrian/script: 12: [: unexpected operator
1024+0 registros leídos
1024+0 registros escritos
102400000 bytes (102 MB) copiados, 14,9117 s, 6,9 MB/s
/home/adrian/script: 21: [: unexpected operator
/home/adrian/script: 24: /home/adrian/script: purge: not found
-e
reading:
1024+0 registros leídos
1024+0 registros escritos
102400000 bytes (102 MB) copiados, 0,0297004 s, 3,4 GB/s

2012/4/2 Adrián Arévalo Tirado <email address hidden>

> No no, I checked that several times
>
> 2012/4/2 Marius Kotsbak <email address hidden>
>
>> Are you sure that this is not caused by the USB media mounted with
>> "sync" option? (check with the "mount" command). I had the problem that
>> Ubuntu mounted with sync and experienced this behavior.
>>
>> --
>> You received this bug notification because you are subscribed to a
>> duplicate bug report (477843).
>> https://bugs.launchpad.net/bugs/500069
>>
>> Title:
>> USB file transfer causes system freezes; ops take hours instead of
>> minutes
>>
>> Status in The Linux Kernel:
>> Confirmed
>> Status in “linux” package in Ubuntu:
>> Triaged
>> Status in “linux” package in Fedora:
>> Unknown
>>
>> Bug description:
>> USB Drive is a MP3 Player 2GB
>>
>> sbec@Diamant:~$ lsusb
>> Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
>> Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
>> Bus 003 Device 002: ID 046d:c50e Logitech, Inc. MX-1000 Cordless Mouse
>> Receiver
>> Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
>> Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
>> Bus 001 Device 004: ID 0402:5661 ALi Corp.
>> Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
>> sbec@Diamant:~$
>>
>> Linux Diamant 2.6.31-15-generic #50-Ubuntu SMP Tue Nov 10 14:54:29 UTC
>> 2009 i686 GNU/Linux
>> Ubuntu 2.6.31-15.50-generic
>>
>> to test, i issued dd command:
>> dd if=/dev/zero of=/media/usb-disk/test-file bs=32
>>
>> while dd is running i run dstat.... this is in the log file attached.
>>
>> other logs are also in the tar.gz file...
>>
>> there is a huge USB performance Bug report #1972262. this Report is
>> something simular
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/linux/+bug/500069/+subscriptions
>>
>
>

Revision history for this message
Ming Lei (tom-leiming) wrote :

On Tue, Apr 3, 2012 at 2:47 AM, adri58 <email address hidden> wrote:
> First of all, thanks everybody for helping me with this problem.
> When trying to transfer big files (>1GB) the system freezes until it
> finishes. Anyway, speed is really slow comparing to windows:

From the usbmon trace you posted, we can find the writing is slowly:
complete writing more than 700MB into usb mass storage device in
about 260sec.

So firstly, it is just very slow, and nothing to do with system freezes
(you can do other things when the transfer is ongoing)

Also could you make sure if you can get better writing performance
on windows in the same machine with the same usb mass storage
device?

Thanks
--
Ming Lei

Revision history for this message
philinux (philcb) wrote :

I've just tested this in Precise. Copy and pasting a 734MB iso from internal drive to second internal drive took less than ten seconds.

Copying same iso to 4gig usb stick started of well at about 15mb/sec and has now slowed to 2.8mb/sec

The copy is still not finished as I write this. I'ts taken about 3 mins to copy the iso to usb.

System does not freeze though.

Revision history for this message
elatllat (elatllat) wrote :

philinux and adri58, this bug will never get fixed if you guys don't do a proper test.

You can't compare the speed of a disk drive to a usb flash stick.

1) You have to use the same disk internaly then over USB, or the same external on 2 OSes.
2) You have to provide quantitative numbers using a standard method.

My tests show that there is no problem on the current or the next LTS,
you guys have not run any "scientific" tests,
and AFAIK this bug report should be closed.

Revision history for this message
philinux (philcb) wrote :

@Dmole.

In my case the device is a little 4 gig USB stick. You cannot test that internally.

This bug has been a bain for years. The copy starts good then progressively slows down to a crawl as has been reported many times.

In windows this does not happen. Therefore kernel maybe reason

If you have a test for me to do on my USB stick I'm more than willing to participate.

I even tried the mainlone kernel once. As suggested a while back in this bug.

Revision history for this message
elatllat (elatllat) wrote :

@philinux,

You can't buy a 15mb/sec usb stick.
For all we know the slowness you are experiencing could be due to it being formatted as NTFS or just a slow flash drive.
3MB/s is the expected speed of a flash drive, anything more then that can likely be attributed to compression / caching / delayed write settings.
(That's why the SDHC classes were introduced http://www.sakoman.com/OMAP/microsd-card-perfomance-test-results.html)

You said that windows was "about" 7MB/s and Linux was "about" 3MB/s
You need to post the results of a disk benchmarking tool on the same drive from both OSes.
(I have no recommendations for what windows benchmarking tool to use.)

Revision history for this message
elatllat (elatllat) wrote :

@philinux, this might work for you though: http://www.iozone.org/src/current/IozoneSetup.zip

Revision history for this message
elatllat (elatllat) wrote :

At this time:

There are UHS-1 "SDHC compatible" cards that claim read speeds of 90MB/s but you need custom hardware to get speeds above normal SDHC.

25MB/s is the max write speed for SDHC.

Class 4 is the max for micro/mini SDHC.

There are some 15MB/s usb sticks http://usbspeed.nirsoft.net/usb_drive_speed_summary.html?o=11

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-linux-kernel-bugs) wrote :

I Confirm this. System to become unresponsible when begin swap memory to disk.

Revision history for this message
In , perlover (perlover-linux-kernel-bugs) wrote :

Good day, anybody!

I found фт optimal options against this and like this bug
Anybody please try a following options.

I found that my kernels 2.6* and 3.2.* and 3.3.* versions of my server has periodical freezings 4-15 secs. I found that this occur in writeback time (flushing to disk) in time when 30sec expires for expired dirty pages occur. I tried many variants of dirty_* options and found optimal these:

I can suggest two veriants
Here 1st and 2nd variants
The second variant commented
Only uncommment second lines and nothing

#######################################
# every 3 sec look up for dirty status
# It for smooting writebacking, may be 100 will be better
echo 300 > /proc/sys/vm/dirty_writeback_centisecs

# Only 100Kb data of dirty pages and writeback...
# It very important option :)
echo 102400 > /proc/sys/vm/dirty_background_bytes

# second variant - uncomment it - but you will have frozens but rarely
# echo 225280000 > /proc/sys/vm/dirty_background_bytes

# my a frozens happen at time of expiring of dirty pages (default 30 sec)
# i increased it (it doesn't mean for 1st variant - it will never happen)
echo 864000 > /proc/sys/vm/dirty_expire_centisecs

# I increased limit for non background writebacking (it never happens i think)
echo 10 > /proc/sys/vm/dirty_ratio

#######################################

I like 1st variant - my system now works smooth
I found that freezings occurs when dirty pages are written to disk.
You can see it by this:

watch -n1 grep -A 1 dirty /proc/vmstat

New kernel features from 3.2.* version (writeback throttling) will not help to me. Now i tested kernel 3.3.2-6 of FC16 and it have a troubles too. But these settings work for me!

I don't have any time for detailed description
But if you will test it and it will help i will ready to discus for it
Sorry for my English :)

Bye!
Perlover

Revision history for this message
Torsten Bronger (bronger) wrote :

I observe a drop to 1/10th in writing speed if I switch from NTFS to FAT32 on an external USB disk. Is this related to this issue?

Revision history for this message
Adam Porter (alphapapa) wrote :

I think a proper test methodology on both Windows and Linux should basically be:

1. Start copy operation and start stopwatch.
2. As soon as the copy is finished on the screen, umount/"Safely
Remove" the drive.
3. Wait for activity light on USB drive to go out and stop stopwatch.

I'm not sure if this bug is truly fixed on Linux, either, but because
of write caching and buffering at different levels and differences in
OSes, any benchmark must take into account unmounting/removing the
drive, otherwise it's apples and oranges. The progress bars, they
lie!

On Fri, May 18, 2012 at 4:21 PM, Torsten Bronger
<email address hidden> wrote:
> I observe a drop to 1/10th in writing speed if I switch from NTFS to
> FAT32 on an external USB disk.  Is this related to this issue?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/500069
>
> Title:
>  USB file transfer causes system freezes; ops take hours instead of
>  minutes
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linux/+bug/500069/+subscriptions

Revision history for this message
Torsten Bronger (bronger) wrote :

I copied very large files with rsync. rsync prints the transfer rate on the screen, and in case of NTFS, it was the expected 10 MByte/s (I copied though 100MBit Ethernet on the USB disk), and in case of FAT32 it was 900 kByte/s. So, it was even more that a factor of 10 because for NTFS, the transfer was limited by the Ethernet.

Thus, FAT32 is definitely responsible for ridiculously slow writing on the USB disk on my Lubuntu 12.04. This surely is a bug. My question is whether this is covered by this bug report here.

Revision history for this message
Ming Lei (tom-leiming) wrote :

On Sun, May 20, 2012 at 2:07 AM, Torsten Bronger
<email address hidden> wrote:
> I copied very large files with rsync.  rsync prints the transfer rate on
> the screen, and in case of NTFS, it was the expected 10 MByte/s (I
> copied though 100MBit Ethernet on the USB disk), and in case of FAT32 it
> was 900 kByte/s.  So, it was even more that a factor of 10 because for
> NTFS, the transfer was limited by the Ethernet.
>
> Thus, FAT32 is definitely responsible for ridiculously slow writing on
> the USB disk on my Lubuntu 12.04.  This surely is a bug.  My question is
> whether this is covered by this bug report here.

Of course, the fat32 bug is not covered by this bug report because the bug
title is 'USB file transfer causes system freezes; ops take hours
instead of minutes'.

So suggest to submit a new bug entry for the fat32 bug.

Thanks,

Revision history for this message
Torsten Bronger (bronger) wrote : Re: [Bug 500069] USB file transfer causes system freezes; ops take hours instead of minutes

Hallöchen!

Ming Lei writes:

> [...]
>
> Of course, the fat32 bug is not covered by this bug report because
> the bug title is 'USB file transfer causes system freezes; ops
> take hours instead of minutes'.

Well, bug #392089 "Slow USB transfer for FAT32" has been marked as a
duplicate of this one. However, the intriguing NTFS/FAT32
difference doesn't seem to play any role in this discussion.

Has anybody affected by this bug tried with a different filesystem
(if possible)?

Tschö,
Torsten.

--
Torsten Bronger Jabber ID: <email address hidden>
                                  or http://bronger-jmp.appspot.com

Revision history for this message
In , datacompboy (datacompboy-linux-kernel-bugs) wrote :

After upgrade from 3.0.x to 3.2.0, this bug are completely eats my brain :(
Have tried solution from #571 -- now hangs not whole system, but just some applications (browser, terminal, ooffice etc).
Disk is SSD:
  Read : 1145044992 bytes (1,1 GB), 1,56616 s, 731 MB/s
  Write: 1145044992 bytes (1,1 GB), 14,30301 s, 80 MB/s
RAM:
  MemTotal: 3969340 kB
  MemFree: 112720 kB
  Buffers: 721196 kB
  Cached: 1246456 kB
  SwapCached: 656 kB
  Active: 918656 kB
  Inactive: 1666252 kB
  Active(anon): 507868 kB
  Inactive(anon): 158192 kB
  Active(file): 410788 kB
  Inactive(file): 1508060 kB
  SwapTotal: 6290428 kB
  SwapFree: 6288604 kB

But it still freezez sometimes, on simple actions like just Alt+Tab to other app, and that app hangs for 3-6 seconds.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

The in-kernel process scheduler is generally crap. Ok, make that majorly crap. Move away from it. Use BFS (search Con Kolivas) if you want sanity. Someone recently posted a simple test case where heavy kernel space starves the user space processes to death. The person switched to BFS and all his troubles went away. Nobody replied to him on the list. I don't think even Ingo knows what's wrong with CFS. So, don't have your hopes of ever seeing this fixed.

Here is the user space starvation thread I am talking about:

https://lkml.org/lkml/2012/6/7/448

Revision history for this message
In , linux-kernel-bugs (linux-kernel-bugs-linux-kernel-bugs) wrote :

Wow!! That's a pretty bold statement to make. Given that the code is all open, why don't you instrument the kernel and pin point where exactly the crap is.

Most you guys who suffer the stall problem, you would want to give Daniel Poelzleithner's ulatency a try.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

@Ritesh: you are assuming I am capable of debugging kernel. None of the users who have reported on this thread are. The only person capable of debugging this issue is Ingo. How many comments have you seen from him? Go ahead and count them! I will tell you the answer: Zilch!

Process scheduling in stock Linux kernel is a REAL problem. Nobody wants to debug it, that is a different story. That does not mean the problem goes away. After seeing that thread I linked above, I am convinced it is some manifestation of CFS issues at fault here.

Revision history for this message
In , alan (alan-linux-kernel-bugs) wrote :

Anton can you file yours as a separate bug - it's clear the main problem has been fixed and the scenario you described seems different.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

> it's clear the main problem has been fixed

Alan: Can u describe how it is clear to you when the general public keeps suffering and reporting the issue Or worse, just gives up?

what is the code change that "fixed" the issue? Just because someone mentioned BFS in a message somewhere and someone is pointing out a potential problem with the in-kernel scheduler, doesn't give u the right to close this bug randomly. That's arrogant behavior and does a disservice to all the reporters here.

Revision history for this message
In , datacompboy (datacompboy-linux-kernel-bugs) wrote :

Installed BFQ + BFS patched 3.4 kernel ( http://pf.natalenko.name/ ) -- there no hangs for now.

Changed in linux:
status: Confirmed → Fix Released
Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

Hey Anton! Big Alan says this problem does not exist. How dare you claim otherwise...:)

I am just kidding....I am moving to BFS myself. So...

Revision history for this message
In , alan (alan-linux-kernel-bugs) wrote :

No I said that Anton's case appears to be different and asked him to open a new bug for it, given the other cases seem fixed. If BFS fixes your case that's also interesting and wants putting in the bug too.

Revision history for this message
In , akpm (akpm-linux-kernel-bugs) wrote :

Comment #571 at least indicates that the problems remain, and are unrelated to the CPU scheduler. So describing all this as "fixed" seems a tad optimistic.

That being said, this bugzilla report clearly isn't getting the job done. I suggest that people who are still seeing writeback-related problems should report them via email. Suitable recipients are

<email address hidden>
<email address hidden>
Wu Fengguang <email address hidden>
Andrew Morton <email address hidden>

And please, the thing to spend time on is to work out how to enable kernel developers to locally reproduce the problem. If we can do this, we'll fix it.

Revision history for this message
In , alan (alan-linux-kernel-bugs) wrote :

571 indicates someone has a possible problem of the same type. It's separate from all the other debug - hence I asked for it to filed as a new bug, otherwise nothing useful is going to occur.

(eg I can get 3 second freezes on alt-tab out of gnome 3 but it doesn't appear to be anything to do with the kernel)

Alan

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

Andrew: if it is not CPU scheduler, then how come JUST replacing the CPU scheduler fixes the issue? This does not make basic CS101 sense!

Revision history for this message
In , datacompboy (datacompboy-linux-kernel-bugs) wrote :

Hmm... Are you sure, that was replaced JUST cpu scheduler? In my case i have replaced both -- cpu and disk schedulers, to BFS and BFQ.
Jun 10 12:05:54 nuuzerpogodible kernel: [ 1.611737] io scheduler bfq registered (default)
Jun 10 12:05:54 nuuzerpogodible kernel: [ 1.826589] BFS CPU scheduler v0.422 by Con Kolivas.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

I like stability and typically use a very minimalistic approach. I only changed just the CPU scheduler. And I haven't noticed any hangs or stuck mouse so far.

May be you can change one variable at a time as well and tell us which one (or both) helped.

I will update back if I have any new findings. For now, I am happy that I can use my system without getting annoyed with it.

Revision history for this message
In , alan (alan-linux-kernel-bugs) wrote :

devsk: it makes basis systems 101 sense however. All the bits interact.

The fact replacing just the CPU scheduler change makes a difference is valuable info though.

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-linux-kernel-bugs) wrote :

I propose an interesting experiment.

1. Install Opera from this location: http://snapshot.opera.com/unix/rc4_12.00-1456/
2. Switch on hardware acceleration opera:config#UserPrefs|EnableHardwareAcceleration
set to 1
3. Open the test http://ie.microsoft.com/testdrive/Performance/LoveIsInTheAir/ or http://ie.microsoft.com/testdrive/Performance/ParticleAcceleration/

Try switching between tty, also use your GUI.

I consider that no program should not affect the responsiveness of the system as a whole, is not it?

Revision history for this message
In , alpha_one_x86 (alphaonex86-linux-kernel-bugs) wrote :

I my case(In reply to comment #587)
> I propose an interesting experiment.
>
> 1. Install Opera from this location:
> http://snapshot.opera.com/unix/rc4_12.00-1456/
> 2. Switch on hardware acceleration
> opera:config#UserPrefs|EnableHardwareAcceleration
> set to 1
> 3. Open the test
> http://ie.microsoft.com/testdrive/Performance/LoveIsInTheAir/
> or http://ie.microsoft.com/testdrive/Performance/ParticleAcceleration/
>
> Try switching between tty, also use your GUI.
>
> I consider that no program should not affect the responsiveness of the system
> as a whole, is not it?

In my case is very similar. I play movie with vlc on one screen, have some Konsole open with transparent, Kwin with desktop effect. Always when the graphic card is at 100% (very often with low end gc like me), all the system have general slow down (same on tty too).

Revision history for this message
In , powerman-asdf (powerman-asdf-linux-kernel-bugs) wrote :

(In reply to comment #578)
> Installed BFQ + BFS patched 3.4 kernel ( http://pf.natalenko.name/ ) -- there
> no hangs for now.

BFS + BFQ really helps… but only until you run a couple of VMware virtual machines. :(
With BFS and BFQ it result in incredible freezes, both in host OS and guest OSes, especially when some OSes does intensive I/O like installing updates etc.
Without BFS and BFQ freezes still happens, but they much less noticeable!
Probably only one of BFS and BFQ is responsible for such bad behavior, but I havn't tested them separately.

Revision history for this message
In , perlover (perlover-linux-kernel-bugs) wrote :

Continuing of post #571

Sorry, my English is not good as i want :)

Now i have Fedora Core with 3.3.2-6.fc16.x86_64 kernel. My server has 48Gb memory and hardware RAID1 array.

Now i use my server with settings (good settings for me):

echo 1000 > /proc/sys/vm/dirty_writeback_centisecs
echo 20 > /proc/sys/vm/dirty_background_ratio
echo 9000000 > /proc/sys/vm/dirty_expire_centisecs
echo 30 > /proc/sys/vm/dirty_ratio

Before these settings as i wrote in #571 post i had regulary freezings up to 10-20 seconds every 2-5 minutes. I found that reason of this is writeback phase of dirty pages. During writeback phase (we can see it by "watch -n1 grep -A 1 dirty /proc/vmstat" command as nr_writeback value - written to disk dirty pages now). For example writeback phase can be started by 'sync' command or when will be expired dirty pages in memory (common settings - 30 seconds). If in next time of writeback we have many dirty pages (even 2000-3000 amount) my server has been frozen by this stage.

Now i have a above settings and one day i do 'sync' from crontab (when load is minimum). During this phase my server increase load average from 1-2 up to 80-90 and this doing ~ 1-2 minutes. My system is frozen during 1-2 minutes! In other time ( 24 hours * 60 minutes - 3 minutes ) i have now load average 1-2, no freezings I/O. Before these settings i had load average 8-9. I know that if power of server will be turned off i will have oldest data in disk (up to 24 hours oldest)

I think that system stops I/O for as long as all dirty pages marked as written to disk to be written to disk. I think normal system should not block all I/O and should split write process of dirty pages to times.

And i noticed that i don't have this problem with my second server where same OS, same kernel version and same RAM volume. There is software RAID1 (/dev/md*). During writeback process this server works smoothly. I think there software raid has an other buffer mechanism of writting to disk. So may be somebody from you will test these problems with software raid?

And i think this article will be useful and related with this:

http://lwn.net/Articles/405076/
https://lwn.net/Articles/456904/

But as i understood this feature partly realized in kernel 3.3 but i didn't get a better things with new kernel. As i understood this is developing now.

Sorry for my English

Bye! :)

Revision history for this message
In , perlover (perlover-linux-kernel-bugs) wrote :

And now (may be 1-2 years) i don't see high volumes of iowait as in top of this topic. But problem with freezing during of large I/O operations remains. So may be iowait problem doesn't exist already but blocking any i/o to be during high-volume writings.

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-linux-kernel-bugs) wrote :

About i/o schedulers. I a lot of read, that devices which have NCQ support not needing in schedulers, is it?
$ Dmesg | grep NCQ
[2.145261] ata1.00: 175836528 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[3.109745] ata5.00: 2930277168 sectors, multi 16: LBA48 NCQ (depth 31/32)

Seems all my devices support NCQ. I manually set noop sheduler, and system was apparently much responsible. I hope this is not a placebo. If it true, so why not in the kernel will automatically switch off scheduler for devices which have NCQ support?

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #592)
> [...] devices which have NCQ support not needing in schedulers [...]
>
> Seems all my devices support NCQ. I manually set noop sheduler, and system
> was
> apparently much responsible. [...]

While it is true that the hard drive will reorder I/O requests within its native command queue to optimize armature movements, the on-device queue is really very shallow (only 32 requests maximum on your hardware). By circumventing the kernel's I/O scheduler (by selecting "noop"), you are losing the benefits of merging adjacent I/O requests and of distributing I/O throughput fairly across multiple processes.

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-linux-kernel-bugs) wrote :

In think in my case we have a system hang for two reasons. I don `t know that there does Opera, but it looks like that video card output is also limited, and when any application tries to send too much data to transfer GUI starts feeling less refreshed from this hangs. Despite the fact that htop does not show any CPU utilization or waiting for i/o. The second case is more traditional, it is to hang when accessing memory (memory of 2GB) htop shows us that even 2GB swap allocated. And then the freezing occurs because of the waiting of the hard disk. That's OK, but bad that affects all applications, even those to whom the available memory would be enough. The worst thing is that there is affects as a whole system responsiveness and GUI. I want to help fix these problems, write what I can do for this.

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-linux-kernel-bugs) wrote :

And I think why the noop scheduler can be better ... There is a stupid idea, but what if the queue scheduler gets to swap? This is theoretically possible? If so then it is understandable why noop is better.

Revision history for this message
In , alex (alex-linux-kernel-bugs) wrote :

I wonder, if someone had tried oprofile while forcing matchine to fall into #12309? It may be stalling somewhere waiting for locks or hardware action.

Alas, I myself have no hardware to reproduce #12309 on at hand.

Revision history for this message
In , Hedayat (hedayat-redhat-bugs) wrote :

I have experienced very slow copying with 3.4.4-5.fc17.x86_64 when the number of files/the volume of the files to copy become large. Unfortunately, it didn't improve with the trick I mentioned in the report.

Revision history for this message
AO (aofrl10n) wrote :

I'm afraid this is not fixed in Ubuntu Quantal AMD64 with latest kernel 3.5.0-6. I'm transferring data to a Sansa Clip+,S internal storage, over 2Gb in this case, starting at over 20 Mo/s to end up below 1Mo/s. So whatever fixe was released did not get ride of the problem at all.

Revision history for this message
elatllat (elatllat) wrote :

Alain-OIivier Breysse: read the previous posts; provide proof.

Revision history for this message
Torsten Bronger (bronger) wrote :

What kind of proof?

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-linux-kernel-bugs) wrote :

Created attachment 78231
htop screenshot

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-linux-kernel-bugs) wrote :

Please look at my htop screenshot https://bugzilla.kernel.org/attachment.cgi?id=78231

I just copy file from HDD to HDD. It's normal to high IO wait's for CPU?

I think the bug is not fixed. What other information to provide?

$ uname -a
Linux u3s3 3.5.2-1.fc17.i686.PAE #1 SMP Wed Aug 15 16:30:14 UTC 2012 i686 i686 i386 GNU/Linux

Revision history for this message
In , alan (alan-linux-kernel-bugs) wrote :

Looks fairly normal to me - I'd expect a lot of waiting for I/O during a big copy because rotating disks are incredibly slow relative to processor performance. The CPU is also generally having to work harder on a 32bit machine with > 1GB of RAM doing MMU management due to the lack of address space.

The scheduler btw is kernel side so doesn't get paged/swapped out.

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

Please correct me if I'm wrong, but I do believe that "I/O Wait" time is the amount of time that processes are blocked on disk I/O operations. What I don't understand is why I/O Wait appears to consume CPU time. Is the kernel spinning in a busy wait loop while an I/O operation is pending on a disk? If so, why? The kernel should be allowing some other task to use the CPU during the I/O wait.

Revision history for this message
In , alan (alan-linux-kernel-bugs) wrote :

I/O wait isn't consuming CPU time but the process of reading/writing disks does consume CPU time because the process is doing work in the kernel managing the I/O and the things that go with it.

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-linux-kernel-bugs) wrote :

(In reply to comment #601)
> I/O wait isn't consuming CPU time but the process of reading/writing disks
> does
> consume CPU time because the process is doing work in the kernel managing the
> I/O and the things that go with it.

Alan, if I understand you correctly why kernel don't switch to another process until current process waiting I/O?

For example why GUI (means GNOME Shell) brakes while another application do swap or much writes to disk?

Revision history for this message
In , gatekeeper.mail (gatekeeper.mail-linux-kernel-bugs) wrote :

Alan, isn't what you just described named PIO? Isn't DMA the solution
that resolved high CPU load on storage IO? Isn't high CPU load on VM
IO (IOWAIT) very similar to PIO storage operation mode? Just to
remember my already asked question: is polling technique suitable for
VM IO as it was some years ago for NET IO?

> I/O wait isn't consuming CPU time but the process of reading/writing disks
> does
> consume CPU time because the process is doing work in the kernel managing the
> I/O and the things that go with it.
>

Revision history for this message
In , alan (alan-linux-kernel-bugs) wrote :

The data transfers are done by DMA where possible, but you still have to do all the housekeeping, controller management, I/O queue handling and the like. On a 32bit box there can also be a lot of memory management work involved.

Old (pre AHCI) controllers need PIO for some parts of a transfer. That is a hardware limit.

And the kernel does switch to other processes and back and forth between them when one is waiting for I/O. The gnome shell is a very large program so on any system without vast quantities of memory the shell tends to be waiting for stuff to come from disk when there is any memory pressure. Last time I looked the compositor was single threaded with all of that so Gnome 3 stalled horribly under paging. That I'm afraid is mostly a problem in Gnome 3.

Rotating disks are in relative terms very very slow. They've not materially improved in the past ten years yet memory sizes have grown vastly, processor speeds have grown likewise. They are also very bad at trying to do two things at once so writing a large file to disk tends to really slow down reading.

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-linux-kernel-bugs) wrote :

(In reply to comment #604)
> And the kernel does switch to other processes and back and forth between them
> when one is waiting for I/O. The gnome shell is a very large program so on
> any
> system without vast quantities of memory the shell tends to be waiting for
> stuff to come from disk when there is any memory pressure. Last time I looked
> the compositor was single threaded with all of that so Gnome 3 stalled
> horribly
> under paging. That I'm afraid is mostly a problem in Gnome 3.

Ok, why also mouse movement is choppy? and why switching to a virtual terminal are slow?

How I can ensure that the locks occurs not in kernel? And how find where occurs locks? I am really want to help find and fix them. Apologies for the many questions.

Revision history for this message
In , alan (alan-linux-kernel-bugs) wrote :

because the gnome compositor is going to end up stalling waiting to get data back. Ditto switching to/from X will be pulling in lots off disk if your machine has been paging stuff out. To actually get detailed data you need to start profiling the system and generating detailed information to analyse - thats way beyond a bugzilla discussion (but the linux-mm list might be a starting point if you want to get involved in understanding what is a very complicated area - because so many things interact).

Ultimately though I suspect that unless someone does something drastic about its memory footprint the "fix" is not to run huge bloated inefficient desktops on a box with 1GB of RAM.

Revision history for this message
In , linux-kernel-bugs (linux-kernel-bugs-linux-kernel-bugs) wrote :

Alan: What would be your example of a huge bloated inefficient desktop? I guess KDE/GNOME. And the efficient one might be icewm/fvwm etc. Not common unfortunately.

The I/O wait problem is still valid. It is just that you need different patterns to hit it. A lot has improved with the latest writeback work but still, when hit, this is a terrible problem.

If you want to reproduce it, take your laptop/desktop, with 4 GiB Mem and the regular SATA disk. Pump (buffered) I/O with dd into it. Write zeroes with block size of 1 MiB. Since it is buffered, you'll start good until you consume all your 4 GiB memory. After that is when you will start seeing the problem.

At that moment (i.e. after you have consumed all of your RAM), every write() will contend for page availability. And given that you also have a slow rotating disk (you can also include remote storage - both block and files), try to execute a task following the I/O. A simple sync command is good to start with. CPU goes blocked until the pages are scanned for best fits and are buffers synced. You can run dstat and observe the CPU wait time out there.

(In our tests) Linux is good at pumping I/O. This doesn't always fit in the regular OS model where the user could also be doing other random stuff while I/O is in progress. They expect the machine to be responsive.

MS Windows, while not the best, is still better than Linux desktop in this use case.

Over the years, my workaround have been to have only 1 process doing I/O. Never let 2 or more processes do I/O at the same time. Like don't do 2 cp. Don't do 2 copy operations in your gui file browser. If you follow this policy, you have a higher chance avoiding this ugly bug.

Revision history for this message
In , alan (alan-linux-kernel-bugs) wrote :

Ritesh: if you have some test cases then discuss them on the linux-mm list.

Revision history for this message
In , linux-kernel-bugs (linux-kernel-bugs-linux-kernel-bugs) wrote :

(In reply to comment #608)
> Ritesh: if you have some test cases then discuss them on the linux-mm list.

Alan, I see in the prev comments you have the same explanation done in the right technical terms. :-)

I just would add 1 more comment. All these symptoms were tested and seen also on my lab machine, which is:

> 2 core CPU
> 8 GiB RAM (We have tested also with 48 GiB RAM)
> All tests were done with SAN Array (over sw iSCSI).

The slow rotating media can be mapped with the slow network in this case. The stalls were visible on these machines also after you do buffered I/O consuming up all of the system RAM.

I had then spent some time tweaking values in /proc/sys/vm but hadn't seen great improvements.

Will surely put in my results on -mm in the next run I do on it (could be in some weeks). Thank you.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Is it possible, that one process can consume all (dirty?) pages and stalls other processes, even if these are running from or accessing other discs.

My system is on two ssds. One for the system and one for the data. I can stall the whole system, while running a vm on a third slow external usb2.0 disc.

Revision history for this message
In , alan (alan-linux-kernel-bugs) wrote :

Thomas - the kernel tries very hard to avoid that sort of thing happening and to throttle a process generating too much I/O. Older kernels were certainly very bad at that and an rsync to a USB disk was horrible. It ought to be much better with the most recent kernels although still not great.

Revision history for this message
In , jaroslaw.fedewicz (jaroslaw.fedewicz-linux-kernel-bugs) wrote :

> Over the years, my workaround have been to have only 1 process doing I/O.
> Never
> let 2 or more processes do I/O at the same time. Like don't do 2 cp. Don't do
> 2
> copy operations in your gui file browser.

I've heard once a while ago that Linux is a multitasking OS, so I figure they lied to me?

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

By moving to BFS, it has been proven (empirically) that IT IS a CPU scheduler issue and not a slow-rotational media problem. Kernel can do other stuff when the rotational media is not giving it what it wants. And don't let buffers and caches fill so much (again a scheduling issue) that even the kernel does not have free pages to run its own components from. All that kernel is doing is spinning finding free pages all the time (kswapd hogging CPU searching through millions of pages on modern systems). Why does it not evict caches by default sooner is not clear? You need to set a bunch of proc parameters for it to start doing that. And it still eventually keels over.

There was a bug reported by someone (and I linked it above) where just pumping network traffic through Linux kernel brought it to its knees leading to cluster reboot. The kernel space (SIRQs) hogged so much CPU during the network traffic processing that user space never got any chance to run. The person moved to BFS and he could run network traffic as fast as he could without bringing anything to its knees. If this is not CPU scheduler issue, then I don't what is!

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #613)
> If this is not CPU scheduler issue, then I don't what is!

This has nothing to do with scheduling CPU time and everything to do with managing virtual memory. That the kswapd process is consuming all CPU time does not indicate that the CPU scheduler is not giving time to user processes but rather that all user processes are blocked in page faults. User processes become unresponsive during heavy I/O because all their code gets evicted from RAM, and the page faults that load their code back in from disk have to compete with all the other disk I/O.

The question is why the kernel is evicting memory-mapped pages (especially *executable* pages) rather than blocking calls to write() until more RAM becomes available.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

Then, how do you explain the above behaviour? Moving to BFS solves this problem. And solves the other problem where during heavy network traffic, the kernel space does not give any chance to user space leading to user space starvation.

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #615)
> Moving to BFS solves this problem.

Moving to BFS solves *a* problem *you're* having. According to comment #589, the VM thrashing problem still occurs when using BFS.

Perhaps BFS schedules a process that is encountering a serial string of page faults more favorably than the CFS, but that doesn't solve the underlying problem of executable pages being evicted from RAM to make excessive space available for caching large writes.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

> According to comment #589,
> the VM thrashing problem still occurs when using BFS.

And the person in comment #589 piled on BFQ into the equation to muddy the waters. Throwing in a new IO scheduler and then having IO problems, well yeah, that's not intuitively obvious.

So, someone please still prove that BFS alone hasn't fixed this issue.

Revision history for this message
Sifr Moja (simplexion) wrote :

I have now tried to duplicate this bug with ext2 on a usb drive. It occurs when transferring over about 1GB of data. It slows down pretty rapidly. Let me know what information is required.

Revision history for this message
elatllat (elatllat) wrote :

simplexion (simplexion)
Torsten Bronger (bronger)

If you'r not going to read the history and figure out how to
provide the results of a disk benchmarking tool on the exact same hardware from both an effected and unaffected, fully upgraded OS, then
please don't post.

Revision history for this message
In , Dave (dave-redhat-bugs) wrote :

# Mass update to all open bugs.

Kernel 3.6.2-1.fc16 has just been pushed to updates.
This update is a significant rebase from the previous version.

Please retest with this kernel, and let us know if your problem has been fixed.

In the event that you have upgraded to a newer release and the bug you reported
is still present, please change the version field to the newest release you have
encountered the issue with. Before doing so, please ensure you are testing the
latest kernel update in that release and attach any new and relevant information
you may have gathered.

If you are not the original bug reporter and you still experience this bug,
please file a new report, as it is possible that you may be seeing a
different problem.
(Please don't clone this bug, a fresh bug referencing this bug in the comment is sufficient).

Revision history for this message
In , Hedayat (hedayat-redhat-bugs) wrote :

I should try to find a new problematic device and test (Under Fedora 17).

Revision history for this message
In , leho (leho-linux-kernel-bugs) wrote :

(In reply to comment #568)
> (In reply to comment #567)
> > Maybe this:
> > http://lwn.net/Articles/467328/
>

Whatever this patch has done or not done, I just had my 3.4.11-pf laptop (CFQ, BFQ) load climb from regular 0,6 to 30+ when I did:

$ ls -l usb-_USB_FLASH_DRIVE_079605074ECA-0\:0.img
-rw-r--r-- 1 leho leho 2004877312 6. nov 12:41 usb-_USB_FLASH_DRIVE_079605074ECA-0:0.img

$ ddrescue usb-_USB_FLASH_DRIVE_079605074ECA-0\:0.img /dev/sdb --force
GNU ddrescue 1.16
Press Ctrl-C to interrupt
rescued: 2004 MB, errsize: 0 B, current rate: 2490 kB/s
   ipos: 2004 MB, errors: 0, average rate: 3671 kB/s
   opos: 2004 MB, time since last successful read: 0 s
Finished

Because of the massive stalling that occured, average write rate ended up at 3.5 MB/sec instead of regular 20+MB/s.

Is this bug still alive, or related or does anyone here know what to look for? I'd really like to maintain responsiveness when working with USB drives.

Revision history for this message
In , wolfram (wolfram-linux-kernel-bugs) wrote :

Same problem here. Kernel 3.7rc6. SSD. 4GB RAM+2GB swap. When system tries to use swap it became irresponsible (even mouse cursor doesn't moving smoothly).

00:00.0 Host bridge: Intel Corporation 3rd Gen Core processor DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
00:04.0 Signal processing controller: Intel Corporation Device 0153 (rev 09)
00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04)
00:16.0 Communication controller: Intel Corporation 7 Series/C210 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 (rev c4)
00:1c.1 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 2 (rev c4)
00:1d.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation HM76 Express Chipset LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 7 Series/C210 Series Chipset Family SMBus Controller (rev 04)
00:1f.6 Signal processing controller: Intel Corporation 7 Series/C210 Series Chipset Family Thermal Management Controller (rev 04)
02:00.0 Network controller: Intel Corporation Centrino Advanced-N 6235 (rev 24)

Revision history for this message
In , wolfram (wolfram-linux-kernel-bugs) wrote :

Additionally, I'm using deadline IO scheduler. So, maybe reopen?..

Revision history for this message
In , Hedayat (hedayat-redhat-bugs) wrote :

Experienced similar issue as comment #12. However I witnessed the following:
1. a normal copy by nautilus was going. The speed was around 2.9M/s and in iotop I saw this: in one update, a write operation around 3M/s happened, in the next 2 updates no read/write operation happened. This pattern was repeatedly happen in iotop.

2. I did the trick I mentioned above to force Linux to flush its caches. I saw a constant write operation in iotop from 3 to 6 M/s. nautilus stalled for a while until buffers were flushing. Unfortunately, I didn't see the final speed shown by nautilus. But considering iotop results, the speed should be better than normal case (2.9M/s)

3. I did another copy of another directory, but it was even slower than 2.9M/s. I undone my changes to kernel parameters and nautilus suddenly finished the copying (which actually copied into memory). I don't know if the actual write was faster or slower.

Unfortunately, I forgot to test with one big file rather than a number of small files. But considering what happened in number 2 above, I'd assume that Linux still needs to better manage its in-RAM buffer for slow USB devices.

Revision history for this message
Matthew (ruinairas1992) wrote :

@The People stating this is a falsely reported bug.

I really don't understand how someone can "not" notice the slowness of USB transfers....I wonder if anyone making these type of comments have used Windows in the last decade... I'm not trying to sound mean, but this is like talking to people with no common sense.

to reproduce this bug

*slap in a USB flash drive/SD card
*transfer a 600mb (or larger file) (maybe the Ubuntu image?)
*Take note of how long it took to transfer that
*Do this over and over with every type of format until you are blue in the face, it wont make a difference

repeat this on a Mac or a Windows machine...notice the drastic difference in speed....But yet it's the USB flash drives fault it's so slow *facepalm*

I don't know why this happens or how to fix this issue, but this is indeed a bug somewhere...

Revision history for this message
elatllat (elatllat) wrote :

@Matthew
I appreciate your frustration but try to understand the people who can fix this problem are unable to reproduce it, and the people who have the problem are unable to so much as report on it properly.

If you truly want this fixed please take the time to learn how to report a bug properly.
At 95 comments clearly there is A problem but without a stack trace, log, or conclusive benchmarking it's unlikely to be addressed.

Revision history for this message
lowsky (jpc1208) wrote :

@elatllat

The Developers don't have a 600MB of files and a common USB drive on hand to test transfer speed? The issue causes programs like Banshee and Rythmbox to force close because the system believes their is an issue if you try to transfer an entire library of music or video to a media device. So it isn't just messing with USB drives acting as part of the file sytem but also effecting devices using MTP.

Its easily reproduced but impossible to test and log for the normal user. FAT32 suffers the most significantly, but the results are reproducible with EXT2 and NTFS. It's hard to benchmark for many users. They don't have the time to sit and watch 1GB of data not do anything for 6 hours. They need their PC to be usable.

Even when this bug was first reported it was pretty common for people to have large files.
The issue is clearly there, and has been ignored up til recently, and is only now being looked at because of Ubuntu's move from using 700MB CDs to using USB drives as a installation medium.

Revision history for this message
Rüdiger Kupper (ruediger.kupper) wrote :

I see this problem since precise, and it clearly is not fixed in the latest quantal.
Please note that whatever caused this problem, it appeared in precise. I never had USB speed issues in oneiric.

Today, I created USB installation media, using Ubuntu's "Startup disk creator", from (a) the latest quantal release, (b) precise release, (c) oneiric release.
Results:
(a) and (b): Live image takes up to 10 minutes just to boot up, then system is so slow it is unusable.
(c): Live image boots up in approx. 2 minutes and is usable.

Can anyone confirm that this problem was not present in oneiric, but appeared in precise?

Revision history for this message
In , Hedayat (hedayat-redhat-bugs) wrote :

Well, I just discovered something about the new behavior. I found that sometimes, when I insert a flash disk, it is registered in a USB 1 bus rather than a USB 2 bus, and this is why it is slow. If I re-insert the disk (even in the same port), it might be recognized as a USB 2 device and so it'll be much faster.

Therefore, I think the original bug is already solved. I just wonder why sometimes the disks are registered as USB 1?!

Thanks

Revision history for this message
In , csredrat (csredrat-linux-kernel-bugs) wrote :

Progress has been made toward the goal of eliminating the timer tick while running in user space. The patches merged for 3.9 fix up the CPU time accounting code, printk() subsystem, and irq_work code to function without timer interrupts; further work can be expected in future development cycles.

A relatively simple scheduler patch fixes the "bouncing cow problem," wherein, on a system with more processors than running processes, those processes can wander across the processors, yielding poor cache behavior. For a "worst-case" tbench benchmark run, the result is a 15x improvement in performance.

The format of tracing events has been changed to remove some unused padding. This change created problems when it was first attempted in 2011, but it seems that the relevant user-space programs have since been fixed (by moving them to the libtraceevent library). It is worth trying again; smaller events require less bandwidth as they are communicated to user space. Anybody who observes any remaining problems would do well to report them during the 3.9 development cycle.

https://lwn.net/Articles/539179/

Revision history for this message
Andrey Dj (djdron) wrote :

Seen this problem for years.
Ubuntu/Linux FAIL.

Revision history for this message
Antony Jones (wrh) wrote :

This is the single most infuriating bug that I have ever seen in Ubuntu and I've been struggling with it for about 3 years now (before that there was no problem, so anybody saying large files weren't invented back then, is talking crap).

How are we supposed to send log files and output for something which doesn't create either? This bug is simple to reproduce. Copy a 3gb file to a USB stick which uses FAT32 - any HD movie is a good candidate.

If there is a developer who is able or willing to fix this I will happily send them a USB stick and a file for them to reproduce the problem to their hearts content, if only this stupid problem would be fixed.

Revision history for this message
Ming Lei (tom-leiming) wrote : Re: [Bug 500069] Re: USB file transfer causes system freezes; ops take hours instead of minutes

On Fri, Mar 29, 2013 at 5:06 AM, Andrey Dj <email address hidden> wrote:
>
> to test, i issued dd command:
> dd if=/dev/zero of=/media/usb-disk/test-file bs=32

The above dd test is very stupid, since setting bs as 32 will make usb
storage transfer very very slowly, and the typical value(also max value)
in kernel is 120K.

Thanks,
--
Ming Lei

Revision history for this message
In , zvova7890 (zvova7890-linux-kernel-bugs) wrote :

On 3.9-rc5 have a pretty good result with copying from usb-flash to hdd. High speed(30+ mb/s) and iowait 2-20%, wery well. hdd -> flash have 50-60% iowait, but no have any performance problems. Copying speed 12-16mb/s(that is maximum for my usb-flash).

dd if=/dev/zero of=~/ololo bs=1M count=1024 provides high iowait and have some performance problems(in 3d apps have small freezes, fps ceases unstable, and DE slowing down; ie, tasks are starving?).

One a very important problem - swap. Even with a small content of swap have a problems with smoothly work. When kernel starting an a very high usage of swap freezes are delayed on minutes, all of have - freezes. It feels like mm-manager work with swap in blocking mode %) Multitasking is locking when page merging ram <-> swap? Have any ideas why that happens?

Oy, forgot, my sata controller is MCP67(most buggy chip?).

Revision history for this message
In , anonymous (anonymous-linux-kernel-bugs) wrote :

Reply-To: <email address hidden>

I'm currently Out Of Office. I'll be responding to emails, but expect some delay in replies. For any urgent issues, please contact my manager, Kugesh Veeraraghavan <email address hidden>

Revision history for this message
garzie2000 (garzie2000) wrote :

I can't believe this bug still exists in Ubuntu 12.10. It's annoying as hell. Please developers work your magic and fix this once and for all!

Revision history for this message
alexmex90 (alexmex90) wrote :

@garzie2000

Do you really expect people to do magic and fix a bug without proper logging?
This seems to work different for all people, in my case USB transfers are fine, but seems like the people affected cannot submit a bug properly for the people who can fix it, and the people who can fix it, cannot reproduce the bug because their USB transfers are fine. Everybody claims that only copying data from the hard drive to the USB is sufficient to reproduce the bug, but that's not the case, a benchmarking from an affected system is necessary to do a fair comparison and try to find the cause of the issue. Maybe this link can be useful for that goal: http://askubuntu.com/questions/11277/usb-drive-speed-testing-app-with-test-options

Revision history for this message
gonssal (gonssal) wrote :

This 'bug' is fixed running "modprobe ehci_hcd" (as root) in my 13.04. Been having the issue since 08.04 maybe, only now I _needed_ to fix it.

So maybe load that module by default and mark this as fixed?

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 17 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora, you are encouraged change the
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-linux-kernel-bugs) wrote :

All who believe that this problem has been fixed, please open this link in Google Chrome: http://ec2-54-229-117-209.eu-west-1.compute.amazonaws.com/party.html

Revision history for this message
In , datacompboy (datacompboy-linux-kernel-bugs) wrote :

Mikhail, that doesn't related to this one bug, there no large IO in that page, only canvas playing, that eatout RAM:
  function partyHard( drunkenness ) {

    var mapCanvas = [];
    var mapCanvasCtx = [];
    for (var i = 0; i < drunkenness * 1200; i++) {
    mapCanvas[i] = document.createElement('canvas');
        mapCanvas[i].width = 2500;
        mapCanvas[i].height = 2500;
        mapCanvasCtx[i] = mapCanvas[i].getContext('2d');
        mapCanvasCtx[i].fillStyle = 'rgb(0, 0, 0)';
        mapCanvasCtx[i].fillRect( 0, 0, 1700, 1700 );
    }
    console.log(window);
  }

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-linux-kernel-bugs) wrote :

In this example, the large IO will be the result of the swap file. Try to increase the size of the swap to 64Gb and repeat the experiment. On my system with 16Gb of RAM with no swap system is no freezes. If you increase the size of the swap to 64Gb 100% then the system dies. :(

Revision history for this message
In , zvova7890 (zvova7890-linux-kernel-bugs) wrote :

Swap in linux is something fantastik. Fills like schedule is locked, when ram-page is writing in swap. We expected for lags in program, but lags is global! That`s awesome! :)

Revision history for this message
Tsu Jan (tsujan2000) wrote :

Has anyone tried to disable THP (transparent huge-pages) with the following command (as root) before using USB stick?

echo "never" > "/sys/kernel/mm/transparent_hugepage/enabled"

This can be easily undone with:

echo "always" > "/sys/kernel/mm/transparent_hugepage/enabled"

Or just with a reboot.

Revision history for this message
Daniel Barrett (dbarrett-m) wrote :

gonssal (#104): On 13.04 (live CD), I ran "sudo modprobe ehci_hcd" and copied files from an internal SSD to an external USB3 drive. I get super-slow 1 MB/second transfer rates. I boot the same computer into Windows 7 (it's dual boot) and I get 150MB/sec.

I get the same problem if I use eSATA or Firewire (the external drive has all three connections): slow on 13.04, fast on Windows 7.

If I boot on a Knoppix 7.2 CD, however... FAST FAST FAST transfer speed. Knoppix is 3.9 kernel, while 13.04 is 3.8.

I also tried the Ubuntu nightly build of 13.10 today (kernel 3.11), and it had the same slowness problem as 13.04.

I also tried #106 (echo "never" > "/sys/kernel/mm/transparent_hugepage/enabled") and it made no difference.

My vendor blames the ASmedia Chipset on my motherboard, which he says Linux has "rudimentary at best" support for.

Revision history for this message
Tsu Jan (tsujan2000) wrote :

@Daniel Barrett

I have Debian with Liquorix kernel 3.10.X and this always works for me before inserting usb stick:

sudo bash -c 'echo "never" > /sys/kernel/mm/transparent_hugepage/enabled'

I just came to this page and was surprised that there was no trace of "THP" or "hugepages", so I added a comment.

See https://www.kernel.org/doc/Documentation/vm/transhuge.txt for explanation.

Revision history for this message
Daniel Barrett (dbarrett-m) wrote :

Referencing my previous post (#107): never mind. Something else is wrong with my machine. ALL disk writes are 1MB/second, even on my internal SSD RAID. So it's not just external drives.

Revision history for this message
penalvch (penalvch) wrote :

sbec67, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.11

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
In , 3draven (3draven-linux-kernel-bugs) wrote :

In my i7+8Gb RAM+sata 750Gb Hdd. If hdd swap working -> system freez and lags -> mouse!!! lags!

kernel 3.10 (and many other versions)

for fix it i use zram swap+hdd swap. Lags are reduced, but did not pass.

Revision history for this message
In , leho (leho-linux-kernel-bugs) wrote :

(In reply to 3draven from comment #628)
> In my i7+8Gb RAM+sata 750Gb Hdd. If hdd swap working -> system freez and
> lags -> mouse!!! lags!
>
> kernel 3.10 (and many other versions)
>
> for fix it i use zram swap+hdd swap. Lags are reduced, but did not pass.

Yep, experiencing the same, currently on 3.10.15. Getting memory usage to swapping on Linux is craziness for the user. Means 8G of RAM is minimum for any above-average workload.

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-linux-kernel-bugs) wrote :

Nobody cares problems with swap I/O :(

Revision history for this message
Henry Mata (matahr) wrote :

still persists in trusty tahr
Linux 3.12.0-4-generic x86_64 GNU/Linux

Revision history for this message
penalvch (penalvch) wrote :

Henry Mata, so your hardware may be tracked, could you please file a new report via a terminal:
ubuntu-bug linux

Revision history for this message
In , loki (loki-linux-kernel-bugs) wrote :

Heya,

after some years I have resolved MY problem with the respopnsivness of the computer in regards to HIGH I/O. For all others I can only say, try it. If it works for you be happy, if not...

For starters I wouldn't call this a bug. It's a DEFECT. Because if I have 20 servers with different linux flavours and distributions, many of them were compiled from scratch, and if I have 200 ubuntu desktops that behave all the same if I use the command dd if=/dev/zero of=test.img bs=1M count=xxx (above 1GB file size), by same meaning this command grinding the system to a halt, and then keeping this problem around for so many years and so many kernels, for me it is a DEFECT.

For the past week I've been trying the BFQ patch for kernel 3.9 on several machines. On one machine I have been heavy testing. I have this machine for some years now, a CORE i7 with 12 GB and 6 HDs in RAID 10. On this I also had the problem, and it was somewhat better with the BFS patch but it was still happening.
With the BFQ patch it's working perfectly. At one moment I had two dd's (dd if=/dev/zero of=test.img bs=1m count=100k, creating a VirtualBox vdi of 60GB, openining 10 ods documents, watching youtube, watching a hd movie in vlc and some other stuff and the desktop / system was as responsive as if nothing was using it. Just like I remember linux being some years ago. And now I have a sustained throughput of 470MB/s HD without my computer going to /dev/null.

So BFQ solved this problem for me. Maybe it's not stable yet, but for me it's more stable than using CFS !!!

Just my two cents. And this bug is closed for me, but only NOW !!!
For all others out there I wish you luck

Revision history for this message
In , leho (leho-linux-kernel-bugs) wrote :

Just general FYI, BFQ just freshly did a new release where they claim another batch of significant improvements for whatever they're doing.

Revision history for this message
In , funtoos (funtoos-linux-kernel-bugs) wrote :

> So BFQ solved this problem for me. Maybe it's not stable yet,
> but for me it's more stable than using CFS !!!

BFQ and CFS are not congruent. May be you meant BFS?

Revision history for this message
In , loki (loki-linux-kernel-bugs) wrote :

(In reply to devsk from comment #633)
> > So BFQ solved this problem for me. Maybe it's not stable yet,
> > but for me it's more stable than using CFS !!!
>
> BFQ and CFS are not congruent. May be you meant BFS?

Sorry, my mistake. Meant CFQ. But on the other hand BFS, too. BFS did give me some improvements, I could listen to music while I created a big file, but that was all. So CFQ without BFS was a no-go, CFQ with BFS helped a little, but BFQ alone solved my problems which I had for the past 4-5 years, in which I had to bend and improvise to create a VDI of 60 GB and hoping that my computer stays alive until it finishes the job, and mind you a computer that has resources in abundance. :)

Revision history for this message
In , vitaly.v.ch (vitaly.v.ch-linux-kernel-bugs) wrote :

I have successfully reproduce this bug on my HP Z200 under ubuntu 12.04 LTS. After some investigation I found out than main reason of this bug is very ugly bottleneck in block device layer so cores of my Z200 spend almost all time in spinning on spinlock while we have disabled IRQ on ALL cores.

Revision history for this message
AO (aofrl10n) wrote : apport information

ApportVersion: 2.14.1-0ubuntu3
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: ao 2970 F.... pulseaudio
 /dev/snd/controlC0: ao 2970 F.... pulseaudio
CurrentDesktop: Unity
DistroRelease: Ubuntu 14.04
InstallationDate: Installed on 2014-04-06 (10 days ago)
InstallationMedia: Ubuntu 14.04 LTS "Trusty Tahr" - Daily amd64 (20140404)
MachineType: System manufacturer System Product Name
Package: linux (not installed)
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=b49aac97-cebf-48a0-b5b2-db2b0c148a81 ro quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 3.13.0-24.46-generic 3.13.9
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-24-generic N/A
 linux-backports-modules-3.13.0-24-generic N/A
 linux-firmware 1.127
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
Tags: trusty
Uname: Linux 3.13.0-24-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 08/09/2012
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1501
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: M4A88TD-M/USB3
dmi.board.vendor: ASUSTeK Computer INC.
dmi.board.version: Rev X.0x
dmi.chassis.asset.tag: Asset-1234567890
dmi.chassis.type: 3
dmi.chassis.vendor: Chassis Manufacture
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr1501:bd08/09/2012:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKComputerINC.:rnM4A88TD-M/USB3:rvrRevX.0x:cvnChassisManufacture:ct3:cvrChassisVersion:
dmi.product.name: System Product Name
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer

tags: added: trusty
Revision history for this message
AO (aofrl10n) wrote : AlsaInfo.txt

apport information

Revision history for this message
AO (aofrl10n) wrote : BootDmesg.txt

apport information

Revision history for this message
AO (aofrl10n) wrote : CRDA.txt

apport information

Revision history for this message
AO (aofrl10n) wrote : CurrentDmesg.txt

apport information

Revision history for this message
AO (aofrl10n) wrote : IwConfig.txt

apport information

Revision history for this message
AO (aofrl10n) wrote : Lspci.txt

apport information

Revision history for this message
AO (aofrl10n) wrote : Lsusb.txt

apport information

Revision history for this message
AO (aofrl10n) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
AO (aofrl10n) wrote : ProcEnviron.txt

apport information

Revision history for this message
AO (aofrl10n) wrote : ProcInterrupts.txt

apport information

Revision history for this message
AO (aofrl10n) wrote : ProcModules.txt

apport information

Revision history for this message
AO (aofrl10n) wrote : PulseList.txt

apport information

Revision history for this message
AO (aofrl10n) wrote : UdevDb.txt

apport information

Revision history for this message
AO (aofrl10n) wrote : UdevLog.txt

apport information

Revision history for this message
AO (aofrl10n) wrote : WifiSyslog.txt

apport information

Revision history for this message
penalvch (penalvch) wrote :

Ubuntu-QC-1, thank you for your comment. So your hardware and problem may be tracked, could you please file a new report with Ubuntu by executing the following in a terminal while booted into a Ubuntu repository kernel (not a mainline one) via:
ubuntu-bug linux

For more on this, please read the official Ubuntu documentation:
Ubuntu Bug Control and Ubuntu Bug Squad: https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue
Ubuntu Kernel Team: https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports
Ubuntu Community: https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

tags: added: bot-stop-nagging
removed: apport-collected bot-stop-nagging. slow trusty usb
Revision history for this message
Sifr Moja (simplexion) wrote :

I have a brand new motherboard and the issue still remains. I guess I will have to reinstall Ubuntu. I haven't had to reinstall it in about 5 years. So annoying.

Revision history for this message
penalvch (penalvch) wrote :

SImplexion, thank you for your comment. So your hardware and problem may be tracked, could you please file a new report with Ubuntu by executing the following in a terminal while booted into a Ubuntu repository kernel (not a mainline one) via:
ubuntu-bug linux

For more on this, please read the official Ubuntu documentation:
Ubuntu Bug Control and Ubuntu Bug Squad: https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue
Ubuntu Kernel Team: https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports
Ubuntu Community: https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

Revision history for this message
In , horst-bugme-osdl (horst-bugme-osdl-linux-kernel-bugs) wrote :

I'm still seeing this.

Setup: Debian 7 Wheezy, amd64 backports kernel (3.11-0.bpo.2-amd64), ~45MB/s write of a low number of large files by rsync (fed through a GBit ethernet link) on an ext3 FS (rw,noatime,data=ordered) in a LVM2 partition on a hardware RAID5.

Observation: The machine (32-core Xeon E5-4650, 192 GB RAM), primarily servicing multiple interactive users via SSH, x2go and SunRay sessions, gets completely unusable during and quite some time after the rsync transfer. TCP connections to SunRay clients time out, IRC connections are dropped, even simple tools like "htop" don't do anything but clear the screen after being started. "iotop" shows a [jbd2/dm-1-8] process on top, reportedly doing "99.99%" I/O (but not reading or writing a single byte, maybe because it's a kernel thread?).

Once I switch from the default CFQ I/O scheduler to "deadline" (echo deadline > /sys/block/sdb/queue/scheduler), the symptoms disappear completely.

Revision history for this message
Ken Sharp (kennybobs) wrote :

Is this not fixed now? It was fixed upstream a while ago.

Revision history for this message
Damir Butmir (d4m1r2) wrote :

Still not fixed under Ubuntu 12.04 LTS x64 at least, even with the latest kernel....

Linux damir-macbook 3.13.0-45-generic #74~precise1-Ubuntu SMP Thu Jan 15 20:21:55 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Writing to a 16GB USB 2.0 stick (NTFS) goes at ~17MB/s while under Windows (same amount, machine, stick, everything) goes at 30+MB/s....

Revision history for this message
penalvch (penalvch) wrote :

Damir Butmir, it would help immensely if you filed a new report via a terminal:
ubuntu-bug linux

Please feel free to subscribe me to it.

Revision history for this message
Damir Butmir (d4m1r2) wrote :
Revision history for this message
In , yanp.bugz (yanp.bugz-linux-kernel-bugs) wrote :

Still face this bug. Kernel 3.16

Is it possible to preserve 5% of IO for user/othe processes needs? Any fast download or copying eats 99.99 of IO and system is hard to use.

Revision history for this message
shantanu saha (shantanucse18-gmail) wrote :

This bug still exits in latest version. It's not only file of USB to HDD or vice versa. This bug occurs for any kind of large file copy.

System:
Ubuntu 15.10 64bit
Corei7
8GB RAM

Revision history for this message
penalvch (penalvch) wrote :

shantanu saha, it will help immensely if you filed a new report via a terminal:
ubuntu-bug linux

Please feel free to subscribe me to it.

Revision history for this message
In , aros (aros-linux-kernel-bugs) wrote :

I'm curious: this bug was ostensibly fixed years ago however I dare everyone, who owns an Android smartphone, run a simple test. Invoke any terminal emulator and execute this command:

$ cat < /dev/zero > /sdcard/EMPTY

What's terribly unpleasant is that _all_ CPU cores become busy (more than 75% load), and the CPU jumps into the highest performance state, i.e. frequency, i.e. power consumption. Obviously this is wrong, bad and shouldn't happen. This test is kinda artificial as no Android app can create such a high IO load, but then there are multiple phones out there with either 5GHz MIMO 802.11n or 802.11ac chips which allow up to 80MB/sec throughput which can easily saturate most if not all internal MMC cards and have the same effect as the above command.

Perhaps vanilla kernel bugzilla is not a place to discuss bugs in Android, but latest Android releases usually feature kernels 3.10.x and 4.1.x without that many patches, so this bug is still there. Both these kernels are currently maintained and supported. Android by default never uses SWAP (one of the reasons for this bug).

Go figure.

P.S. Sample apps from Google Play:

* CPU Stats by takke
* Terminal Emulator by Jack Palevich

Revision history for this message
In , eugene.seppel (eugene.seppel-linux-kernel-bugs) wrote :

I've just experienced this issue with 3.19.0-32-generic on Ubuntu.
My KTorrent downloaded files to NTFS filesystem on SATA3 drive (fuse, download speed was about 100Mbit/s), simultaneously I copied files from that filesystem to USB3.0 flash drive with NTFS filesystem. That resulted poor interactive performance, mouse and windows lags. The workaround was to suspend torrent downloa until files copied.

Hardware: One AMD FX(tm)-8320 Eight-Core Processor, 8 GB RAM.

Revision history for this message
In , gooberslot (gooberslot-linux-kernel-bugs) wrote :

This bug is definitely not fixed. A simple cp from one drive to another makes a huge impact on my desktop. Trying to do an rsync is even worse. It seems to mainly be a problem with large files. My system is old (Athlon II 250) but even an old P3 running Win98 doesn't lag this bad from just copying files.

Revision history for this message
In , bes1002t (bes1002t-linux-kernel-bugs) wrote :

I'm trying to copy 50gb from one tower to another via USB 3.0 and it is really no fun. If I would copy all files at once the speed is decreasing constantly. After 30 minutes it copies with 1.0MB/s. If I copy a bunch of directories it is a littlebit better but also decreases in speed. For 2GB my Linux system needs more than an hour. This bug is definitely not fixed. On Windows this USB Stick is working without that speed loss.

OS: Fedora 24
Kernel: 4.8.15

Revision history for this message
In , bes1002t (bes1002t-linux-kernel-bugs) wrote :

I've noticed that this happens not everytime when I use exact the same USB stick. For my 2GB files (Eclipse with workspace and a project) I needed one hour to copy. It startet with 60MB/s and decreased to 500KB/s. Now I copy 16gb (android studio and some other projects) and it only needs about 15minutes. The copy speed startet with 70MB/s and at the end it was 22MB/s fast. So it also decreases but not as fast as in my 2GB copy process.

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-linux-kernel-bugs) wrote :

It seems Kernel developers not look this topic here, much better to write to the mailing list.

Revision history for this message
In , oleksandr (oleksandr-linux-kernel-bugs) wrote :

Does Jens' buffered writeback throttling patchset solve your issue?

Revision history for this message
In , aros (aros-linux-kernel-bugs) wrote :

(In reply to bes1002t from comment #642)
> I'm trying to copy 50gb from one tower to another via USB 3.0 and it is
> really no fun. If I would copy all files at once the speed is decreasing
> constantly. After 30 minutes it copies with 1.0MB/s. If I copy a bunch of
> directories it is a littlebit better but also decreases in speed. For 2GB my
> Linux system needs more than an hour. This bug is definitely not fixed. On
> Windows this USB Stick is working without that speed loss.
>
> OS: Fedora 24
> Kernel: 4.8.15

This bug report has nothing to do with the speed of copying data to USB flash drive. It's about substantially degraded interactivity which manifests in slowness and it's hard to believe you can perceive it via an SSH session.

I'm inclined to believe your bug is related to other subsystems like USB.

> It seems Kernel developers not look this topic here, much better to write to
> the mailing list.

Kernel bugzilla has always been neglected. Thousands of bug reports which have zero comments from prospective developers. LKML is a hit and miss too. Your developer skipped your e-mail because he/she was busy? Bad luck.

Revision history for this message
In , mpartap (mpartap-linux-kernel-bugs) wrote :

@bes1002t: I think throughput is a different issue than this, although it might well be related.

But most important would be for someone to create a I/O concurrency / latency benchmark. Maybe the Phoronix Test Suite is an adequate tool for that? It can also be used for automatic bisecting..

I clearly remember pre-2.6.18 times where I had a much inferior machine and while gent0o's emerge was compiling stuff in the background with multiple threads, I could browse the web switch between programs and play a HD stream without any hickup or stalling.

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

@bes1002t: Copying to a USB device always starts with the speed of the harddrive as all is cached till the write cache is full and ends with the speed of the usb drive. The write process has to wait till all data is written.

@Artem S. Tashkinov: The stall problems on a ssh session exists or existed. I have migrated an old server with CentOS 6 and copied some vm images. The ssh responsiveness was very bad. I had to wait for up to 20 seconds for tab auto to complete.

I many cases it was a swap problem, as the buffers are full and the caches need a long time to be written to a slow usb device. The server starts to swap process data. It's only a very small amount of data. I could increase the overall desktop performance with an RAM upgrade.

Revision history for this message
In , iam (iam-linux-kernel-bugs) wrote :

Try Kernel 4.10.

>Improved writeback management
>
>Since the dawn of time, the way Linux synchronizes to disk the data written to
>memory by processes (aka. background writeback) has sucked. When Linux writes
>all that data in the background, it should have little impact on foreground
>activity. That's the definition of background activity...But for a long as it
>can be remembered, heavy buffered writers have not behaved like that. For
>instance, if you do something like $ dd if=/dev/zero of=foo bs=1M count=10k,
>or try to copy files to USB storage, and then try and start a browser or any
>other large app, it basically won't start before the buffered writeback is
>done, and your desktop, or command shell, feels unreponsive. These problems
>happen because heavy writes -the kind of write activity caused by the
>background writeback- fill up the block layer, and other IO requests have to
>wait a lot to be attended (for more details, see the LWN article).
>
>This release adds a mechanism that throttles back buffered writeback, which
>makes more difficult for heavy writers to monopolize the IO requests queue,
>and thus provides a smoother experience in Linux desktops and shells than what
>people was used to. The algorithm for when to throttle can monitor the
>latencies of requests, and shrinks or grows the request queue depth
>accordingly, which means that it's auto-tunable, and generally, a user would
>not have to touch the settings. This feature needs to be enabled explicitly in
>the configuration (and, as it should be expected, there can be regressions)

Revision history for this message
In , marius (marius-linux-kernel-bugs) wrote :

Hi..

Thank you for your email.

I am sorry, but this email will soon be disabled..

Please send everything work related to <email address hidden>

Please send private mails to <email address hidden>

bye m.

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-linux-kernel-bugs) wrote :

> Try Kernel 4.10.
It not helps in my work load :(
still freezing mouse pointer and keyboard input

Revision history for this message
In , iam (iam-linux-kernel-bugs) wrote :

Make sure your kernel has that option enabled.

>This feature needs to be enabled explicitly in
>the configuration (and, as it should be expected, there can be regressions)

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-linux-kernel-bugs) wrote :

I read this https://kernelnewbies.org/Linux_4.10 and this https://kernelnewbies.org/Linux_4.10 articles, but I not seen name of this option.

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-linux-kernel-bugs) wrote :

Created attachment 255491
$ cat /boot/config-`uname -r`

Revision history for this message
In , aros (aros-linux-kernel-bugs) wrote :

(In reply to Mikhail from comment #651)

First, I'd recommend trying to disable SWAP completely - it might help:

$ sudo swapoff -a

If you compile your own kernel or your distro hasn't enabled them for you, here's the list of the options you need to enable:

BLK_WBT, enable support for block device writeback throttling
BLK_WBT_MQ, multiqueue writeback throttling
BLK_WBT_SQ, single queue writeback throttling

They are all under "Enable the block layer".

If disabling swap and enabling these options have no effect, please ***create a new bug report*** and provide the following information:

CPU
Motherboard and BIOS version
RAM type and volume
Storage and its type
Kernel version and its .config

And also the complete output of these utilities:

dmesg
lspci -vvv
lshw
free
vmstat (when the bug is exposed)

cat /proc/interrupts
cat /proc/iomem
cat /proc/meminfo
cat /proc/mttr

Revision history for this message
In , iam (iam-linux-kernel-bugs) wrote :

>CONFIG_BLK_WBT=y
># CONFIG_BLK_WBT_SQ is not set
>CONFIG_BLK_WBT_MQ=y

So writeback throttling is enabled only for multi queue devices in your case. I suppose you need to use blk-mq for your sd* devices to activate writeback throttling (scsi_mod.use_blk_mq=1 boot flag) or to recompile kernel with CONFIG_BLK_WBT_SQ enabled.

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-linux-kernel-bugs) wrote :

Created attachment 255501
all required files in one archive

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-linux-kernel-bugs) wrote :

After setting boot flag "scsi_mod.use_blk_mq=1", the freezes became much shorter. I'm not sure now that they are at the kernel level. More look like that window manager (GNOME mutter) is written in such a way that freezes mouse while loading list of applications. To finally defeat freezes, seems need to make the window manager not paged into the swap file.

I'm also catch vmstat output when freeze occurred:
# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r b swpd free buff cache si so bi bo in cs us sy id wa st
 2 6 15947052 205136 112592 4087608 32 41 93 119 7 23 43 19 37 1 0

Revision history for this message
In , aros (aros-linux-kernel-bugs) wrote :

Twice I asked you you to try disabling SWAP altogether and you still haven't.

I'm unsubscribing from this bug report.

Dimitrenko (paviliong6)
Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Changed in linux (Fedora):
importance: Unknown → High
status: Unknown → Won't Fix
Revision history for this message
In , zvova7890 (zvova7890-linux-kernel-bugs) wrote :

Created attachment 274511
Per deice dirty ration configuration support

Per device dirty bytes configuration

Revision history for this message
In , zvova7890 (zvova7890-linux-kernel-bugs) wrote :

Per device dirty bytes configuration. Patch is not ideal, i'm make it for smoothly flash drive wriring by passing smaller value of dirty byte per removeable device.

>> Path
# ls /sys/block/sdc/bdi/
dirty_background_bytes dirty_background_ratio dirty_bytes dirty_ratio max_ratio min_pages_to_flush min_ratio power read_ahead_kb stable_pages_required subsystem uevent

>> udev Rule for removeables device
# cat /etc/udev/rules.d/90-dirty-flash.rules
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{removable}=="1", ATTR{bdi/dirty_bytes}="4194304"

Revision history for this message
In , chriswy27 (chriswy27-linux-kernel-bugs) wrote :

Was this bug actually fixed? The status shows CLOSED CODE_FIX with a last modified date of Dec 5 2018. I don't see any updates as to what was corrected, and what version the fix will be put into?

Revision history for this message
Chris (chriswy27) wrote :

Was this bug actually fixed? The status shows Fix Released for Ubuntu with a last modified date of July 4 2017 by Dimitrenko (paviliong6). I don't see any updates as to what was corrected, and what version the fix will be put into?

Revision history for this message
In , gatekeeper.mail (gatekeeper.mail-linux-kernel-bugs) wrote :

Created attachment 282477
attachment-6179-0.html

This was never fixed and since bug state cheating with no commit info ever
provided even if asked directly, will never be fixed. Nobody just cares and
I guess nobody even figured out who broke the kernel by which changeset and
when. Just buy another couple of Xeons for your zupa-dupa web-serfing
desktop and pray it's enough for loads of waits when you format your
diskette. Another approach is to buy enough ram to hold whole your block
devices set there so write-outs are quick enough and you won't see
microsecond lags. This is complete workaround list they provided since the
bug opened.

вт, 23 апр. 2019 г., 18:21 <email address hidden>:

> https://bugzilla.kernel.org/show_bug.cgi?id=12309
>
> protivakid (<email address hidden>) changed:
>
> What |Removed |Added
>
> ----------------------------------------------------------------------------
> CC| |<email address hidden>
>
> --- Comment #663 from protivakid (<email address hidden>) ---
> Was this bug actually fixed? The status shows CLOSED CODE_FIX with a last
> modified date of Dec 5 2018. I don't see any updates as to what was
> corrected,
> and what version the fix will be put into?
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

Revision history for this message
In , vi0oss (vi0oss-linux-kernel-bugs) wrote :

As far as I understand, this is kind of meta-bug: there are multiple causes and multiple fixes.

"I do bulk IO and it gets slow" sounds rather general, and problem that can resurface anytime due to some new underlying issue. So the problem cannot be really "closed for good" no matter how much technical progress is made.

For me 12309 basically stopped happening unless I deliberately tune "/proc/sys/vm/dirty_*" values to non-typical ranges and forgot to revert them back. I see system controllably slowing down processes doing bulk IO so the system in general stays reasonable. This behaviour is one of outcomes of this bug.

I don't expect meaningful technical discussion to be happen in this thread. It should just serve as a hub for linking to specific new issues.

Revision history for this message
In , powerman-asdf (powerman-asdf-linux-kernel-bugs) wrote :

Sure it's a meta bug, but for me 12309 is still actual, and I don't use any tuning for I/O subsystem at all.

Not as bad as years ago when it happens for the first time, but I still have to throttle rtorrent to download at 2.5MB/sec maximum instead of usual 10MB/s if I like to view films in mplayer at same time without jitter/freeze/lag. And that's on powerful and modern enough system with kernel 4.19.27, CPU i7-2600K @ 4.5GHz, RAM 24GB, and HDD 3TB Western Digital Caviar Green WD30EZRX-00D. This is annoying, and I remember time before 12309 when rtorrent without any throttling won't make mplayer to freeze on less powerful hardware.

Revision history for this message
In , gatekeeper.mail (gatekeeper.mail-linux-kernel-bugs) wrote :

Created attachment 282483
attachment-22369-0.html

Well, I've tried to report a new bug to investigate my own "my CPU does
nothing because waiting is too hard for it". Of no interest of any kernel
dev. So, just as Linus once said "f**k you Nvidia", the very same goes back
to linux itself. Pity some devs think that make their software linux-bound
(via udev only binding or alsa only sound out) is a good idea (gnome and
even parts of KDE). They forgot 15 years ago they picketed Adobe for having
flash for win only. Now one has to use 12309-bound crap for not having a
way to run his software on another platform.

вт, 23 апр. 2019 г., 21:29 <email address hidden>:

> https://bugzilla.kernel.org/show_bug.cgi?id=12309
>
> --- Comment #665 from _Vi (<email address hidden>) ---
> As far as I understand, this is kind of meta-bug: there are multiple
> causes and
> multiple fixes.
>
> "I do bulk IO and it gets slow" sounds rather general, and problem that can
> resurface anytime due to some new underlying issue. So the problem cannot
> be
> really "closed for good" no matter how much technical progress is made.
>
> For me 12309 basically stopped happening unless I deliberately tune
> "/proc/sys/vm/dirty_*" values to non-typical ranges and forgot to revert
> them
> back. I see system controllably slowing down processes doing bulk IO so the
> system in general stays reasonable. This behaviour is one of outcomes of
> this
> bug.
>
> I don't expect meaningful technical discussion to be happen in this
> thread. It
> should just serve as a hub for linking to specific new issues.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

Revision history for this message
In , mpartap (mpartap-linux-kernel-bugs) wrote :

Can 'someone' please open a bounty on creation of a VM test case, f.e. with `vagrant` or `phoronix test suite`?
Basically, a way to reproduce and quantify the perceived/actual performance difference between
> Linux 2.6.17 Released 17 June, 2006
and
> Linux 5.0 Released Sun, 3 Mar 2019

(In reply to Alex Efros from comment #666)
> Not as bad as years ago […]
> And that's on powerful and modern enough system with
> kernel 4.19.27, CPU i7-2600K @ 4.5GHz, RAM 24GB, and HDD 3TB […]
> This is annoying, and I remember time before
> 12309 when rtorrent without any throttling won't make mplayer to freeze on
> less powerful hardware.

Oh yeah, this... i can clearly remember back then when on a then mid-range machine with a lot of compiling (gentoo => 100% cpu �[U+1F923]�) and filesystem work, VLC used to play an HD video stream even under heavy load without any hiccups and micro-stuttering.. It was impressive at the time.. and then.. it broke �[U+1F928]�

Revision history for this message
Jeremy (son9ne-junk) wrote :

This bug still exists in 18.04.02. Over 20 minutes to transfer 2 GB is insane.USB and Network transfers are killing productivity. Hopefully this will be fixed this decade...

Revision history for this message
In , vitaly.v.ch (vitaly.v.ch-linux-kernel-bugs) wrote :

According to my attempts to fix this bug, I totally disagree with you.

This bug is caused by pure design of current block dev layer. Methods which are good to develop code is absolutely improper for developing ideas. It's probably the key problem of the Linux Comunity. Currently, there is merged WA for block devices with a good queue such as Samsung Pro NVMe.

WBR,

Vitaly

(In reply to _Vi from comment #665)
> As far as I understand, this is kind of meta-bug: there are multiple causes
> and multiple fixes.
>
> "I do bulk IO and it gets slow" sounds rather general, and problem that can
> resurface anytime due to some new underlying issue. So the problem cannot be
> really "closed for good" no matter how much technical progress is made.
>
> For me 12309 basically stopped happening unless I deliberately tune
> "/proc/sys/vm/dirty_*" values to non-typical ranges and forgot to revert
> them back. I see system controllably slowing down processes doing bulk IO so
> the system in general stays reasonable. This behaviour is one of outcomes of
> this bug.
>
> I don't expect meaningful technical discussion to be happen in this thread.
> It should just serve as a hub for linking to specific new issues.

Revision history for this message
In , todorovic.s (todorovic.s-linux-kernel-bugs) wrote :

Had this again 20 minutes ago.
Was dopying 8.7GiB of data from one directory to another directory on the same filesystem (ext4 (rw,relatime,data=ordered)) on the same disk (Western Digital WDC WD30EZRX-00D8PB0 spinning metal disk).

The KDE UI became unresponsive (Everything other than /home and user data in on a SSD), could not launch any new applications. Opening a new tab on Firefox to go to Youtube didnt load the page, and kept saying waiting for youtube.com in the status bar (network gets halted?).

dmesg shows these, are they important?

[25013.905943] INFO: task DOMCacheThread:17496 blocked for more than 120 seconds.
[25013.905945] Tainted: P OE 4.15.0-54-generic #58-Ubuntu
[25013.905947] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[25013.905949] DOMCacheThread D 0 17496 2243 0x00000000
[25013.905951] Call Trace:
[25013.905954] __schedule+0x291/0x8a0
[25013.905957] schedule+0x2c/0x80
[25013.905959] jbd2_log_wait_commit+0xb0/0x120
[25013.905962] ? wait_woken+0x80/0x80
[25013.905965] __jbd2_journal_force_commit+0x61/0xb0
[25013.905967] jbd2_journal_force_commit+0x21/0x30
[25013.905970] ext4_force_commit+0x29/0x2d
[25013.905972] ext4_sync_file+0x14a/0x3b0
[25013.905975] vfs_fsync_range+0x51/0xb0
[25013.905977] do_fsync+0x3d/0x70
[25013.905980] SyS_fsync+0x10/0x20
[25013.905982] do_syscall_64+0x73/0x130
[25013.905985] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[25013.905987] RIP: 0033:0x7fc9cb839b07
[25013.905988] RSP: 002b:00007fc9a7aeb200 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[25013.905990] RAX: ffffffffffffffda RBX: 00000000000000a0 RCX: 00007fc9cb839b07
[25013.905992] RDX: 0000000000000000 RSI: 00007fc9a7aeaff0 RDI: 00000000000000a0
[25013.905993] RBP: 0000000000000000 R08: 0000000000000000 R09: 72732f656d6f682f
[25013.905994] R10: 0000000000000000 R11: 0000000000000293 R12: 00000000000001f6
[25013.905995] R13: 00007fc97fc5d038 R14: 00007fc9a7aeb340 R15: 00007fc987523380

Revision history for this message
In , alpha_one_x86 (alphaonex86-linux-kernel-bugs) wrote :

KDE have problem too, same copy via CLI or via Ultracopier (GUI) have no problem.
I note too KDE have UI more slow, plasma doing CPU usage in case I use the HDD...

Revision history for this message
In , howaboutsynergy (howaboutsynergy-linux-kernel-bugs) wrote :

What's the value of `vm.dirty_writeback_centisecs` ?, ie.
$ sysctl vm.dirty_writeback_centisecs

try setting it to 0 to disable it, ie.
`$ sudo sysctl -w vm.dirty_writeback_centisecs=0`

I found that this helps my network transfer not stall/stop at all(for a few seconds when that is =1000 for example) while some kinda of non-async `sync`(command)-like flushing is going on periodically while transferring GiB of data files from sftp to SSD!(via Midnight Commander, on a link limited to 10MiB per second)

vm.dirty_writeback_centisecs is how often the pdflush/flush/kdmflush processes wake up and check to see if work needs to be done.

Coupled with the above I've been using another value:
`vm.dirty_expire_centisecs=1000`
for both cases (when stall and not stall), so this one remained fixed to =1000.

vm.dirty_expire_centisecs is how long something can be in cache before it needs to be written. In this case it's 1 seconds. When the pdflush/flush/kdmflush processes kick in they will check to see how old a dirty page is, and if it's older than this value it'll be written asynchronously to disk. Since holding a dirty page in memory is unsafe this is also a safeguard against data loss.

Well, with the above, at least I'm not experiencing network stalls when copying GiB of data via Midnight Commander's sftp to my SSD until some kernel-caused sync-ing is completed in the background.

I don't know if this will work for others, but if curious about any of my other (sysctl)settings, they should be available for perusing [here](https://github.com/howaboutsynergy/q1q/tree/0a2cd4ba658067140d3f0ae89a0897af54da52a4/OSes/archlinux/etc/sysctl.d)

Revision history for this message
In , howaboutsynergy (howaboutsynergy-linux-kernel-bugs) wrote :

correction:

> In this case it's 1 seconds.

*In this case it's 10 seconds.

Also, heads up:
I found that 'tlp' in `/etc/default/tlp`, on ArchLinux, will overwrite the values set in `/etc/sysctl.d/*.conf` files if these are set to non `0`, ie.
MAX_LOST_WORK_SECS_ON_AC=10
MAX_LOST_WORK_SECS_ON_BAT=10
will set:
vm.dirty_expire_centisecs=1000
vm.dirty_writeback_centisecs=1000

regardless of what values you set them in `/etc/sysctl.d/*.conf` files.

/etc/default/tlp is owned by tlp 1.2.2-1

Not setting those (eg. commenting them out) will have tlp set the to its default of 15 sec (aka =1500). So the workaround is to set them to =0 which makes tlp not set them at all, thus the values from `/etc/sysctl.d/*.conf` files is allowed to remain as set.

Brad Figg (brad-figg)
tags: added: cscc
Revision history for this message
In , mricon (mricon-linux-kernel-bugs) wrote :

I'm making this bug private to prevent more spam from being added to it.

Revision history for this message
joseielpi (joseielpi) wrote :

I've had this same bug since lucid and keep having it today in xubuntu 18.04. It takes around one hour to transfer 5gb of data.

Revision history for this message
In , caroljames972022 (caroljames972022-linux-kernel-bugs) wrote :
Revision history for this message
In , alexwrinner (alexwrinner-linux-kernel-bugs) wrote :

[25013.905943] INFO: task DOMCacheThread:17496 blocked for more than 120 seconds.
[25013.905945] Tainted: P OE 4.15.0-54-generic #58-Ubuntu
[25013.905947] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[25013.905949] DOMCacheThread D 0 17496 2243 0x00000000
[25013.905951] Call Trace:
[25013.905954] __schedule+0x291/0x8a0
[25013.905957] schedule+0x2c/0x80
[25013.905959] jbd2_log_wait_commit+0xb0/0x120
[25013.905962] ? wait_woken+0x80/0x80 cheap essay https://trustanalytica.com/online/top-cheap-essay-writing-services /0xb0
[25013.905965] __jbd2_journal_force_commit+0x61/0xb0
[25013.905967] jbd2_journal_force_commit+0x21/0x30
[25013.905970] ext4_force_commit+0x29/0x2d
[25013.905972] ext4_sync_file+0x14a/0x3b0
[25013.905975] vfs_fsync_range+0x51/0xb0
[25013.905977] do_fsync+0x3d/0x70
[25013.905980] SyS_fsync+0x10/0x20
[25013.905982] do_syscall_64+0x73/0x130
[25013.905985] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[25013.905987] RIP: 0033:0x7fc9cb839b07

Revision history for this message
In , r.piedfer (r.piedfer-linux-kernel-bugs) wrote :

Created attachment 305400
attachment-13101-0.html

Bonjour,

Je suis actuellement absent.
J'aurai d'ici là un accès très limité à mes emails.
Je reviendrai vers vous dès que possible à mon retour.
Pour toute urgence, vous pouvez contacter Vincent Ophele / <email address hidden>

Cordialement,

----------

Hello,

I am OOO with no access to my emails.
I will get back to you as quickly as possible when I return.
In case of emergency, please contact Vincent Ophele / <email address hidden>

Best regards,

Ce courriel provient de la société Financière Amuse BidCo. Le contenu de ce courriel et les pièces jointes le cas échéant sont confidentiels pour le destinataire. Ils ne peuvent être ni divulgués, ni utilisés, ni copiés de quelque manière que ce soit par une personne autre que le destinataire prévu. Si ce courriel vous a été adressé par erreur, merci d'en informer son auteur par téléphone et par courriel en intégrant le message original dans votre réponse, puis supprimez-le. Pour plus d'informations sur la manière dont nous traitons les données personnelles, veuillez consulter notre Politique de protection des données personnelles <https://cdn.svc.asmodee.net/corporate/uploads/Templates/AH_Politique_de_protection_des_donnees_personnelles_du_groupe_FR.pdf> . Veuillez noter que Financière Amuse BidCo n'assume aucune responsabilité vis-à-vis des virus et qu'il vous incombe d'analyser ou de consulter ce courriel et ses pièces jointes le cas échéant. Financière Amuse BidCo est une société par actions simplifiée à associé unique (RCS Versailles 815 143 904). Son siège social est situé 18 rue Jacqueline Auriol - Quartier Villaroy - 78280 Guyancourt. Pour plus d'informations, veuillez consulter le site https://corporate.asmodee.com/.

This email is from Financière Amuse BidCo. The content of this email and any attachments are confidential to the intended recipient. They may not be disclosed to or used by or copied in any way by anyone other than the intended recipient. If this email is received in error, please inform the author by phone and email, including the original message in your reply, and then delete it. For more information on how we process personal data, please see our Privacy policy <https://cdn.svc.asmodee.net/corporate/uploads/Templates/AH_Asmodee_Global_Privacy_Policy_EN.pdf> . Please note that Financière Amuse BidCo does not accept any responsibility for viruses and it is your responsibility to scan or otherwise check this email and any attachments. Financière Amuse BidCo is a French « Société par actions simplifiée à associé unique » (Commerce and Companies Register of Versailles 815 143 904) Its registered office is at 18 rue Jacqueline Auriol - Quartier Villaroy - 78280 Guyancourt. For further information, please refer to https://corporate.asmodee.com/.

Revision history for this message
In , konoha02 (konoha02-linux-kernel-bugs) wrote :

I am facing this issue with both debian and archlinux. xfs and ext4
https://forums.debian.net/viewtopic.php?p=778803

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.