Bug #500069 “USB file transfer causes system freezes; ops take h...” : Bugs : linux package : Ubuntu

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2008-12-27:

#160

This is an attempt at bringing sanity to bug #7372. Please only comment here is you are experiencing high I/O wait times and interactvity on reasonable workloads.

Latest working kernel version: 2.6.18?

Problem Description:
I/O operations on large files tend to produce extremely high iowait times and poor system I/O performance (degraded interactivity). This behavior can be seen to varying degrees in tasks such as,
- Backing up /home (40GB with numerous large files) with diffbackup to external USB hard drive
- Moving messages between large maildirs
- updatedb
- Upgrading large numbers of packages with rpm

Steps to reproduce:
The best synthetic reproduction case I have found is,
$ dd if=/dev/zero of=/tmp/test bs=1M count=1M
During this copy, IO wait times are very high (70-80%) with extremely degraded interactivity although throughput averages about 29MB/s (about the disk's capacity I think). Even starting a new shell takes minutes, especially after letting the machine copy for a while without being actively used. Could this mean it's a caching issue?

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2008-12-27:

#161

For the record, this is even reproducible with Linus's master.

Revision history for this message

In Linux Kernel Bug Tracker #12309, ozan (ozan-linux-kernel-bugs) wrote on 2009-01-12:

#162

I'm also having this problem.

Latest working kernel version: 2.6.18.8 with config:
http://svn.pardus.org.tr/pardus/2007/kernel/kernel/files/pardus-kernel-config.patch

Currently working on 2.6.25.20 with config:
http://svn.pardus.org.tr/pardus/2008/kernel/kernel/files/pardus-kernel-config.patch

Tested also with 2.6.28 and felt no significant performance improvement.

--

During heavy disk IO's like running 'svn up' hogs the system avoiding the start a new shell, browse on the internet, do some text editing using vim, etc.

For example, after being able to open a text buffer with vim, 4-5 seconds delays happens between consecutive search attempts.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-14:

#163

Download full text (3.1 KiB)

Hello Ben,

I don't known where to post it exactly. Why Linux Memory Management? Or why -mm and not mainstream? Can you do it for me please?

I have added a second test case, which using threads with pthread_mutex and pthread_cond instead of processes with pipes for communicating, to ensure it is a cpu scheduler issue.

I have repeated the tests with some vanilla kernels again, as there is a remark in the bug report for tainted or distro kernels. As I got a segmentation fault with the 2.6.28 kernel, I added the result of the Ubuntu 9.04 kernel (see attachment). The results are not comparable to the results posted before, as I have changed the time handling (doubles instead of int32_t as some echo messages takes more than one second).
The first three results are 2*100, 2*50 and 2*20 processes exchanging 100k, 200k and 1M messages over a pipe. The last three results are 2*100, 2*50, and 2*20 threads exchanging 100k, 200k and 1M messages with pthread_mutex and pthread_cond. I have added a 10 second pause at the beginning of every thread/process to assure the 2*100 processes or threads are all created and start to exchange the messages nearby at the same time. This was not the case at the old test-case with 2*100 processes, as the first thread was already destroyed before the last was created.

With the second test-case with threads, I got the problems (threads:2*100/msg:1M) immediately with the kernel 2.6.22.19. There kernel 2.6.20.21 was fine with both test-cases.

The meaning of the results:
- min message time
- average message time (80% of the messages)
- message time at median
- maximal message time
- test duration

Hello Ben,

I don't known where to post it exactly. Why Linux Memory Management? Or why -mm and not mainstream? Can you do it for me please?

I have added a second test case, which using threads with pthread_mutex and pthread_cond instead of processes with pipes for communicating, to ensure it is a cpu scheduler issue.

I have repeated the tests with some vanilla kernels again, as there is a remark in the bug report for tainted or distro kernels. As I got a segmentation fault with the 2.6.28 kernel, I added the result of the Ubuntu 9.04 kernel (see attachment). The results are not comparable to the results posted before, as I have changed the time handling (doubles instead of int32_t as some echo messages takes more than one second).
The first three results are 2*100, 2*50 and 2*20 processes exchanging 100k, 200k and 1M messages over a pipe. The last three results are 2*100, 2*50, and 2*20 threads exchanging 100k, 200k and 1M messages with pthread_mutex and pthread_cond. I have added a 10 second pause at the beginning of every thread/process to assure the 2*100 processes or threads are all created and start to exchange the messages nearby at the same time. This was not the case at the old test-case with 2*100 processes, as the first thread was already destroyed before the last was created.

With the second test-case with threads, I got the problems (threads:2*100/msg:1M) immediately with the kernel 2.6.22.19. There kernel 2.6.20.21 was fine with both test-cases.

The meaning of the results:
- min message time
- average message time (80% of the messages)
- message time at median
- maximal message time
- test duration

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-14:

#164

Created attachment 19795
test case with processes and pipes

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-14:

#165

Created attachment 19796
test case with threads and mutexes

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-14:

#166

Created attachment 19797
All testresult on Core2 T7700 @ 2.40GHz / 4GB RAM

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-15:

#167

I guess the high I/O wait time and the poor responsiveness are the same problem, caused by the cpu scheduler, as I can produce the same symptoms without disc I/O.
Since 2.6.26/27 everyone should be affected by this issue.

What I did not understand is:
Why takes the test with threads and mutexes twice as long as the test with processes and pipes, but stresses the system much more? The mouses freezes nearby immediately, while the test with processes and pipes allows to move the windows.

Revision history for this message

In Linux Kernel Bug Tracker #12309, l.wandrebeck (l.wandrebeck-linux-kernel-bugs) wrote on 2009-01-15:

#168

I've met the high I/O wait problem with 3ware cards on Centos 5.x.
This is related to pci_try_set_mwi. More information here:
https://bugzilla.redhat.com/show_bug.cgi?id=444759
Now Thomas seems to have found another source for the problem. Maybe mwi is adding on top of that (not every controller driver sets MWI - BIOS is supposed to do so, but I've met a couple of boards that do not).
HTH.

Revision history for this message

In Linux Kernel Bug Tracker #12309, rick.richardson (rick.richardson-linux-kernel-bugs) wrote on 2009-01-15:

#169

If I run "google desktop indexer", then I get the long waits. E.G. vim goes away for up to 5-30 seconds, repeatably!

So, I don't run "google desktop indexer". No problem since 12/15/08!

Revision history for this message

In Linux Kernel Bug Tracker #12309, humufr (humufr-linux-kernel-bugs) wrote on 2009-01-15:

#170

You can also add the task:

- copy a file from a compactflash card through usb adaptor or pcmcia card. The
computer is not usable until the copy of the file (3 to 5 megas) is finish. It
doesn't matter if it copy the whole card or only a file. It seems to be similar
to the description of the bug here.

Revision history for this message

In Linux Kernel Bug Tracker #12309, unixg33k (unixg33k-linux-kernel-bugs) wrote on 2009-01-15:

#171

I have found that this may be an issue with the Complete Fair Queuing I/O scheduler that was introduced as default in 2.6.18 (when most started observing this performance issue). Reverting back to the old AS scheduler for me seems to have resolved the problem.

To use the AS scheduler and test for yourself, just specify "elevator=as" as a boot option.

Revision history for this message

In Linux Kernel Bug Tracker #12309, brice+lklm (brice+lklm-linux-kernel-bugs) wrote on 2009-01-15:

#172

(In reply to comment #2)
> I'm also having this problem.
>
> Latest working kernel version: 2.6.18.8 with config:
>
> http://svn.pardus.org.tr/pardus/2007/kernel/kernel/files/pardus-kernel-config.patch
>
> Currently working on 2.6.25.20 with config:
>
> http://svn.pardus.org.tr/pardus/2008/kernel/kernel/files/pardus-kernel-config.patch
>
> Tested also with 2.6.28 and felt no significant performance improvement.
>
> --
>
> During heavy disk IO's like running 'svn up' hogs the system avoiding the
> start
> a new shell, browse on the internet, do some text editing using vim, etc.
>
> For example, after being able to open a text buffer with vim, 4-5 seconds
> delays happens between consecutive search attempts.

You seem to be able to reproduce the bug easily, and have found a non affected kernel version.
Can you git bisect between those kernels to at least isolate the culprit commit?

Revision history for this message

In Linux Kernel Bug Tracker #12309, brice+lklm (brice+lklm-linux-kernel-bugs) wrote on 2009-01-15:

#173

(In reply to comment #3)
>
> With the second test-case with threads, I got the problems
> (threads:2*100/msg:1M) immediately with the kernel 2.6.22.19. There kernel
> 2.6.20.21 was fine with both test-cases.

I'm not sure that's the same issue I had when I posted but 7372, but since you seem to be a programmer you should git bisect between those kernels to isolate the culprit commit.

Revision history for this message

In Linux Kernel Bug Tracker #12309, pvz (pvz-linux-kernel-bugs) wrote on 2009-01-15:

#174

I'm not sure if this is related or not, but I'm getting similar behaviour on my own system, but *only* when copying files *from* my USB memory stick (a 4 GB Corsair Flash Voyager) *to* the internal SSD on my Asus Eee PC 900 running Ubuntu 8.10 with a custom build of Linux 2.6.27 (probably slightly patched) provided by array.org.

I.e. reading a file from the USB stick to /dev/null, no slowdown.
Writing /dev/zero to USB stick, no slowdown.
Reading a file from the internal SSD to /dev/null, no slowdown.
Writing /dev/zero to internal SSD, no slowdown.
Copying a file from internal SSD to USB stick, no slowdown.
Copying a file from USB stick to internal SSD, I get massive slowdowns on interactive performance. Launching a terminal, which usually takes a few seconds, suddenly takes the better part of a minute.

Linux used is 2.6.27-8-eeepc on i686 SMP, as prebuilt by http://www.array.org/ubuntu/

The filesystem on the internal SSD is ext3, running on LVM, running on LUKS (encrypted filesystem). As set up by the Ubuntu 8.10 installer. Swap is also on the same encrypted LVM.

The filesystem on the USB stick is vfat. Nothing fancy at all.

I should also add that the read performance of my USB stick is faster (about 25 MB/s) than the write performance on the built-in SSD (about 10 MB/s).

If you feel that it is useful, I can provide dumps of lspci/lsusb/lsmod or any other information. As for the exact build options and patches, that should be determinable by checking the web site specified above.

Hope more data makes it possible to determine a pattern to this bug.

Revision history for this message

In Linux Kernel Bug Tracker #12309, humufr (humufr-linux-kernel-bugs) wrote on 2009-01-15:

#175

I tried the solution of Mike the comment http://bugzilla.kernel.org/show_bug.cgi?id=12309#c11 and indeed that solved my issue. So he seens that he is right at least for my problem.

Revision history for this message

In Linux Kernel Bug Tracker #12309, pvz (pvz-linux-kernel-bugs) wrote on 2009-01-15:

#176

I tried elevator=as on my system, and it did not change the behaviour. Copying files from external USB to internal encrypted SSD still totally smashes interactive performance. So this issue might be unrelated.

Revision history for this message

In Linux Kernel Bug Tracker #12309, unixg33k (unixg33k-linux-kernel-bugs) wrote on 2009-01-15:

#177

(In reply to comment #16)
> I tried elevator=as on my system, and it did not change the behaviour.
> Copying
> files from external USB to internal encrypted SSD still totally smashes
> interactive performance. So this issue might be unrelated.
>

This may be an unrelated issue having to do with USB I/O - since USB seems to be more CPU intensive anyway.

When I experienced this bug (prior to switching from CFQ), it would happen whenever I copied a large file on ATA or SCSI devices and I noticed extremely high I/O wait times - with very low CPU usage. Not only during copying - but during any disk-intensive operation. Everything on my affected machines would come to a grinding halt until the operation was complete. Using AS for me so far has seemed to resolve the issue - as my machines are now responsive as they should be during heavy disk I/O.

Revision history for this message

In Linux Kernel Bug Tracker #12309, bpenglase (bpenglase-linux-kernel-bugs) wrote on 2009-01-15:

#178

I have had a very similar problem to this. I still have it often, but not as
much from when I changed from EXT3 to ReiserFS. For the Scheduler, I've been
using BFQ or V(R) thats included in the Zen Patchset. I have tried the stock
kernel, and same problem exists, however I can't remember which scheduler I
used at that point, I believe Deadline.
Most of the IOWait I get comes when either I'm copying files to the local
drives, or using multiple VM's (generally Windows as thats what is needed for
work). I'm willing to try about anything to get this fixed. It's a little
better since I switched FS's on my VM Drive, but still isn't totally fixed.

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2009-01-15:

#179

(In reply to comment #11)
> I have found that this may be an issue with the Complete Fair Queuing I/O
> scheduler that was introduced as default in 2.6.18 (when most started
> observing
> this performance issue). Reverting back to the old AS scheduler for me seems
> to have resolved the problem.
>
> To use the AS scheduler and test for yourself, just specify "elevator=as" as
> a
> boot option.
>

Fwiw, I've never used the CFQ scheduler. I'm on the deadline scheduler with my 3ware 9560SE and still see this problem crop up from time to time, usually when doing a file copy large enough to fill the page cache.

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-01-15:

#180

I too have found that the choice of I/O scheduler makes little difference. Using AS generally yields no noticable improvement.

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-01-15:

#181

>
> Fwiw, I've never used the CFQ scheduler. I'm on the deadline scheduler with
> my
> 3ware 9560SE and still see this problem crop up from time to time, usually
> when
> doing a file copy large enough to fill the page cache.
>

Another deadliner here. And the same thing. There are two clear cut triggers for me:

1. The test case thomas posted.
2. large copies which fill up page cache.

I think its a process scheduling bug because page cache fill up might be triggering the pdflush processes (which are btw, normal priority. why?) into hyper drive and causing all other processes to wait. We do see various processes going into 'D' state and pdflush at the top of the cpu usage list, when the symptoms occur.

If CFQ is used, and process priority determines IO priority, aren't pdflush processes going to compete with processes doing their own IO when dirty_ratio is reached and the process has priority equal or better than 0 (-1 and higher)? That may explain some of the stories with CFQ here.

Revision history for this message

In Linux Kernel Bug Tracker #12309, andi-bz (andi-bz-linux-kernel-bugs) wrote on 2009-01-15:

#182

Re: blaming the scheduler in 2.6.26

The problem was observed a long time before that. There might be additional
scheduler problems (this bug in general suffers from the "lots of different problems" disease), but that is unlikely to be the old well known disk starvation with different devices issue.

Re comment #9 vim stalls while disk is pounded:

You're running ext3 or reiser right? That's a known problem in that vim
regularly does fsync on its auto safe file and that causes a synchronous
JBD transaction and since all transactions are strictly ordered if there
are enough of them in front and the disk is busy it takes quite a long time.

At least on the higher level that is supposed to be mostly solved by ext4
or by XFS.

Of course it's another problem that the disk schedulers allow that long starvation in the first time.

Revision history for this message

In Linux Kernel Bug Tracker #12309, theparanoidone (theparanoidone-linux-kernel-bugs) wrote on 2009-01-15:

#183

Download full text (7.3 KiB)

Hi Thomas~

Can you elaborate on your test?

You wrote:
"The first three results are 2*100, 2*50 and 2*20 processes exchanging 100k,
200k and 1M messages over a pipe. The last three results are 2*100, 2*50, and
2*20 threads exchanging 100k, 200k and 1M messages with pthread_mutex and
pthread_cond."

So, I'm guessing you want the test to be run like this:
./processtest 200 100000
./processtest 100 200000
./processtest 40 1000000
./threadtest 200 100000
./threadtest 100 200000
./threadtest 40 1000000

Is that correct? Just want to be sure i'm running the same tests (Also, the code limits number of processes to max 100... so I just edited this allowing the max limit to be 200)

Here's our results:

Side notes... we have been experiencing problems with MySQL specifically with sync-binlog=1 and log-bin on and performing high volume of concurrent transactions. Although we run raid-1 with battery cache on... our throughput is horrible. For some reason, we have found that by increasing the CONFIG_HZ=1000 from 100 in the kernel, we get much higher throughput. Otherwise our benchmarks just sit around and have trouble context switching.

#CONFIG_HZ_100=y
#CONFIG_HZ=100
#change to:
CONFIG_HZ_1000=y
CONFIG_HZ=1000

I do not know if the problems we are experiencing with the clock are related to this bug listed here. However, I did want to submit our feed back showing the difference in kernels where our bottleneck runs better.

We use sysbench for our test (wi...

Hi Thomas~

Can you elaborate on your test?

You wrote:
"The first three results are 2*100, 2*50 and 2*20 processes exchanging 100k,
200k and 1M messages over a pipe. The last three results are 2*100, 2*50, and
2*20 threads exchanging 100k, 200k and 1M messages with pthread_mutex and
pthread_cond."

So, I'm guessing you want the test to be run like this:
./processtest 200 100000
./processtest 100 200000
./processtest 40 1000000
./threadtest 200 100000
./threadtest 100 200000
./threadtest 40 1000000

Is that correct?  Just want to be sure i'm running the same tests (Also, the code limits number of processes to max 100... so I just edited this allowing the max limit to be 200)

Here's our results:

Side notes... we have been experiencing problems with MySQL specifically with sync-binlog=1 and log-bin on and performing high volume of concurrent transactions.  Although we run raid-1 with battery cache on... our throughput is horrible.  For some reason, we have found that by increasing the CONFIG_HZ=1000 from 100 in the kernel, we get much higher throughput.  Otherwise our benchmarks just sit around and have trouble context switching.

#CONFIG_HZ_100=y 
#CONFIG_HZ=100
#change to:
CONFIG_HZ_1000=y 
CONFIG_HZ=1000

I do not know if the problems we are experiencing with the clock are related to this bug listed here.  However, I did want to submit our feed back showing the difference in kernels where our bottleneck runs better.

We use sysbench for our test (with vmstat -S M 3, iostat -dx 3, and mpstat 3 to monitor.. all part of sysstat suite).

FYI, Here are our sysbench commands (be sure to change your mysql username and password and create the database sbtest):

You can get sysbench here: http://sysbench.sourceforge.net/
Compile it like: 
./configure --with-mysql --with-mysql-include=/usr/share/include --with-mysql-lib=/usr/share/lib
make
make install

Prepare it:
./sysbench --num-threads=50 --test=oltp --oltp-test-mode=complex --oltp-table-size=100000 --oltp-distinct-ranges=0 --oltp-order-ranges=0 --oltp-sum-ranges=0 --oltp-simple-ranges=0 --oltp-point-selects=0 --oltp-range-size=0 --mysql-table-engine=innodb --mysql-host=127.0.0.1 --mysql-user=ROOT --mysql-password=PASSWORD prepare

Run it:
./sysbench --num-threads=50 --test=oltp --oltp-test-mode=complex --oltp-table-size=100000 --oltp-distinct-ranges=0 --oltp-order-ranges=0 --oltp-sum-ranges=0 --oltp-simple-ranges=0 --oltp-point-selects=0 --oltp-range-size=0 --mysql-table-engine=innodb --mysql-host=127.0.0.1 --mysql-user=ROOT --mysql-password=PASSWORD run

The important line of output is read/write requests per second, and total time.

===
2.6.15.7-ubuntu1-custom-1000HZ_CLK #1 SMP Thu Jan 15 19:06:30 PST 2009 x86_64 GNU/Linux (ubuntu 6.06.2 server LTS with clk_hz custom set to 1000)

read/write requests:                 50000  (2394.13 per sec.)
total time:                          20.8844s

vmstat -S M 3
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  0      0   9043    142    559    0    0     1 30341 5020 25659  6 15 78  1
iostat -dx 3
Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda          0.00 4320.74  0.00 4836.12    0.00 73254.85     0.00 36627.42    15.15     4.93    1.02   0.16  77.02
===

===
2.6.24-23-server #1 SMP Thu Nov 27 18:45:02 UTC 2008 x86_64 GNU/Linux (default ubuntu 64 8.04 server LTS at default 100HZ clock)

read/write requests:                 50000  (434.33 per sec.)
total time:                          115.1207s

vmstat -S M 3
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0   1506    109    100    0    0   155  5011  531 4532  5  3 91  1
iostat -dx 3
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   951.67   30.67  551.00   274.67 12021.33    21.14     1.18    2.03   1.60  93.00
===

===
Linux la 2.6.24.6-custom #1 SMP Thu Jan 15 23:34:10 UTC 2009 x86_64 GNU/Linux (ubuntu 8.04 server LTS with clk_hz custom set to 1000)

read/write requests:                 50003  (2680.47 per sec.)
total time:                          18.6546s

vmstat -S M 3
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0      0   1710     46     73    0    0  1296 27104 3474 31095  5  3 82  9
iostat -dx 3
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00  2432.33  159.33 2576.00  1632.00 40066.67    15.24     1.95    0.71   0.35  94.47
===

Note: Our servers are 2xIntel xeon 5110 dual core 1.6GHz with 15k SAS Raid-1 2+GB Ram

Not sure if this feedback is helping or not;  my hope is that it is relevant to what you are trying to fix.  My personal opinion is that the kernel should scale a little more uniform than  434.33 per sec  versus  2680.47 per sec... seems to be a large difference.

Even though 100Hz clock setting is recommended for servers... it's seems this could actually not be ideal for anyone running a MySQL server that needs safe transaction support via sync-binlog=1  (at least... that's what we are finding for high insert/update load).

Perhaps you can look at sysbench as there are a number of test for threads, fileio, and etc to determine if this can expose the kernel issues in a different way?

Any feedback about an ideal kernel and kernel config for servers is much appreciated as these are no doubt difficult to debug.

Revision history for this message

In Linux Kernel Bug Tracker #12309, pvz (pvz-linux-kernel-bugs) wrote on 2009-01-16:

#184

Did some more testing. My father has an Eee PC 900 exactly the same as mine also running Ubuntu 8.10 with the same kernel as mentioned before. Only difference that I can think of - he doesn't use LUKS and LVM like me, he instead has his / directly on /dev/sdb1 (internal SSD).

I also, in addition to trying to launch a Terminal via Gnome (as I did previously) I tried the vim "stuttering" test by creating a file, saving it, and holding down a key to see when it stutters.

The results of these tests:

- On both my own (encypted) and the other (unencrypted) computer, vim occasionally freezes for a few seconds while I cp a file from USB memory to internal SSD.

- On my computer (encrypted) lauching a gnome-terminal takes much longer while copying a file from SSD than on the other computer. While there is a noticable slowdown on the unencrypted machine, on the encrypted machine sometimes the gnome-terminal won't even launch until *after* the copy is complete.

In conclusion - the effect exists on both machines, but the encryption of the SSD very significantly increases the problem. While some slowdown due to encryption should be expected, it should not make the machine almost completely unusable while copying a file from a USB stick to the internal SSD.

Revision history for this message

In Linux Kernel Bug Tracker #12309, larppaxyz (larppaxyz-linux-kernel-bugs) wrote on 2009-01-16:

#185

Different scheduler (#11) doesn't seem to do much. I did some quick and dirty testing with my laptop :

Linux lupaus 2.6.28-customlupaus #4 SMP PREEMPT Thu Dec 25 15:05:35 EET 2008 x86_64 GNU/Linux
Vanilla 2.6.28 kernel, config from Ubuntu 8.10, with some modifications to suit my laptop

with io scheduler cfq
./threadtest 100 200000
min:0.004ms|avg:0.007-0.008ms|mid:0.008ms|max:894.480ms|duration:187.588s

with elevator=as (eg. io scheduler anticipatory)
./threadtest 100 200000
min:0.004ms|avg:0.007-0.008ms|mid:0.008ms|max:884.016ms|duration:188.248s

---

with io scheduler cfq
./proctest 50 100000
min:0.005ms|avg:0.005-0.006ms|mid:0.006ms|max:460.631ms|duration:35.773s

with elevator=as (eg. io scheduler anticipatory)
./proctest 50 100000
min:0.005ms|avg:0.006-0.006ms|mid:0.006ms|max:479.695ms|duration:36.645s

Revision history for this message

In Linux Kernel Bug Tracker #12309, pvz (pvz-linux-kernel-bugs) wrote on 2009-01-16:

#186

One more observation from another experiment I did:

I have swap on the same encrypted LVM as my root partition. Disabling swap makes the terminal launch much faster while copying -- still slower than when not copying files, but within a few seconds of clicking instead of within minutes.

However! Now, instead individual running processes (like Firefox and vim) hang much more agressively and frequently during copying. I'm not sure what to make of this, but I hope somebody who actually knows something about the Linux kernel will find this useful. :-)

Revision history for this message

In Linux Kernel Bug Tracker #12309, nalimilan (nalimilan-linux-kernel-bugs) wrote on 2009-01-16:

#187

I'm not sure any developer will be able to pinpoint the problem in all this mess! ;-) There are likely several bugs here.

For a start, I think it could be nice to separate people whose problem is fixed by elevator=as. And then separate people using encrypted disks. And then problems occurring only with USB disks. Please open new reports. What do developers think?

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-16:

#188

Created attachment 19828
Bisect results

I have done the bisect and isolated patch. In the attachment you can find the bisec result. I have done the sysbench test too.

Tests:
100 Process / 1k messages

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-16:

#189

Created attachment 19829
sysbench results

As I am using Firefox3 with the bad kernel, my post was submitted by accident. With the good kernel there are (nearby) no problems with firefox3 any more.

The tests were where run with the following parameters
- 2*100 processes / 100k messages
- 2*20 processes / 1M messages
- 2*200 threads / 100k messages

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-16:

#190

Created attachment 19830
Bisect results

wrong file

Revision history for this message

In Linux Kernel Bug Tracker #12309, andi-bz (andi-bz-linux-kernel-bugs) wrote on 2009-01-16:

#191

Re #26

There's some performance problem in general with encrypted swap. I've seen that too. But it's probably a different issue than the primary one which should be discussed here.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-16:

#192

> Is that correct? Just want to be sure i'm running the same tests (Also, the
> code limits number of processes to max 100... so I just edited this allowing
> the max limit to be 200)
I have used 100/50/20 as one echo process uses 2 threads or processes. But it is not important, as these test should only compare different kernel versions on the same computer.

Revision history for this message

In Linux Kernel Bug Tracker #12309, bpenglase (bpenglase-linux-kernel-bugs) wrote on 2009-01-16:

#193

(In reply to comment #18)
> I have had a very similar problem to this. I still have it often, but not as
> much from when I changed from EXT3 to ReiserFS. For the Scheduler, I've been
> using BFQ or V(R) thats included in the Zen Patchset. I have tried the stock
> kernel, and same problem exists, however I can't remember which scheduler I
> used at that point, I believe Deadline.
> Most of the IOWait I get comes when either I'm copying files to the local
> drives, or using multiple VM's (generally Windows as thats what is needed for
> work). I'm willing to try about anything to get this fixed. It's a little
> better since I switched FS's on my VM Drive, but still isn't totally fixed.
>

I did try the AS Scheduler, as that was the only thing I changed in my kernel, and it didn't change anything interactively, still get a high IO Wait.

The other thing I noticed, at least when in AS, I start using Swap, it's not a lot (within about 2 minutes I was using 10MB), but it was still climbing.

One other thing, I'm wondering if this is 64bit related. All of my personal boxes are 64bit, and it seems of ones posted here, along with other threads I've read (over on Gentoo forums) that it seems this hits the 64bit users more then the 32bit users. Any truth to this, or am I trying to relate things that aren't related?

My work box (most heavily used):
Linux PC010233L 2.6.28-zen1-2 #2 SMP PREEMPT Thu Jan 15 16:06:37 EST 2009 x86_64 Intel(R) Core(TM)2 Duo CPU E8200 @ 2.66GHz GenuineIntel GNU/Linux

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-01-16:

#194

(In reply to comment #30)
> Created an attachment (id=19830) [details]
> Bisect results
>
If that bisection is to be believed, the assertion that the issue is caused by a scheduling issue seems quite plausible.

(In reply to comment #33)
> One other thing, I'm wondering if this is 64bit related. All of my personal
> boxes are 64bit, and it seems of ones posted here, along with other threads
> I've read (over on Gentoo forums) that it seems this hits the 64bit users
> more
> then the 32bit users. Any truth to this, or am I trying to relate things that
> aren't related?
>
There is evidence that x86-64 is a factor here.

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-01-16:

#195

It does strike me as quite odd how large of a factor the size of the transfer seems to be. When I first start evolution (I have very large folders), the system will exhibit poor interactivity for upwards of 5 to 10 minutes. However, when transferring lots of small files (i.e. module_install'ing), the kernel behaves fine. (although modpost also seems to produce poor interactivity)

I think it might help if we had a kernel developer here to list the kernel block/memory manager/scheduler statistics that might indicate where this I/O wait time is going. If sufficient statistics don't exist, it might be worthwhile to instrument the kernel specifically for this bug. It does seem clear that the bug I intended this ticket to describe is invariant on I/O scheduler, so that's one factor that needn't be accounted for.

Revision history for this message

In Linux Kernel Bug Tracker #12309, larppaxyz (larppaxyz-linux-kernel-bugs) wrote on 2009-01-16:

#196

I just recompiled my kernel without any SMP support and tested again. My laptop went from usable to totally unusable. Network traffic stops and it's even hard to type anything when process/thread test is running. I have only single CPU on my laptop. I also tried to change scheduler with this setup and that didn't make any difference.

Good luck :)

Revision history for this message

In Linux Kernel Bug Tracker #12309, Adriaan.van.Kessel (adriaan.van.kessel-linux-kernel-bugs) wrote on 2009-01-16:

#197

Could this be a jiffies wraparound bug ?

I've seen different formulas for doing interval arithmetic,
and (not) handling wraparound.

For instance, in as_antic_expired()
::
long delta_jif;

        delta_jif = jiffies - ad->antic_start;
        if (unlikely(delta_jif < 0))
                delta_jif = -delta_jif;
::
, which seems incorrect to me. (it could alter the preditive powers
of the scheduler in mysterious ways ;-)
(A different calculation is performed at other places.)
Jiffies wrap around depending on the HZ value (but still, intervals above INT_MAX should be relatively rare), and the jiffies start value
will cause the first wrap @ 5 min after booting, so that would show.

My 2 cents,
AvK

Revision history for this message

In Linux Kernel Bug Tracker #12309, vapier (vapier-linux-kernel-bugs) wrote on 2009-01-16:

#198

Adriaan: drivers shouldnt be manually doing comparison on jiffies values. there are helps in linux/jiffies.h for doing the comparison (time_before() / time_after()) and those should handle wrap arounds. if you do see a driver that is doing the wrong thing, i'd open another bug specifically about that (or post a patch yourself :D).

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-16:

#199

With the following code I got negative time differences about -127ms. The tv_sec values where equal and the second tv_usec was smaller than the first. I cannot say which kernel it was, as I am no more able to reproduce it. Some days before it occurs on nearby every test. As this behaviour is connected with TSC synchronisation patch, I have posted it here. I will try to figure out the kernel version.

> gettimeofday(&tv_s, &tz);
> write(a2b[1], &c, 1);
> read(b2a[0], &c, 1);
> gettimeofday(&tv_e, &tz);
> timersub(&tv_e, &tv_s, &tv_r);

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-16:

#200

I get the negative time difference on 2.6.17.14 kernel.org, 2.6.18.8 kernel.org and 2.6.18-92.el5 CentOS.

My system is unusable with these three kernels, when I use the ide_generic. Disc throughput ~3MB/s I/O wait time at 100%.

No problems in ahci and libata with 2.6.18-92.el5.

I was not able to provoke a negative time difference with kernels 2.6.20, 2.6.21, 2.6.24, 2.6.27 and 2.6.8.

Revision history for this message

In Linux Kernel Bug Tracker #12309, theparanoidone (theparanoidone-linux-kernel-bugs) wrote on 2009-01-16:

#201

Created attachment 19839
32v64test

32 Bit Test vs 64-Bit

This test is slightly apples and oranges... however, because someone inquired if this was a 32bit or a 64bit problem I ran these tests.

I'm inclined to think it applies to both 32bit and 64bit for 2 reasons
-The 32 bit test didn't perform that great
-The git bisect comment states "the biggest change is the removal of the 'fix up TSCs' code on x86_64 and i386"

Revision history for this message

In Linux Kernel Bug Tracker #12309, theparanoidone (theparanoidone-linux-kernel-bugs) wrote on 2009-01-16:

#202

Created attachment 19840
32v64testCleanNewLines.txt

formatting fix

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-16:

#203

Please ignore my comments #39 and #40, as this are other problems.

Revision history for this message

In Linux Kernel Bug Tracker #12309, cyrusm (cyrusm-linux-kernel-bugs) wrote on 2009-01-16:

#204

Are you guys aware of the Latencytop utility? http://www.latencytop.org/
You have to add CONFIG_LATENCYTOP=y to your config.

Then run your tests which break down the system with Latencytop running. It might give additional information.

Revision history for this message

In Linux Kernel Bug Tracker #12309, mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote on 2009-01-16:

#205

I've reproduced this problem with LTTng (http://ltt.polymtl.ca). It looks like the block layer is backmerging the large "dd if=/dev/zero ...." requests at a rate which leaves the request on the top of the request queue.

I've started a more thorough discussion on lkml here :

http://lkml.org/lkml/2009/1/16/487

Revision history for this message

In Linux Kernel Bug Tracker #12309, unixg33k (unixg33k-linux-kernel-bugs) wrote on 2009-01-16:

#206

re: the 32bit vs 64bit idea - I've experienced this issue on both 32 and 64 bit platforms, however - all of the platforms were on x64-capable CPUs (not sure if that would matter).

Revision history for this message

In Linux Kernel Bug Tracker #12309, seanj (seanj-linux-kernel-bugs) wrote on 2009-01-16:

#207

I hit this bug on Ubuntu 8.10 (updated to 2.6.27-9-generic) running Vmware Workstation 6.5.126130 with Ubuntu 8.04.1 LTS as a guest. It was esp pronounced when resuming a suspended VM.

I tried the different elevator io schedulers. Nothing helped.

Independent of VMWare, if I ran bonnie in one shell and launched firefox the whole system behaved in a very chunky manner.

Renicing pdflush -10 had some great improvement on basic responsiveness. The weird part was after re-recreating a new VM and not seeing the iowait problems I then tried resuming a VM with VMware at the same time I was compressing a tar file with pbzip2 (parallel bzip). All 4 cores were pegged and my load average was normal, system responsiveness was good. As **soon** as I tried resuming the VM with VMWare workstation, the cpu load dropped to 1-5% across all cpus. iowait times shot way up. I have now killed Vmware and iowait times have dropped but my maximum read speed is hovers around 1MB/s (as measured with iostat). This is another symptom of the iowait problem.

iostat -c -d -m -x sda 1

rMB/s is usually never over 2MB/s

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas (thomas-linux-kernel-bugs) wrote on 2009-01-17:

#208

(In reply to comment #46)
> re: the 32bit vs 64bit idea - I've experienced this issue on both 32 and 64
> bit
> platforms, however - all of the platforms were on x64-capable CPUs (not sure
> if
> that would matter).
>

Using an IBM X40 with an old Pentium M (32bit) and Thomas.pi's testcases made my machine totally unusable. So I don't think this has anything to do with x64-capable CPUs.

Revision history for this message

In Linux Kernel Bug Tracker #12309, Adriaan.van.Kessel (adriaan.van.kessel-linux-kernel-bugs) wrote on 2009-01-17:

#209

(In reply to comment #38)
> Adriaan: drivers shouldnt be manually doing comparison on jiffies values.
> there are helps in linux/jiffies.h for doing the comparison (time_before() /
> time_after()) and those should handle wrap arounds. if you do see a driver
> that is doing the wrong thing, i'd open another bug specifically about that
> (or
> post a patch yourself :D).

Well, it was not in one of the driver's code but in block/as-iosched.c:as_fifo_expired()

The observed behavior indicates that something is wrong with the shceduling of
disk I/O, and that most time is spent by all theads competing for one or more (spin-)locks; you might call it a convoy or a thundering hurd syndrome.
But it might be unrelated.
AvK

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-01-17:

#210

Hi all,

More tests

But if this is a IO problem why monitors does not show a big IO Wait Percentage. It shows a high system usage percentage.
So I suppose that not IO problem seems to be related to process handling inside kernel. May it be related to the preemption model?

I did some additional test:

   1.-Change clock timing -> (no improvement)
   2.-Change preemption model (tested all of them) -> (no improvement)
   3.-Change IO scheduler -> (no improvement)

Is there any way to profile the kernel to see what function gets more attention?

Hope you find somethig...

I attach a screenshot also...

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-01-17:

#211

Created attachment 19858
Top output while running test

Revision history for this message

In Linux Kernel Bug Tracker #12309, mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote on 2009-01-17:

#212

Created attachment 19859
RFC patch to put a maximum to the number of cached bio merge done in a row

Can you try this patch, which applies to 2.6.28, to see if it helps ? I have not been able to reproduce the problem with the patch applied.

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-01-17:

#213

Hi Mathieu,

I tried this patch against 2.6.27 because it patched right. But the results are not good. It took even more time to complete the test.

Can anyone confirm this?

Revision history for this message

In Linux Kernel Bug Tracker #12309, mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote on 2009-01-17:

#214

This patch will probably diminish the overall throughput, because it is making sure that we do not merge more than 128 requests together. I am more interested in the I/O _latency_ (delay) you get when you run the system under a heavy I/O load.

Mathieu

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-01-17:

#215

Created attachment 19866
Port Attachment #19859 to Linus's master

(In reply to comment #53)
> Hi Mathieu,
>
> I tried this patch against 2.6.27 because it patched right. But the results
> are
> not good. It took even more time to complete the test.
>
> Can anyone confirm this?
>
I can. Unfortunately, not only did the patch fail to reduce latency, but also reduces throughput. Even opening the file selection dialog to attach this patch took over 30 seconds while building a kernel.

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-01-17:

#216

Also, a patch set providing an ftrace interface to blktrace was recently submitted to the LKML (http://marc.info/?t=123212992300002&r=1&w=2). This could come in handy in further debugging.

Revision history for this message

In Linux Kernel Bug Tracker #12309, henkjans_bagger (henkjansbagger-linux-kernel-bugs) wrote on 2009-01-18:

#217

Just a comment that might have gone unnoticed, but to me appears relevant as this bug again appears to become a collection of multiple issues again as happened with #7372 making that the kernel-devs started to ignore it.

The bisect done by thomas.pi points yields a first bad commit dating from february 2007, while these symptoms first surfaced in 2.6.18, which dates from end 2006.

Bug #7372 basically is from before this first bad commit; the bisect I did in that bug for example pointed towards a problem with NCQ with the CFQ scheduler from November 2006 that clearly was only present for 64bit. See http://bugzilla.kernel.org/show_bug.cgi?id=7372#c112 as a reminder for this proof. I'm not sure that issue got resolved in the end.....no clear pointers on what I could do to help further.

Seeing reports in this bug reporting improvements when switching IO-scheduler and reports on differences between 32/64 bit makes me think those might be more related to that commit. Bottomline is to be sceptical with reports on whether or not a patch helps fully as to me it still appears to be multiple issues that have very similar but difficult to reliably trigger symptoms.

However the test-case of Thomas does bring my system to its knees as well, so definitely a good way to tackle at least part of the problem. But I don't think it is the only problem.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-18:

#218

No the patch does not fix the problem, but I think is now better than before.

I think, that it is a cpu scheduler problem. As one process with many threads and thread switching can nearby stop the execution of other processes. This problem exists in every kernel version, even 2.6.15. You can test it by executing the thread based with 2*100 threads.
My system starts to become unusable with the kernel 2.6.27 (Fedora 10) when executing the thread based test with 2*40-50 threads. I don't know how many interrupt occurs, while coping some data, but perhaps it is the commonness between copying files and the thread based test.

The provided bisect, points to a cpu scheduler performance regression, which make the problem more noticeable. The biggest cpu scheduler performance regression was in 2.6.24 - 2.6.27. There was another cpu scheduler performance regression between 2.6.22 and 2.6.24.

Revision history for this message

In Linux Kernel Bug Tracker #12309, larppaxyz (larppaxyz-linux-kernel-bugs) wrote on 2009-01-18:

#219

(In reply to comment #57)

> Seeing reports in this bug reporting improvements when switching IO-scheduler
> and reports on differences between 32/64 bit makes me think those might be
> more
> related to that commit.

Nobody confirms that changing io-scheduler or 32<->64bit improves system much?

People are also testing different things, some test disk i/o and others are running process/thread tests. It's very confusing and someone should run couple of identical tests (including disk i/o AND process/thread test) with different kernel options. On my setup, just disabing or enabling SMP support made HUGE difference.

I'm happy to do testing, but only if someone really needs information i can provide.

Again, my worthless 5 cents.. :)

Revision history for this message

In Linux Kernel Bug Tracker #12309, mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote on 2009-01-18:

#220

I just created a fio job file which acts like a "ls" executed while doing a large dd. It looks like the anticipatory I/O scheduler was causing those delays for me.

The results for the ls-like jobs are interesting :

I/O scheduler runt-min (msec) runt-max (msec)
noop 41 10563
anticipatory 63 8185
deadline 52 33387
cfq 43 1420

Is it me or all I/O schedulers except cfq generate unexpectedly high latency ?

Details here (including fio job file) :

http://lkml.org/lkml/2009/1/18/198

Mathieu

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-01-19:

#221

Actually, in this bug as well as in the other (7372), there is no clear direction. None of the kernel devs have taken a leadership role and directed the reporters in a direction where we can start to get a handle on things. What we see here is a lot of speculation on the part of the users and hence enormity of variety of things being tried. Its like everybody shooting in the dark.

Unless someone in the kernel team takes ownership of this bug, sorts out quarters from the pennies and directs users with a clear set of instructions to get well-defined data, I don't see this bug going anywhere.

The question is who has the know-how and willingness to do that? We see process as well as io scheduler being involved, we see vm having effect, we see some libata effects. With so many components in the line of fire and kernel being as vast as it is, I don't see above (one savior coming along and putting 2 & 2 together) happening.

IOW, take a beer and head away from the computer and into the sun....;-)

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-01-19:

#222

Created attachment 19894
Test job description for fio

Attaching the test case written by Mathieu Desnoyers and included in his earlier email

Revision history for this message

In Linux Kernel Bug Tracker #12309, bpenglase (bpenglase-linux-kernel-bugs) wrote on 2009-01-20:

#223

(In reply to comment #33)
>
> I did try the AS Scheduler, as that was the only thing I changed in my
> kernel,
> and it didn't change anything interactively, still get a high IO Wait.
>
> The other thing I noticed, at least when in AS, I start using Swap, it's not
> a
> lot (within about 2 minutes I was using 10MB), but it was still climbing.
>
> My work box (most heavily used):
> Linux PC010233L 2.6.28-zen1-2 #2 SMP PREEMPT Thu Jan 15 16:06:37 EST 2009
> x86_64 Intel(R) Core(TM)2 Duo CPU E8200 @ 2.66GHz GenuineIntel GNU/Linux
>

Ok, I tried playing a little bit more, and switching to the DeadLine scheduler really helped things. I have topped around 73% IOWait, but it never bogged the whole box down. I still need to do definitive testing (via tests already in the bug report), but this seems to have helped. Not sure which problem this relates to in this bug though; I'm guessing the scheduler one.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-20:

#224

Created attachment 19906
fio test results of kernel 2.6.15 - 2.6.24

I have executed the test case of Mathieu Desnoyers on some different kernel version. I took the bad and good kernels from my bisection. The results do not confirm my theory. If someone can identificate a problem in it, I can make some more tests.

The only regression I can seen is the regression with the noop scheduler. It is the average of the average latencies.

./test.2.6.15-53-amd64-genericresult.noop 700,62ms
./test.2.6.20-17-genericresult.noop 3520,24ms
./test.2.6.20result.noop 3005,24ms
./test.2.6.20badresult.noop 3698,64ms
./test.2.6.22.19result.noop 1393,67ms
./test.2.6.24.7result.noop 589,66ms

I will check, if the 2.6.24.7 kernel test build has a improved desktop responsiveness.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-20:

#225

There is no performance improve in 2.6.24.7. The list below shows the average times of the 41 small jobs with the cfq scheduler. I have the best desktop responsiveness on 2.6.20. Gimp start on heavy I/O in 10 seconds instead of 30 seconds. The freezes of the applications exists on 2.6.20, but they are much shorter, mostly under one second, while in kernels >= 2.6.22 there are freezes till one minute.

min maxa avg stdev
2.6.20-17-generic 9.9 126.00 49.97 59.89
2.6.20 8.66 115.05 39.68 50.41
2.6.22.19 10.34 195.29 66.88 96.07
2.6.24.7 9.93 185.02 64.38 89.95

The high I/O wait is at 75% on the start and climbs to 99-100% after ~5 seconds.

I have noticed, that the freezes occurs in all applications more often, when firefox is running. Currently I create a ram disk on startup, extract the .mozilla folder to it and save it again on shutdown. I makes my system more responsive, especially firefox3.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-20:

#226

Created attachment 19912
fio results for kernel 2.6.28

And finally the results for 2.6.28. I have removed all tracing stuff, I could find, but the system is still dull under heavy io.

min max avg stdev
2.6.28 noop 97,61 1799,06 654,84 861,90
2.6.28 cfq 9,32 169,32 55,59 79,50

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-21:

#227

Created attachment 19920
ext3 and ext4 comparison with patched and unpatched kernel

Here some more results. I could gain or loose some latency by different kernel settings. In 2.6.20 I could reproduceable loose 10ms, which makes a decrease of 25% of average latency. But it makes no difference in the desktop responsiveness.

I have tested the 2.6.28 kernel as patched ( http://bugzilla.kernel.org/attachment.cgi?id=19866 ) and unpatched kernel with ext3 and ext4 with exactly the same kernel settings. My test system is installed on a ext3 partition, the tests are executed on a extra ext3 or ext4 (on the slower one) partition on the same hard drive. The write performance on ext4 is now at 45MB/s instead of 35MB/s (ext3).

The destop responsiveness on the ext4 test with the patched version decreases extremely. While copying a 10gig file from ext4 to ext4, there is nerby no problems with the unpatched kernel. While using the patched kernel, the system becomes unuseable. With ext3 there is a little responsiveness improve with the patched kernel. But it can be coincidence, as I have no exact test for desktop responsiveness.

But while copying the 10gig file on ext4 and compiling the kernel, my system becomes unusable with the unpatched kernel too. There are freezes for >20 seconds, while access the menu in applications the first time.
You can easly simulate this behaviour by executing the following compression for every core.

bzip2 -9 -c /dev/urandom >/dev/null &

And the average latencies of the last four tests.

min maxa avg stdev
2.6.28 unpatched ext3 11.24 181.20 62.35 86.15
2.6.28 patched ext3 10.82 175.93 62.18 83.89
2.6.28 unpatched ext4 6.90 396.17 132.52 213.18
2.6.28 patched ext4 6.85 2078.93 707.26 1006.74

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-01-21:

#228

Forget the back merge patch.

Have you tried running latencytop to spot big sleep offenders?

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-21:

#229

Created attachment 19924
Latencytop results

(In reply to comment #68)
> Have you tried running latencytop to spot big sleep offenders?

I am not sure, what I shall look at. You can find in the file latencytop-ext4-2*bzip2.txt the most results.

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-01-21:

#230

Most of them look as expected, up to about 1 second latency for a single IO under load. latencytop-ext4-2*bzip2.txt looks pretty bad, though. It has a 10 second wait on a single lock_page(), that's pretty slow.

Again, this whole thread confuses me. The IO latencies from the fio jobs posted look OK, in the sense that they haven't regressed and that you can't expect zero latency when you are fully loading a disk with writes. So while we could do better there, it's not a catastrophy.

The bisect you originally did pointed to something interesting, I think. If we have clock problems, the CPU scheduler could easily delay a single process for large amounts of time if other processes are repeatedly ready to run.

Revision history for this message

In Linux Kernel Bug Tracker #12309, andi-bz (andi-bz-linux-kernel-bugs) wrote on 2009-01-21:

#231

The scheduler has normal special code to handle bad (like going backwards) clocks. Of course it has its limit, but it should handle the typical cases.
Of course it could confuse other subsystems. For testing you could force
another clock like clock=pmtmr or clock=hpet (if you have HPET)

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-01-21:

#232

It may be something as simple as a wrapped variable. IIRC, someone recently found something like that in the scheduler, though I can't find the posting just now. It was in kernel/sched_fair.c:update_curr() I think.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-21:

#233

My default clock source is hpet. It is faster, but I have long freezes. With acpi_pm the system is dull, but the freezes were allways below 5 seconds.

Test: copy 10gig file and execute "bzip2 -9 -c /dev/urandom >/dev/null" twice on core2duo.

hpet
1299.7 / 1651.3 / 39790.7 / 4580.1 / 943.9 / 2069.3 / 145.7 / 1739.2 / 691.4 / 2060.2 / 172.3 / 492.4 / 2286.4 / 3064.9 / 696.9 / 716.9 / 14096.2 / 3131.2 / 1640.2 /
min:145.7 ms|max:39790.7 ms|avg:4277.31

acpi_pm
1969 / 1276.8 / 658.8 / 16303.8 / 1604.3 / 3885.8 / 823.6 / 3659.1 / 2719.6 / 2064.2 / 672.9 / 1327.9 / 1783.9 / 604.3 / 1289 / 9535.1 / 1271.5 / 280.9 / 2621.8 / 759.1 /
min:280.9 ms|max:16303.8 ms|avg:2755.57

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-01-21:

#234

I'm not sure what my default clock source is (where does one look to determine this?), however I just booted with clock=hpet and things don't seem to be particularly better (50% IO wait time while evolution is starting, a process which takes over 5 minutes; this is with Jens' patch). These numbers are common with Jens' patch (which is a bit of an improvement, without the patch evolution pegs IO wait times at 70%+ and is very sluggish even after starting).

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-01-21:

#235

I just tried clock=acpi_pm and evolution startup performance seems no better. Tonight I'm going to try some quantitative benchmarks on these configurations so that legitimate comparisons can be made.

One thing that I have neglected to mention is that Jens' patch does seem to help overall system interactivity---an application with a high IO load doesn't degrade the latency of the entire system nearly as much---although I have no numbers to support this claim.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-21:

#236

On my computer on 2.6.20 kernel jiffies was the default scheduler. Since 2.6.22 hpet is. On my old notebook it is now acpi_pm. I don't known what it was before. With jiffies under 2.6.28, my system seems much better, although there are still some short freezes. It does not solve the problem, but makes it much better. Please try clocksource=jiffies .

You can check yout current clocksource with.
cat /sys/devices/system/clocksource/clocksource0/*

jiffies
645 / 598.3 / 462.5 / 1496.2 / 213.2 / 1353.1 / 6470.6 / 337.6 / 3406.9 / 2057.5 / 155.3 / 309 / 2332 / 463.1 / 1804.4 / 3258.6 / 261.7 / 8124.3 / 2373.2 / 2471.1
min:116.1 ms|max:8124.3 ms|avg:1843.32

The long values are freezes of firefox.
hpet 39790.7
acpi_pm 16303.8
jiffies 8124.3

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-01-21:

#237

(In reply to comment #76)
> The long values are freezes of firefox.

Do you mean startup time? or you click on a tab and it takes that long for it to switch?

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-01-21:

#238

Using the jiffies clocksource on linus's master causes the machine to wedge up on attempting the start Xorg. I'll have to look into it later.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-21:

#239

(In reply to comment #77)
> Do you mean startup time? or you click on a tab and it takes that long for it
> to switch?

It the longest time for switching or opening tabs during heavy io, and 2*bzip2 urandom.

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-01-21:

#240

> (WW) intel(0): No outputs definitely connected, trying again...
> (WW) intel(0): Unable to find initial modes
> (EE) intel(0): No valid modes.

no Xorg coming up with jiffies clocksource. takes the console with it. I have darkness on the screen...:) I can ssh into it, though.

some weird interaction between i915 and clocksource there.

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-01-21:

#241

echo hpet > current_clocksource

and things are back to normal.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-21:

#242

(In reply to comment #81)
> echo hpet > current_clocksource
>
> and things are back to normal.

I got a crash, while tried to set jiffies clocksource while linux was running.

There is now a improvement in the process and thread test with clock source jiffies. Here the result. The performance is nearby as in 2.6.20.

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-01-21:

#243

I booted up with clocksource=jiffies and lost Xorg and console. So, it wasn't set while running.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-21:

#244

(In reply to comment #83)
> I booted up with clocksource=jiffies and lost Xorg and console. So, it wasn't
> set while running.

Try to blacklist the thermal and the processor kernel module.

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2009-01-21:

#245

Hi

I have currently the following running.

2 x "bzip2 -9 -c /dev/urandom >/dev/null" since I have 2 cores
and one "dd if=/dev/zero of=test.10g bs=1M count=10000"

And only small lockups happenend during that time, which was about 9 minuttes
Bu small locoups I mean a couple of seconds.

After the dd-command had finished the lockups where still occuring but they
where generally much shorter.

For me it is definetly a fix.

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2009-01-21:

#246

Seems like is more complex. Only doing the dd-command halts my system in the same ways as earlier described in this bug. ~100% iowait etc. Adding a single bzip-command results in an iowait of around 40% and improved desktop reponse, and finally adding the second bzip-command results in 5% iowait and even better desktop response.

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-01-21:

#247

(In reply to comment #84)
> (In reply to comment #83)
> > I booted up with clocksource=jiffies and lost Xorg and console. So, it
> wasn't
> > set while running.
>
> Try to blacklist the thermal and the processor kernel module.
>

Wouldn't that throw everything cpufreq into a tizzy? Its a laptop, so losing cpufreq and other potential ACPI functions is a big loss. Let me know if I am wrong about this.

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-01-21:

#248

blacklisting processor and thermal didn't work either. I give up on jiffies...:-)

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-01-21:

#249

Well, looks like there's a good reason why machines hang with clock=jiffies. http://lkml.org/lkml/2009/1/21/402

Any ideas why those users whose machines didn't crash saw improvement? Does this suggest a scheduler issue?

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-01-21:

#250

> Well, looks like there's a good reason why machines hang with clock=jiffies.
> http://lkml.org/lkml/2009/1/21/402
>

This means I need to recompile kernel without high resolution timer and then pass clocksource=jiffies?

Do we have an explanation for why the freezing period reduced to half with acpi_pm and to a quarter with jiffies for Thomas? I would have thought faster timers will result in better behavior and it was a step in the future direction. But we seem to be going backwards.

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-01-21:

#251

(In reply to comment #90)
> This means I need to recompile kernel without high resolution timer and then
> pass clocksource=jiffies?
No, it shouldn't be possible to run the kernel using jiffies as a clocksource. The system's time source needs to have a sufficiently high resolution. Using a low resolution time source (like jiffies) can cause the kernel to hang.

>
> Do we have an explanation for why the freezing period reduced to half with
> acpi_pm and to a quarter with jiffies for Thomas? I would have thought faster
> timers will result in better behavior and it was a step in the future
> direction. But we seem to be going backwards.
It's far more complicated than that. If we have a timer wrapping around, it is entirely possible that a slower clock source would give you expected behavior whereas a higher resolution time source would fail. It completely depends upon the source of the freezes.

Jens, what do you think in light of this growing body of evidence pointing towards timer issues?

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-01-22:

#252

(In reply to comment #91)
Hmm, I think I was a little tired last night. To clarify, I guess you probably could recompile without CONFIG_HIGH_RES_TIMERS, however I'm not sure you'd want to. If I'm not mistaken, the no-tick kernel option is dependent on high-res timers, so you'd have to give that up.

Also, correction:
s/towards timer issues/towards timer-triggered-issues/

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-01-22:

#253

Has anyone run latency top yet?

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-22:

#254

These are the total values of latency top.

http://bugzilla.kernel.org/show_bug.cgi?id=12309#c73
http://bugzilla.kernel.org/show_bug.cgi?id=12309#c76

Currently my system crashes, while I am executing the copy and 2*bzip operation with jiffies. I will make some new measures, as soon my test system runs.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-23:

#255

Created attachment 19954
latencytop captures with clocksource jiffies and hpet

I was not able to execute the 2*bzip2 with jiffies any more. The system freezes for ever, while copying a file and ziping urandom. It happens in runlevel 1, 3 and 5, during cpu intensive tasks.

I have made an test with less cpu consumption. The test uses a script to have the same execution with different clocksources. It's copying a file and extract kernel source, build kernel and finally delete the kernel path. Concurrent the script started gimp, oowriter, firefox, htop, opens some web pages and a document.

Here the "Total:" time from the captures.

jiffies
min:0.1 ms|max:5442.1 ms|avg:213.2

hpet
min:0.0 ms|max:14777.7 ms|avg:403.71

The full capture without the escape sequences are added in the attachment. The escapes sequences are not correctly removed, but it's enough to see the necessary. I can provide the captures with the escape sequences too, if someone wants.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-23:

#256

Filtering 10% of the upper and lower times out, results in an average latency time of 1737.18ms for jiffies and 3164.72ms for hpet.

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-01-23:

#257

Basically all of them show waiting for an async page write to finish, and that can take quite a bit of time with heavy writing going on. First thing next week I'll try and provide a 'this async write now went sync' helper for the io scheduler, so that they can make sure it gets expedited as soon as the sync io is. This should drastically reduce latencies for this situation.

I'll probably be less than straight forward, but a test patch should be quite doable.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-23:

#258

That sounds good.

I have to correct the last values, as I was using the filter for capture logs with escape sequences.

jiffies:
min:0.1 ms|max:5442.1 ms|avg:834.12|avg80:2248.28

hpet:
min:0 ms|max:14777.7 ms|avg:1474.09|avg80:3638.15

Why are there such a big difference in the average latency with jiffies and hpet? The total latency of 80% of the recording is 2,2s with jiffies and 3,6s with hpet.

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-01-26:

#259

Created attachment 19996
Test patch for async page promotion

First attempt at doing sync promotion of async page waiting. It actually booted, however I haven't done any sort of testing with it yet.

Note that this will only work with CFQ currently.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-26:

#260

Created attachment 19997
latencytop captures with clocksource hpet and patched kernel

Same test, with patched kernel and hpet as clocksource.

hpet:
min:10.1 ms|max:11733 ms|avg:3096.22|avg80:4082.79

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-01-26:

#261

One observation is that ext4 seems quite latency prone in waiting for write access to the journal. IIRC, that matches earlier results where ext3 was much quicker in that area. No idea what causes this, as I'm not familiar with the ext4 internals.

Another observation is that I neglected to include the buffer waiting in the async promotion, it only worked for page locking. I'll add an updated patch below after this posting.

And finally, lots of time is spent waiting for a new write request in the block layer. So you are maxing all 128 requests out in this test case. You can try and increase that to 512 for testing purposes, you can do that ala:

# echo 512 > /sys/block/sda/queue/nr_requests

That will get your async wait numbers down, but it may not reduce your latencies. Fact is that 128 writes is already a lot, and with more requests in the queue, you will have higher completion times for each individual request.

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-01-26:

#262

Created attachment 19998
Test patch for async page promotion v2

Revision history for this message

In Linux Kernel Bug Tracker #12309, mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote on 2009-01-26:

#263

Download full text (3.5 KiB)

Attachement

http://bugzilla.kernel.org/attachment.cgi?id=19998&action=view

Causes the following OOPS as soon as stress-testing starts. Is it possible that bdi->unplug_io_data can be NULL in blk_backing_dev_wop ? Should we simply discard those ?

[ 138.345195] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 138.346301] IP: [<ffffffff803f997d>] elv_wait_on_page+0xd/0x20
[ 138.346301] PGD 434c05067 PUD 434c06067 PMD 0
[ 138.346301] Oops: 0000 [#1] PREEMPT SMP
[ 138.346301] LTT NESTING LEVEL : 0
[ 138.346301] last sysfs file: /sys/block/md1/md/raid_disks
[ 138.346301] Dumping ftrace buffer:
[ 138.346301] (ftrace buffer empty)
[ 138.346301] CPU 3
[ 138.346301] Modules linked in: e1000e loop ltt_tracer ltt_trace_control ltt_type_serializer ltte
[ 138.346301] Pid: 1272, comm: kjournald Not tainted 2.6.28.1 #69
[ 138.346301] RIP: 0010:[<ffffffff803f997d>] [<ffffffff803f997d>] elv_wait_on_page+0xd/0x20
[ 138.346301] RSP: 0018:ffff88043cc19cd0 EFLAGS: 00010286
[ 138.346301] RAX: 0000000000000000 RBX: ffff88043f460938 RCX: 0000000000000000
[ 138.346301] RDX: ffff880438490000 RSI: ffffe200193f0bc0 RDI: ffff88043e580a40
[ 138.346301] RBP: ffff88043cc19cd0 R08: ffff88043d09de78 R09: 0000000000000001
[ 138.346301] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88043cc19d50
[ 138.346301] R13: ffff88043cc19d60 R14: 0000000000000002 R15: ffff8800280590c8
[ 138.346301] FS: 0000000000000000(0000) GS:ffff88043f804d00(0000) knlGS:0000000000000000
[ 138.346301] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 138.346301] CR2: 0000000000000000 CR3: 0000000434817000 CR4: 00000000000006e0
[ 138.346301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 138.346301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 138.346301] Process kjournald (pid: 1272, threadinfo ffff88043cc18000, task ffff88043d09d8c0)
[ 138.346301] Stack:
[ 138.346301] ffff88043cc19ce0 ffffffff803fd2a2 ffff88043cc19d00 ffffffff802f6762
[ 138.346301] ffff88043cc19d60 0000000000000000 ffff88043cc19d40 ffffffff8067ace2
[ 138.346301] ffffffff802f6710 ffff880438490000 0000000000000002 0000000000000002
[ 138.346301] Call Trace:
[ 138.346301] [<ffffffff803fd2a2>] blk_backing_dev_wop+0x12/0x20
[ 138.346301] [<ffffffff802f6762>] sync_buffer+0x52/0x80
[ 138.346301] [<ffffffff8067ace2>] __wait_on_bit+0x62/0x90
[ 138.346301] [<ffffffff802f6710>] ? sync_buffer+0x0/0x80
[ 138.346301] [<ffffffff802f6710>] ? sync_buffer+0x0/0x80
[ 138.346301] [<ffffffff8067ad89>] out_of_line_wait_on_bit+0x79/0x90
[ 138.346301] [<ffffffff802566f0>] ? wake_bit_function+0x0/0x50
[ 138.346301] [<ffffffff802f6649>] __wait_on_buffer+0xf9/0x130
[ 138.346301] [<ffffffff8036c0c5>] journal_commit_transaction+0x7d5/0x1540
[ 138.346301] [<ffffffff80265991>] ? trace_hardirqs_on_caller+0x1b1/0x210
[ 138.346301] [<ffffffff8067d457>] ? _spin_unlock_irqrestore+0x47/0x80
[ 138.346301] [<ffffffff80249cef>] ? try_to_del_timer_sync+0x5f/0x70
[ 138.346301] [<ffffffff803708c8>] kjournald+0xe8/0x250
[ 138.346301] [<ffffffff802566b0>] ? autoremove_wake_function+0x0/0x40
[ 138.346301] [<ffffffff803707e0>] ? kjourna...

Attachement

http://bugzilla.kernel.org/attachment.cgi?id=19998&action=view

Causes the following OOPS as soon as stress-testing starts. Is it possible that bdi->unplug_io_data can be NULL in blk_backing_dev_wop ? Should we simply discard those ?

[  138.345195] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  138.346301] IP: [<ffffffff803f997d>] elv_wait_on_page+0xd/0x20
[  138.346301] PGD 434c05067 PUD 434c06067 PMD 0
[  138.346301] Oops: 0000 [#1] PREEMPT SMP
[  138.346301] LTT NESTING LEVEL : 0
[  138.346301] last sysfs file: /sys/block/md1/md/raid_disks
[  138.346301] Dumping ftrace buffer:
[  138.346301]    (ftrace buffer empty)
[  138.346301] CPU 3
[  138.346301] Modules linked in: e1000e loop ltt_tracer ltt_trace_control ltt_type_serializer ltte
[  138.346301] Pid: 1272, comm: kjournald Not tainted 2.6.28.1 #69
[  138.346301] RIP: 0010:[<ffffffff803f997d>]  [<ffffffff803f997d>] elv_wait_on_page+0xd/0x20
[  138.346301] RSP: 0018:ffff88043cc19cd0  EFLAGS: 00010286
[  138.346301] RAX: 0000000000000000 RBX: ffff88043f460938 RCX: 0000000000000000
[  138.346301] RDX: ffff880438490000 RSI: ffffe200193f0bc0 RDI: ffff88043e580a40
[  138.346301] RBP: ffff88043cc19cd0 R08: ffff88043d09de78 R09: 0000000000000001
[  138.346301] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88043cc19d50
[  138.346301] R13: ffff88043cc19d60 R14: 0000000000000002 R15: ffff8800280590c8
[  138.346301] FS:  0000000000000000(0000) GS:ffff88043f804d00(0000) knlGS:0000000000000000
[  138.346301] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[  138.346301] CR2: 0000000000000000 CR3: 0000000434817000 CR4: 00000000000006e0
[  138.346301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  138.346301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  138.346301] Process kjournald (pid: 1272, threadinfo ffff88043cc18000, task ffff88043d09d8c0)
[  138.346301] Stack:
[  138.346301]  ffff88043cc19ce0 ffffffff803fd2a2 ffff88043cc19d00 ffffffff802f6762
[  138.346301]  ffff88043cc19d60 0000000000000000 ffff88043cc19d40 ffffffff8067ace2
[  138.346301]  ffffffff802f6710 ffff880438490000 0000000000000002 0000000000000002
[  138.346301] Call Trace:
[  138.346301]  [<ffffffff803fd2a2>] blk_backing_dev_wop+0x12/0x20
[  138.346301]  [<ffffffff802f6762>] sync_buffer+0x52/0x80
[  138.346301]  [<ffffffff8067ace2>] __wait_on_bit+0x62/0x90
[  138.346301]  [<ffffffff802f6710>] ? sync_buffer+0x0/0x80
[  138.346301]  [<ffffffff802f6710>] ? sync_buffer+0x0/0x80
[  138.346301]  [<ffffffff8067ad89>] out_of_line_wait_on_bit+0x79/0x90
[  138.346301]  [<ffffffff802566f0>] ? wake_bit_function+0x0/0x50
[  138.346301]  [<ffffffff802f6649>] __wait_on_buffer+0xf9/0x130
[  138.346301]  [<ffffffff8036c0c5>] journal_commit_transaction+0x7d5/0x1540
[  138.346301]  [<ffffffff80265991>] ? trace_hardirqs_on_caller+0x1b1/0x210
[  138.346301]  [<ffffffff8067d457>] ? _spin_unlock_irqrestore+0x47/0x80
[  138.346301]  [<ffffffff80249cef>] ? try_to_del_timer_sync+0x5f/0x70
[  138.346301]  [<ffffffff803708c8>] kjournald+0xe8/0x250
[  138.346301]  [<ffffffff802566b0>] ? autoremove_wake_function+0x0/0x40
[  138.346301]  [<ffffffff803707e0>] ? kjournald+0x0/0x250
[  138.346301]  [<ffffffff802561de>] kthread+0x4e/0x90
[  138.346301]  [<ffffffff80256190>] ? kthread+0x0/0x90
[  138.346301]  [<ffffffff8020d8d9>] child_rip+0xa/0x11
[  138.346301]  [<ffffffff8020cd58>] ? restore_args+0x0/0x30
[  138.346301]  [<ffffffff80256190>] ? kthread+0x0/0x90
[  138.346301]  [<ffffffff8020d8cf>] ? child_rip+0x0/0x11

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-01-26:

#264

Yes that's expected, I didn't fixup the non-request_fn based drivers. It's trickier to do for dm/md, since you need to know where that page went. Or you can just cycle all the bottom backing_dev_info's like it's done for unplug. I'll be back at the machine in an hour or two, I'll update the patch for dm/md.

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-01-26:

#265

Created attachment 20001
Test patch for async page promotion v2

Adds support for raid0/1/10/5 and should not oops on dm (just not work as intended, it'll do nothing).

There's still the debug printk in there that notifies you of when something has happened, ala:

$ dmesg | tail
cfq: moving e4a348d4 to dispatch
cfq: moving e49dede4 to dispatch
cfq: moving f687d8d4 to dispatch

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-01-26:

#266

Another question - are people using CONFIG_NO_HZ or not?

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-01-26:

#267

(In reply to comment #106)
> Another question - are people using CONFIG_NO_HZ or not?

Yes, I am.

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-01-26:

#268

(In reply to comment #106)
> Another question - are people using CONFIG_NO_HZ or not?
>
As am I

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-26:

#269

(In reply to comment #106)
Me currently too.

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-01-27:

#270

So my next question would be if disabling that option makes any difference?

Revision history for this message

In Linux Kernel Bug Tracker #12309, toby (toby-linux-kernel-bugs) wrote on 2009-01-27:

#271

We are not using CONFIG_NO_HZ and get high latency (subjective) while running:

dd if=/dev/zero of=file bs=1M count=2048

Additionally, all 8 core cores go to at least 50% iowait, several peg at ~=95%.

We see similar results with:

2.6.18, 2.6.24, deadline, cfq.

Revision history for this message

In Linux Kernel Bug Tracker #12309, ozan (ozan-linux-kernel-bugs) wrote on 2009-01-28:

#272

Created attachment 20024
2.6.25.20 fio test with NOHZ disabled

Revision history for this message

In Linux Kernel Bug Tracker #12309, ozan (ozan-linux-kernel-bugs) wrote on 2009-01-28:

#273

Created attachment 20025
2.6.25.20 fio test with NOHZ enabled

Revision history for this message

In Linux Kernel Bug Tracker #12309, ozan (ozan-linux-kernel-bugs) wrote on 2009-01-28:

#274

What is the preferred way of testing different kernels against this bug?

I've done the fio test of Mathieu but I'm not sure if it gives detailed clue about the problem. I've attached the results.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-28:

#275

Created attachment 20026
latencytop captures with clocksource hpet with nohz and no high resolution timer

hpet - no hz - no high resolution timer
min:0 ms|max:10888.7 ms|avg:1311.17
hpet - no hz
min:2 ms|max:16980.9 ms|avg:1513.26

Same settings as in
http://bugzilla.kernel.org/attachment.cgi?id=19954&action=view
hpet
min:0 ms|max:14777.7 ms|avg:1474.09

jiffies
min:0.1 ms|max:5442.1 ms|avg:834.12

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-28:

#276

Created attachment 20027
latencytop captures + fio results amd64

I have run the fio job on a different machine on two different discs. While running the fio job, I have captured the latency with latencytop. Each test was executed twice. Once with 2*bzip-urandom and the other without cpu consumption.
You can find the test results for every io scheduler in the archive.

100MB/s disk + 2bzip (2009-01-27.0847-2.6.28.2-acpi_pm)
100MB/s disk (2009-01-27.0908-2.6.28.2-acpi_pm)
40MB/s disk + 2bzip (2009-01-27.0934-2.6.28.2-acpi_pm)
40MB/s disk (2009-01-27.1029-2.6.28.2-acpi_pm)

fio results - cfq
mint 25msec | maxt 1669msec
mint 23msec | maxt 1596msec
mint 77msec | maxt 2370msec
mint 106msec | maxt 738msec

Revision history for this message

In Linux Kernel Bug Tracker #12309, ozan (ozan-linux-kernel-bugs) wrote on 2009-01-28:

#277

// Adding myself to CC

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-28:

#278

(In reply to comment #101)
> One observation is that ext4 seems quite latency prone in waiting for write
> access to the journal. IIRC, that matches earlier results where ext3 was much
> quicker in that area. No idea what causes this, as I'm not familiar with the
> ext4 internals.

It is possible, that the reduced latency on ext4 is a result of the increased write speed, which is nearby doubled. You can see in the result posted before (comment #116), a reduction on ext3 partitions with different hard drives.

Revision history for this message

In Linux Kernel Bug Tracker #12309, bart (bart-linux-kernel-bugs) wrote on 2009-01-28:

#279

I have really noticed this lately.

I replace a old server running and older kernel. The replacement hardware was by orders of magnitude more powerful. The I/O system in the old machine was a 4 disk hardware RAID 5 on 64 bit PCI with the very first SATA 10,000RPM WD Raptors (WD740-00FLA). The new machine has and 8 disk hardware RAID 5 using the new 300gig 10,000rpm Velociraptor SATA drives on PCI-Express. The old machine had a Pentium 4 HT CPU. The new machine has a 4 core Core 2 CPU. All high end gear.

The new machine does get far better disk through put, however on the workloads the latencies seem far higher, the interactvity of the machine is poor and all CPU core show high I/O waits.

This machine serves a application that run from Samba shares to 15 or so Windows workstations. This involved lots of file activity on large flat file database files. Some of the files are up to 4GB in size.

The old server was very busy however not a huge amounts of I/O wait was seen. On the new server using a 2.6.18 kernel on an enterprise distro the I/O waits are heaps higher. Especially noticed at backup times. Users of the system have noticed the extra latencies when the system is busy and at these time the I/O waits are high.

The server feels slower than the old machine and this should not be so.

Just thought I would let you know this info as it seems a hard to quantify this to real world.

Revision history for this message

In Linux Kernel Bug Tracker #12309, simon+kernelbugzilla (simon+kernelbugzilla-linux-kernel-bugs) wrote on 2009-01-28:

#280

Just wanted to add a couple of links to places where some additional real world experience is related, for whatever they might be worth.

http://forums.storagereview.net/index.php?s=121e3f0d26cbd551c84271019f82f6d3&showtopic=25923&st=0

http://community.novacaster.com/showarticle.pl?id=7395&n=8001

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-30:

#281

(In reply to comment #105)

I have tried the patch with 2.6.28.2 and 2.6.29-rc3 and always get a crash, when io start. Sometimes even after the X-server has started.

kernel 2.6.29-rc3 at

cfq_remove_request 0xe3/0x251

0xffffffff811ca8fc is in cfq_remove_request (block/cfq-iosched.c:650).
645 {
646 struct cfq_queue *cfqq = RQ_CFQQ(rq);
647 struct cfq_data *cfqd = cfqq->cfqd;
648 const int sync = rq_is_sync(rq);
649
650 BUG_ON(!cfqq->queued[sync]);
651 cfqq->queued[sync]--;
652
653 elv_rb_del(&cfqq->sort_list, rq);
654

kernel 2.6.28.2
elv_rb_del+0x21/0x4b

394 }
395 EXPORT_SYMBOL(elv_rb_add);
396
397 void elv_rb_del(struct rb_root *root, struct request *rq)
398 {
399 BUG_ON(RB_EMPTY_NODE(&rq->rb_node));
400 rb_erase(&rq->rb_node, root);
401 RB_CLEAR_NODE(&rq->rb_node);
402 }
403 EXPORT_SYMBOL(elv_rb_del);

Revision history for this message

In Linux Kernel Bug Tracker #12309, michiel (michiel-linux-kernel-bugs) wrote on 2009-01-30:

#282

Could this be the same bug as: http://lkml.org/lkml/2008/6/15/163 ?

Because on the same system on which I have the same sympthones on what
his bug describes, also the following happens:
http://beheer.eduwijs.nl/kernellog-brikama.log

I need to say that changing the IO scheduler from CFQ to AS seems to
help a bit. It will not solve the problem, but the system will be much
more responsive.

System information:
IO Scheduler: AS (default is CFQ, using elevator=as)
Timer: hpet
CONFIG_NO_HZ=y
Kernel: Linux brikama 2.6.27-9-generic #1 SMP x86_64 GNU/Linux
Distro: Ubuntu 8.10 Intrepid amd64
CPU: Intel(R) Core(TM)2 CPU E8400 @ 3.00GHz (2 cores)
Memory: 4GB
Using LVM: yes
Using LVM encryption: no
LVM version:
        LVM version: 2.02.39 (2008-06-27)
        Library version: 1.02.27 (2008-06-25)
        Driver version: 4.14.0
Using DM: yes

HDD:

/dev/sda:

Model=WDC WD5000AACS-00G8B1 , FwRev=05.04C05,
SerialNo= WD-WCAUF0869014
Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=50
BuffType=unknown, BuffSize=16384kB, MaxMultSect=16, MultSect=?0?
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=976773168
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
AdvancedPM=no WriteCache=enabled
Drive conforms to: Unspecified: ATA/ATAPI-1,2,3,4,5,6,7

Revision history for this message

In Linux Kernel Bug Tracker #12309, tchiwam (tchiwam-linux-kernel-bugs) wrote on 2009-02-01:

#283

Anyone here managed to reproduce this problem on an AMD platform ? Because I can't seem to be able to reproduce it. But both of 965GM and 945GM chipset motherboard have the problem with the T7600 and T9500 cpu. My old celeron has the same problem but it doesn't feel like freezing so much.

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-02-01:

#284

(In reply to comment #123)
> Anyone here managed to reproduce this problem on an AMD platform ? Because I
> can't seem to be able to reproduce it. But both of 965GM and 945GM chipset
> motherboard have the problem with the T7600 and T9500 cpu. My old celeron has
> the same problem but it doesn't feel like freezing so much.
>

AMD on nForce4 here running x86_64. Look over at gentoo forums, there is a long thread. And almost all of the people experiencing the problem there are on amd.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-02-02:

#285

The problem exists on an AMD platform too, but not as bad as on a Intel platform. By changing the clocksource to the acpi_pm, you can recuce the problem a bit on a Intel platform, but the system feels a little bit slower.

Using ext4 reduces the problem enormous. Even firefox is usable while eclipse is indexing the kernel build tree.The problem still exists on heavy io.

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-02-02:

#286

Sounds like the infamous ext3 fsync() issue is also a factor. Can you try mounting ext3 with -o data=writeback and see if that makes ext3 behave better?

Revision history for this message

In Linux Kernel Bug Tracker #12309, harrisonmetz (harrisonmetz-linux-kernel-bugs) wrote on 2009-02-02:

#287

On my machine (nForce 5, AMD Phenom II 940) I also experience huge slowdowns when performing I/O. For example, using Ben's:
dd if=/dev/zero of=/tmp/test bs=1M count=1M
test, it takes me about 40 secs to spawn a shell (15 secs for konsole to open a new tab, and about 25 secs for the shell to actually spawn). This was conducted on my a HD with ext4. Turning off swap helps a lot with launching a shell.

On a more substantial note, I use Unison to sync files between various places and when it is running my system is hardly responsive. This happens to me on ext4, ext3, and ReiserFS.

Changing the schedule to noop, the dd-and-open-shell test is very responsive (with both swapon or off), but any substantial usage, such as using firefox is still slow, just as it is above with cfq.

If I can free up some space and one of my partitions, I'm going to install some distro pre 2.6.18 and "feel" what the performance is like.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-02-02:

#288

I think the appearance of this bug is conditioned on cpu speed and drive speed.

I have make some more tests. Currently I am using the following command.

for i in 1 2 ; do \
dd if=/dev/zero of=test-$i bs=1M count=4K oflag=direct & echo test-$i; \
done

Once with oflag=direct and once without.

With ext3 the problem occurs immediately in both cases. With ext4 the problem occurs without oflag=direct occurs immediately. With with oflag=direct I can use even firefox, but sometime the desktop in unusable.

In direct mode new application does not start and disk intensive operations take a long time, but I can move the windows and change the desktops without problems and io-wait at 60%. With dd in non direct mode, I can start new application (it takes still a lot of time), but everything is freezing from time to time and io-wait is immediately at 100%.

I have captures some statistic by adding a printk with the duration time in the function __make_request (blk-core.c). The time is taken directly before and after the spin_lock_irq(q->queue_lock); and finally before the unlock.

There is a dramatic difference between the request per seconds in direct and non-direct mode.
W: wait time before entering lock state
D: duration time of the make_request
T: total time = W + D

ext3 - direct
requests: 209.694080/s
total: W: 0.000645 / D: 0.014584 / T: 0.015229
W: avg: 0.000000307 / min: 0.000000000 / max: 0.000007606
D: avg: 0.000006948 / min: 0.000000255 / max: 0.000085018
T: avg: 0.000007255 / min: 0.000000365 / max: 0.000085018
4294967296 Bytes (4,3 GB) kopiert, 203,66 s, 21,1 MB/s
4294967296 Bytes (4,3 GB) kopiert, 203,582 s, 21,1 MB/s

ext3
requests: 4662.272968/s
total: W: 0.013624 / D: 15.256149 / T: 15.269773
W: avg: 0.000000291 / min: 0.000000000 / max: 0.000275893
D: avg: 0.000325819 / min: 0.000000000 / max: 1.092940760
T: avg: 0.000326110 / min: 0.000000000 / max: 1.092940920
4294967296 Bytes (4,3 GB) kopiert, 203,559 s, 21,1 MB/s
4294967296 Bytes (4,3 GB) kopiert, 214,995 s, 20,0 MB/s

ext4 - direct
requests: 114.510132/s
total: W: 0.000356 / D: 0.017658 / T: 0.018014
W: avg: 0.000000311 / min: 0.000000110 / max: 0.000000630
D: avg: 0.000015408 / min: 0.000000220 / max: 0.000127249
T: avg: 0.000015719 / min: 0.000000330 / max: 0.000127689
4294967296 Bytes (4,3 GB) kopiert, 154,491 s, 27,8 MB/s
4294967296 Bytes (4,3 GB) kopiert, 157,853 s, 27,2 MB/s

ext4
requests: 7009.744726/s
total: W: 0.018928 / D: 6.110891 / T: 6.129819
W: avg: 0.000000270 / min: 0.000000000 / max: 0.000032916
D: avg: 0.000087046 / min: 0.000000000 / max: 0.603327176
T: avg: 0.000087316 / min: 0.000000000 / max: 0.603327516
4294967296 Bytes (4,3 GB) kopiert, 146,303 s, 29,4 MB/s
4294967296 Bytes (4,3 GB) kopiert, 149,361 s, 28,8 MB/s

I think the appearance of this bug is conditioned on cpu speed and drive speed.

I have make some more tests. Currently I am using the following command.

for i in  1 2 ; do \
 dd if=/dev/zero of=test-$i bs=1M count=4K oflag=direct & echo test-$i; \
done

Once with oflag=direct and once without.

With ext3 the problem occurs immediately in both cases. With ext4 the problem occurs without oflag=direct occurs immediately. With with oflag=direct I can use even firefox, but sometime the desktop in unusable.

In direct mode new application does not start and disk intensive operations take a long time, but I can move the windows and change the desktops without problems and io-wait at 60%. With dd in non direct mode, I can start new application (it takes still a lot of time), but everything is freezing from time to time and io-wait is immediately at 100%.

I have captures some statistic by adding a printk with the duration time in the function __make_request (blk-core.c). The time is taken directly before and after the spin_lock_irq(q->queue_lock); and finally before the unlock.

There is a dramatic difference between the request per seconds in direct and non-direct mode. 
W: wait time before entering lock state
D: duration time of the make_request
T: total time = W + D

ext3 - direct
requests: 209.694080/s
total: W: 0.000645 / D: 0.014584 / T: 0.015229
W: avg: 0.000000307 / min: 0.000000000 / max: 0.000007606 
D: avg: 0.000006948 / min: 0.000000255 / max: 0.000085018
T: avg: 0.000007255 / min: 0.000000365 / max: 0.000085018
4294967296 Bytes (4,3 GB) kopiert, 203,66 s, 21,1 MB/s
4294967296 Bytes (4,3 GB) kopiert, 203,582 s, 21,1 MB/s

ext3
requests: 4662.272968/s
total: W: 0.013624 / D: 15.256149 / T: 15.269773 
W: avg: 0.000000291 / min: 0.000000000 / max: 0.000275893
D: avg: 0.000325819 / min: 0.000000000 / max: 1.092940760
T: avg: 0.000326110 / min: 0.000000000 / max: 1.092940920
4294967296 Bytes (4,3 GB) kopiert, 203,559 s, 21,1 MB/s
4294967296 Bytes (4,3 GB) kopiert, 214,995 s, 20,0 MB/s

ext4 - direct
requests: 114.510132/s
total: W: 0.000356 / D: 0.017658 / T: 0.018014 
W: avg: 0.000000311 / min: 0.000000110 / max: 0.000000630
D: avg: 0.000015408 / min: 0.000000220 / max: 0.000127249
T: avg: 0.000015719 / min: 0.000000330 / max: 0.000127689
4294967296 Bytes (4,3 GB) kopiert, 154,491 s, 27,8 MB/s
4294967296 Bytes (4,3 GB) kopiert, 157,853 s, 27,2 MB/s

ext4 
requests: 7009.744726/s
total: W: 0.018928 / D: 6.110891 / T: 6.129819
W: avg: 0.000000270 / min: 0.000000000 / max: 0.000032916
D: avg: 0.000087046 / min: 0.000000000 / max: 0.603327176
T: avg: 0.000087316 / min: 0.000000000 / max: 0.603327516
4294967296 Bytes (4,3 GB) kopiert, 146,303 s, 29,4 MB/s
4294967296 Bytes (4,3 GB) kopiert, 149,361 s, 28,8 MB/s

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-02-02:

#289

And some test results with clocksource=jiffies instead of hpet (non-direct only), which runs much better on my machine. The total times are added in an interval of 10s. The 15s with ext3 above should come from the two cores.

ext3
total: W: 0.018617 / D: total: 3.714917 / T: total: 3.733534
requests: 4050.191168/s
W: avg: 0.000000459 / min: 0.000000000 / max: 0.000048408
D: avg: 0.000091496 / min: 0.000000000 / max: 0.615268038
T: avg: 0.000091954 / min: 0.000000000 / max: 0.615268379
4294967296 Bytes (4,3 GB) kopiert, 213,215 s, 20,1 MB/s
4294967296 Bytes (4,3 GB) kopiert, 222,198 s, 19,3 MB/s

ext4
total: W: 0.026263 / D: 3.681891 / T: 3.708154
requests: 6006.413044/s
W: avg: 0.000000431 / min: 0.000000000 / max: 0.001003075
D: avg: 0.000060427 / min: 0.000000000 / max: 0.344179020
T: avg: 0.000060858 / min: 0.000000000 / max: 0.344179370
4294967296 Bytes (4,3 GB) kopiert, 147,343 s, 29,1 MB/s
4294967296 Bytes (4,3 GB) kopiert, 146,386 s, 29,3 MB/s

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-02-02:

#290

Can you try with this simple patch applied?

diff --git a/block/blk.h b/block/blk.h
index 6e1ed40..a145c3a 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -5,7 +5,7 @@
#define BLK_BATCH_TIME (HZ/50UL)

/* Number of requests a "batching" process may submit */
-#define BLK_BATCH_REQ 32
+#define BLK_BATCH_REQ 1

extern struct kmem_cache *blk_requestq_cachep;
extern struct kobj_type blk_queue_ktype;

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-02-02:

#291

I would say there are no changes. Perhaps a little bit worse.
There are still freezes with non-direct write access, e.g. while painting circles in gimp.
No freezes with direct-io, but high lattency with concurrent disk access (as before).

ext3 - direct
requests: 205.795295/s
total: W: 0.000616 / D:: 0.011195 / T: 0.011811
W: avg: 0.000000299 / min: 0.000000000 / max: 0.000007085
D: avg: 0.000005434 / min: 0.000000000 / max: 0.000100447
T: avg: 0.000005733 / min: 0.000000000 / max: 0.000100958
4294967296 Bytes (4,3 GB) kopiert, 210,281 s, 20,4 MB/s
4294967296 Bytes (4,3 GB) kopiert, 210,525 s, 20,4 MB/s

ext3
requests: 4960.868922
total: W: 0.032503 / D: 21.032077 / T: 21.064580
W: avg: 0.000000655 / min: 0.000000000 / max: 0.000069624
D: avg: 0.000423863 / min: 0.000000000 / max: 0.415194973
T: avg: 0.000424518 / min: 0.000000000 / max: 0.415195303

requests: 3588.105593/s
total: W: 0.014912 / D: 10.578434 / T: 10.593346
W: avg: 0.000000415 / min: 0.000000000 / max: 0.000077581
D: avg: 0.000294754 / min: 0.000000000 / max: 0.447073476
T: avg: 0.000295170 / min: 0.000000000 / max: 0.447073806

4294967296 Bytes (4,3 GB) kopiert, 218,708 s, 19,6 MB/s
4294967296 Bytes (4,3 GB) kopiert, 228,355 s, 18,8 MB/s

ext4 - direct
requests: 115.981745/s
total: W: 0.000344 / D: 0.016716 / T: 0.017061
W: avg: 0.000000297 / min: 0.000000110 / max: 0.000025846
D: avg: 0.000014398 / min: 0.000000650 / max: 0.000075554
T: avg: 0.000014695 / min: 0.000000990 / max: 0.000076195
4294967296 Bytes (4,3 GB) kopiert, 156,476 s, 27,4 MB/s
4294967296 Bytes (4,3 GB) kopiert, 157,78 s, 27,2 MB/s

ext4
requests: 7556.114616/s
total: W: 0.029942 / D: 9.424271 / T: 9.454213
W: avg: 0.000000396 / min: 0.000000000 / max: 0.000127857
D: avg: 0.000124722 / min: 0.000000000 / max: 0.046151790
T: avg: 0.000125119 / min: 0.000000000 / max: 0.046152130
4294967296 Bytes (4,3 GB) kopiert, 147,553 s, 29,1 MB/s
4294967296 Bytes (4,3 GB) kopiert, 151,226 s, 28,4 MB/s

Revision history for this message

In Linux Kernel Bug Tracker #12309, mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote on 2009-02-02:

#292

Download full text (11.5 KiB)

(In reply to comment #130)
> Can you try with this simple patch applied?
>
> diff --git a/block/blk.h b/block/blk.h
> index 6e1ed40..a145c3a 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -5,7 +5,7 @@
> #define BLK_BATCH_TIME (HZ/50UL)
>
> /* Number of requests a "batching" process may submit */
> -#define BLK_BATCH_REQ 32
> +#define BLK_BATCH_REQ 1
>
> extern struct kmem_cache *blk_requestq_cachep;
> extern struct kobj_type blk_queue_ktype;
>

Hi Jens,

I tried it on a 2.6.29-rc3 kernel. It made things worse for "default" config, but did help with config1.
(fio "ssh" test bench)
(config1 : quantum=1, slice_async_rq=1, queue_depth=1)

max runt 2.6.29-rc3 default no patch 14247msec
max runt 2.6.29-rc3 default patch 30833msec

max runt 2.6.29-rc3 config1 no patch 7574msec
max runt 2.6.29-rc3 config1 patch 6585msec

Note that the results seems to indicate that the larger run times occur near the "write" job. The listings below show the runtime of the jobs (1 large write and many 2M reads executed at regular interval for most of the load, and ending with more randomly delayed jobs) in the order they were run. Note that all the read jobs are started at a 4s interval, except the last 2 jobs which are started after 50s for the 1st one, and after another 10s for the last one.

Here is the listing of the 2.6.29-rc3 default no patch

  write: io=10240MiB, bw=56062KiB/s, iops=53, runt=191526msec
  read : io=2052KiB, bw=3411KiB/s, iops=141, runt= 616msec
  read : io=2084KiB, bw=409KiB/s, iops=16, runt= 5215msec
  read : io=2060KiB, bw=349KiB/s, iops=15, runt= 6031msec
  read : io=2060KiB, bw=445KiB/s, iops=17, runt= 4731msec
  read : io=2068KiB, bw=377KiB/s, iops=14, runt= 5606msec
  read : io=2084KiB, bw=558KiB/s, iops=23, runt= 3824msec
  read : io=2056KiB, bw=398KiB/s, iops=15, runt= 5279msec
  read : io=2048KiB, bw=328KiB/s, iops=13, runt= 6393msec
  read : io=2056KiB, bw=337KiB/s, iops=12, runt= 6236msec
  read : io=2072KiB, bw=596KiB/s, iops=23, runt= 3558msec
  read : io=2068KiB, bw=448KiB/s, iops=17, runt= 4723msec
  read : io=2052KiB, bw=342KiB/s, iops=14, runt= 6143msec
  read : io=2056KiB, bw=448KiB/s, iops=19, runt= 4695msec
  read : io=2060KiB, bw=362KiB/s, iops=14, runt= 5814msec
  read : io=2072KiB, bw=1202KiB/s, iops=44, runt= 1765msec
  read : io=2048KiB, bw=395KiB/s, iops=17, runt= 5308msec
  read : io=2056KiB, bw=434KiB/s, iops=17, runt= 4851msec
  read : io=2064KiB, bw=382KiB/s, iops=14, runt= 5521msec
  read : io=2072KiB, bw=412KiB/s, iops=16, runt= 5144msec
  read : io=2052KiB, bw=439KiB/s, iops=17, runt= 4784msec
  read : io=2076KiB, bw=408KiB/s, iops=15, runt= 5209msec
  read : io=2084KiB, bw=405KiB/s, iops=15, runt= 5263msec
  read : io=2052KiB, bw=379KiB/s, iops=14, runt= 5543msec
  read : io=2076KiB, bw=438KiB/s, iops=18, runt= 4852msec
  read : io=2052KiB, bw=1016KiB/s, iops=38, runt= 2068msec
  read : io=2056KiB, bw=227KiB/s, iops=9, runt= 9271msec
  read : io=2072KiB, bw=1256KiB/s, iops=48, runt= 1689msec
  read : io=2048KiB, bw=347KiB/s, iops=13, runt= 6036msec
  read : io=2068KiB, bw=594KiB/s, iops=24, runt= 3562msec
  read : io=2052KiB, bw=415KiB/s, iops=16,...

(In reply to comment #130)
> Can you try with this simple patch applied?
> 
> diff --git a/block/blk.h b/block/blk.h
> index 6e1ed40..a145c3a 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -5,7 +5,7 @@
>  #define BLK_BATCH_TIME (HZ/50UL)
> 
>  /* Number of requests a "batching" process may submit */
> -#define BLK_BATCH_REQ  32
> +#define BLK_BATCH_REQ  1
> 
>  extern struct kmem_cache *blk_requestq_cachep;
>  extern struct kobj_type blk_queue_ktype;
>

Hi Jens,

I tried it on a 2.6.29-rc3 kernel. It made things worse for "default" config, but did help with config1.
(fio "ssh" test bench)
(config1 : quantum=1, slice_async_rq=1, queue_depth=1)

max runt 2.6.29-rc3 default no patch    14247msec
max runt 2.6.29-rc3 default patch       30833msec

max runt 2.6.29-rc3 config1 no patch     7574msec
max runt 2.6.29-rc3 config1 patch        6585msec

Note that the results seems to indicate that the larger run times occur near the "write" job. The listings below show the runtime of the jobs (1 large write and many 2M reads executed at regular interval for most of the load, and ending with more randomly delayed jobs) in the order they were run. Note that all the read jobs are started at a 4s interval, except the last 2 jobs which are started after 50s for the 1st one, and after another 10s for the last one.

Here is the listing of the 2.6.29-rc3 default no patch

write: io=10240MiB, bw=56062KiB/s, iops=53, runt=191526msec
  read : io=2052KiB, bw=3411KiB/s, iops=141, runt=   616msec
  read : io=2084KiB, bw=409KiB/s, iops=16, runt=  5215msec
  read : io=2060KiB, bw=349KiB/s, iops=15, runt=  6031msec
  read : io=2060KiB, bw=445KiB/s, iops=17, runt=  4731msec
  read : io=2068KiB, bw=377KiB/s, iops=14, runt=  5606msec
  read : io=2084KiB, bw=558KiB/s, iops=23, runt=  3824msec
  read : io=2056KiB, bw=398KiB/s, iops=15, runt=  5279msec
  read : io=2048KiB, bw=328KiB/s, iops=13, runt=  6393msec
  read : io=2056KiB, bw=337KiB/s, iops=12, runt=  6236msec
  read : io=2072KiB, bw=596KiB/s, iops=23, runt=  3558msec
  read : io=2068KiB, bw=448KiB/s, iops=17, runt=  4723msec
  read : io=2052KiB, bw=342KiB/s, iops=14, runt=  6143msec
  read : io=2056KiB, bw=448KiB/s, iops=19, runt=  4695msec
  read : io=2060KiB, bw=362KiB/s, iops=14, runt=  5814msec
  read : io=2072KiB, bw=1202KiB/s, iops=44, runt=  1765msec
  read : io=2048KiB, bw=395KiB/s, iops=17, runt=  5308msec
  read : io=2056KiB, bw=434KiB/s, iops=17, runt=  4851msec
  read : io=2064KiB, bw=382KiB/s, iops=14, runt=  5521msec
  read : io=2072KiB, bw=412KiB/s, iops=16, runt=  5144msec
  read : io=2052KiB, bw=439KiB/s, iops=17, runt=  4784msec
  read : io=2076KiB, bw=408KiB/s, iops=15, runt=  5209msec
  read : io=2084KiB, bw=405KiB/s, iops=15, runt=  5263msec
  read : io=2052KiB, bw=379KiB/s, iops=14, runt=  5543msec
  read : io=2076KiB, bw=438KiB/s, iops=18, runt=  4852msec
  read : io=2052KiB, bw=1016KiB/s, iops=38, runt=  2068msec
  read : io=2056KiB, bw=227KiB/s, iops=9, runt=  9271msec
  read : io=2072KiB, bw=1256KiB/s, iops=48, runt=  1689msec
  read : io=2048KiB, bw=347KiB/s, iops=13, runt=  6036msec
  read : io=2068KiB, bw=594KiB/s, iops=24, runt=  3562msec
  read : io=2052KiB, bw=415KiB/s, iops=16, runt=  5057msec
  read : io=2052KiB, bw=326KiB/s, iops=14, runt=  6430msec
  read : io=2064KiB, bw=394KiB/s, iops=16, runt=  5362msec
  read : io=2068KiB, bw=280KiB/s, iops=12, runt=  7553msec
  read : io=2064KiB, bw=364KiB/s, iops=15, runt=  5806msec
  read : io=2052KiB, bw=1001KiB/s, iops=41, runt=  2098msec
  read : io=2084KiB, bw=490KiB/s, iops=18, runt=  4352msec
  read : io=2056KiB, bw=1197KiB/s, iops=51, runt=  1758msec
  read : io=2048KiB, bw=471KiB/s, iops=19, runt=  4444msec
  read : io=2052KiB, bw=158KiB/s, iops=6, runt= 13259msec
  read : io=2052KiB, bw=147KiB/s, iops=6, runt= 14247msec
  read : io=2060KiB, bw=3906KiB/s, iops=148, runt=   540msec

Here is the listing of the 2.6.29-rc3 default patch

write: io=10240MiB, bw=54981KiB/s, iops=52, runt=195291msec
  read : io=2072KiB, bw=3843KiB/s, iops=159, runt=   552msec
  read : io=2080KiB, bw=4302KiB/s, iops=173, runt=   495msec
  read : io=2064KiB, bw=276KiB/s, iops=11, runt=  7642msec
  read : io=2056KiB, bw=462KiB/s, iops=18, runt=  4552msec
  read : io=2064KiB, bw=311KiB/s, iops=12, runt=  6790msec
  read : io=2076KiB, bw=832KiB/s, iops=34, runt=  2554msec
  read : io=2052KiB, bw=298KiB/s, iops=12, runt=  7038msec
  read : io=2048KiB, bw=493KiB/s, iops=20, runt=  4250msec
  read : io=2048KiB, bw=310KiB/s, iops=12, runt=  6746msec
  read : io=2060KiB, bw=595KiB/s, iops=24, runt=  3542msec
  read : io=2068KiB, bw=280KiB/s, iops=12, runt=  7542msec
  read : io=2056KiB, bw=506KiB/s, iops=20, runt=  4155msec
  read : io=2052KiB, bw=352KiB/s, iops=13, runt=  5953msec
  read : io=2068KiB, bw=1778KiB/s, iops=73, runt=  1191msec
  read : io=2080KiB, bw=239KiB/s, iops=9, runt=  8885msec
  read : io=2064KiB, bw=790KiB/s, iops=31, runt=  2675msec
  read : io=2048KiB, bw=235KiB/s, iops=9, runt=  8900msec
  read : io=2052KiB, bw=395KiB/s, iops=16, runt=  5312msec
  read : io=2048KiB, bw=490KiB/s, iops=20, runt=  4279msec
  read : io=2048KiB, bw=350KiB/s, iops=14, runt=  5991msec
  read : io=2060KiB, bw=289KiB/s, iops=13, runt=  7296msec
  read : io=2060KiB, bw=392KiB/s, iops=14, runt=  5368msec
  read : io=2048KiB, bw=323KiB/s, iops=13, runt=  6487msec
  read : io=2052KiB, bw=442KiB/s, iops=17, runt=  4753msec
  read : io=2056KiB, bw=382KiB/s, iops=15, runt=  5506msec
  read : io=2052KiB, bw=299KiB/s, iops=11, runt=  7005msec
  read : io=2052KiB, bw=372KiB/s, iops=15, runt=  5647msec
  read : io=2068KiB, bw=512KiB/s, iops=18, runt=  4136msec
  read : io=2056KiB, bw=326KiB/s, iops=13, runt=  6453msec
  read : io=2060KiB, bw=765KiB/s, iops=30, runt=  2756msec
  read : io=2052KiB, bw=392KiB/s, iops=15, runt=  5357msec
  read : io=2060KiB, bw=420KiB/s, iops=19, runt=  5013msec
  read : io=2052KiB, bw=307KiB/s, iops=12, runt=  6838msec
  read : io=2056KiB, bw=724KiB/s, iops=33, runt=  2905msec
  read : io=2052KiB, bw=407KiB/s, iops=16, runt=  5153msec
  read : io=2048KiB, bw=417KiB/s, iops=15, runt=  5021msec
  read : io=2048KiB, bw=345KiB/s, iops=15, runt=  6069msec
  read : io=2048KiB, bw=451KiB/s, iops=21, runt=  4643msec
  read : io=2048KiB, bw=68KiB/s, iops=2, runt= 30833msec
  read : io=2048KiB, bw=121KiB/s, iops=5, runt= 17290msec
  read : io=2052KiB, bw=3876KiB/s, iops=167, runt=   542msec

Here is the listing of the 2.6.29-rc3 config1 no patch

write: io=10240MiB, bw=61068KiB/s, iops=58, runt=175827msec
  read : io=2048KiB, bw=4185KiB/s, iops=167, runt=   501msec
  read : io=2056KiB, bw=3814KiB/s, iops=161, runt=   552msec
  read : io=2056KiB, bw=448KiB/s, iops=17, runt=  4692msec
  read : io=2056KiB, bw=1070KiB/s, iops=42, runt=  1966msec
  read : io=2052KiB, bw=424KiB/s, iops=16, runt=  4946msec
  read : io=2076KiB, bw=512KiB/s, iops=19, runt=  4149msec
  read : io=2076KiB, bw=580KiB/s, iops=25, runt=  3664msec
  read : io=2052KiB, bw=470KiB/s, iops=18, runt=  4467msec
  read : io=2068KiB, bw=624KiB/s, iops=26, runt=  3390msec
  read : io=2060KiB, bw=929KiB/s, iops=39, runt=  2270msec
  read : io=2064KiB, bw=508KiB/s, iops=19, runt=  4160msec
  read : io=2076KiB, bw=659KiB/s, iops=26, runt=  3224msec
  read : io=2080KiB, bw=366KiB/s, iops=14, runt=  5819msec
  read : io=2064KiB, bw=1023KiB/s, iops=42, runt=  2066msec
  read : io=2060KiB, bw=322KiB/s, iops=13, runt=  6540msec
  read : io=2060KiB, bw=1383KiB/s, iops=52, runt=  1525msec
  read : io=2052KiB, bw=691KiB/s, iops=26, runt=  3039msec
  read : io=2064KiB, bw=444KiB/s, iops=20, runt=  4755msec
  read : io=2080KiB, bw=551KiB/s, iops=20, runt=  3860msec
  read : io=2084KiB, bw=743KiB/s, iops=29, runt=  2870msec
  read : io=2056KiB, bw=412KiB/s, iops=16, runt=  5106msec
  read : io=2056KiB, bw=406KiB/s, iops=15, runt=  5179msec
  read : io=2048KiB, bw=465KiB/s, iops=19, runt=  4507msec
  read : io=2060KiB, bw=446KiB/s, iops=15, runt=  4725msec
  read : io=2068KiB, bw=467KiB/s, iops=20, runt=  4528msec
  read : io=2052KiB, bw=461KiB/s, iops=18, runt=  4557msec
  read : io=2076KiB, bw=628KiB/s, iops=25, runt=  3385msec
  read : io=2052KiB, bw=518KiB/s, iops=23, runt=  4054msec
  read : io=2068KiB, bw=492KiB/s, iops=20, runt=  4296msec
  read : io=2048KiB, bw=543KiB/s, iops=21, runt=  3858msec
  read : io=2048KiB, bw=559KiB/s, iops=20, runt=  3750msec
  read : io=2064KiB, bw=646KiB/s, iops=26, runt=  3270msec
  read : io=2056KiB, bw=426KiB/s, iops=17, runt=  4938msec
  read : io=2052KiB, bw=741KiB/s, iops=29, runt=  2835msec
  read : io=2048KiB, bw=453KiB/s, iops=19, runt=  4621msec
  read : io=2072KiB, bw=579KiB/s, iops=24, runt=  3662msec
  read : io=2068KiB, bw=418KiB/s, iops=16, runt=  5066msec
  read : io=2056KiB, bw=2101KiB/s, iops=82, runt=  1002msec
  read : io=2072KiB, bw=280KiB/s, iops=11, runt=  7574msec
  read : io=2048KiB, bw=4877KiB/s, iops=190, runt=   430msec
  read : io=2076KiB, bw=4160KiB/s, iops=168, runt=   511msec

and, for comparison, here is the listing of the
2.6.29-rc3 config1 patch

write: io=10240MiB, bw=59607KiB/s, iops=56, runt=180134msec
  read : io=2068KiB, bw=4152KiB/s, iops=162, runt=   510msec
  read : io=2060KiB, bw=4185KiB/s, iops=168, runt=   504msec
  read : io=2064KiB, bw=508KiB/s, iops=21, runt=  4157msec
  read : io=2060KiB, bw=476KiB/s, iops=19, runt=  4425msec
  read : io=2056KiB, bw=444KiB/s, iops=18, runt=  4738msec
  read : io=2084KiB, bw=525KiB/s, iops=21, runt=  4063msec
  read : io=2072KiB, bw=481KiB/s, iops=20, runt=  4406msec
  read : io=2084KiB, bw=565KiB/s, iops=22, runt=  3777msec
  read : io=2048KiB, bw=498KiB/s, iops=20, runt=  4209msec
  read : io=2068KiB, bw=544KiB/s, iops=21, runt=  3888msec
  read : io=2080KiB, bw=389KiB/s, iops=15, runt=  5462msec
  read : io=2068KiB, bw=1384KiB/s, iops=55, runt=  1529msec
  read : io=2072KiB, bw=444KiB/s, iops=18, runt=  4774msec
  read : io=2064KiB, bw=320KiB/s, iops=12, runt=  6585msec
  read : io=2060KiB, bw=630KiB/s, iops=28, runt=  3348msec
  read : io=2064KiB, bw=428KiB/s, iops=15, runt=  4931msec
  read : io=2052KiB, bw=422KiB/s, iops=15, runt=  4973msec
  read : io=2056KiB, bw=480KiB/s, iops=21, runt=  4385msec
  read : io=2060KiB, bw=1453KiB/s, iops=61, runt=  1451msec
  read : io=2076KiB, bw=426KiB/s, iops=16, runt=  4983msec
  read : io=2052KiB, bw=735KiB/s, iops=28, runt=  2855msec
  read : io=2060KiB, bw=427KiB/s, iops=16, runt=  4939msec
  read : io=2064KiB, bw=508KiB/s, iops=19, runt=  4158msec
  read : io=2064KiB, bw=511KiB/s, iops=21, runt=  4134msec
  read : io=2052KiB, bw=538KiB/s, iops=20, runt=  3900msec
  read : io=2048KiB, bw=454KiB/s, iops=18, runt=  4612msec
  read : io=2052KiB, bw=520KiB/s, iops=21, runt=  4034msec
  read : io=2064KiB, bw=505KiB/s, iops=19, runt=  4183msec
  read : io=2052KiB, bw=414KiB/s, iops=17, runt=  5074msec
  read : io=2068KiB, bw=520KiB/s, iops=19, runt=  4065msec
  read : io=2048KiB, bw=392KiB/s, iops=15, runt=  5349msec
  read : io=2064KiB, bw=671KiB/s, iops=27, runt=  3148msec
  read : io=2068KiB, bw=551KiB/s, iops=21, runt=  3843msec
  read : io=2056KiB, bw=665KiB/s, iops=28, runt=  3162msec
  read : io=2084KiB, bw=606KiB/s, iops=23, runt=  3518msec
  read : io=2056KiB, bw=346KiB/s, iops=14, runt=  6076msec
  read : io=2056KiB, bw=452KiB/s, iops=19, runt=  4656msec
  read : io=2076KiB, bw=495KiB/s, iops=20, runt=  4291msec
  read : io=2052KiB, bw=407KiB/s, iops=17, runt=  5152msec
  read : io=2068KiB, bw=2267KiB/s, iops=92, runt=   934msec
  read : io=2064KiB, bw=4080KiB/s, iops=144, runt=   518msec

I start to think that I should put more than a 4s delay between the jobs, since the duration of the reads is always around those 4s. Things become more interesting with the 50s delay probably because the read queue is empty.

Mathieu

Revision history for this message

In Linux Kernel Bug Tracker #12309, mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote on 2009-02-02:

#293

(edit)
Note that the results seems to indicate that the larger run times occur near
the "write" job *end*.

Revision history for this message

In Linux Kernel Bug Tracker #12309, petrinic (petrinic-linux-kernel-bugs) wrote on 2009-02-04:

#294

Hi.

On my laptop(Core2Duo 1.6 ghz) I run my gentoo kernel 2.6.28-gentoo.
I didn't have any problems with latency.

If I run "dd if=/dev/zero of=file bs=1M count=2048" or "dd if=/dev/zero of=/tmp/test bs=1M count=1M" (I tried to run it as user and also as root), my system works well and I can start firefox, another shell, open dolphin (i'm under kde4-svn) and everything is faster.

I have XFS filesystem on my home and reiserfs on root.

Since I configured my kernel manually, maybe it could be usefull for someone to have my .config so I'll post it.

Revision history for this message

In Linux Kernel Bug Tracker #12309, petrinic (petrinic-linux-kernel-bugs) wrote on 2009-02-04:

#295

Created attachment 20105
With this .config I don't have latency bug.

My 2.6.28 .config , Everything is ok with this .config. I didn't have any slowdowns running "dd if=/dev/zero of=/tmp/test bs=1M count=1M" on my core2duo laptop(1.6 ghz).

Revision history for this message

In Linux Kernel Bug Tracker #12309, harrisonmetz (harrisonmetz-linux-kernel-bugs) wrote on 2009-02-04:

#296

After looking through Alexsandar's kernel I decided to try a new config. Changing my kernel from 250HZ and Voluntary Kernel Preemption to 1000HZ and Preemptible Kernel (Low-Latency Desktop), I can actually open tabs in firefox, new terminals, or SSH into my computer (from itself) without waiting 10-30 seconds. Perhaps there is no bug but this is just expected behavior.

I wonder if it was more of the clock change or the preemption change which made the difference, or both.

For those of you who have this problem what is your HZ and preemption model?

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-02-05:

#297

Enabling the 1000Hz timer frequency and Low-Latency Desktop as preemption model does not solve the problem for me.

The mouse still freezes, I cannot move windows or switch between desktops on heavy i/o. The time of these freezes is now reduced to less than 3s, the freezes interval is 2-10s and the desktop still unusable for me.

Revision history for this message

In Linux Kernel Bug Tracker #12309, petrinic (petrinic-linux-kernel-bugs) wrote on 2009-02-05:

#298

Maybe it's not only the preemption and the frequency. I think one of these things could be:

General setup:
- Control Group support DISABLED
- Group CPU Scheduler DISABLED
- Enable full-sized data structures for core ENABLED
- Enable futex support ENABLED
- Use full shmem filesystem ENABLED
- Enable AIO support ENABLED
- SLAB Allocator: SLUB

Processor type and features (ENABLED):
- Tickless System (NO_HZ)
- High Resolution Timer Support
- HPET Timer Support
- Multi-core scheduler support
- Preemptible RCU
- 64 bit Memory and IO resources
- Add LRU list to track non-evictable pages

Good luck...

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-02-05:

#299

I think it could be great if someone of the kernel can take a look on this.

Linux is starting to loss advantage in performance tests because this problem.

Is there any kernel developer who can address this issue?

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-02-05:

#300

(In reply to comment #136)
> For those of you who have this problem what is your HZ and preemption model?
>
I'm currently using Voluntary Preemption and HZ=1000. However, I think we're probably losing focus here. Just randomly changing configurations seems like grasping at straws to me. There are far too many potentially relevant configuration options to realistically test them all. If we are going to make progress, we are going to have to use more targeted investigation.

(In reply to comment #139)
> Is there any kernel developer who can address this issue?
>
Jens Axboe has sent us a few patches although he doesn't seem to have a lot of time to dedicate to the issue. Honestly, I think we might need to find a distribution with a block layer developer on payroll who could focus on this issue until it is solved. In my discussions on #fedora-kernel, it doesn't look like Redhat has such a person. I haven't received any responses one way or another on #ubuntu-kernel with respect to Canonical.

Does anyone know of a company who might have someone with the requisite skill set to debug this issue? Jens, do you think you'll be able to sustainably work on this bug? (Thanks for your work so far, by the way)

I think it would be amazing if we could give 2.6.29 proper I/O performance. I know it's getting late considering we're at -rc3, but this bug has been with us for far too long.

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-02-05:

#301

Well, I'm fairly certain at least part of the issue is a scheduler bug. Just now I was make module_install'ing a few kernels and after some time found that specific processes had stopped responding. This pattern continued, with more and more processes blocking. Eventually the entire X session stopped responding. For a while I could maintain an SSH session and found that IO wait time was 40%, with the rest of the CPU time going idle. After some time, however, even the ssh session stopped responding. This is the third time I have seen behavior like this, with the previous instances involving copying 15GB of data between external hard drives.

Also, Jens, what do you think is the most useful benchmark we've seen here? Testers have used several benchmarks including dd, various fio jobs. Would it help if we standardized on a single benchmark?

Revision history for this message

In Linux Kernel Bug Tracker #12309, Adriaan.van.Kessel (adriaan.van.kessel-linux-kernel-bugs) wrote on 2009-02-05:

#302

The best illustration of this behavior seems #128 #129 #131.
IMHO this illustrates that most CPU is burned on a spinlock.
If the time spent inside the critical section also increases (which it does), there is IMHO a strong indication that there must be another (spin-) lock inside this code path.
Currently I'm looking into mm->filemap.c

My own testing consists of a toy search engine I am developing. It uses the maximum of mmap()ed files (32K or 64K). (the program maintains it's own LRU)
In the first stage of its's indexer, it just reads mmap()ed pages, maybe dirtying them. When it is done, it unmaps() them (causing the buffers to be written back to backing store).

The frozen-cursor and non-responsive system only occurs during the first phase.
During the writing phase, things are back to normal again.

IMHO, this could mean two things:
1) There is a funneling lock in the read() pathway
2) The mm runs into the mud

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-02-06:

#303

Sorry, I wish I could spend more time on it. I'll be on vacation the next 9 days, so no response until the week beginning on Feb 16th. I'll try and set aside a few days to work on it then.

With complete freezing of the mouse, it does look like some sort of spinning issue. To that extent, the most valuable information would be profiling from those 5 seconds surrounding the freeze. Hard to do, but would be very valuable.

People seem to be certain that this is a block layer issue, I'm far from convinced that is the case.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-02-06:

#304

I have have limited the usage of generic_file_aio_write in filemap.c for every process. Once I have limited the throughput of every process. When the overall throughput was below disc capacity, there where no more freezes of the mouse. When the overall throughput was above disc capacity, the problem appears immediately.

When I have limited the usage to max 20% of interval time for every process, and suspended the thread when it needs more. The problems was present as before, as every 20st requests, __generic_file_aio_write_nolock needs more than 2s for finishing.

I tried the same for the cfq scheduler in cfq_choose_req and added penalties for processes with heavy io, but the pid is not correctly set for all cfq_queue and I got a kernel panic after a while. Before the the kernel panic there was no improvement.

Revision history for this message

In Linux Kernel Bug Tracker #12309, bart (bart-linux-kernel-bugs) wrote on 2009-02-06:

#305

Created attachment 20148
Graph of I/O waits on CPU Core 0

Running dd if=/dev/zero of=/storage/hwraid0/test1 bs=1M count=1M

On my AMD Phenom 9950 Quad-Core Processor running a distro kernel (2.6.27.12-170.2.5.fc10.x86_64). This test was run against a XFS file system on a 8 disk PCI-Express hardware RAID card. I also get the same if I run against ext4 on the same hardware. I also get similar results with this machine on a single 10,000RPM drive connected to the motherboards SATA with ext4.

When this test was running the system was very unresponsive. In a different test run I launched evolution and it took around 60 seconds to load.

[root@bajor hwraid0]# dd if=/dev/zero of=/storage/hwraid0/test1 bs=1M count=1M
436560+0 records in
436560+0 records out
457766338560 bytes (458 GB) copied, 2535.92 s, 181 MB/s

Revision history for this message

In Linux Kernel Bug Tracker #12309, mpartap (mpartap-linux-kernel-bugs) wrote on 2009-02-07:

#306

<stupidmetoopost />
Now at least i know what's going on.. it seems like its somehow coupled with mm because when this happens a) i can see invocations of the oom_killer in the logs after reboot and b) SYSRQ + sync & unmount action do not end the furious HDD LED flashing so i presume the kernel is misusing swapspace..
btw this is a very indeterminate and simply doing the same thing again will not reproduce the problem... so my vote is for uuhm race condition or spinlock recursion, too.

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-02-07:

#307

P5K, CPU - Core 2 Duo E8400, connected to the motherboards (ICH9) SATA - ST31000340AS, openSUSE 11.1, kernel - 2.6.28.3

yura@suse:~> dd if=/dev/zero of=test1 bs=1M count=1M
^C
128443+0 records in
128443+0 records out
134682247168 байт (135 GB), 1872,43 c, 71,9 MB/c

vmstat 1 (fragment)
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
free buff cache si so bi bo in cs us sy id wa
0 7750564 0 0 4880 12808 838 1491 2 3 0 95
0 7751004 0 0 1876 36888 1268 2506 2 5 0 93
0 7752056 0 0 7120 12296 705 1790 1 3 0 96
0 7751636 0 0 1416 36888 979 2178 3 4 0 93
0 7751480 0 0 900 28176 672 1444 2 3 0 95
0 7753008 0 0 2468 24680 649 1191 2 3 0 95
0 7757720 0 0 1508 72240 994 1696 2 7 2 88
0 7749244 0 0 5212 51212 1247 2436 6 6 0 87
0 7749752 0 0 1268 42504 799 1963 2 5 0 93
0 7757836 0 0 0 81959 1126 2249 1 9 6 84
0 7758912 0 0 0 71736 830 1818 1 8 31 60
0 7756472 0 0 0 59473 998 1879 1 5 8 85
0 7748176 0 0 0 81996 1114 2332 1 8 0 91
0 7747356 0 0 0 79920 867 1748 1 8 0 91
0 7747508 0 0 0 76848 1021 1947 1 8 0 91
0 7751964 0 0 0 52272 821 1775 1 6 0 93
0 7754660 0 0 0 77896 1054 2230 1 7 0 92
0 7755792 0 0 0 71736 1343 2886 1 7 0 91
0 7756444 0 0 0 77863 826 1736 0 7 0 92
0 7757664 0 0 0 63560 1036 1911 1 7 0 91
0 7757936 0 0 0 77896 721 1539 1 6 0 92
0 7760684 0 0 428 63544 1538 2789 12 8 0 79
0 7756940 0 0 6876 31248 1241 2857 4 4 0 91

The system dies. To call KDE main menu - it is the extremely inconvenient. About the rest - in general I am silent.

P5K, CPU - Core 2 Duo E8400, connected to the motherboards (ICH9) SATA - ST31000340AS, openSUSE 11.1, kernel - 2.6.28.3

yura@suse:~> dd if=/dev/zero of=test1 bs=1M count=1M
^C
128443+0 records in
128443+0 records out
134682247168 байт (135 GB), 1872,43 c, 71,9 MB/c

vmstat 1 (fragment)
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----   
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa   
 1  7      0  46780      0 7750564    0    0  4880 12808  838 1491  2  3  0 95  
 0  9      0  45480      0 7751004    0    0  1876 36888 1268 2506  2  5  0 93  
 1  9      0  45420      0 7752056    0    0  7120 12296  705 1790  1  3  0 96  
 0  7      0  43924      0 7751636    0    0  1416 36888  979 2178  3  4  0 93  
 0  8      0  44148      0 7751480    0    0   900 28176  672 1444  2  3  0 95  
 0  2      0  54144      0 7753008    0    0  2468 24680  649 1191  2  3  0 95  
 0 11      0  46420      0 7757720    0    0  1508 72240  994 1696  2  7  2 88  
 4 10      0  43348      0 7749244    0    0  5212 51212 1247 2436  6  6  0 87  
 1 10      0  46256      0 7749752    0    0  1268 42504  799 1963  2  5  0 93  
 0  1      0  45468      0 7757836    0    0     0 81959 1126 2249  1  9  6 84  
 0  1      0  43880      0 7758912    0    0     0 71736  830 1818  1  8 31 60  
 0 10      0  43280      0 7756472    0    0     0 59473  998 1879  1  5  8 85  
 1  9      0  46832      0 7748176    0    0     0 81996 1114 2332  1  8  0 91  
 0 10      0  46652      0 7747356    0    0     0 79920  867 1748  1  8  0 91  
 0 10      0  45836      0 7747508    0    0     0 76848 1021 1947  1  8  0 91  
 0 10      0  46724      0 7751964    0    0     0 52272  821 1775  1  6  0 93  
 0 10      0  44388      0 7754660    0    0     0 77896 1054 2230  1  7  0 92  
 0  6      0  45672      0 7755792    0    0     0 71736 1343 2886  1  7  0 91  
 1  8      0  44624      0 7756444    0    0     0 77863  826 1736  0  7  0 92  
 1  6      0  43132      0 7757664    0    0     0 63560 1036 1911  1  7  0 91  
 0  3      0  43200      0 7757936    0    0     0 77896  721 1539  1  6  0 92  
 1 11      0  46716      0 7760684    0    0   428 63544 1538 2789 12  8  0 79  
 0 10      0  44808      0 7756940    0    0  6876 31248 1241 2857  4  4  0 91

The system dies. To call KDE main menu - it is the extremely inconvenient. About the rest - in general I am silent.

Revision history for this message

In Linux Kernel Bug Tracker #12309, mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote on 2009-02-07:

#308

Ah !!

I think that could be the problem. The dd test with a large file (20GB) on my machine with 16GB. Looking at top while it's done shows me that the available memory steadily shrinks, all being incrementally reserved for cache.

It actually shrinks down to 80kB. Starting from that point, I experience lags when I type "ls". So.. I think this could be the problem. Is there any reason why the memory used for cache is allowed to grow out of proportion like this ?

Mathieu

(In reply to comment #146)
> <stupidmetoopost />
> Now at least i know what's going on.. it seems like its somehow coupled with
> mm
> because when this happens a) i can see invocations of the oom_killer in the
> logs after reboot and b) SYSRQ + sync & unmount action do not end the furious
> HDD LED flashing so i presume the kernel is misusing swapspace..
> btw this is a very indeterminate and simply doing the same thing again will
> not
> reproduce the problem... so my vote is for uuhm race condition or spinlock
> recursion, too.
>

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2009-02-07:

#309

Well actually it is worse than that. If you have not tuned vm.swappiness to something much lower than the default of 60 (1 or something) the kernel will also start swapping out stuff to free memory. I don't know a way the limit the cachememory's size.

Revision history for this message

In Linux Kernel Bug Tracker #12309, mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote on 2009-02-07:

#310

There seems to be some information about how to tune this here. Trying out
parameter variations would be interesting :

http://www.westnet.com/~gsmith/content/linux-pdflush.htm

Mathieu

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2009-02-07:

#311

echo "1" > dirty_background_ratio
echo "1" > dirty_ratio
echo "3" > drop_caches

and vmstat says

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 2 355844 427256 3508 67544 10 21 315 180 459 781 5 3 80 12

then after doing a 10gig dd-operation vmstat says

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 355872 24532 8656 457200 10 21 338 497 456 763 5 3 79 13

So if I read the numbers correct around 400 Mb of memory has now been used for caches. Hmm that doesn't match setting dirty_background_ratio and dirty_ratio to 1. Since I have 1G of memory only 1% (10 Mb) should be allowed to be dirty before forcing applications to wait. But this is apparently not the cause here.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-02-08:

#312

In __block_write_full_page (buffer.c) nearby all submits to the block device are caused by pdflush.
At the beginning there are submits of 300MB on VM with 384MB. After that the dd processes submits the data direct. As soon as there is available memory, it is filled and submitted immediately by pdflush. The 300MB are submitted at once or nearby at once.

On the VM there is the following scheme, caused by the double buffering (VM/Host).
At 67.506825 300MB (pdflush)
100MB (dd processes)
At 72.750497 300MB (pdflush)
100MB (dd processes)
At 74.215577 50MB (pdflush) // Host cache filled
...

My guess is, that the dirty pages are not increased correctly by create_empty_buffers in __block_write_full_page. I currently don't known, how to check it, as I have just started to read and understand the kernel code.

Revision history for this message

In Linux Kernel Bug Tracker #12309, mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote on 2009-02-08:

#313

The following solution works for me. I use the cgroups to limit the amount of memory dd can use. That shows that there is a problem with the kernel otherwise allowing the page cache to take _all_ the available kernel memory.

mkdir -p /cgroups
mount -t cgroup none /cgroups -o memory
mkdir 0
echo $$ > /cgroups/0/tasks
echo 4M > /cgroups/0/memory.limit_in_bytes
dd if=/dev/zero of=/tmp/bigfile bs=1024k count=20480

The same works with the fio "ssh" test case when run under the cgroups limitations :

  write: io=10240MiB, bw=34349KiB/s, iops=32, runt=312595msec
  read : io=2068KiB, bw=404KiB/s, iops=16, runt= 5239msec
  read : io=2048KiB, bw=598KiB/s, iops=25, runt= 3505msec
  read : io=2056KiB, bw=283KiB/s, iops=12, runt= 7437msec
  read : io=2056KiB, bw=542KiB/s, iops=21, runt= 3879msec
  read : io=2060KiB, bw=388KiB/s, iops=16, runt= 5431msec
  read : io=2052KiB, bw=591KiB/s, iops=25, runt= 3554msec
  read : io=2076KiB, bw=375KiB/s, iops=15, runt= 5658msec
  read : io=2048KiB, bw=522KiB/s, iops=19, runt= 4011msec
  read : io=2080KiB, bw=468KiB/s, iops=19, runt= 4548msec
  read : io=2068KiB, bw=406KiB/s, iops=16, runt= 5206msec
  read : io=2080KiB, bw=412KiB/s, iops=17, runt= 5161msec
  read : io=2068KiB, bw=410KiB/s, iops=18, runt= 5159msec
  read : io=2064KiB, bw=320KiB/s, iops=13, runt= 6603msec
  read : io=2064KiB, bw=356KiB/s, iops=13, runt= 5924msec
  read : io=2052KiB, bw=565KiB/s, iops=22, runt= 3716msec
  read : io=2060KiB, bw=396KiB/s, iops=18, runt= 5321msec
  read : io=2048KiB, bw=507KiB/s, iops=19, runt= 4129msec
  read : io=2048KiB, bw=302KiB/s, iops=12, runt= 6924msec
  read : io=2060KiB, bw=497KiB/s, iops=20, runt= 4243msec
  read : io=2072KiB, bw=3138KiB/s, iops=130, runt= 676msec
  read : io=2048KiB, bw=3472KiB/s, iops=130, runt= 604msec
  read : io=2060KiB, bw=4080KiB/s, iops=172, runt= 517msec
  read : io=2052KiB, bw=4227KiB/s, iops=171, runt= 497msec
  read : io=2048KiB, bw=3744KiB/s, iops=166, runt= 560msec
  read : io=2076KiB, bw=4201KiB/s, iops=169, runt= 506msec
  read : io=2052KiB, bw=3531KiB/s, iops=159, runt= 595msec

See Documentation/cgroups/memory.txt for more details.

Mathieu

The following solution works for me. I use the cgroups to limit the amount of memory dd can use. That shows that there is a problem with the kernel otherwise allowing the page cache to take _all_ the available kernel memory.

mkdir -p /cgroups
mount -t cgroup none /cgroups -o memory
mkdir 0
echo $$ > /cgroups/0/tasks
echo 4M > /cgroups/0/memory.limit_in_bytes
dd if=/dev/zero of=/tmp/bigfile bs=1024k count=20480

The same works with the fio "ssh" test case when run under the cgroups limitations :

write: io=10240MiB, bw=34349KiB/s, iops=32, runt=312595msec
  read : io=2068KiB, bw=404KiB/s, iops=16, runt=  5239msec
  read : io=2048KiB, bw=598KiB/s, iops=25, runt=  3505msec
  read : io=2056KiB, bw=283KiB/s, iops=12, runt=  7437msec
  read : io=2056KiB, bw=542KiB/s, iops=21, runt=  3879msec
  read : io=2060KiB, bw=388KiB/s, iops=16, runt=  5431msec
  read : io=2052KiB, bw=591KiB/s, iops=25, runt=  3554msec
  read : io=2076KiB, bw=375KiB/s, iops=15, runt=  5658msec
  read : io=2048KiB, bw=522KiB/s, iops=19, runt=  4011msec
  read : io=2080KiB, bw=468KiB/s, iops=19, runt=  4548msec
  read : io=2068KiB, bw=406KiB/s, iops=16, runt=  5206msec
  read : io=2080KiB, bw=412KiB/s, iops=17, runt=  5161msec
  read : io=2068KiB, bw=410KiB/s, iops=18, runt=  5159msec
  read : io=2064KiB, bw=320KiB/s, iops=13, runt=  6603msec
  read : io=2064KiB, bw=356KiB/s, iops=13, runt=  5924msec
  read : io=2052KiB, bw=565KiB/s, iops=22, runt=  3716msec
  read : io=2060KiB, bw=396KiB/s, iops=18, runt=  5321msec
  read : io=2048KiB, bw=507KiB/s, iops=19, runt=  4129msec
  read : io=2048KiB, bw=302KiB/s, iops=12, runt=  6924msec
  read : io=2060KiB, bw=497KiB/s, iops=20, runt=  4243msec
  read : io=2072KiB, bw=3138KiB/s, iops=130, runt=   676msec
  read : io=2048KiB, bw=3472KiB/s, iops=130, runt=   604msec
  read : io=2060KiB, bw=4080KiB/s, iops=172, runt=   517msec
  read : io=2052KiB, bw=4227KiB/s, iops=171, runt=   497msec
  read : io=2048KiB, bw=3744KiB/s, iops=166, runt=   560msec
  read : io=2076KiB, bw=4201KiB/s, iops=169, runt=   506msec
  read : io=2052KiB, bw=3531KiB/s, iops=159, runt=   595msec

See Documentation/cgroups/memory.txt for more details.

Mathieu

Revision history for this message

In Linux Kernel Bug Tracker #12309, marco.gatti (marco.gatti-linux-kernel-bugs) wrote on 2009-02-09:

#314

How can we limit this with pre 2.6.29* kernels? I'm using 2.6.28.4 but there's no memory.limit_in_bytes and documentation doesn't help much about this...
Should we completely remove cgroups support from kernel until upgrading or waiting for a fix?

(In reply to comment #153)
[...]
> echo 4M > /cgroups/0/memory.limit_in_bytes
[...]

Revision history for this message

In Linux Kernel Bug Tracker #12309, mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote on 2009-02-09:

#315

Is CONFIG_CGROUPS (and sub-options) enabled in your 2.6.28.x kernel ?

I cannot guarantee that memory limits will be available, but I can see the CONFIG_CGROUPS option in my old 2.6.28.x .config.

Mathieu

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2009-02-09:

#316

Does not work for me. I succeed in limitting the memory-usage from going to infinity, but I still get 98% iowait and bad loss of responsiveness. I'm running 2.6.28.7

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2009-02-09:

#317

well it is a little bit more detailed. 4M limit ended up to kill my dd-operation. A limit of 16M is better for me and seems to be way better than the default without any limits.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-02-09:

#318

The CGROUPS are available in 2.6.28.3, but there is no the memory limit.

(In reply to comment #156)
Søren can you test it with clocksource=jiffies too? As I still think, that the reduces scheduler performance (#3) makes the problem worse. You can see the differences in comment #128 and #129 on my machine.

The number of dirty pages and writeback pages (/proc/meminfo) is always below 20% of memory on my systems, even under heavy io. But there is a lot of "traffic" caused by pdflush, when dirty pages count reaches the limit. All dirty pages are passed to the blk/elevator nearby at once. The time for sorting the rb-tree or perhaps looks takes more time for every request, as there are a lot of requests.

On ext3 it takes up to 1 second and 0.3 in average for inserting for new a request. And there are up to 7000 request submitted on my notebook. ( see comment #128 and #129 ). I think this one reason for the high io.

The problem for the high memory usage is caused by pdflush too, which is called by generic_perform_write (filemap.c) -> balance_dirty_pages_ratelimited. The clear_page_dirty_for_io is called directly before the page is submitted to the blk/elevator in write_cache_pages. As a result the page buffers are still in the elevator queue and the global_page_state(NR_FILE_DIRTY) has a too small value.

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2009-02-09:

#319

It does not matter if I use jiffies in these cases where memory is limited

memory.limit_in_bytes = 4M
Responsiveness : Very good
Disk speed : 40% of disk capabillity
iowait : Generally around 50%

Responsiveness : Good
Disk speed : 50% of disk capabillity
iowait : Generally around 50% but

Interestingly I can't get the disk speed > 50% of the disk capabillity reported by hdparm, not event with oflag=direct

Eearlier I have reported that jiffies performed better, but that was without memory-limitations.

Revision history for this message

In Linux Kernel Bug Tracker #12309, mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote on 2009-02-09:

#320

Created attachment 20172
mm fix page writeback accounting to fix oom condition under heavy I/O

Makes sure the page cache accounting behaves correctly with I/O elevator, thus fixing OOM condition.

Does not seem to fix the latency problem though. See changelog.

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-02-10:

#321

Hi Søren,

It's possible that the memory limits does not help with the problem, as you say the Hd will go underspeed because lack of data (dute to the memory limits). So it will trigger the problem later or not trigger it at all.

But it's good to have a way to limit the problem anyway.

So I have a question. What is the right way to work? I mean under heavy loads of IO flow, what's the right way to work for a sane kernel?

I propose some cases:

A) We have 2 process one that makes high load IO operations (this time HD), one thats only do it occasionally.

1.- Process 1 (high IO) starts to do IO ops. So it will switch between blocked status by IO ops and active as it reads and sends data to controller.
2.- Process 2 tries to access disk, so it has to wait for a chance to read.

In this case IO wait of process 1 should be almost 0 so it only waits microsecs while last IO op finish. But process 2 should have high IO waits because Process 1 takes all IO bandwidth.

B) Same case but with a round robin style queue. CFQ?

IO wait should be nearly 0 for Process 2 as it gets chance to write to disk but Process 1 must wait each operation to finish...

What is the correct whay? Is there any other?

What is clear is that is not normal that a process blocks all the other processes because is waiting to write. Just in case that every process want to write the IO Wait should rise as all processes are waiting to get a chance. In this case... Should we only have IO Wait times? Is this our case?

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-02-10:

#322

Created attachment 20176
Screenshot of current status of the bug while letting a program hang the system

Here you can see how IO Wait is 72.2% With Xorg going crazy on CPU usage and system showing that the rest of the system is completely unusable.

That was just because transmission was verifying my torrents. So again it's not acceptable that systems renders unusable because a background operation in place...

How can I help more?

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2009-02-10:

#323

Could it be possible to reused the concept from cpu-schedueling. Instead of talking about time-slices we could talk about IO-slices. The favor the processes which uses fewest IO-slices this will avoid an evil dd to starve other light reader/writers. I'm not kernel-skilled a t all so maybe this sound a lot like your RR-queue but just some thoughts.

Revision history for this message

In Linux Kernel Bug Tracker #12309, alevkovich (alevkovich-linux-kernel-bugs) wrote on 2009-02-10:

#324

May be someone can explain me why the simple copying eats ~50% cpu? May be it is a part of this problem? The same copying in Windows eats 5-10% cpu. UDMA 100 is enabled by my pata. I have jfs partitions.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-02-11:

#325

With the last patch, the problem is permanent on my notebook on a ext4 and ext3 partition. The io wait time is at 100% with heavy io. Mouse clicks are not recognised very often, or the keyboard input is delayed for up to 10 seconds (all under xorg).

I got a deadlock with the patch on kernel 2.6.28.2, but only once. The io wait time was at 100%, but there was no disc io any more. I could not start any programs or save some data, but I was able to use the running programs. I am not sure, if this is a problem caused by the patch or it is our problem. I got a complete freeze with clocksource=jiffies on a unpatched kernel with heavy io and heavy cpu usage too.

I have checked some timings in the block and elevator functions
(__make_request, get_request, get_request_wait, blk_complete_request, cfq_service_tree_add and cfq_add_rq_rb).
All the timings where below 5µs. At some points they are climbing to 80µs. But it looks good for me. In get_request_wait the writing dd processes are waiting up to one second for a new free request. It was only the dd processes or sometime the pdflush process. Should be OK.

Can prepare_to_wait_exclusive(&rl->wait[rw], &wait, TASK_UNINTERRUPTIBLE) in get_request_wait (blk-core.c) cause such a problem?

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2009-02-11:

#326

The patch from #160 to avoid the kernel from jsut taking all available memory almost works for me. Thanks Mathieu. I don't get crazy swapout as I used to, but the cache still occupies 400 Megs of memory out of my 1G which is also wrong.

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2009-02-12:

#327

hmm ..... I assume that the cache is both read- and write-cache. In that case everything is allright. I can confirm the allmost 100% iowait

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-02-12:

#328

I have limited the number of request of a process to 200 every second by adding a msleep_interruptible(5) just before spin_lock_irq(q->queue_lock) in __make_request, when there is a intensive usage by this process. The number of request are incremented in a ring buffer for four seconds and updated every 100ms. The throughput of the two dd processes is really bad at 3MB/s (as expected). Processes with a higher priority than 0, kjournald(2) or when (bio_data_dir(bio) == READ || bio_sync(bio)) is true, are passed without delay.
The wait time is at 100% of one core at the beginning and 100% off both cores after ~5-10s. Only the two dd processes and pdflush are delayed.
The problem is permanent. I cannot change the windows of two consoles or switching desktop. There are always long delays. It's exactly the freezing known from heavy io, with the difference of a moveable mouse cursor. I am not able using gedit to write a text, as every 5-15 seconds the keys are recognised with a long delay of at least 5 seconds. Even when the dd processes are killed and there is only a maximum write speed of 3MB/s (pdflush and perhaps kjournald) (0% io wait time) in the background. Gimp is starting in 10 seconds without preloading. The cache usage is at less than 20% of memory (~800MB).

I am using the kernel 2.6.28.2 with the patch from Mathieu. Thanks a lot. I think It stops freezing the mouse cursor. And my delay in __make_request.
Removing the delay only, restores the state before.

I think it is the main problem, as I can simulate it! The high wait io are cause by the sleeping threads. In __make_request there are only 100-200 from 7000 request during heavy io, which are calling get_request_wait. And there are only 10 requests, which are entering the while loop in get_request_wait, realy waiting more then 20ms and up to 1 second on my machine (prepare_to_wait_exclusive(&rl->wait[rw], &wait, TASK_UNINTERRUPTIBLE);...; io_schedule(); in get_request_wait).

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-02-12:

#329

I have just replaced prepare_to_wait_exclusive(&rl->wait[rw], &wait, TASK_UNINTERRUPTIBLE); and io_schedule(); in the function get_request_wait) agains msleep_interruptible(500). The thoughtput of the two dd processes is at 57MB/s (27/30). The desktop freezes up to 100 seconds.

Revision history for this message

In Linux Kernel Bug Tracker #12309, mathias.buren (mathias.buren-linux-kernel-bugs) wrote on 2009-02-16:

#330

Is there any way I can help debugging this?

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-02-18:

#331

(In reply to comment #138)
> Maybe it's not only the preemption and the frequency. I think one of these
> things could be:
>
> General setup:
> - Control Group support DISABLED
> - Group CPU Scheduler DISABLED
> - Enable full-sized data structures for core ENABLED
> - Enable futex support ENABLED
> - Use full shmem filesystem ENABLED
> - Enable AIO support ENABLED
> - SLAB Allocator: SLUB
>
> Processor type and features (ENABLED):
> - Tickless System (NO_HZ)
> - High Resolution Timer Support
> - HPET Timer Support
> - Multi-core scheduler support
> - Preemptible RCU
> - 64 bit Memory and IO resources
> - Add LRU list to track non-evictable pages
>
> Good luck...
>

Many of these seem to be 32-bit settings. The funny thing is that if I boot into x86 32-bit, I don't see any of the slow downs or they are so little that effectively I don't feel them. Its only x86-64 which freezes on me during IO.

Revision history for this message

In Linux Kernel Bug Tracker #12309, bart (bart-linux-kernel-bugs) wrote on 2009-02-18:

#332

Must admit all machines I have noticed this on are x86_64.

Revision history for this message

In Linux Kernel Bug Tracker #12309, michiel (michiel-linux-kernel-bugs) wrote on 2009-02-18:

#333

On the systems I have noticed it, are also x86_64.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-02-18:

#334

I have noticed this bug on a Pentium-M (32-Bit only) processor.

Revision history for this message

In Linux Kernel Bug Tracker #12309, simon+kernelbugzilla (simon+kernelbugzilla-linux-kernel-bugs) wrote on 2009-02-18:

#335

I have seen this bug on an Opteron 250 system with a 32-bit OS (CentOS 4.4 thru CentOS 5) installed.

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-02-18:

#336

Mine is

gad@ws-esp16:~$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz
stepping : 10
cpu MHz : 800.000
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm ida
bogomips : 4388.98
clflush size : 64
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz
stepping : 10
cpu MHz : 800.000
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm ida
bogomips : 4389.07
clflush size : 64
power management:

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2009-02-19:

#337

My cpu model is :AMD Turion(tm) 64 X2 Mobile Technology TL-50
The kernel is compiled for i686, and I see large slowdowns.

Revision history for this message

In Linux Kernel Bug Tracker #12309, james (james-linux-kernel-bugs) wrote on 2009-02-19:

#338

I see this on my Intel T81OO notebook on both kernel-2.6.29-0.33.rc5.fc10.x86_64 and kernel-2.6.27.15-170.2.24.fc10.x86_64 (default Fedora config options). Just using the simple dd /dev/zero test can provoke it; the desktop feels less responsive. latencytop shows things like evolution waiting almost 10 seconds for an fsync to complete.

Hardware has an ICH8 chipset, DMA etc. seems configured properly.

vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Duo CPU T8100 @ 2.10GHz
stepping : 6
cpu MHz : 800.000
cache size : 3072 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-02-19:

#339

In certain cases (2.6.28.5) at patch usage "mm fix page writeback accounting to fix oom condition under heavy I/O" the output from under the control, increase iowait ~ 100 % and a complete stop of system is observed. Any data I can not put, as one button reset on a box works only. Probably there is a set of the influencing factors demanding more detailed check.

Revision history for this message

In Linux Kernel Bug Tracker #12309, mathieu.desnoyers (mathieu.desnoyers-linux-kernel-bugs) wrote on 2009-02-19:

#340

(In reply to comment #179)
> In certain cases (2.6.28.5) at patch usage "mm fix page writeback accounting
> to
> fix oom condition under heavy I/O" the output from under the control,
> increase
> iowait ~ 100 % and a complete stop of system is observed. Any data I can not
> put, as one button reset on a box works only. Probably there is a set of the
> influencing factors demanding more detailed check.
>

My patch "mm fix page writeback accounting to fix oom condition under heavy I/O" is probably no the right solution, but rather a step in the right direction. It poinpoints that the elevator fails to increment counters that are tested by the code which selects if the memory pressure from the dirty pages and writeback pages high enough to make the process fall into "sync write" mode.

Therefore, I think a cleaner solution to this particular problem could be to create a new page type counter (like dirty pages, write buffers, ..) to let the vm know how many pages are used by the elevator. The fs/buffer.c code should then check for this value too to see if the pressure on memory is high enough to make the process do a "sync write". However, this problem is harder than it appears, because the buffer.c code would probably put such process in sync write mode independently of the elevator, and I really wonder what the interaction of such solution with the CFQ would be. I am not sure the CFQ I/O scheduler would behave correctly in such situation, but Jens could tell better than I on the subject.

Hope this helps,

Mathieu

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-02-20:

#341

(In reply to comment #179)
> Any data I can not
> put, as one button reset on a box works only. Probably there is a set of the
> influencing factors demanding more detailed check.

I have noticed this issue with a unpached kernel too. The "mm fix page writeback accounting to fix oom condition under heavy I/O" patch makes the problem reproduceable. Sometimes the io wait time is at 100%. Sometimes there is no io wait time. There is no problem with read access, but no write access is executed. I can reproduce the problem with xfs. With ext4 the problem does not appear very often on the patched and unpatched kernel.

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-02-20:

#342

(In reply to comment #181)

Then I will bring specification, I use only xfs. Probably patch badly influences it, and probably well works with other file systems. I am sorry, it is simple to me there is nothing it to check up.

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-02-24:

#343

I have consistently had this problem with any kernel I have tried, above 2.6.17, so I have stuck with that up until now.

There are some supposed resolutions to the problem at http://linux-ata.org/faq.html, but none of them work for me, and I don't have the mentioned BIOS setting in my BIOS.

lspci reports...
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 01)
00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA IDE Controller (rev 01)

I do not appear to have the problem on my Macbook 2,1, although the disk performance is like 21M/s, which is lousy. But, what I'm seeing on my one machine is 1M-3M/s.

I also tried passing "pci=routeirq" and "acpi=off" (grasping at straws), but that did not change anything. I did however notice that my HD is /dev/sda in 2.6.17, and /dev/hda in 2.6.25 and 2.6.27.

On 2.6.17, dmesg tells me...
ata_piix 0000:00:1f.2: version 1.05
ata_piix 0000:00:1f.2: MAP [ P0 P2 IDE IDE ]
ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 17 (level, low) -> IRQ 18
ata: 0x170 IDE port busy
PCI: Setting latency timer of device 0000:00:1f.2 to 64
ata1: SATA max UDMA/133 cmd 0x1F0 ctl 0x3F6 bmdma 0xBFA0 irq 14
ata1: dev 0 cfg 49:2f00 82:346b 83:7d09 84:6123 85:3469 86:bc09 87:6123 88:207f
ata1: dev 0 ATA-8, max UDMA/133, 625142448 sectors: LBA48
ata1: dev 0 configured for UDMA/133
scsi2 : ata_piix
Vendor: ATA Model: ST9320421ASG Rev: SD13
Type: Direct-Access ANSI SCSI revision: 05
SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
sda: sda1 sda2 sda3
sd 2:0:0:0: Attached scsi disk sda
sd 2:0:0:0: Attached scsi generic sg0 type 0

But on 2.6.27, I get nothing of the sort. Nothing to do with SATA or anything. I did notice that with 2.6.27, libata was enabled, while with 2.6.17 it didn't appear to be an option even. Ever since that libata, nothing seems to work, and my computer is relatively new. I have a Dell D820 core 2 duo.

I have consistently had this problem with any kernel I have tried, above 2.6.17, so I have stuck with that up until now.

There are some supposed resolutions to the problem at http://linux-ata.org/faq.html, but none of them work for me, and I don't have the mentioned BIOS setting in my BIOS.

lspci reports...
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 01)
00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA IDE Controller (rev 01)

I do not appear to have the problem on my Macbook 2,1, although the disk performance is like 21M/s, which is lousy.  But, what I'm seeing on my one machine is 1M-3M/s.

I also tried passing "pci=routeirq" and "acpi=off" (grasping at straws), but that did not change anything.  I did however notice that my HD is /dev/sda in 2.6.17, and /dev/hda in 2.6.25 and 2.6.27.

On 2.6.17, dmesg tells me...
ata_piix 0000:00:1f.2: version 1.05
ata_piix 0000:00:1f.2: MAP [ P0 P2 IDE IDE ]
ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 17 (level, low) -> IRQ 18
ata: 0x170 IDE port busy
PCI: Setting latency timer of device 0000:00:1f.2 to 64
ata1: SATA max UDMA/133 cmd 0x1F0 ctl 0x3F6 bmdma 0xBFA0 irq 14
ata1: dev 0 cfg 49:2f00 82:346b 83:7d09 84:6123 85:3469 86:bc09 87:6123 88:207f
ata1: dev 0 ATA-8, max UDMA/133, 625142448 sectors: LBA48
ata1: dev 0 configured for UDMA/133
scsi2 : ata_piix
  Vendor: ATA       Model: ST9320421ASG      Rev: SD13
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
 sda: sda1 sda2 sda3
sd 2:0:0:0: Attached scsi disk sda
sd 2:0:0:0: Attached scsi generic sg0 type 0

But on 2.6.27, I get nothing of the sort.  Nothing to do with SATA or anything.  I did notice that with 2.6.27, libata was enabled, while with 2.6.17 it didn't appear to be an option even.  Ever since that libata, nothing seems to work, and my computer is relatively new.  I have a Dell D820 core 2 duo.

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2009-02-27:

#344

I noticed the same - would it be possible to revert the libata-integration ?

Revision history for this message

In Linux Kernel Bug Tracker #12309, akpm (akpm-linux-kernel-bugs) wrote on 2009-02-27:

#345

Trenton, it's unclear to me what you're describing here.

> I have consistently had this problem

which problem?

Anyway, it sounds like what you're reporting is a straightforward
regression in ATA throughput?

If so, please raise a separate, new bug report against SATA for that,
thanks.

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-02-27:

#346

Oops, mid-air collision. I'll answer Andrew's question first.

I'm having two problems.
1. on my Dell D820 I see degraded throughput AND high io wait times as everyone else here has described
2. on my Macbook, I do not see degraded performance, but I see the extremely high io wait times.

Both of these systems have the IDENTICAL IDE chipsets. Read on with my original reply, before collision, for more information.

Quick question, is anyone else using the Intel 82801GBM/GHM IDE chipset, who has this problem as well???

I have a Dell D820 (64 bit) notebook, and a Macbook from late 2007 (the 64 bit ones). I noticed that they both have Intel 82801GBM/GHM IDE chipsets. They both exhibit the problem. If running Gentoo Linux 32 bit on the D820, and one of these bad kernels, my hard drive (which was renamed to hda), gets about 3M/sec, and the high wait times are also present.

With the Macbook, the high io wait times are there, but I get a good throughput, with Gentoo 32 bit. Not sure what the difference is between the D820 and the Macbook, seeing they have very similar hardware (almost identical). I suppose it is possible that Apple made the suggested change that the linux-ata guy suggested (for the bios).

This truly is debilitating. I have now tried two distributions with the latest 2.6.x kernels (Gentoo and OpenSUSE 11.1), and all of them exhibit these symptoms on my hardware. I am almost certain that if this does not get fixed, I will be unable to continue using Linux at work, unless I get a new computer (slim chance but possible). After all, eventually, Gentoo will move towards some new features that require a newer kernel, and I will be left in the dust. I will then be forced to run Linux in vmware under Windows. Please, someone save me from this awful DEATH. muhahahaha.

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-02-27:

#347

(In reply to comment #172)
> Must admit all machines I have noticed this on are x86_64.
>

I am seeing both x86_64 and i686 machines exhibit this. Before my Dell D820 died on me, it was a duo core 32bit machine. Then it got replace with a newer D820 which is a core 2 duo 64bit machine. This issue happened on both of those. And, as mentioned in my last comment, it also happens on my core 2 duo Macbook.

Revision history for this message

In Linux Kernel Bug Tracker #12309, ozan (ozan-linux-kernel-bugs) wrote on 2009-02-28:

#348

I once had a similar *traumatic* throughput regression with an Intel processor + p4_clockmod. So the issues may completely have different causes.

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-03-01:

#349

(In reply to comment #134)
> Hi.
>
> On my laptop(Core2Duo 1.6 ghz) I run my gentoo kernel 2.6.28-gentoo.
> I didn't have any problems with latency.
>
> If I run "dd if=/dev/zero of=file bs=1M count=2048" or "dd if=/dev/zero
> of=/tmp/test bs=1M count=1M" (I tried to run it as user and also as root), my
> system works well and I can start firefox, another shell, open dolphin (i'm
> under kde4-svn) and everything is faster.
>
> I have XFS filesystem on my home and reiserfs on root.
>
> Since I configured my kernel manually, maybe it could be usefull for someone
> to
> have my .config so I'll post it.
>

I have just unmasked, and tried 2.6.28 on Gentoo Linux as well, and the problem appears to be gone. This is on my D820, which is the one with really bad throughput as well. As I am in the process of converting to 64bit on my D820, I am unable to try GUI stuff out. But, before, during heavy load, I was unable to switch between terminals very well either. Now, the system is EXTREMELY responsive, during these heavy load times, which is what I expect. And, I'm getting 82M/sec once the caching limit has been reached, and 256M/sec with caching. This is equivalent to what I was getting with 2.6.17.

Now, I don't know if the gentoo guys applied someone's patch from here, as comment #52 mentioned patching 2.6.28, but it's working for me now. I'm VERY happy about that. :D Based on his description, it very much sounds like the Gentoo guys must have applied the patch. I was doing a while loop, with dd, increasing the amount of data by 1M at a time. The first few, up to about 60M, were getting 256M/sec. Then, I noticed in my other terminal, running vmstat, the iowait times got pinned to nearly 100%. So, I'm thinking that all those dd's that got cached, were finally catching up to the NO LIMIT on cached items, and causing thrashing in the IO system. That caused a COMPLETE freezeup of the while loop. Also, during this time, my HD light was going crazy. Then, when the io wait times dropped to 0 again (cached items flushed), the loop did a few more iterations (and my HD light was off), and it started all over again. Then, again the loop froze, etc, etc, etc.

(In reply to comment #134)
> Hi.
> 
> On my laptop(Core2Duo 1.6 ghz) I run my gentoo kernel 2.6.28-gentoo.
> I didn't have any problems with latency.
> 
> If I run "dd if=/dev/zero of=file bs=1M count=2048" or "dd if=/dev/zero
> of=/tmp/test bs=1M count=1M" (I tried to run it as user and also as root), my
> system works well and I can start firefox, another shell,  open dolphin (i'm
> under kde4-svn) and everything is faster.
> 
> I have XFS filesystem on my home and reiserfs on root.
> 
> Since I configured my kernel manually, maybe it could be usefull for someone
> to
> have my .config so I'll post it.
>

I have just unmasked, and tried 2.6.28 on Gentoo Linux as well, and the problem appears to be gone.  This is on my D820, which is the one with really bad throughput as well. As I am in the process of converting to  64bit on my D820, I am unable to try GUI stuff out.  But, before, during heavy load, I was unable to switch between terminals very well either.  Now, the system is EXTREMELY responsive, during these heavy load times, which is what I expect.  And, I'm getting 82M/sec once the caching limit has been reached, and 256M/sec with caching.  This is equivalent to what I was getting with 2.6.17.

Now, I don't know if the gentoo guys applied someone's patch from here, as comment #52 mentioned patching 2.6.28, but it's working for me now.  I'm VERY happy about that. :D  Based on his description, it very much sounds like the Gentoo guys must have applied the patch.  I was doing a while loop, with dd, increasing the amount of data by 1M at a time.  The first few, up to about 60M, were getting 256M/sec.  Then, I noticed in my other terminal, running vmstat, the iowait times got pinned to nearly 100%.  So, I'm thinking that all those dd's that got cached, were finally catching up to the NO LIMIT on cached items, and causing thrashing in the IO system.  That caused a COMPLETE freezeup of the while loop.  Also, during this time, my HD light was going crazy.  Then, when the io wait times dropped to 0 again (cached items flushed), the loop did a few more iterations (and my HD light was off), and it started all over again.  Then, again the loop froze, etc, etc, etc.

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-03-01:

#350

Also, I feel kind of stupid because I should have reported this back in 2007 when I saw it. But, I figured someone else would find it before too long, so I just hung back with my kernel version. SORRY!!! :(

I guess I shouldn't do that next time. Especially considering it is way easier to find bugs when a new release just came out, and there is a new bug due to the changes in that release.

Revision history for this message

In Linux Kernel Bug Tracker #12309, michiel (michiel-linux-kernel-bugs) wrote on 2009-03-01:

#351

Download full text (4.6 KiB)

<email address hidden> schreef:
> http://bugzilla.kernel.org/show_bug.cgi?id=12309
>
>
>
>
>
> ------- Comment #189 from <email address hidden> 2009-03-01 01:26
> -------
> (In reply to comment #134)
>
>> Hi.
>>
>> On my laptop(Core2Duo 1.6 ghz) I run my gentoo kernel 2.6.28-gentoo.
>> I didn't have any problems with latency.
>>
>> If I run "dd if=/dev/zero of=file bs=1M count=2048" or "dd if=/dev/zero
>> of=/tmp/test bs=1M count=1M" (I tried to run it as user and also as root),
>> my
>> system works well and I can start firefox, another shell, open dolphin (i'm
>> under kde4-svn) and everything is faster.
>>
>> I have XFS filesystem on my home and reiserfs on root.
>>
>> Since I configured my kernel manually, maybe it could be usefull for someone
>> to
>> have my .config so I'll post it.
>>
>>
>
>
> I have just unmasked, and tried 2.6.28 on Gentoo Linux as well, and the
> problem
> appears to be gone. This is on my D820, which is the one with really bad
> throughput as well. As I am in the process of converting to 64bit on my
> D820,
> I am unable to try GUI stuff out. But, before, during heavy load, I was
> unable
> to switch between terminals very well either. Now, the system is EXTREMELY
> responsive, during these heavy load times, which is what I expect. And, I'm
> getting 82M/sec once the caching limit has been reached, and 256M/sec with
> caching. This is equivalent to what I was getting with 2.6.17.
>
> Now, I don't know if the gentoo guys applied someone's patch from here, as
> comment #52 mentioned patching 2.6.28, but it's working for me now. I'm VERY
> happy about that. :D Based on his description, it very much sounds like the
> Gentoo guys must have applied the patch. I was doing a while loop, with dd,
> increasing the amount of data by 1M at a time. The first few, up to about
> 60M,
> were getting 256M/sec. Then, I noticed in my other terminal, running vmstat,
> the iowait times got pinned to nearly 100%. So, I'm thinking that all those
> dd's that got cached, were finally catching up to the NO LIMIT on cached
> items,
> and causing thrashing in the IO system. That caused a COMPLETE freezeup of
> the
> while loop. Also, during this time, my HD light was going crazy. Then, when
> the io wait times dropped to 0 again (cached items flushed), the loop did a
> few
> more iterations (and my HD light was off), and it started all over again.
> Then, again the loop froze, etc, etc, etc.
>
>
>
Ok, so if that version is working for you of Gentoo, can we compare that
with the vanilla kernel?

Can you send us some system info to compare your kernel config with the
vanilla one?

Can we have a tarball with the following structure? (to make it easy to
diff over it)
--------------------------------------------------
systeminfo.txt
vanilla
    \- config (original config of the vanilla kernel, not yours)
    |- kernel-info.txt
    |- dmesg.txt
    |- lsmod-output.txt
    |- test-report.txt
gentoo-youredition
    \- config (the config file of your kernel version)
    |- dmesg.txt
    |- lsmod-output.txt
    |- test-report.txt
    |- gentoo.patch
--------------------------------------------------

If...

bugme-daemon@bugzilla.kernel.org schreef:
> http://bugzilla.kernel.org/show_bug.cgi?id=12309
>
>
>
>
>
> ------- Comment #189 from trent.bugzilla@trentonadams.ca  2009-03-01 01:26
> -------
> (In reply to comment #134)
>   
>> Hi.
>>
>> On my laptop(Core2Duo 1.6 ghz) I run my gentoo kernel 2.6.28-gentoo.
>> I didn't have any problems with latency.
>>
>> If I run "dd if=/dev/zero of=file bs=1M count=2048" or "dd if=/dev/zero
>> of=/tmp/test bs=1M count=1M" (I tried to run it as user and also as root),
>> my
>> system works well and I can start firefox, another shell,  open dolphin (i'm
>> under kde4-svn) and everything is faster.
>>
>> I have XFS filesystem on my home and reiserfs on root.
>>
>> Since I configured my kernel manually, maybe it could be usefull for someone
>> to
>> have my .config so I'll post it.
>>
>>     
>
>
> I have just unmasked, and tried 2.6.28 on Gentoo Linux as well, and the
> problem
> appears to be gone.  This is on my D820, which is the one with really bad
> throughput as well. As I am in the process of converting to  64bit on my
> D820,
> I am unable to try GUI stuff out.  But, before, during heavy load, I was
> unable
> to switch between terminals very well either.  Now, the system is EXTREMELY
> responsive, during these heavy load times, which is what I expect.  And, I'm
> getting 82M/sec once the caching limit has been reached, and 256M/sec with
> caching.  This is equivalent to what I was getting with 2.6.17.
>
> Now, I don't know if the gentoo guys applied someone's patch from here, as
> comment #52 mentioned patching 2.6.28, but it's working for me now.  I'm VERY
> happy about that. :D  Based on his description, it very much sounds like the
> Gentoo guys must have applied the patch.  I was doing a while loop, with dd,
> increasing the amount of data by 1M at a time.  The first few, up to about
> 60M,
> were getting 256M/sec.  Then, I noticed in my other terminal, running vmstat,
> the iowait times got pinned to nearly 100%.  So, I'm thinking that all those
> dd's that got cached, were finally catching up to the NO LIMIT on cached
> items,
> and causing thrashing in the IO system.  That caused a COMPLETE freezeup of
> the
> while loop.  Also, during this time, my HD light was going crazy.  Then, when
> the io wait times dropped to 0 again (cached items flushed), the loop did a
> few
> more iterations (and my HD light was off), and it started all over again. 
> Then, again the loop froze, etc, etc, etc.
>
>
>   
Ok, so if that version is working for you of Gentoo, can we compare that 
with the vanilla kernel?

Can you send us some system info to compare your kernel config with the 
vanilla one?

Can we have a tarball with the following structure? (to make it easy to 
diff over it)
--------------------------------------------------
systeminfo.txt
vanilla
    \- config (original config of the vanilla kernel, not yours)
    |- kernel-info.txt
    |- dmesg.txt
    |- lsmod-output.txt
    |- test-report.txt
gentoo-youredition
    \- config (the config file of your kernel version)
    |- dmesg.txt
    |- lsmod-output.txt
    |- test-report.txt
    |- gentoo.patch
--------------------------------------------------

If you have the time can you do the following on the system:
 - Get the source for that gentoo version you are using (shouldn't be to 
hard on Gentoo ;-) )
 - Get the source of the vanilla kernel with the same version/patch 
level as your gentoo kernel
 - Check to see if your current gentoo config is working on vanilla 
kernel and if that will result in a responding system
 - If that does not solve the bug on your system, create a patch file 
for the gentoo patches, so we can see exactly what gentoo has patched

If you try this and send us the information, we can use a tool like Meld 
(http://meld.sourceforge.net/) to compare the 2 kernel configurations 
with each other.

Can you put the following information in systeminfo.txt
    cat /proc/cpuinfo
    cat /proc/meminfo
    cat /proc/swaps

And for per kernel information:

In kernel-info.txt:
    cat /proc/version
    uname -a
    cat /proc/cmdline
    cat /sys/block/<disk>/queue/scheduler

Config is just the .config file
    You can get the info by the command zcat /proc/config.gz or via your 
/boot/config-<something> or via kernel source

In dmesg.txt your dmesg  output

In lsmod-output.txt your lsmod output.

In test-report the reporting of your tests on the kernel. And how they 
performend and what tests you did.

In gentoo.patch the patches Gentoo made on the vanilla kernel (using the 
diff command).

I hope we can find a piece of the cause with this information.

Greetings,

Michiel

Revision history for this message

In Linux Kernel Bug Tracker #12309, wprins (wprins-linux-kernel-bugs) wrote on 2009-03-01:

#352

(In reply to comment #16)
> I tried elevator=as on my system, and it did not change the behaviour.
> Copying
> files from external USB to internal encrypted SSD still totally smashes
> interactive performance. So this issue might be unrelated.
>

Note, some SSD's have very poor random-write performance, this can cause stuttering and all sorts of side effects. Anandtech investigated this issue when comparing/reviewing Intel's SSD's vs. parts from OCZ which uses a certain JMicron controller. See here: http://www.anandtech.com/showdoc.aspx?i=3403&p=7
You should probably just read the entire review.

It is therefore possible that your issue has more to do with the behaviour of your SSD during writes than the kernel scheduler or anything else.

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-03-01:

#353

Working on it now Michiel. I'll try and get that info for 2.6.27, 2.6.28, and vanilla 2.6.28.

ttyl

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-03-01:

#354

Hmmm, apparently I forgot to try vmstat. The high io wait times are still there, but I haven't been noticing it. I wonder what could have caused me to not notice it now. The performance is way better, even with the high io wait though. I'm not seeing 30 second delays on stuff. Every now and then there's a second or two delay, perhaps five tops. I'll get the info anyhow, and see what the differences are. FYI: This is still on my D820.

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-03-01:

#355

(In reply to comment #192)
> (In reply to comment #16)
> > I tried elevator=as on my system, and it did not change the behaviour.
> Copying
> > files from external USB to internal encrypted SSD still totally smashes
> > interactive performance. So this issue might be unrelated.
> >
>
> Note, some SSD's have very poor random-write performance, this can cause
> stuttering and all sorts of side effects. Anandtech investigated this issue
> when comparing/reviewing Intel's SSD's vs. parts from OCZ which uses a
> certain
> JMicron controller. See here:
> http://www.anandtech.com/showdoc.aspx?i=3403&p=7
> You should probably just read the entire review.
>
> It is therefore possible that your issue has more to do with the behaviour of
> your SSD during writes than the kernel scheduler or anything else.
>

Well, if that is true, it would have to be a combination of the kernel and my system. Mainly because my system was SUPER fast before I tried upgrading my kernel past 2.6.17. As for my Mac, I don't recall having performance issues while running Mac OS X. Nothing like the article describes anyhow.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-03-02:

#356

(In reply to comment #189) - #195
> Well, if that is true, it would have to be a combination of the kernel and my
> system. Mainly because my system was SUPER fast before I tried upgrading my
> kernel past 2.6.17. As for my Mac, I don't recall having performance issues
> while running Mac OS X. Nothing like the article describes anyhow.

There is another bug in 2.6.17/18-??, which gives a poor disc performance, while running the SATA controller on a ICH8M (or equal?) platform in compatibility mode, which gives a high i/o wait time too and lets this bug appear.

There are dependencies between cpu-power, disc throughput, task switching time (eg. clocksource) and this bug.

Has someone tried to identify the source of the problem, with the info provided in Comment #168 and Comment #169 ?

There is a comment in the code (blk-core.c @ ~1300)
/*
  * After dropping the lock and possibly sleeping here, our request
  * may now be mergeable after it had proven unmergeable (above).
  * We don't worry about that case for efficiency. It won't happen
  * often, and the elevators are able to handle it.
  */
But it happens up to 20 times every second during heavy io, causing high io wait times for the writing process (or pdflush) and makes the desktop responsiveness becomes poor. My proof is the real poor desktop responsiveness, when replacing prepare_to_wait_exclusive by msleep_interruptible (see Comment #169). I will be able to spend some more time on this bug in april.

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-03-02:

#357

Created attachment 20405
info request by Michiel in comment 191

Here's the info you wanted Michiel.

Doing a diff on the config of the bad kernel and the new one reveals this interesting tidbit...

diff -u 2.6.27-gentoo-r8-kernel-config.txt 2.6.28-gentoo-r2-kernel-config.txt
-CONFIG_BLK_DEV_IDEDISK=y
-CONFIG_IDEDISK_MULTI_MODE=y
+CONFIG_IDE_GD=y
+CONFIG_IDE_GD_ATA=y

That must have been what switched me back to using sda. Anyhow, that was obviously a separate issue.

So, my system performance, and io wait times are totally fine during normal system operation. When I do REALLY heavy io, the wait times go up, but the responsiveness is still relatively good. I can start kwrite in about 2-3 seconds. It seems like it is fixed to me. But, I'll still try that patched 2.6.28 and get back to you, to see if it is even better.

Perhaps Andrew Morton was right. Maybe my issue was entirely to do with my SATA issues.

Revision history for this message

In Linux Kernel Bug Tracker #12309, james (james-linux-kernel-bugs) wrote on 2009-03-02:

#358

(In reply to comment #196)
> There is another bug in 2.6.17/18-??, which gives a poor disc performance,
> while running the SATA controller on a ICH8M (or equal?) platform in
> compatibility mode, which gives a high i/o wait time too and lets this bug
> appear.
>
> There are dependencies between cpu-power, disc throughput, task switching
> time
> (eg. clocksource) and this bug.

This is interesting, since my notebook has an ICH8M stuck in compatibility mode (no BIOS option). I'll see how it compares to my other notebook with an ATI-IXP chipset.

Revision history for this message

In Linux Kernel Bug Tracker #12309, heine.andersen (heine.andersen-linux-kernel-bugs) wrote on 2009-03-03:

#359

Anyone seen this on a non-sata drive ?

If i do a "dd if=/dev/zero of=outfile bs=1M count=50000" on 2.6.28 the load raise to around 8, on 2.6.29-rc5 It never get past 4.

I'm testing on 64bit, ich9 + sata, btw. I tried to install centos 4.7, with kernel 2.6.9.+, and It's just as bad as 2.6.28.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-03-03:

#360

I have just tested the 2.6.29-rc6. The desktop responsiveness is increased enormous. Especially Firefox is now useable. The problem still exists for me, but it is now not as noticeable as before.

Revision history for this message

In Linux Kernel Bug Tracker #12309, wprins (wprins-linux-kernel-bugs) wrote on 2009-03-04:

#361

(In reply to comment #195)
> (In reply to comment #192)
> > It is therefore possible that your issue has more to do with the behaviour
> of
> > your SSD during writes than the kernel scheduler or anything else.
> >
>
> Well, if that is true, it would have to be a combination of the kernel and my
> system. Mainly because my system was SUPER fast before I tried upgrading my
> kernel past 2.6.17. As for my Mac, I don't recall having performance issues
> while running Mac OS X. Nothing like the article describes anyhow.

OK well in that case I absolutely agree it's obviously a software only problem in your case and probably this scheduler kernel issue. (I just wanted to point out for the record so everyone's aware, that there are some SSD hardware combinations that inherently have limitations that will may very well cause similar sluggishness regardless of the kernel/software itself.)

As an aside, high IO wait percentages are after all as far as I understand it not in and of themselves problematic, since high IO wait only means that a process is waiting for IO. This measure will therefore predictably be high when a process is doing heavy substantial IO with a comparatively slow device. Normally however one would expect such IO to not generally negatively affect other processes/general system reponsiveness, *except* if the other processes are also somehow IO hungry in order to proceed and you have some sort of IO resource contention going on, or as appears in this thread, there's actually a scheduling problem which causes processes that are runnable to not receive the CPU when they should, thus resulting in perceived sluggishness.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-03-04:

#362

I must correct my last post (Comment #200). I was working with VMs the whole day and it is still awful as before.

But there is a big improvement while using firefox.

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-03-04:

#363

I would agree that -rc6 has for some reason greatly improved system responsiveness under I/O load but there are most certainly still great issues in the block I/O world.

Just now I once again managed to completely wedge up my machine by doing nothing more than copying a few gigabytes of files between drives. Furthermore, Firefox still freezes for several seconds when I first start typing in the location bar as it looks in its history database. Lastly, Evolution still takes several minutes to start and become usable while it's I/O rate is less than 1 MB/s. All in all, things are pretty unusable.

Jens, are you around? I've been asking various distributions and vendors whether they could spare some qualified man-hours to get this problem finally worked out but it seems like you're our best hope. I know you'll be getting at least one case of beer when this is fixed ;)

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-03-06:

#364

Hi Guys,

My brother has apparently been having the same problem on his computer. I hadn't realized it when I submitted my bug. For him, he has an ICH8 family of chipsets.

The following works for him, and the problem goes away.
echo anticipatory > /sys/block/sda/queue/scheduler

Looks like this may be a tough one to nail down, because everyone's symptoms are slightly different. I'm wondering if perhaps there are multiple issues going on here.

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-03-06:

#365

Oh, crap, I forgot the details. Before the details, I also wanted to say that I am going to get him to try changing the BIOS option mentioned on the libata page I gave earlier, to see what happens.

[03:05 root@zipper ~]# lspci
00:00.0 Host bridge: Intel Corporation 82P965/G965 Memory Controller Hub (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82G965 Integrated Graphics Controller (rev 02)
00:03.0 Communication controller: Intel Corporation 82P965/G965 HECI Controller (rev 02)
00:19.0 Ethernet controller: Intel Corporation 82566DC Gigabit Network Connection (rev 02)
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Contoller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 02)
00:1c.2 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 3 (rev 02)
00:1c.3 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 4 (rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 5 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801HB/HR (ICH8/R) LPC Interface Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801H (ICH8 Family) 4 port SATA IDE Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 02)
00:1f.5 IDE interface: Intel Corporation 82801H (ICH8 Family) 2 port SATA IDE Controller (rev 02)
02:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6101 single-port PATA133 interface (rev b1)
06:00.0 RAID bus controller: Silicon Image, Inc. SiI 3112 [SATALink/SATARaid] Serial ATA Controller (rev 02)
06:01.0 Mass storage controller: Promise Technology, Inc. 20269 (rev 02)
06:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link)

[03:05 root@zipper ~]# uname -a
Linux zipper 2.6.18-53.el5xen #1 SMP Mon Nov 12 02:46:57 EST 2007 x86_64 x86_64 x86_64 GNU/Linux

[03:09 root@zipper ~]# cat /etc/issue
CentOS release 5 (Final)
Kernel \r on an \m

Oh, crap, I forgot the details.  Before the details, I also wanted to say that I am going to get him to try changing the BIOS option mentioned on the libata page I gave earlier, to see what happens.

[03:05 root@zipper ~]# lspci
00:00.0 Host bridge: Intel Corporation 82P965/G965 Memory Controller Hub (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82G965 Integrated Graphics Controller (rev 02)
00:03.0 Communication controller: Intel Corporation 82P965/G965 HECI Controller (rev 02)
00:19.0 Ethernet controller: Intel Corporation 82566DC Gigabit Network Connection (rev 02)
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Contoller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 02)
00:1c.2 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 3 (rev 02)
00:1c.3 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 4 (rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 5 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801HB/HR (ICH8/R) LPC Interface Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801H (ICH8 Family) 4 port SATA IDE Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 02)
00:1f.5 IDE interface: Intel Corporation 82801H (ICH8 Family) 2 port SATA IDE Controller (rev 02)
02:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6101 single-port PATA133 interface (rev b1)
06:00.0 RAID bus controller: Silicon Image, Inc. SiI 3112 [SATALink/SATARaid] Serial ATA Controller (rev 02)
06:01.0 Mass storage controller: Promise Technology, Inc. 20269 (rev 02)
06:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link)

[03:05 root@zipper ~]# uname -a
Linux zipper 2.6.18-53.el5xen #1 SMP Mon Nov 12 02:46:57 EST 2007 x86_64 x86_64 x86_64 GNU/Linux

[03:09 root@zipper ~]# cat /etc/issue
CentOS release 5 (Final)
Kernel \r on an \m

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-03-06:

#366

I have noticed, that while working with VMs my system starts swapping after a while. I tried the -rc7 with Mathieus patch (Comment #160) and my system seems to be useable. There is still the non fair io scheduling between processes, but it's another problem. I am using a kernel without "Group CPU Scheduler" and "Control Group Support" and writing this text in firefox at load avg 12.

To reach such high load avg, I have to run eight concurrent dd write operations.

for i in 1 2 3 4 5 6 7 8; do \
dd if=/dev/zero of=test-$i bs=1M count=4K oflag=direct & echo test-$i; \
done

Copying big files with nautilus makes my system from time to time unusable. With known symptoms such as "Unable to switch desktop" and "mouse freezes".

And finally, I have not seen the complete io freeze with -rc7 kernel on xfs, ext3 and ext4.

Revision history for this message

In Linux Kernel Bug Tracker #12309, Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote on 2009-03-08:

#367

Trenton, I too set my kernel to anticipatory scheduler and for a while i thought all was well when I ran dd if=/dev/zero of=~/test bs=1M count=1500 in order to test. Then I realized that its not a reliable testing method since the *anticipatory* can anticipate the coming zeroes that will be written. I ran dd if=/dev/zero of=~/test bs=1M count=1500 simultaniously with the one writing from dev/zero, and realized that the part of the syndrome is fixed with AS, but the problem persists...

Revision history for this message

In Linux Kernel Bug Tracker #12309, Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote on 2009-03-08:

#368

argh, forgot to give details too...
running 2.6.28-8-generic kernel (64bit) in ubuntu jaunty. and i had this problem in 32 kernels before aswell.

khaal@Xeraphim:~$ sudo lspci
[sudo] password for khaal:
00:00.0 RAM memory: nVidia Corporation C51 Host Bridge (rev a2)
00:00.1 RAM memory: nVidia Corporation C51 Memory Controller 0 (rev a2)
00:00.2 RAM memory: nVidia Corporation C51 Memory Controller 1 (rev a2)
00:00.3 RAM memory: nVidia Corporation C51 Memory Controller 5 (rev a2)
00:00.4 RAM memory: nVidia Corporation C51 Memory Controller 4 (rev a2)
00:00.5 RAM memory: nVidia Corporation C51 Host Bridge (rev a2)
00:00.6 RAM memory: nVidia Corporation C51 Memory Controller 3 (rev a2)
00:00.7 RAM memory: nVidia Corporation C51 Memory Controller 2 (rev a2)
00:02.0 PCI bridge: nVidia Corporation C51 PCI Express Bridge (rev a1)
00:04.0 PCI bridge: nVidia Corporation C51 PCI Express Bridge (rev a1)
00:09.0 RAM memory: nVidia Corporation MCP51 Host Bridge (rev a2)
00:0a.0 ISA bridge: nVidia Corporation MCP51 LPC Bridge (rev a3)
00:0a.1 SMBus: nVidia Corporation MCP51 SMBus (rev a3)
00:0a.2 RAM memory: nVidia Corporation MCP51 Memory Controller 0 (rev a3)
00:0b.0 USB Controller: nVidia Corporation MCP51 USB Controller (rev a3)
00:0b.1 USB Controller: nVidia Corporation MCP51 USB Controller (rev a3)
00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1)
00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1)
00:10.0 PCI bridge: nVidia Corporation MCP51 PCI Bridge (rev a2)
00:10.1 Audio device: nVidia Corporation MCP51 High Definition Audio (rev a2)
00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
02:00.0 VGA compatible controller: nVidia Corporation G80 [GeForce 8800 GTS] (rev a2)
03:05.0 FireWire (IEEE 1394): Agere Systems FW323 (rev 70)
03:06.0 Multimedia controller: Philips Semiconductors SAA7131/SAA7133/SAA7135 Video Broadcast Decoder (rev d1)
03:07.0 Multimedia audio controller: Creative Labs SB X-Fi
03:09.0 Ethernet controller: Atheros Communications Inc. AR5413 802.11abg NIC (rev 01)

argh, forgot to give details too... 
running 2.6.28-8-generic kernel (64bit) in ubuntu jaunty. and i had this problem in 32 kernels before aswell.

khaal@Xeraphim:~$ sudo lspci
[sudo] password for khaal: 
00:00.0 RAM memory: nVidia Corporation C51 Host Bridge (rev a2)
00:00.1 RAM memory: nVidia Corporation C51 Memory Controller 0 (rev a2)
00:00.2 RAM memory: nVidia Corporation C51 Memory Controller 1 (rev a2)
00:00.3 RAM memory: nVidia Corporation C51 Memory Controller 5 (rev a2)
00:00.4 RAM memory: nVidia Corporation C51 Memory Controller 4 (rev a2)
00:00.5 RAM memory: nVidia Corporation C51 Host Bridge (rev a2)
00:00.6 RAM memory: nVidia Corporation C51 Memory Controller 3 (rev a2)
00:00.7 RAM memory: nVidia Corporation C51 Memory Controller 2 (rev a2)
00:02.0 PCI bridge: nVidia Corporation C51 PCI Express Bridge (rev a1)
00:04.0 PCI bridge: nVidia Corporation C51 PCI Express Bridge (rev a1)
00:09.0 RAM memory: nVidia Corporation MCP51 Host Bridge (rev a2)
00:0a.0 ISA bridge: nVidia Corporation MCP51 LPC Bridge (rev a3)
00:0a.1 SMBus: nVidia Corporation MCP51 SMBus (rev a3)
00:0a.2 RAM memory: nVidia Corporation MCP51 Memory Controller 0 (rev a3)
00:0b.0 USB Controller: nVidia Corporation MCP51 USB Controller (rev a3)
00:0b.1 USB Controller: nVidia Corporation MCP51 USB Controller (rev a3)
00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1)
00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1)
00:10.0 PCI bridge: nVidia Corporation MCP51 PCI Bridge (rev a2)
00:10.1 Audio device: nVidia Corporation MCP51 High Definition Audio (rev a2)
00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
02:00.0 VGA compatible controller: nVidia Corporation G80 [GeForce 8800 GTS] (rev a2)
03:05.0 FireWire (IEEE 1394): Agere Systems FW323 (rev 70)
03:06.0 Multimedia controller: Philips Semiconductors SAA7131/SAA7133/SAA7135 Video Broadcast Decoder (rev d1)
03:07.0 Multimedia audio controller: Creative Labs SB X-Fi
03:09.0 Ethernet controller: Atheros Communications Inc. AR5413 802.11abg NIC (rev 01)

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-03-08:

#369

I had done some initial testing on my x86_64 box, of 2.6.17 vanilla (downloaded from kernel.org), and it seems to me that it has the problem too. I don't understand why my problem started with 2.6.18 if the vanilla 2.6.17 has the problem. Note that I tested the first 2.6.17, and the last version of 2.6.17. I'm thoroughly confused. I think I'll switch to 2.6.17, and run that for awhile to see if there's better performance overall. Perhaps loading it is not the best way to see if there's latency issues, as there will be some.

Then, if I do see some improvement, I'll increment to 2.6.18. Hopefully, slowly but surely I can figure out which exact kernel as the problem, and then a kernel dev can fix it. That's the plan anyhow. :P

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-03-09:

#370

Hi I tried new 2.6.28.7 kernel. And things seem to go worse... Even btorrent checking downloaded files is able to lock the computer...

I will upload a new screenshot showing 91.4% of processor time waiting for HD to read data... This is a nonsense... I will try to do same check for evey new kernel that goes out to check for improvements.

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-03-09:

#371

Created attachment 20464
IWait problem 91,4% 2.6.28.7

Revision history for this message

In Linux Kernel Bug Tracker #12309, Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote on 2009-03-09:

#372

Wanted to add even more testing results from my side, tried the suggestions from this source: http://stackoverflow.com/questions/392198/how-to-make-linux-gui-usable-when-lots-of-disk-activity-is-happening
by changing some vm.dirty_ variables. No improvement could be seen neither changing to deadline scheduler didn't improve the situation. I also changed /sys/block/sda/queue/nr_requests to 64 with same unresponsiveness.

i'm still on the same kernel (2.6.28-8) and my fstab mounts the partition with relatime,noatime,nodiratime flags.

Revision history for this message

In Linux Kernel Bug Tracker #12309, michiel (michiel-linux-kernel-bugs) wrote on 2009-03-09:

#373

Am currently installing 2.6.29-rc7. Hoping that it will solve some issues on the bug.

Can changing SLAB allocator be an option to test for the problem? We can choose between SLAB/SLUB/SLOB. Maybe that can be helpfull.

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-03-09:

#374

There's still some confusing comments on IO wait in here, lets clear that up at least. 91% io wait does not mean it's using 91% cpu power for doing the IO, it merely means that some process is BLOCKED waiting for IO 91% of the time. It has zero relevance on cpu cycles consumed. Same goes for the observed load. Having a load of 2.0 due to io wait times does not mean that you have a doubly loaded system. It just means that, on average, two processes are blocked waiting for IO. When you start a bittorrent client and it checks the file data, you would expect io wait to be nearly 100%. It does do some cpu processing, so that's why it's not completely at 100%.

So forget IO wait, it doesn't tell you ANYTHING about whether a system is supposed to be slow or not.

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-03-09:

#375

And to make a more general comment... This bug is impossible to solve, since it (once again) has degraded into somewhere for everybody to tunnel everything that relates to a system feeling sluggish. There could be at least 10 separate issues described in here, or more. And while some of these are surely things we could do better, some are also certainly expected behaviour. We are at least touching several file systems, mm issues, and io scheduler issues. I'm quite sure that some of the mentioned behaviour is completely due to ext3 sucking at fsync.

I'd LOVE to be able to look into this, but honestly I have no idea where to start. What I would also love is for someone to post a test case that actually works. This includes observed behaviour and a description of what you would EXPECT to see happen. Then we/I should be able to at least judge whether there's something we can do about it. Expecting a fully fluid system while having 100 threads writing data to the device is not reasonable, for instance. But if it behaves significantly worse than previous kernels, then there's still something to look into.

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-03-09:

#376

I totally agree with you Jens. I have been having a hard time localizing the problem myself. I went back to the 2.6.17 kernel, and it seems to be worse than my 2.6.28 kernel. But keep in mind, I was running i686 when I originally discovered the problem, and now I'm doing x86_64. I think the only way I will be able to localize the issue, is if I restore my system to i686 gentoo, and then trying 2.6.28, then I may start getting somewhere.

I also agree that it is nearly impossible to solve this one without some more concrete data. I wish I had chosen a different time to upgrade to 64bit, because then I could be fiddling with this issue on my i686 still.

I'll post again if I find something more concrete.

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-03-09:

#377

I will admit that many of my issues seem to be caused by fsync() (I'm on ext4). One of the largest issues I'm currently having is Liferea blocking in fsync() for several seconds every time a new item is selected. During this time kjournald2 is writing, although iotop only shows a total write rate of ~500kB/s. This seems extremely slow and far below the disk's (a 7200 RPM SATA drive) capacity. This low I/O rate is common for all sluggish I/O cases. Does this sound like expected behavior? Perhaps my problems have been caused by just generally slow I/O?

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-03-09:

#378

The last without a problem kernel was - 2.6.16 (acknowledgement to that is SLES 10 SP2 does not give high iowait on ASUS P5K). So let's look what super-mega-function has appeared in 2.6.17 and was absent in 2.6.16. This function cannot clearly belong to separately taken file system (all file systems are subject to an error). Changes in schedulers between 2.6.16-2.6.17 I has not found out. Introduction libata - a unique difference. Who gives high iowait - itself libata or the infrastructure of its embedding in a kernel practically has no value. Value has only one - the kernel is disabled. Also it is the sad fact.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-03-09:

#379

I do not mean the fsync problem, which is not a problem in the 29 kernel for me any more. I mean the sluggish behaviour of all gui application. Especially while working with vmware workstation. Suspend and resume time rises from less than two minutes to up to ten minutes. It started for me, when I upgraded from feisty (2.6.20) to gutsy (2.6.22) on a 32-bit Pentium-M.

There is a problem to locate this problem, as it does not appear all the time and there are a lot other problems and many solved problems, which make a comparison very problematic. And my assumption is, that I depend on the cpu, hard drive and user.

The best hint for me was the duration of the process test. I have not committed this test to adjust the kernel to this special test case, as I have seen at LKML. It should help to localize the problem. The results of this tests, seems to fit with the regression of the sluggish behaviour.

See
http://bugzilla.kernel.org/attachment.cgi?id=19797&action=view
CentOS 2.6.18-92.el5 - 29.995s - good
Feisty 2.6.20.21 - 25.304s - good
Gusty 2.6.22-16 - 40.405s - bad
Hardy 2.6.24-23 - 37.604s - bad
Intrepid 2.6.27-9 - 96.922s - unusable

I have seen with powertop, that the number of interrupt was doubled from 200 to 400 for keyboard input, when a high io was running in the background.

And I know there is nothing wrong with a high io wait time, but as soon as the io wait time reaches 100% the desktop becomes sluggish and unusable. You can try this on an installation on a slow disk and ext3, or even on an full encrypted disc. The slow SSD could be related with this bug, as there is a real poor write performance with linux on many SSDs. I have measured transfer rates up(down) to 2MB/s on no direct write (4KB cache splitting), while direct writing gets up to 90MB/s on my SSD. My system on my SSD is completely unusable.

I will execute some tests in a virtual machine, as it's seems to me, that an application running in the virtual machine is more affected by this sluggish behaviour than an application executed on the host. I will run exactly the same vm and test on different host kernels. But I am not able to send some more time before April. Perhaps someone else can starts earlier?

I do not mean the fsync problem, which is not a problem in the 29 kernel for me any more. I mean the sluggish behaviour of all gui application. Especially while working with vmware workstation. Suspend and resume time rises from less than two minutes to up to ten minutes. It started for me, when I upgraded from feisty (2.6.20) to gutsy (2.6.22) on a 32-bit Pentium-M.

There is a problem to locate this problem, as it does not appear all the time and there are a lot other problems and many solved problems, which make a comparison very problematic. And my assumption is, that I depend on the cpu, hard drive and user.

The best hint for me was the duration of the process test. I have not committed this test to adjust the kernel to this special test case, as I have seen at LKML. It should help to localize the problem. The results of this tests, seems to fit with the regression of the sluggish behaviour.

See
http://bugzilla.kernel.org/attachment.cgi?id=19797&action=view
CentOS 2.6.18-92.el5 - 29.995s - good
Feisty 2.6.20.21 - 25.304s - good
Gusty 2.6.22-16 - 40.405s - bad
Hardy 2.6.24-23 - 37.604s - bad
Intrepid 2.6.27-9 - 96.922s - unusable

I have seen with powertop, that the number of interrupt was doubled from 200 to 400 for keyboard input, when a high io was running in the background.

And I know there is nothing wrong with a high io wait time, but as soon as the io wait time reaches 100% the desktop becomes sluggish and unusable. You can try this on an installation on a slow disk and ext3, or even on an full encrypted disc. The slow SSD could be related with this bug, as there is a real poor write performance with linux on many SSDs. I have measured transfer rates up(down) to 2MB/s on no direct write (4KB cache splitting), while direct writing gets up to 90MB/s on my SSD. My system on my SSD is completely unusable.

I will execute some tests in a virtual machine, as it's seems to me, that an application running in the virtual machine is more affected by this sluggish behaviour than an application executed on the host. I will run exactly the same vm and test on different host kernels. But I am not able to send some more time before April. Perhaps someone else can starts earlier?

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-03-10:

#380

Jens,

I'm trying to nail this down on my computer. So, I'm creating a vm of my i686 gentoo system, to see if I can see the same results as I was before.

I used the following command, inside the vm, to extract my system tarball backup of my previous system.

ssh root@192.168.8.4 'gunzip -c /media/backup/system.tar.gz' | tar -xv --exclude './usr/portage/packages/*' --exclude './userportage/distfiles/*' --exclude './var/log/apache2/*' --exclude ./Bonnie.10218 >extract-list.txt

Now, on the host system (192.168.8.4) I am seeing the following...
trenta@tdamac ~/Desktop $ uptime
01:39:37 up 1:21, 6 users, load average: 20.49, 14.92, 9.35

Obviously I'm getting REALLY sick performance. Normally something linear like a tar extraction does not produce these kinds of issues with performance. Granted that the disk may have to move around a little, but is it that bad?.

Is there some sort of thing I can do, to analyze why this is happening? e.g. something like strace, or something? I ran strace -c on kwrite, during heavy load like this, and it claims that it finished everything in a tenth of a second, even though it took like 30.

So, is there a lower level mechanism I can use to get a fix on what is making processes wait? For example, something that will tell me "kernel function X" is blocking?

Thanks.

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-03-10:

#381

Hi Ben,

Thank you for the clarification. I think I was really lost on this. I expected the process to wait while IO but then it's supposed that the rest of the system should take the rest of the processor power while it's not. The system seems to hang until IO stops.

So I think best way to proceed is to start to discard problems.

I propose to start with:

    I will try to do CPU intensive with no IO task while other process will write a file with no CPU intensive to check if the first process take the same time to execute under high IO or not.
            Process 1: CPU / No IO
            Process 2: High UI / No CPU
    And measure times...

Should this test trigger the problem? As no IO for process 1 it should finish almost in the same time than under no load at all. Right?

Can we discard a ext3 related problem? Test case (Test writing files 1 thread, over ext3 and ext4, reiser, etc) and observe responsivness.

Can we track if this is a fsync problem? How (commands, test case)?

How can we test this without making filesystem take part on the tests?

Can we show differences between kernel 2.6.16 and >=2.6.28? (I will do this today)

How to measure responsiveness? Can we put a numeric value to this?

Thank you all.

Revision history for this message

In Linux Kernel Bug Tracker #12309, Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote on 2009-03-10:

#382

Gonzalo, I think you're giving great question in order for us to establish the cause of the problem. Even though I can't anwer most of your questions (I'm no guru) I think we all should agree on a unified ways to test and measure the responsiveness. Regarding filesystems, I tried ReiserFS, ext3 and ext4 with two terminals running dd if=/dev/zero of=/test1 bs=1M count=1400 and dd if=/dev/urandom of=/tst2 bs=1M count=700 as a test, and they all gave the same sluggish feeling to the system.

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-03-10:

#383

I agree that those tests let us know that there's a problem, because we see the sluggish behaviour. However, if a kernel dev is not seeing the performance issues on their machines, it won't be very convincing for them. If however, we provide some concrete tests, showing which kernels didn't have the problem, which did, and the test results, then they may be able to get somewhere. That's why I'm hoping someone can chime in and tell us what sorts of tests would be useful, such as I suggested in comment #220.

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-03-11:

#384

Ok. Here are my firsts tests with 2.6.28.7:

I used a modified version of the ThreadSchedulerTest.cpp that kills the initial timeout. And a dd to simulate high IO loads.

First hypothesis seesm to be broken. High IO loads does not seem affect processing much.

------------------------------------------------------------------
./kernel-test.sh
Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 3362
min:0.008ms|avg:0.010-0.011ms|mid:0.000ms|max:0.000ms|duration:19.791s
Break!
We have Burning CPU with 4855
min:0.006ms|avg:0.010-0.011ms|mid:0.000ms|max:0.000ms|duration:18.754s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 6211
We have Burning CPU with 6212
min:0.007ms|avg:0.010-0.011ms|mid:0.000ms|max:0.000ms|duration:20.265s
DD Finished
--- Finish ---
Kernel tested: 2.6.28.7-level2crm i686

-----------------------------------------------------------------------

Results says that it takes 2 segs more to complete (Is this relevant for a process that takes ~18-19s to complete).

A curious thing is that I observed no IO Wait was present while doing processing in test 2. Only system processor time.

This also seem to be strange as it should be 100% USER time. System time (correct me if I'm wrong) means that OS is taking lot of time doing scheduling of the threads...

Anyway, I will try to reproduce high iowait times before starting the CPU intensive program to see if we are right.

I will post the test suite in bash. Feel free to add more tests.

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-03-11:

#385

Created attachment 20489
Initial effort to build an automatic test suite for this bug

Please feel free to add tests or correct what's wrong

Revision history for this message

In Linux Kernel Bug Tracker #12309, Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote on 2009-03-11:

#386

Hello Gonzalo, I just ran your testsuit and here is the results:

---------------------------------
khaal@Xeraphim:~/Desktop/test-suite-bug-12309$ sh kernel-test.sh
Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 17986
min:0.006ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:21.873s
We have Burning CPU with 19909
min:0.004ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:17.708s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 21084
We have Burning CPU with 21085
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 12.5488 s, 16.7 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 16.0014 s, 13.1 MB/s
DD Finished
Killing 21085 process
--- Finish ---
Kernel tested: 2.6.28-8-generic x86_64
khaal@Xeraphim:~/Desktop/test-suite-bug-12309$ 200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 18.6493 s, 11.2 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 18.9091 s, 11.1 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 20.0353 s, 10.5 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 20.1651 s, 10.4 MB/s
-------------------------------------

I'm not really familiar with what it saying, but it did affect the desktop responsiveness. I made a google spread sheet thats open for access to anyone in order to organise test results and see common traits among our systems: http://spreadsheets.google.com/ccc?key=p3aerC-xkjEqvo7BvMHaxXg - there is one thing missing and that is a place to upload the output of these test results, anyone who knows of a service that's like photobucket but for text/console output?

the document open to edit for everyone. Please choose a specific color for you so we keep the readability :-)

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-03-11:

#387

Created attachment 20491
Initial effort to build an automatic test suite for this bug V2

This fixes the killing of the process (I hope)

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-03-11:

#388

I will try to explain:

It takes between 17s - 22s to complete.

The lines like:
209715200 bytes (210 MB) copied, 18.6493 s, 11.2 MB/s

Tells you the throughput of your HD. This throughput is shared between 6 processes that are writing at the same time.

TEST 2. Then tries to do the same thing but with high IO.

Unfortunately I killed the program before finish because High IO finished before than the CPU intensive program. so it seems it is affecting hard to you.

In my computer CPU program finished early.

Can you run it with the new version, please?

NOTE: It writes several 200MB files to your hard disk. Please remove them after tests... it will take 200X6=1200MB of your disk.

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-03-11:

#389

For me throughput is horrible:
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 14987
min:0.005ms|avg:0.010-0.011ms|mid:0.000ms|max:0.000ms|duration:21.527s
We have Burning CPU with 16371
min:0.005ms|avg:0.010-0.011ms|mid:0.000ms|max:0.000ms|duration:21.833s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 17768
We have Burning CPU with 17769
min:0.007ms|avg:0.010-0.011ms|mid:0.000ms|max:0.000ms|duration:22.777s
200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 64,2187 s, 3,3 MB/s
200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 75,1226 s, 2,8 MB/s
DD Finished
IO Finished before than processing
--- Finish ---
Kernel tested: 2.6.28.7-level2crm i686
gad@ws-esp16:~$ 200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 76,8811 s, 2,7 MB/s
200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 79,4772 s, 2,6 MB/s
200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 82,0248 s, 2,6 MB/s
200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 82,9147 s, 2,5 MB/s
---------------------------

I forgot to say ext3 filesystem here...

I will try with different kernels from now on.

Revision history for this message

In Linux Kernel Bug Tracker #12309, james (james-linux-kernel-bugs) wrote on 2009-03-11:

#390

Results from my notebook:

[james@rhapsody tsb]$ ./kernel-test.sh
Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 3772
min:0.009ms|avg:0.013-0.013ms|mid:0.000ms|max:0.000ms|duration:37.528s
We have Burning CPU with 6762
min:0.011ms|avg:0.013-0.013ms|mid:0.000ms|max:0.000ms|duration:37.351s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 9489
We have Burning CPU with 9490
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 21.1718 s, 9.9 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 38.183 s, 5.5 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 41.1141 s, 5.1 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 45.3742 s, 4.6 MB/s
min:0.007ms|avg:0.012-0.013ms|mid:0.000ms|max:0.000ms|duration:38.801s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 49.0724 s, 4.3 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 50.0517 s, 4.2 MB/s
DD Finished
IO Finished before than processing
--- Finish ---
Kernel tested: 2.6.29-0.54.rc7.git3.fc10.x86_64 x86_64

Revision history for this message

In Linux Kernel Bug Tracker #12309, igor.lautar (igor.lautar-linux-kernel-bugs) wrote on 2009-03-11:

#391

Download full text (7.1 KiB)

Output on kubuntu 8.10 running on EliteBook 8530w.

While running, it felt 'sluggish' but not by much. When copying/unziping big files, I can get 10+ seconds of firefox inactivity.

Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 24021
min:0.004ms|avg:0.018-0.022ms|mid:0.000ms|max:0.000ms|duration:15.861s
We have Burning CPU with 25229
min:0.004ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:15.678s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 27067
We have Burning CPU with 27068
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 15.0066 s, 14.0 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 19.0474 s, 11.0 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 21.9454 s, 9.6 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 22.6718 s, 9.3 MB/s
DD Finished
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 22.9066 s, 9.2 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 23.667 s, 8.9 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Fi...

Output on kubuntu 8.10 running on EliteBook 8530w.

While running, it felt 'sluggish' but not by much. When copying/unziping big files, I can get 10+ seconds of firefox inactivity.

Using current dir to do IO tests                        
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 24021                          
min:0.004ms|avg:0.018-0.022ms|mid:0.000ms|max:0.000ms|duration:15.861s
We have Burning CPU with 25229                                        
min:0.004ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:15.678s
Second Test: Does the process queue get blocked because high IO?      
Starting                                                              
We have High IO PID 27067                                             
We have Burning CPU with 27068                                        
200+0 records in                                                      
200+0 records out                                                     
209715200 bytes (210 MB) copied, 15.0066 s, 14.0 MB/s                 
200+0 records in                                                      
200+0 records out                                                     
209715200 bytes (210 MB) copied, 19.0474 s, 11.0 MB/s                 
200+0 records in                                                      
200+0 records out                                                     
209715200 bytes (210 MB) copied, 21.9454 s, 9.6 MB/s                  
200+0 records in                                                      
200+0 records out                                                     
209715200 bytes (210 MB) copied, 22.6718 s, 9.3 MB/s                  
DD Finished                                                           
DD Finished                                                           
200+0 records in                                                      
200+0 records out                                                     
209715200 bytes (210 MB) copied, 22.9066 s, 9.2 MB/s                  
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
200+0 records in                                                      
200+0 records out                                                     
209715200 bytes (210 MB) copied, 23.667 s, 8.9 MB/s                   
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD FinishedUsing current dir to do IO tests                        
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 24021                          
min:0.004ms|avg:0.018-0.022ms|mid:0.000ms|max:0.000ms|duration:15.861s
We have Burning CPU with 25229                                        
min:0.004ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:15.678s
Second Test: Does the process queue get blocked because high IO?      
Starting                                                              
We have High IO PID 27067                                             
We have Burning CPU with 27068                                        
200+0 records in                                                      
200+0 records out                                                     
209715200 bytes (210 MB) copied, 15.0066 s, 14.0 MB/s                 
200+0 records in                                                      
200+0 records out                                                     
209715200 bytes (210 MB) copied, 19.0474 s, 11.0 MB/s                 
200+0 records in                                                      
200+0 records out                                                     
209715200 bytes (210 MB) copied, 21.9454 s, 9.6 MB/s                  
200+0 records in                                                      
200+0 records out                                                     
209715200 bytes (210 MB) copied, 22.6718 s, 9.3 MB/s                  
DD Finished                                                           
DD Finished                                                           
200+0 records in                                                      
200+0 records out                                                     
209715200 bytes (210 MB) copied, 22.9066 s, 9.2 MB/s                  
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
200+0 records in                                                      
200+0 records out                                                     
209715200 bytes (210 MB) copied, 23.667 s, 8.9 MB/s                   
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
min:0.004ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:17.371s
DD Finished
IO Finished before than processing
 --- Finish ---
Kernel tested: 2.6.27-13-generic x86_64
DD Finished
DD Finished
min:0.004ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:17.371s
DD Finished
IO Finished before than processing
 --- Finish ---
Kernel tested: 2.6.27-13-generic x86_64

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-03-11:

#392

gad@ws-esp16:~$ ./kernel-test.sh /mnt/data/gad/
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 8103
min:0.006ms|avg:0.010-0.011ms|mid:0.000ms|max:0.000ms|duration:21.766s
We have Burning CPU with 10098
min:0.007ms|avg:0.010-0.011ms|mid:0.000ms|max:0.000ms|duration:21.275s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 12105
We have Burning CPU with 12106
min:0.007ms|avg:0.010-0.011ms|mid:0.000ms|max:0.000ms|duration:20.630s
200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 34,4896 s, 6,1 MB/s
200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 35,157 s, 6,0 MB/s
200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 37,4852 s, 5,6 MB/s
DD Finished
IO Finished before than processing
--- Finish ---
Kernel tested: 2.6.28-8-generic i686
gad@ws-esp16:~$ 200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 40,6583 s, 5,2 MB/s
200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 49,9392 s, 4,2 MB/s
200+0 registros de entrada
200+0 registros de salida
209715200 bytes (210 MB) copiados, 51,9306 s, 4,0 MB/s

-----

Filesystem ext4

Revision history for this message

In Linux Kernel Bug Tracker #12309, igor.lautar (igor.lautar-linux-kernel-bugs) wrote on 2009-03-11:

#393

Download full text (13.2 KiB)

Seams last comment has double c/p, making it hard to read. Here goes another result (for some reason, I get a bunch of "DD Finished", I didn't want to cut as do not know if its relevant for test - probably not):

Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 7139
min:0.005ms|avg:0.015-0.031ms|mid:0.000ms|max:0.000ms|duration:22.600s
We have Burning CPU with 8947
min:0.004ms|avg:0.014-0.031ms|mid:0.000ms|max:0.000ms|duration:22.342s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 10772
We have Burning CPU with 10773
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 14.7651 s, 14.2 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 16.8547 s, 12.4 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 18.5809 s, 11.3 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 19.6679 s, 10.7 MB/s
DD Finished
DD Finished ...

Seams last comment has double c/p, making it hard to read. Here goes another result (for some reason, I get a bunch of "DD Finished", I didn't want to cut as do not know if its relevant for test - probably not):

Using current dir to do IO tests                        
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 7139                           
min:0.005ms|avg:0.015-0.031ms|mid:0.000ms|max:0.000ms|duration:22.600s
We have Burning CPU with 8947                                         
min:0.004ms|avg:0.014-0.031ms|mid:0.000ms|max:0.000ms|duration:22.342s
Second Test: Does the process queue get blocked because high IO?      
Starting                                                              
We have High IO PID 10772                                             
We have Burning CPU with 10773                                        
200+0 records in                                                      
200+0 records out                                                     
209715200 bytes (210 MB) copied, 14.7651 s, 14.2 MB/s                 
200+0 records in                                                      
200+0 records out                                                     
209715200 bytes (210 MB) copied, 16.8547 s, 12.4 MB/s                 
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
200+0 records in                                                      
200+0 records out                                                     
209715200 bytes (210 MB) copied, 18.5809 s, 11.3 MB/s                 
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
DD Finished                                                           
200+0 records in                                                      
200+0 records out                                                     
209715200 bytes (210 MB) copied, 19.6679 s, 10.7 MB/s                                                                                                                        
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
200+0 records in                                                                                                                                                             
200+0 records out                                                                                                                                                            
209715200 bytes (210 MB) copied, 20.7152 s, 10.1 MB/s                                                                                                                        
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
200+0 records in                                                                                                                                                             
200+0 records out                                                                                                                                                            
209715200 bytes (210 MB) copied, 22.0414 s, 9.5 MB/s                                                                                                                         
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished                                                                                                                                                                  
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
min:0.004ms|avg:0.018-0.033ms|mid:0.000ms|max:0.000ms|duration:24.033s
DD Finished
IO Finished before than processing
 --- Finish ---
Kernel tested: 2.6.27-13-generic x86_64

This is ext3.

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-03-11:

#394

What for you brake disks a finger?

yura@suse:~/Desktop> sh kernel-test.sh
Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 14170
min:0.003ms|avg:0.006-0.007ms|mid:0.000ms|max:0.000ms|duration:4.725s
We have Burning CPU with 14815
min:0.004ms|avg:0.006-0.007ms|mid:0.000ms|max:0.000ms|duration:4.752s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 15470
We have Burning CPU with 15471
200+0 записей считано
200+0 записей написано
скопировано 209715200 байт (210 MB), 2,45896 c, 85,3 MB/c
200+0 записей считано
200+0 записей написано
скопировано 209715200 байт (210 MB), 4,33352 c, 48,4 MB/c
200+0 записей считано
200+0 записей написано
скопировано 209715200 байт (210 MB), 4,51529 c, 46,4 MB/c
200+0 записей считано
200+0 записей написано
скопировано 209715200 байт (210 MB), 5,22602 c, 40,1 MB/c
DD Finished
DD Finished
DD Finished
200+0 записей считано
200+0 записей написано
скопировано 209715200 байт (210 MB), 5,97021 c, 35,1 MB/c
DD Finished
DD Finished
DD Finished
200+0 записей считано
200+0 записей написано
скопировано 209715200 байт (210 MB), 6,38097 c, 32,9 MB/c
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
min:0.003ms|avg:0.006-0.007ms|mid:0.000ms|max:0.000ms|duration:6.047s
DD Finished
IO Finished before than processing
--- Finish ---
Kernel tested: 2.6.28.5-default x86_64

What for you brake disks a finger?

yura@suse:~/Desktop> sh kernel-test.sh
Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 14170
min:0.003ms|avg:0.006-0.007ms|mid:0.000ms|max:0.000ms|duration:4.725s
We have Burning CPU with 14815
min:0.004ms|avg:0.006-0.007ms|mid:0.000ms|max:0.000ms|duration:4.752s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 15470
We have Burning CPU with 15471
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 2,45896 c, 85,3 MB/c
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 4,33352 c, 48,4 MB/c
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 4,51529 c, 46,4 MB/c
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 5,22602 c, 40,1 MB/c
DD Finished
DD Finished
DD Finished
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 5,97021 c, 35,1 MB/c
DD Finished
DD Finished
DD Finished
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 6,38097 c, 32,9 MB/c
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
min:0.003ms|avg:0.006-0.007ms|mid:0.000ms|max:0.000ms|duration:6.047s
DD Finished
IO Finished before than processing
 --- Finish ---
Kernel tested: 2.6.28.5-default x86_64

Revision history for this message

In Linux Kernel Bug Tracker #12309, mathias.buren (mathias.buren-linux-kernel-bugs) wrote on 2009-03-11:

#395

$ ./kernel-test.sh
Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 4215
min:0.005ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:14.822s
We have Burning CPU with 5656
min:0.007ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:15.624s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 7403
We have Burning CPU with 7404
200+0 poster in
200+0 poster ut
209715200 byte (210 MB) kopierade, 12,7466 s, 16,5 MB/s
200+0 poster in
200+0 poster ut
209715200 byte (210 MB) kopierade, 15,3423 s, 13,7 MB/s
200+0 poster in
200+0 poster ut
209715200 byte (210 MB) kopierade, 17,363 s, 12,1 MB/s
200+0 poster in
200+0 poster ut
209715200 byte (210 MB) kopierade, 18,3437 s, 11,4 MB/s
200+0 poster in
200+0 poster ut
209715200 byte (210 MB) kopierade, 18,9163 s, 11,1 MB/s
200+0 poster in
200+0 poster ut
209715200 byte (210 MB) kopierade, 19,3732 s, 10,8 MB/s

min:0.005ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:18.564s

IO Finished before than processing
--- Finish ---
Kernel tested: 2.6.29-rc7-zen2-ARCH-20090309 x86_64

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-03-12:

#396

I have recognized, that the cpu clock scaling responds sluggish during heavy io. From time to time it stays at lowest clock rate, although there was cpu intensive, but discontinuous, work in other processes. I had just a freeze for 20 seconds during such a state.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-03-12:

#397

I could move the mouse, but cursor did not change. All panel were working, but I could not move or switch windows.

Revision history for this message

In Linux Kernel Bug Tracker #12309, Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote on 2009-03-12:

#398

Gonzalo, is it possible to include the motherboard chipset in the test? It would be interesting to see if everybody who's affected have the same or similiar chipsets... Here's another test result, with 2.5.29 RC7. Still affected by the bug, on ext4.

khaal@Xeraphim:~/Desktop/test-suite-bug-12309-v2$ sh kernel-test.sh
Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 9080
min:0.007ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:23.801s
We have Burning CPU with 14728
min:0.007ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:22.593s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 19811
We have Burning CPU with 19812
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 13.901 s, 15.1 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 15.2808 s, 13.7 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 15.4188 s, 13.6 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 16.1941 s, 13.0 MB/s
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 16.6363 s, 12.6 MB/s
DD Finished
DD Finished
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 17.1937 s, 12.2 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
min:0.004ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:18.957s
DD Finished
IO Finished before than processing
--- Finish ---
Kernel tested: 2.6.29-020629rc7-generic x86_64

Gonzalo, is it possible to include the motherboard chipset in the test? It would be interesting to see if everybody who's affected have the same or similiar chipsets... Here's another test result, with 2.5.29 RC7. Still affected by the bug, on ext4.

khaal@Xeraphim:~/Desktop/test-suite-bug-12309-v2$ sh kernel-test.sh 
Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 9080
min:0.007ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:23.801s
We have Burning CPU with 14728
min:0.007ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:22.593s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 19811
We have Burning CPU with 19812
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 13.901 s, 15.1 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 15.2808 s, 13.7 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 15.4188 s, 13.6 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 16.1941 s, 13.0 MB/s
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 16.6363 s, 12.6 MB/s
DD Finished
DD Finished
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 17.1937 s, 12.2 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
min:0.004ms|avg:0.008-0.009ms|mid:0.000ms|max:0.000ms|duration:18.957s
DD Finished
IO Finished before than processing
 --- Finish --- 
Kernel tested: 2.6.29-020629rc7-generic x86_64

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-03-12:

#399

Created attachment 20503
Results in ODF for spreadsheet

This shows the information recovered by each of the tests performed.

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-03-12:

#400

Created attachment 20504
Results in ODF for spreadsheet

This shows the information recovered by each of the tests performed.

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-03-12:

#401

I uploaded a spreadsheet to show results...

For me a High IO is affecting to the scheduler or processor. Not really much for the tests but it may be important if long processing takes in place.

It's very significative that increment is always about 2 seconds for all included for the tests of Yuriy Lalym where normally should only take 4,7s and the processing time gets incremented in 1,3 secs. Why always around 2 secs?

Also we can see that ext4 does not really seem to be affected. Maybe because throughtput? It would be interesting to know the fs system tested by Khalid Rashid because it take less time to complete under high IO, like for me on ext4.

And what the DD Finished says is that the last IO transfer done finished before thee CPU intensive task. Maybe this also affected the result.

Ok. I will fix the format of the output of the testsuite program and include other tests. Also temp files will be deleted after tests.

What other tests should be included?

I will try to search for the fsync problem to include it in the tests.

Also will try to report motherboard chipset as requested...

Any ideas on what to test?

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-03-12:

#402

I have one question for the kernel developers...

How many processor time is normal for a dd process using dma?
I have two hipothesis:
1.- Kernel is taking to much time getting the process in and out even if it is blocked by IO.
2.- Is there one lock that prevents the scheduler from running free...

How can I trackdown processor time of a program (say dd)?
Want to see if times for each kind of process is normal. Current computers are fast and sometimes we do not realize that a process is taking to much time to complete.

Any good ways to profile the kernel looking at only one PID?
I want to profile specific parts of the kernel. Any good doc?

Thank you all!

I forgot to say. For now don't use the testsuite anymore until new tests are here.

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-03-12:

#403

Server on Xeon based, internal HDD SATA2 (no RAID), SLES 10 SP2

Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 31607
min:0.004ms|avg:0.013-0.049ms|mid:0.000ms|max:0.000ms|duration:19.071s
We have Burning CPU with 7637
min:0.004ms|avg:0.015-0.057ms|mid:0.000ms|max:0.000ms|duration:21.218s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 15831
We have Burning CPU with 15832
200+0 записей считано
200+0 записей написано
скопировано 209715200 байт (210 MB), 1,0195 секунд, 206 MB/s
200+0 записей считано
200+0 записей написано
скопировано 209715200 байт (210 MB), 1,04578 секунд, 201 MB/s
200+0 записей считано
200+0 записей написано
скопировано 209715200 байт (210 MB), 1,26246 секунд, 166 MB/s
200+0 записей считано
200+0 записей написано
скопировано 209715200 байт (210 MB), 1,90053 секунд, 110 MB/s
200+0 записей считано
200+0 записей написано
скопировано 209715200 байт (210 MB), 2,19354 секунд, 95,6 MB/s
200+0 записей считано
200+0 записей написано
скопировано 209715200 байт (210 MB), 2,22529 секунд, 94,2 MB/s
min:0.003ms|avg:0.014-0.060ms|mid:0.000ms|max:0.000ms|duration:20.705s
IO Finished before than processing
--- Finish ---
Kernel tested: 2.6.16.60-0.21-smp x86_64

Server on Xeon based, 3-Ware RAID-1 (2 pieces SAS), SLES 10 SP2

Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 22420
min:0.004ms|avg:0.015-0.071ms|mid:0.000ms|max:0.000ms|duration:25.210s
We have Burning CPU with 28763
min:0.004ms|avg:0.018-0.083ms|mid:0.000ms|max:0.000ms|duration:33.232s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 1628
We have Burning CPU with 1629
200+0 записей считано
200+0 записей написано
скопировано 209715200 байт (210 MB), 0,335776 секунд, 625 MB/s
200+0 записей считано
200+0 записей написано
скопировано 209715200 байт (210 MB), 0,367063 секунд, 571 MB/s
200+0 записей считано
200+0 записей написано
скопировано 209715200 байт (210 MB), 0,363934 секунд, 576 MB/s
200+0 записей считано
200+0 записей написано
скопировано 209715200 байт (210 MB), 0,430686 секунд, 487 MB/s
200+0 записей считано
200+0 записей написано
скопировано 209715200 байт (210 MB), 0,520617 секунд, 403 MB/s
200+0 записей считано
200+0 записей написано
скопировано 209715200 байт (210 MB), 0,531063 секунд, 395 MB/s
min:0.004ms|avg:0.014-0.065ms|mid:0.000ms|max:0.000ms|duration:22.025s
IO Finished before than processing
--- Finish ---
Kernel tested: 2.6.16.60-0.21-smp x86_64

Server on Xeon based, internal HDD SATA2 (no RAID), SLES 10 SP2

Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 31607
min:0.004ms|avg:0.013-0.049ms|mid:0.000ms|max:0.000ms|duration:19.071s
We have Burning CPU with 7637
min:0.004ms|avg:0.015-0.057ms|mid:0.000ms|max:0.000ms|duration:21.218s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 15831
We have Burning CPU with 15832
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 1,0195 секунд, 206 MB/s
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 1,04578 секунд, 201 MB/s
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 1,26246 секунд, 166 MB/s
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 1,90053 секунд, 110 MB/s
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 2,19354 секунд, 95,6 MB/s
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 2,22529 секунд, 94,2 MB/s
min:0.003ms|avg:0.014-0.060ms|mid:0.000ms|max:0.000ms|duration:20.705s
IO Finished before than processing
 --- Finish ---
Kernel tested: 2.6.16.60-0.21-smp x86_64
         
Server on Xeon based, 3-Ware RAID-1 (2 pieces SAS), SLES 10 SP2

Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 22420
min:0.004ms|avg:0.015-0.071ms|mid:0.000ms|max:0.000ms|duration:25.210s
We have Burning CPU with 28763
min:0.004ms|avg:0.018-0.083ms|mid:0.000ms|max:0.000ms|duration:33.232s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 1628
We have Burning CPU with 1629
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 0,335776 секунд, 625 MB/s
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 0,367063 секунд, 571 MB/s
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 0,363934 секунд, 576 MB/s
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 0,430686 секунд, 487 MB/s
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 0,520617 секунд, 403 MB/s
200+0 записей считано
200+0 записей написано
 скопировано 209715200 байт (210 MB), 0,531063 секунд, 395 MB/s
min:0.004ms|avg:0.014-0.065ms|mid:0.000ms|max:0.000ms|duration:22.025s
IO Finished before than processing
 --- Finish ---
Kernel tested: 2.6.16.60-0.21-smp x86_64

Revision history for this message

In Linux Kernel Bug Tracker #12309, bpenglase (bpenglase-linux-kernel-bugs) wrote on 2009-03-12:

#404

bpenglas@PC010233L ~/Desktop/bug $ ./kernel-test.sh
Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 10638
min:0.004ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:14.790s
We have Burning CPU with 13523
min:0.004ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:13.953s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 14793
We have Burning CPU with 14794
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 14.7986 s, 14.2 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 17.6264 s, 11.9 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 19.4253 s, 10.8 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 19.9593 s, 10.5 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 21.898 s, 9.6 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 21.9509 s, 9.6 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
min:0.004ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:14.694s
DD Finished
IO Finished before than processing
--- Finish ---
Kernel tested: 2.6.29-rc3-zen1-1-07438-g2953ca1 x86_64

Revision history for this message

In Linux Kernel Bug Tracker #12309, Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote on 2009-03-12:

#405

Gonzalo, as I stated before I am on ext4 mounted with noatime and nodiratime flags. However, even if my throughput is fast according to the test, my performance takes a big hit during the tests still. I'm considering to reformat my partitions to ext3 so i can get an older kernel running and test how it fares. Also, it would be great to collect the results on one place, I've put one up at http://tinyurl.com/au4fda - feel free to rearrange it to fit your needs.

Well done with the testsuit, and good bughunting everyone :-)

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-03-12:

#406

(In reply to comment #234)
(In reply to comment #243)

File system - xfs

Revision history for this message

In Linux Kernel Bug Tracker #12309, bpenglase (bpenglase-linux-kernel-bugs) wrote on 2009-03-12:

#407

(In reply to comment #244)

Forgot to mention, on this system, all filesystems are EXT3. That is also without my VMs running, and it's my work machine. I'll try to get with VMs running, and also my home box tomorrow(3/13/09).

Revision history for this message

In Linux Kernel Bug Tracker #12309, bpenglase (bpenglase-linux-kernel-bugs) wrote on 2009-03-16:

#408

My Work machine:

bpenglas@PC010233L ~/kernel $ ./kernel-test.sh
Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 16034
min:0.004ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:19.169s
We have Burning CPU with 18771
min:0.005ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:17.182s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 21066
We have Burning CPU with 21067
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 21.8451 s, 9.6 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 21.7598 s, 9.6 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 21.9914 s, 9.5 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 24.8323 s, 8.4 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 24.9565 s, 8.4 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 25.6149 s, 8.2 MB/s
DD Finished
DD Finished
DD Finished
min:0.004ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:15.944s
DD Finished
IO Finished before than processing
--- Finish ---
Kernel tested: 2.6.29-rc3-zen1-1-07438-g2953ca1 x86_64

This is while FireFox is open, Audacious is playing music, and two VMWare Workstation VM's running (Windows Vista, and Windows XP).
All filesystems are EXT3, main system drive is a WD 80gig at 10kRPM, other drive is 250gig 7.2kRPM. All intel Chipset, with an Core2Dou E8200. It's a Dell GX755.

My Work machine:

bpenglas@PC010233L ~/kernel $ ./kernel-test.sh 
Using current dir to do IO tests
First Test: How much gets to run the CPU intensive task?
We have Burning CPU with 16034
min:0.004ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:19.169s
We have Burning CPU with 18771
min:0.005ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:17.182s
Second Test: Does the process queue get blocked because high IO?
Starting
We have High IO PID 21066
We have Burning CPU with 21067
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 21.8451 s, 9.6 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 21.7598 s, 9.6 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 21.9914 s, 9.5 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 24.8323 s, 8.4 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 24.9565 s, 8.4 MB/s
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
DD Finished
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 25.6149 s, 8.2 MB/s
DD Finished
DD Finished
DD Finished
min:0.004ms|avg:0.007-0.008ms|mid:0.000ms|max:0.000ms|duration:15.944s
DD Finished
IO Finished before than processing
 --- Finish --- 
Kernel tested: 2.6.29-rc3-zen1-1-07438-g2953ca1 x86_64

This is while FireFox is open, Audacious is playing music, and two VMWare Workstation VM's running (Windows Vista, and Windows XP).
All filesystems are EXT3, main system drive is a WD 80gig at 10kRPM, other drive is 250gig 7.2kRPM. All intel Chipset, with an Core2Dou E8200. It's a Dell GX755.

Revision history for this message

In Linux Kernel Bug Tracker #12309, drees76 (drees76-linux-kernel-bugs) wrote on 2009-03-17:

#409

Simple test case:

dd if=/dev/zero of=/tmp/bigfile bs=1M count=10000 conv=fdatasync &
sleep 10
time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync

You'd expect the small file to be written fairly quickly - as in a couple seconds at most. But on every system with a recent kernel I've tried this on, it takes 6-45 seconds.

Why the huge range? I'm not sure, but available memory seems to have something to do with it. The more memory in the machine, the larger the smallfile writes.

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2009-03-17:

#410

(In reply to comment #249)
> dd if=/dev/zero of=/tmp/bigfile bs=1M count=10000 conv=fdatasync &
> sleep 10
> time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync

real 0m1.808s
user 0m0.001s
sys 0m0.001s

I don't think this gets to the issue.

Revision history for this message

In Linux Kernel Bug Tracker #12309, igor.lautar (igor.lautar-linux-kernel-bugs) wrote on 2009-03-17:

#411

Well, for me it does:

dd if=/dev/zero of=/tmp/bigfile bs=1M count=10000 conv=fdatasync &
sleep 10
time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync

1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 15.8284 s, 0.3 kB/s

real 0m16.024s
user 0m0.004s
sys 0m0.020s

Revision history for this message

In Linux Kernel Bug Tracker #12309, andre (andre-linux-kernel-bugs) wrote on 2009-03-17:

#412

(In reply to comment #249)
> dd if=/dev/zero of=/tmp/bigfile bs=1M count=10000 conv=fdatasync &
> sleep 10
> time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync

2.6.28-gentoo-r2 (/tmp on reiser3.6, rootfs-drive):
>4096 bytes (4,1 kB) copied, 10.618 s, 0.4 kB/s
>real 0m10.620s
>user 0m0.000s
>sys 0m0.077s

2.6.28-gentoo-r2 (/tmp on ext4, other drive):
>4096 bytes (4,1 kB) copied, 5,34679 s, 0,8 kB/s
>real 0m5.349s
>user 0m0.000s
>sys 0m0.003s

2.6.27.19-3.2-default (opensuse 11.1) (/tmp on ext3, rootfs):
>4096 bytes (4,1 kB) copied, 60.5764 s, 0.1 kB/s
>real 1m2.827s
>user 0m0.004s
>sys 0m0.036s

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2009-03-17:

#413

(In reply to comment #250)
> real 0m1.808s
> user 0m0.001s
> sys 0m0.001s

My 1.808s was on 2.6.27-gentoo-r8 with XFS on a 3ware 8-drive SATA RAID.

Revision history for this message

In Linux Kernel Bug Tracker #12309, vaiski (vaiski-linux-kernel-bugs) wrote on 2009-03-17:

#414

2.6.28.7 w/reiserFS

4096 bytes (4.1 kB) copied, 6.96955 s, 0.6 kB/s

real 0m6.972s
user 0m0.001s
sys 0m0.026s

Revision history for this message

In Linux Kernel Bug Tracker #12309, drees76 (drees76-linux-kernel-bugs) wrote on 2009-03-17:

#415

André did you mean to take ownership of this bug away from Jens?

It looks like the test case I posted up earlier seems to be very effective at demonstrating at least one of the issues that is affecting people in this thread (namely people using ext3 or reiserfs).

It appears that xfs and ext4 are better at avoiding these huge latencies - I'm also assuming that the IO scheduler interacts with these filesystems differently.

Matt - I don't think this test case works for you as much because you have such a fast disk array. I imagine that you can write 10GB pretty quickly with an 8-drive array. Try increasing the 10GB to 100GB and increasing the sleep to 20-30 seconds so that you get more data waiting to be flushed to disk.

Revision history for this message

In Linux Kernel Bug Tracker #12309, Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote on 2009-03-17:

#416

David, I want to stress that while my earlier test results looked good on my ext4 filesystem, I was still affected by the slow performance. I think we need a (diffrent?) way to measure desktop responsiveness in order to get actual values from there too.

Revision history for this message

In Linux Kernel Bug Tracker #12309, drees76 (drees76-linux-kernel-bugs) wrote on 2009-03-17:

#417

Where are your performance numbers from the test case, Khalid, and what is your hardware setup like? André posted numbers in comment #252 on ext4 which are better than his ext3/resiserfs numbers, are still very poor, IMO.

It's fairly obvious that it is likely there are multiple bugs causing similar symptoms and all have been jumbled into this bug report.

Jens has asked for a simple test case illustrating at least one issue discussed in this thread. I have presented one extremely simple test case which duplicates the problems I (and others) am seeing. Feel free to create another.

Revision history for this message

In Linux Kernel Bug Tracker #12309, andre (andre-linux-kernel-bugs) wrote on 2009-03-17:

#418

sorry, didn't mean to reassign the bug in the first place

Revision history for this message

In Linux Kernel Bug Tracker #12309, drees76 (drees76-linux-kernel-bugs) wrote on 2009-03-17:

#419

I've been doing some testing - two tunables I've found (briefly mentioned earlier) that helps immensely is setting /proc/sys/vm/dirty_background_ratio to 1 and /proc/sys/vm/dirty_ratio to 2.

On some of my systems that I've run the test on it reduces latency down to a fraction of a second - on other systems it reduces it from 20+ seconds to less than 10.

Anyone else see similar behaviour with my simple test?

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-03-17:

#420

(In reply to comment #259)
> I've been doing some testing - two tunables I've found (briefly mentioned
> earlier) that helps immensely is setting /proc/sys/vm/dirty_background_ratio
> to
> 1 and /proc/sys/vm/dirty_ratio to 2.
>
> On some of my systems that I've run the test on it reduces latency down to a
> fraction of a second - on other systems it reduces it from 20+ seconds to
> less
> than 10.
>
> Anyone else see similar behaviour with my simple test?
>

This is right. Although it doesn't eliminate stutter (mouse freezing for 1-2 seconds) during heavy IO, it does make that stutter tolerable. Its basically converting your IO to almost sync inline instead of leaving the work for later for pdflush to pick up and choke the hell out of the IO subsystem. I have no idea why on larger memory configurations those default values are set so high as 40 and 20 (IIRC). I mean on a 4GB RAM system, we may not see any IO landing until expiry alarms fire in pdflush or 40% of 4GB=1.6G is ready to be written.

Revision history for this message

In Linux Kernel Bug Tracker #12309, bpenglase (bpenglase-linux-kernel-bugs) wrote on 2009-03-17:

#421

PC010233L vmware # dd if=/dev/zero of=/tmp/bigfile bs=1M count=10000 conv=fdatasync &
[1] 10528
PC010233L vmware # sleep 10
PC010233L vmware # time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.00333981 s, 1.2 MB/s

real 0m0.054s
user 0m0.000s
sys 0m0.000s
PC010233L vmware #
PC010233L vmware # time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.604249 s, 6.8 kB/s

real 0m3.219s
user 0m0.000s
sys 0m0.000s

The second time I ran the second DD was about 2 minutes later. My / (or /tmp)
is located on a WD 10K RPM SATA II Drive.

And after fixing the dirty ratios....

PC010233L vmware # dd if=/dev/zero of=/tmp/bigfile bs=1M count=10000 conv=fdatasync &
[1] 10548
PC010233L vmware # sleep 10
PC010233L vmware # time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 1.41179 s, 2.9 kB/s

real 0m2.044s
user 0m0.000s
sys 0m0.002s
PC010233L vmware # time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000649804 s, 6.3 MB/s

real 0m6.366s
user 0m0.000s
sys 0m0.002s
PC010233L vmware #

Again, second one was about 2 minutes afterwards.

Revision history for this message

In Linux Kernel Bug Tracker #12309, drees76 (drees76-linux-kernel-bugs) wrote on 2009-03-17:

#422

(In reply to comment #261)
Brandon, this test case doesn't seem to reproduce any significant latency issues for you. I suspect that 10k RPM disk is able to write fast enough to keep a significant amount of data from being buffered in memory. 1.5 seconds isn't great, but all my systems are at least 5 times worse than that and often 10-40 times worse.

Do you notice a large latency hit on the system when the large write is running?

Why are you running that second small write afterwards? Was the big write done at that point or not? The latency of your small writes does seem to vary by quite a bit.

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2009-03-17:

#423

(In reply to comment #255)
> Matt - I don't think this test case works for you as much because you have
> such
> a fast disk array. I imagine that you can write 10GB pretty quickly with an
> 8-drive array. Try increasing the 10GB to 100GB and increasing the sleep to
> 20-30 seconds so that you get more data waiting to be flushed to disk.
>

Setting dirty_background_ratio=1 and dirty_ratio=2 had a HUGE effect on my system.

$ dd if=/dev/zero of=/var/tmp/bigfile bs=1M count=100000 conv=fdatasync & sleep 30 ; time dd if=/dev/zero of=/var/tmp/smallfile bs=4k count=1 conv=fdatasync
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 6.96642 s, 0.6 kB/s

real 0m8.590s
user 0m0.000s
sys 0m0.004s

100000+0 records in
100000+0 records out
104857600000 bytes (105 GB) copied, 1354.9 s, 77.4 MB/s

# echo 1 > dirty_background_ratio ; echo 2 > dirty_ratio

$ dd if=/dev/zero of=/var/tmp/bigfile bs=1M count=100000 conv=fdatasync & sleep 30 ; time dd if=/dev/zero of=/var/tmp/smallfile bs=4k count=1 conv=fdatasync
[1] 22718
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.72366 s, 5.7 kB/s

real 0m0.725s
user 0m0.000s
sys 0m0.001s

100000+0 records in
100000+0 records out
104857600000 bytes (105 GB) copied, 359.02 s, 292 MB/s

Revision history for this message

In Linux Kernel Bug Tracker #12309, bpenglase (bpenglase-linux-kernel-bugs) wrote on 2009-03-17:

#424

(In reply to comment #262)
> (In reply to comment #261)
> Brandon, this test case doesn't seem to reproduce any significant latency
> issues for you. I suspect that 10k RPM disk is able to write fast enough to
> keep a significant amount of data from being buffered in memory. 1.5 seconds
> isn't great, but all my systems are at least 5 times worse than that and
> often
> 10-40 times worse.
>
> Do you notice a large latency hit on the system when the large write is
> running?
>
> Why are you running that second small write afterwards? Was the big write
> done
> at that point or not? The latency of your small writes does seem to vary by
> quite a bit.
>

The large write took a while to complete (about 10 minutes.. and only got to 5.3gig before I killed it), and yes, VERY degraded performance... took me a while to ssh in and kill it.. as local was almost unusable.

The first small write wasn't when the system started lagging out on me... it was when the ram usage was going up, and cpu usage was going up, so I decided to run it again at a later point just to see.

I can try doing the writes on my 7.2K RPM disc tomorrow when I'm back at work. just need to point the output to a different partition.

Revision history for this message

In Linux Kernel Bug Tracker #12309, Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote on 2009-03-18:

#425

David Rees, all my test results are presented here: https://spreadsheets.google.com/ccc?key=p3aerC-xkjEqvo7BvMHaxXg&hl=en and my computer components can be seen here: http://h10025.www1.hp.com/ewfrf/wc/prodinfoCategory?lc=en&cc=se&dlc=sv&product=3387690&lang=sv&

I tried this also on a WD Raptor drive just to ensure that it was not fauly harddrives that was the case, and the symptoms were still present.

Revision history for this message

In Linux Kernel Bug Tracker #12309, bpenglase (bpenglase-linux-kernel-bugs) wrote on 2009-03-18:

#426

PC010233L ~ # dd if=/dev/zero of=/home/bigfile bs=1M count=10000 conv=fdatasync &
[1] 22333
PC010233L ~ # sleep 10
PC010233L ~ # time dd if=/dev/zero of=/home/smallfile bs=4k count=1 conv=fdatasync
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 6.27386 s, 0.7 kB/s

real 0m6.275s
user 0m0.000s
sys 0m0.000s
PC010233L ~ # time dd if=/dev/zero of=/home/smallfile bs=4k count=1 conv=fdatasync
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 2.4702 s, 1.7 kB/s

real 0m2.482s
user 0m0.000s
sys 0m0.000s

This was going to /home which is on a 250gig 7200k RPM SATA II drive. Also, even though the second one (ran about a minute or two later) completed quickly.. it was about another 10 secs till I got the prompt back.

Revision history for this message

In Linux Kernel Bug Tracker #12309, vaiski (vaiski-linux-kernel-bugs) wrote on 2009-03-19:

#427

(In reply to comment #249)

the same system than in #254 but I changed kernel to latest rc of .29

2.6.29-rc8 w/reiserFS

4096 bytes (4.1 kB) copied, 1.2374 s, 3.3 kB/s

real 0m2.843s
user 0m0.001s
sys 0m0.003s

Revision history for this message

In Linux Kernel Bug Tracker #12309, mpartap (mpartap-linux-kernel-bugs) wrote on 2009-03-22:

#428

Just a quick note: i've been having considerable troubles with kernels since 2.6.17 aswell, yet recently run across this article http://kerneltrap.org/node/3000, citing: "Kernel maintainer Andrew Morton has said that he runs his desktop machines with a swappiness of 100"... that made me think if my swappiness of 1 might be not such a good idea. An example of misbehaviour which i was actually crediting to this bug can be seen here:
http://hfopi.org/files/temp/time-trouble.jpg (look at the three different clock times).. This problem was the result of physical memory running full, which was happening a lot (VLC mem leak..) - stalling the system sometimes for hours.
Well setting swappiness .. ah let me quote Andrew: "I'm gonna stick my fingers in my ears and sing 'la la la' until people tell me 'I set swappiness to zero and it didn't do what I wanted it to do'." .. well here i am. To all of you: setting swappiness too extremely low values is a bad idea and won't achieve what you expect it to. So that might actually be your problem if you have done so; echo 100 > /proc/sys/vm/swappiness and make the test.

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2009-03-22:

#429

(In reply to comment #268)

I see the problem, and I've never touched 'swappiness'.

$ cat /proc/sys/vm/swappiness
60

Actually, I have no swap at all.

# swapon -s
swapon: /proc/swaps: No such file or directory

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2009-03-22:

#430

Well Mathieu Desnoyers did a fix for the write-cache accounting which solves the kernel write-cache to eat up all available memory + swap. Witout the fix the slowness is solved by setting swappiness to 0 or disabling swap, The fix is afaik not in 2.6.29.

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-03-22:

#431

So, we have one guy (#268) saying high swappiness will solve the problem and the other guy (#270) saying setting swappiness to 0 will solve the problem. I have a feeling neither is going to work, because I have run my system with both and this bug appears under high IO load in both cases. But I would like to see what others find.

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-03-22:

#432

The swappiness setting is irrelevant to this bug, as it is a disk io problem no matter which way you look at it. Yes, if you are swapping, this bug will cause the system to be even slower.

p.s.
I'm convinced that swap is evil. I just disable my swap, and my system works much better, especially when I get a run-away memory hog process.

Revision history for this message

In Linux Kernel Bug Tracker #12309, awebers (awebers-linux-kernel-bugs) wrote on 2009-03-23:

#433

Hi:

A) if you have multipple harddrives
- they are not equally affected
- if you copy a file (e.g. 7 Gig) from drive A to drive B, a job running on drive C is not slowing down, accept, if perhas a swapfile is used.

A job, in my case, is a vmware virtual machine
I was spreading machines over different harddrives to reduce the trouble.

B) isn't this slowdown a planed action of the system:

About /proc/sys/vm/dirty_ratio
> Note that all processes are blocked for writes when this happens
(see below, original text)
This is what slows everything down.

IMHO, it should be:
If "dirty_ratio" is reached, slow down the job that is creating
so much "dirt" and leave the other ones alone.

cut out from http://www.westnet.com/~gsmith/content/linux-pdflush.htm

8< -------------------

Process page writes
There is another parameter involved though that can spill over into management of user processes:

/proc/sys/vm/dirty_ratio (default 40): Maximum percentage of total memory that can be filled with dirty pages before processes are forced to write dirty buffers themselves during their time slice instead of being allowed to do more writes.

Note that all processes are blocked for writes when this happens, not just the one that filled the write buffers. This can cause what is perceived as an unfair behavior where one "write-hog" process can block all I/O on the system. The classic way to trigger this behavior is to execute a script that does "dd if=/dev/zero of=hog" and watch what happens. See Kernel Korner: I/O Schedulers for examples showing this behavior.

8< -------------------

Reference:
http://www.westnet.com/~gsmith/content/linux-pdflush.htm

Does someone have an idea how to slow down the IO-heavy job (automatically) ?
If the throughput of dd, rsync or "whatever" is reduced, the moment
a triggervalue is reached, the problem would be only for dd, rsync, ...
and not for the rest of the system.

Revision history for this message

In Linux Kernel Bug Tracker #12309, awebers (awebers-linux-kernel-bugs) wrote on 2009-03-23:

#434

Hi again:

My test is to throttle the bandwith using "rsync --bwlimit=<throughput>"

I am testing using vmware on /images3.
Vmware runs fluent until I copy a lot (7Gig vmdk-file) to /images3, which
is a separat harddrive on which 5 vmware systems are having their .vmdk-files.
Copying this 7Gig file freezes the vmware systems for > 30 seconds.

And now with limited bandwith ...

all jobs run fine, no hangig or else:
rsync --bwlimit=10000 /images5/vmware/vlab03/STD_XP_Prof.vmdk /images3/test
rsync --bwlimit=20000 /images5/vmware/vlab03/STD_XP_Prof.vmdk /images3/test

some jobs start to become slow and hang:
rsync --bwlimit=30000 /images5/vmware/vlab03/STD_XP_Prof.vmdk /images3/test

a lot of jobs hangs and are very slow, some freeze:
rsync --bwlimit=40000 /images5/vmware/vlab03/STD_XP_Prof.vmdk /images3/test

This is my estimation:
rsync is creating more dirt than the "kernel" can get rid off and
the system is put into this "processes are blocked for writes" (see previous posting) mode.

I hope that my input can help.

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-03-23:

#435

Created attachment 20656
vmstat with high # of uninterruptible processes

I just had a hang for about 10-15 minutes. My system started to freeze, so I immediately switched to a console, and ran "vmstat 1" (see attachment).

I sat there and watched it, as I wanted to catch it immediately after it became usable again, so that I could check the load average.

uptime
23:38:18 up 6 days, 4:49, 8 users, load average: 23.30, 26.12, 16.21

23, with a 5 minute load average of 26 OUCH.

I have no swap, and I think the problem happened when one of my processes did something to lock up the machine. But, take note how many processes are blocked in UNINTERRUPTIBLE sleep at various times...

I think I also realized something very interesting about this bug. It does not occur as readily when you have a fast disk. As I had mentioned in previous comments, my macbook and my D820 have the same hardware. Well I'm rarely experiencing this on my D820 now. The only difference I can see, related to IO, is that the D820 just had a 320G 80M/s drive put into it. My Macbook runs at approximately 20-25M/s.

Also, given that I am pretty sure that one of my processes hanged the machine, it seems (though I am not a kernel hacker) like this bug may be related to a wait on a mutex or semaphore in a location that it should not be, hence the high number of uninterruptible processes? Could that be?

Revision history for this message

In Linux Kernel Bug Tracker #12309, drees76 (drees76-linux-kernel-bugs) wrote on 2009-03-24:

#436

There has been more discussion on LKML related to this issue attached to the 2.6.29 kernel release thread. I'll direct interested parties to this post from Ted Tso:

http://lkml.org/lkml/2009/3/24/227

Attached to that post is Ted's fsync latency measuring tool. If people have a workload which generates high latency, this tool may be useful for measuring it and then posting that workload to Ted/LKML.

His testing tool doesn't do anything much different than my earlier dd test, except that he writes 1MB of data which may show higher latencies.

For those interested, I picked up a couple other workarounds for people this is affecting:

1. Mount ext3 in writeback instead of ordered. This has the drawback of leaving your data a bit more vulnerable than default, but now data writes won't be forced to be completed in order with meta data.

2. Increase IO priority of kjournald:
for i in `pidof kjournald` ; do ionice -c1 -p $i ; done
One theory is that by default kjournald is fighting for IO priority with normal processes. By making the IO priority of kjournald higher, the "important" data (IE, data that is getting synced to disk) should get written out faster reducing user visible latency. See this post/thread for more detail: http://lkml.org/lkml/2008/10/2/205

Revision history for this message

In Linux Kernel Bug Tracker #12309, nalimilan (nalimilan-linux-kernel-bugs) wrote on 2009-03-25:

#437

I've tested the second workaround posted by David above (high IO priority of kjournald), and it definitely improves things in my case. My test is very simple: doing normal upgrades under Ubuntu (esp. kernel packages) always make Firefox and even Evolution or the whole desktop freeze for several seconds, up to about 20 sec in some cases. With that workaround, the freezes don't last more than ~1 sec; the desktop experience is not really smooth, but I can work during upgrades.

So I guess we can track down at least a specific issue here, which may be the major one affecting desktop boxes, and which seems to have appeared (maybe in different ways) between 2.6.17 and 2.6.28. I'm using a fairly basic Toshiba Satellite laptop with 512 MB of RAM and a 4200 rd/min HD.

Can anybody confirm that too?

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-03-25:

#438

Ok. I'm also testing the kjournald option to see if it improves. I will post after some testing...

I want to include the fsync tests you pointed out. I tested it and gave me:
fsync time: 0.0145
fsync time: 0.0205
fsync time: 0.0221
fsync time: 0.0195
fsync time: 0.0177
fsync time: 0.0702
fsync time: 0.0456
What's the correct way to do reliable tests? I will include it in the test suite.

Revision history for this message

In Linux Kernel Bug Tracker #12309, jonathan.bower (jonathan.bower-linux-kernel-bugs) wrote on 2009-03-25:

#439

The kjournald option makes my system much more responsive.

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-03-26:

#440

Hi Guys,

After reading those LKML messages from Theoodre, regarding his sync patches, it gave me an idea. Why not just mount my filesystem with "sync" mount option.

I run the following command on one console...
dd if=/dev/zero of=/tmp/bigfile bs=1M count=10000

And Theodore's fsync-test on another. On the standard test, WITHOUT mounting with sync, I get these results out of Theodore's test...

fsync time: 1.5693
fsync time: 18.8047
fsync time: 21.2672
fsync time: 18.6747
fsync time: 2.3821
fsync time: 2.0494
fsync time: 2.8781
fsync time: 21.6300

Here's a "vmstat 1" snipette. All the lines while the dd is running are roughly the same.
2 9 380388 16716 33412 1409988 0 0 0 15340 806 1188 3 4 0 93
0 8 380388 15748 33428 1411080 0 0 0 16284 1165 2350 7 8 0 85
0 9 380388 16620 33432 1409752 0 0 0 18240 878 1108 5 3 0 92
1 8 380388 16776 33452 1410108 0 0 0 11888 1046 1140 10 8 0 82

When I do the following...
mount -o remount,rw,sync /dev/s/sys /

I get the following benches while running the same dd command...
fsync time: 0.0067
fsync time: 0.0369
fsync time: 0.0208
fsync time: 0.0099
fsync time: 0.1175
fsync time: 0.0337
fsync time: 0.0003
fsync time: 0.0219
fsync time: 0.0110
fsync time: 0.0142
fsync time: 0.0076
fsync time: 0.0146
fsync time: 0.0153
fsync time: 0.1104
fsync time: 0.0061
fsync time: 0.0003

With "vmstat 1" snippet of ...
1 0 380624 1112236 93104 297252 0 0 0 13056 920 1167 5 3 49 43
0 1 380624 1098212 93252 311044 0 0 0 15876 925 1165 5 4 52 38
1 2 380624 1085796 93408 323296 0 0 0 13800 996 1239 10 4 47 38

Did something in the kernel change a couple years ago, in regard to syncing?

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-03-26:

#441

Just an FYI, there was some mm/msync.c "fsync" related changes between 2.6.16.62 and 2.6.17 vanilla. I didn't see the problem until after 2.6.17, but perhaps gentoo had patched the kernel heavily, I don't know. I'll try and do some more diffs between the kernel versions around the time I started having the problem, in case it can help you guys figure it out.

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-03-26:

#442

From first 2.6.17 release to first 2.6.18 release (haven't narrowed it down to exact versions), 3 PF_SYNCWRITE related lines have been removed from mm/msync.c.

And some PF_SYNCWRITE related stuff in block/cfq-iosched.c was added in 2.6.17 (diff between 2.6.16.62 and 2.6.17), and then removed in 2.6.18.

There's also fs/ sync related stuff between 2.6.16.62 and 2.6.17.

I hope I'm not spamming. :P

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-03-26:

#443

(In reply to comment #280)
> Hi Guys,
>
> After reading those LKML messages from Theoodre, regarding his sync patches,
> it
> gave me an idea. Why not just mount my filesystem with "sync" mount option.

what are the disadvantages of sync mount option? reduced b/w? higher latency? data you posted does show any disadvantages or may be I don't know what to conclude from that data?

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-03-26:

#444

(In reply to comment #280)
> Hi Guys,
>
> After reading those LKML messages from Theoodre, regarding his sync patches,
> it
> gave me an idea. Why not just mount my filesystem with "sync" mount option.

what are the disadvantages of sync mount option? reduced b/w? higher latency? data you posted doesn't show any disadvantages or may be I don't know what to conclude from that data?

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-03-26:

#445

(In reply to comment #284)
> (In reply to comment #280)
> > Hi Guys,
> >
> > After reading those LKML messages from Theoodre, regarding his sync
> patches, it
> > gave me an idea. Why not just mount my filesystem with "sync" mount
> option.
>
> what are the disadvantages of sync mount option? reduced b/w? higher latency?
> data you posted doesn't show any disadvantages or may be I don't know what to
> conclude from that data?

It appears that the overall transfer rate has decreased a tiny bit. But, the
big advantage of not doing "sync" on mount, is that the system can queue the
writes. So, for anything that fits into kernel queues, the writes appear way
faster to the user. That's my understanding of the difference between sync and
not using sync.

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-03-26:

#446

Oh, I should have given an example. Normally, when doing a dd of say 10M, your write would be several hundred MEGABYTES per second, because it's writing to memory, not disk. In my case, I only get disk speeds, even with 10M. So yeah, the memory queueing is WAAAAY faster until you reach the limit.

One last thing, for the kernel devs, as this may be important...
The comment in 2.6.28's version of msync.c is as follows...

/*
* MS_SYNC syncs the entire file - including mappings.
*
* MS_ASYNC does not start I/O (it used to, up to 2.5.67).
* Nor does it marks the relevant pages dirty (it used to up to 2.6.17).
* Now it doesn't do anything, since dirty pages are properly tracked.
*
* The application may now run fsync() to
* write out the dirty pages and wait on the writeout and check the result.
* Or the application may run fadvise(FADV_DONTNEED) against the fd to start
* async writeout immediately.
* So by _not_ starting I/O in MS_ASYNC we provide complete flexibility to
* applications.
*/

This is an interesting comment. Mainly because there was some logic based on MS_SYNC, that was removed from msync.c, in 2.6.18 (as I mentioned at the TOP of comment #282). That code would set the PF_SYNCWRITE flag. The code exists in 2.6.17 but not 2.6.18. I haven't checked if it was the 2.6.18 change that did it, or a previously 2.6.17.x change.

Is this a problem kernel devs???????

Revision history for this message

In Linux Kernel Bug Tracker #12309, amaury.deganseman (amaury.deganseman-linux-kernel-bugs) wrote on 2009-03-26:

#447

I have the same result here when mouting with the "sync" option.

I try also async and ionice -c1 'pidof kjournald' and doesn't seems to improve latency measured by fsync-tester.

(In reply to comment #280)
> Hi Guys,
>
> After reading those LKML messages from Theoodre, regarding his sync patches,
> it
> gave me an idea. Why not just mount my filesystem with "sync" mount option.
>
> I run the following command on one console...
> dd if=/dev/zero of=/tmp/bigfile bs=1M count=10000
>
> And Theodore's fsync-test on another. On the standard test, WITHOUT mounting
> with sync, I get these results out of Theodore's test...
>
> fsync time: 1.5693
> fsync time: 18.8047
> fsync time: 21.2672
> fsync time: 18.6747
> fsync time: 2.3821
> fsync time: 2.0494
> fsync time: 2.8781
> fsync time: 21.6300
>
> Here's a "vmstat 1" snipette. All the lines while the dd is running are
> roughly the same.
> 2 9 380388 16716 33412 1409988 0 0 0 15340 806 1188 3 4 0
> 93
> 0 8 380388 15748 33428 1411080 0 0 0 16284 1165 2350 7 8 0
> 85
> 0 9 380388 16620 33432 1409752 0 0 0 18240 878 1108 5 3 0
> 92
> 1 8 380388 16776 33452 1410108 0 0 0 11888 1046 1140 10 8 0
> 82
>
> When I do the following...
> mount -o remount,rw,sync /dev/s/sys /
>
> I get the following benches while running the same dd command...
> fsync time: 0.0067
> fsync time: 0.0369
> fsync time: 0.0208
> fsync time: 0.0099
> fsync time: 0.1175
> fsync time: 0.0337
> fsync time: 0.0003
> fsync time: 0.0219
> fsync time: 0.0110
> fsync time: 0.0142
> fsync time: 0.0076
> fsync time: 0.0146
> fsync time: 0.0153
> fsync time: 0.1104
> fsync time: 0.0061
> fsync time: 0.0003
>
> With "vmstat 1" snippet of ...
> 1 0 380624 1112236 93104 297252 0 0 0 13056 920 1167 5 3 49
> 43
> 0 1 380624 1098212 93252 311044 0 0 0 15876 925 1165 5 4 52
> 38
> 1 2 380624 1085796 93408 323296 0 0 0 13800 996 1239 10 4 47
> 38
>
> Did something in the kernel change a couple years ago, in regard to syncing?

I have the same result here when mouting with the "sync" option.

I try also async and ionice -c1 'pidof kjournald' and doesn't seems to improve latency measured by fsync-tester.

(In reply to comment #280)
> Hi Guys,
> 
> After reading those LKML messages from Theoodre, regarding his sync patches,
> it
> gave me an idea.  Why not just mount my filesystem with "sync" mount option.
> 
> I run the following command on one console...
> dd if=/dev/zero of=/tmp/bigfile bs=1M count=10000
> 
> And Theodore's fsync-test on another.  On the standard test, WITHOUT mounting
> with sync, I get these results out of Theodore's test...
> 
> fsync time: 1.5693
> fsync time: 18.8047
> fsync time: 21.2672
> fsync time: 18.6747
> fsync time: 2.3821
> fsync time: 2.0494
> fsync time: 2.8781
> fsync time: 21.6300
> 
> Here's a "vmstat 1" snipette.  All the lines while the dd is running are
> roughly the same.
>  2  9 380388  16716  33412 1409988    0    0     0 15340  806 1188  3  4  0
>  93 
>  0  8 380388  15748  33428 1411080    0    0     0 16284 1165 2350  7  8  0
>  85 
>  0  9 380388  16620  33432 1409752    0    0     0 18240  878 1108  5  3  0
>  92 
>  1  8 380388  16776  33452 1410108    0    0     0 11888 1046 1140 10  8  0
>  82
> 
> When I do the following...
> mount -o remount,rw,sync /dev/s/sys /
> 
> I get the following benches while running the same dd command...
> fsync time: 0.0067
> fsync time: 0.0369
> fsync time: 0.0208
> fsync time: 0.0099
> fsync time: 0.1175
> fsync time: 0.0337
> fsync time: 0.0003
> fsync time: 0.0219
> fsync time: 0.0110
> fsync time: 0.0142
> fsync time: 0.0076
> fsync time: 0.0146
> fsync time: 0.0153
> fsync time: 0.1104
> fsync time: 0.0061
> fsync time: 0.0003
> 
> With "vmstat 1" snippet of ...
>  1  0 380624 1112236  93104 297252    0    0     0 13056  920 1167  5  3 49
>  43
>  0  1 380624 1098212  93252 311044    0    0     0 15876  925 1165  5  4 52
>  38
>  1  2 380624 1085796  93408 323296    0    0     0 13800  996 1239 10  4 47
>  38
> 
> Did something in the kernel change a couple years ago, in regard to syncing?

Revision history for this message

In Linux Kernel Bug Tracker #12309, Adriaan.van.Kessel (adriaan.van.kessel-linux-kernel-bugs) wrote on 2009-03-27:

#448

@ #286: no the msync.c :: IMHO MS_[A]SYNC is _not_ related.
With the introduction of the Unified (disk) Buffer Cache, msync(MS_ASYNC) became basically a no-op. Every process will see the same contents for a block, whether it uses read() or mmap() to access it. Other unices (without UBC) may behave differently. For MS_SYNC, the situation is more complicated. (IIUC: it is hard to wait for all pages to have been written if other processes may re-dirty them simultaneously)

This bug / issue is not about throughput, it is about latency and (lack of) responsiveness (of other, unrelated processes).

BTW, to me it seems there are actually two symptoms:
1) initially, the mouse cursor is stuck ("stuck/jerky mouse syndrome")
2) later on, the cursor gets quicker, but the actions (pop-ups, window focus, ...)
are still slow.

(1) can be associated with CPU scheduling, unix-domain socket-I/O, maybe even pagefaulting of X's code segments.
(2) can be associated with CPU scheduling, pagefaulting of code, or memory shortage ( -->> pagefaulting + induced writing of dirty pages)

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-03-27:

#449

File system - xfs (mounted with options by default)
dd if=/dev/zero of=test.img bs=572041216 count=1

Kernel 2.6.28.8

# time (cp test.img test1.img && sync)
real 0m7.372s
user 0m0.021s
sys 0m1.152s

Kernel 2.6.29

# time (cp test.img test2.img && sync)
real 0m13.704s
user 0m0.016s
sys 0m1.060s

Revision history for this message

In Linux Kernel Bug Tracker #12309, valentyn+_= (valentyn+-linux-kernel-bugs) wrote on 2009-04-01:

#450

This bug is present at least from 2.6.15 and up, so it's older than the 2.6.18 (with question mark) reported in this bug.

@breezer:~$ { sleep 5; dd if=/dev/zero of=/tmp/bigfile bs=1M count=5000 conv=fdatasync ; } & /tmp/fsync-tester
[1] 4946
fsync time: 0.0188
fsync time: 0.0142
fsync time: 0.0142
fsync time: 0.0142
fsync time: 0.0143
fsync time: 9.2283
fsync time: 12.0892
fsync time: 11.9867
fsync time: 17.6123
fsync time: 13.5469

I've seen sync times up to 20 seconds.

This is Ubuntu 6.06LTS, 2.6.15-53-686 kernel. I am seeing this behaviour on various machines with different hardware. It is a real problem for NFS servers in combination with clients that run Firefox 3.

Revision history for this message

In Linux Kernel Bug Tracker #12309, drees76 (drees76-linux-kernel-bugs) wrote on 2009-04-09:

#451

Anyone willing to do some before and after tests? It looks like the huge filesystem thread has produced some results and latency during large writes should be much better now with 2.6.30-rc1 + Theodore Ts'o's ext3-latency-fixes.

http://lkml.org/lkml/2009/4/8/760

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2009-04-09:

#452

Look at the difference in disk throughput when running with dirty_background_ratio=0 and dirty_ratio=0:
http://img9.imageshack.us/img9/811/fsyncgraph00.png

versus with dirty_background_ratio=40 and dirty_ratio=80:
http://img154.imageshack.us/img154/9427/fsyncgraph4080.png

Both images are graphs of vmstat output during this command:
dd if=/dev/zero of=bigfile bs=1M count=20k conv=fdatasync

I collected this data in single-user mode so no other processes were touching the disk.

Do note the fairly steady throughput in the first case, in stark contrast with the huge burst at the beginning and end of, and slowness throughout, the second case.

In case anyone missed the point, it took 55 seconds to write 20 GB with dbr=0,dr=0 and 593 seconds to write 20 GB with dbr=40,dr=80. For some reason, the page cache appears to be really gumming up the works.

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-04-09:

#453

Hi David,

I would be willing to do some before/after testing. But it may be a couple of days, at least, before I can. When is 2.6.30 going to be released?

Also, I have a local Linus git tree. How do I update it with the latest git, or do I have to re-clone the entire thing again?

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-04-09:

#454

Before you make the comparison tests, you should ensure that you use the same journal mode with ext3. The default ext3 journal mode was changed to writeback as default mode.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-04-09:

#455

Is there a reliable testcase for the latency issue?

Revision history for this message

In Linux Kernel Bug Tracker #12309, dixlor (dixlor-linux-kernel-bugs) wrote on 2009-04-12:

#456

How about XFS, JFS, Reiserfs ???

Revision history for this message

In Linux Kernel Bug Tracker #12309, jgardiazabal (jgardiazabal-linux-kernel-bugs) wrote on 2009-04-12:

#457

I'm using XFS, and I have the same latency problems.
I've been checking this thread, and testing the proposed ideas, without success.
if you want me to test something, I'll happily do it.

Cheers,

Jose

(In reply to comment #296)
> How about XFS, JFS, Reiserfs ???

Revision history for this message

In Linux Kernel Bug Tracker #12309, bart (bart-linux-kernel-bugs) wrote on 2009-04-13:

#458

Same here using XFS on a multi disk (8) volume and seeing high IO waits.

Revision history for this message

In Linux Kernel Bug Tracker #12309, drees76 (drees76-linux-kernel-bugs) wrote on 2009-04-14:

#459

For anyone who wants to test, here's what to do:

1. Document latencies with current setup which is performing poorly.
2. Document latencies with 2.6.30-rc1 (which should be much better for most people - make sure that if you are using ext3, that you mount your filesystem with the same journalling mode, as the default has changed)

To document latencies, start a large streaming write:

# dd if=/dev/zero of=/tmp/bigfile bs=1M count=5000

And run Ted Tso's latency testing tool in parallel (grab/compile it from here: http://lkml.org/lkml/2009/3/24/227)

If you still have questions, read the last 50 or so comments to this bug for more information.

Revision history for this message

In Linux Kernel Bug Tracker #12309, dixlor (dixlor-linux-kernel-bugs) wrote on 2009-04-14:

#460

(In reply to comment #299)
> For anyone who wants to test, here's what to do:

# uname -a
Linux amd64 2.6.29.1 #4 SMP PREEMPT Fri Apr 3 07:27:52 MSD 2009 x86_64 x86_64 x86_64 GNU/Linux

# cat /proc/meminfo | grep MemTotal
MemTotal: 4127376 kB

#cat /proc/cpuinfo | grep -i "Model name" | uniq

model name : Dual Core AMD Opteron(tm) Processor 265

# cat /proc/mounts | grep ' / '

/dev/sda2 / xfs rw,noatime,nodiratime,relatime,noquota 0 0

# hdparm -i /dev/sda | grep Model

Model=WDC WD1500AHFD-00RAR5, FwRev=21.07QR5, SerialNo=WD-WMAP43732535

/* Western Digital Raptor */

# dd if=/dev/zero of=./bigfile bs=1M count=5000 && ./fsync-tester
5000+0 records in
5000+0 records out
5242880000 bytes (5,2 GB) copied, 69,7789 s, 75,1 MB/s

fsync time: 0.0076
fsync time: 0.0091
fsync time: 0.0436
fsync time: 0.0359
fsync time: 0.0359
fsync time: 0.0359
fsync time: 0.0358
fsync time: 0.0359
fsync time: 0.0359
fsync time: 0.0359
fsync time: 0.0359
fsync time: 0.0358
fsync time: 0.0359
fsync time: 0.0358
fsync time: 0.0359
fsync time: 0.0359
fsync time: 0.0359
fsync time: 0.0359

^C

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2009-04-14:

#461

(In reply to comment #300)
> # dd if=/dev/zero of=./bigfile bs=1M count=5000 && ./fsync-tester

That's supposed to be a single ampersand, which causes the dd process to start in the background so the fsync-tester process can run simultaneously with it.

Revision history for this message

In Linux Kernel Bug Tracker #12309, dixlor (dixlor-linux-kernel-bugs) wrote on 2009-04-14:

#462

(In reply to comment #301)
> ...to start in the background ...

dd if=/dev/zero of=./bigfile bs=1M count=5000 & ./fsync-tester;
[1] 5298
fsync time: 0.0266
fsync time: 0.7677
fsync time: 0.6938
fsync time: 0.5879
fsync time: 1.1956
fsync time: 0.9582
fsync time: 0.9866
fsync time: 1.1833
fsync time: 0.6964
fsync time: 0.9986
fsync time: 0.9624
fsync time: 0.9093
fsync time: 0.9999
fsync time: 0.4423
fsync time: 0.8406
fsync time: 1.0880
fsync time: 0.1754
fsync time: 0.9039
fsync time: 0.8727
fsync time: 0.1261
fsync time: 0.2749
fsync time: 0.8547
fsync time: 0.5241
fsync time: 0.8164
fsync time: 0.4006
fsync time: 0.6532
fsync time: 0.8521
fsync time: 0.4151
fsync time: 0.3384
fsync time: 0.3326
fsync time: 0.4330
fsync time: 0.5800
fsync time: 0.8854
fsync time: 0.5953
fsync time: 0.3899
fsync time: 0.6722
fsync time: 0.1056
fsync time: 0.5554
^C

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-04-14:

#463

Created attachment 20972
fsync tester kernel 17 - 30

I have tested the kernels 17, 18, 20, 28, 29, 29 (patched with http://bugzilla.kernel.org/attachment.cgi?id=20172) and 30 (f4efdd65b754ebbf41484d3a2255c59282720650), which should include the patches.

I got great results with the patched 29 kernel at the beginning and bad results, while executing the test again. This test case is not reliable, or my installation is changing parameters while switching the kernels.

I have executed the two commands concurrent (Comment #299).
dd if=/dev/zero of=./bigfile bs=1M count=5000 & ./fsync-tester

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-04-14:

#464

ASUS P5K
linux suse 2.6.29-53-default x86_64

# cat /proc/meminfo | grep MemTotal
MemTotal: 8196428 kB

# cat /proc/cpuinfo | grep -i "Model name" | uniq
model name : Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz

# cat /proc/mounts | grep ' /home '
/dev/sda3 /home xfs rw,attr2,noquota 0 0

# hdparm -i /dev/sda
Model=ST31000340AS /* Seagate SATA2 */
UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6

~> dd if=/dev/zero of=./bigfile bs=1M count=5000 & ./fsync-tester
[1] 5346
setting up random write file
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB) copied, 90.9677 s, 57.6 MB/s
done setting up random write file
starting fsync run
starting random io!
fsync time: 1.0965s
fsync time: 0.4574s
fsync time: 0.7729s
fsync time: 0.3746s
fsync time: 0.5232s
fsync time: 0.1928s
fsync time: 0.9374s
fsync time: 0.6353s
fsync time: 0.3625s
fsync time: 0.4970s
fsync time: 0.3150s
run done 11 fsyncs total, killing random writer
[1]+ Done dd if=/dev/zero of=./bigfile bs=1M count=5000

~> vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 13 0 38868 164 7778824 0 0 12 23407 959 1940 1 3 0 95
1 13 0 47144 164 7770084 0 0 0 26260 1435 2732 2 3 0 95
0 13 0 39740 164 7774280 0 0 60 30724 1534 2860 2 4 0 94
0 13 0 41124 164 7776080 0 0 0 13888 1103 2038 2 3 0 95
0 13 0 42460 164 7768056 0 0 0 52248 1320 2334 2 3 0 95
1 13 0 40456 164 7776908 0 0 0 3028 1058 1934 2 3 0 95

At the moment of performance of the test operation with graphic interface KDE is impossible

Revision history for this message

In Linux Kernel Bug Tracker #12309, todorovic.s (todorovic.s-linux-kernel-bugs) wrote on 2009-04-15:

#465

Just tried dd if=/dev/zero of=bigfile bs=1M count=20k conv=fdatasync on 2.6.30-rc2 and top still shows iowait of 70% to 90%, on ext3 filesystem.

Motherboard: Gigabyte M57SLI-S4
Distro: Slamd64 12.2

$ cat /proc/meminfo | grep MemTotal
MemTotal: 3089672 kB

$ cat /proc/cpuinfo | grep -i "Model name" | uniq
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4400+

sda:

Model=WDC WD5000AAKS-00TMA0, FwRev=12.01C01, SerialNo=WD-WCAPW4009869
UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6

I believe the ext3 partition was mounted with data=writeback option, but can reboot and confirm if it is important enough.

Revision history for this message

In Linux Kernel Bug Tracker #12309, drees76 (drees76-linux-kernel-bugs) wrote on 2009-04-15:

#466

(In reply to comment #304)
> ASUS P5K
> linux suse 2.6.29-53-default x86_64

You're running a kernel that is known to have high write latencies, and it doesn't appear that your fsync latency test is running in parallel with the dd. With 8GB of RAM, you likely need to change your dd to write out at least 10GB of data instead of 5GB.

(In reply to comment #305)
> Just tried dd if=/dev/zero of=bigfile bs=1M count=20k conv=fdatasync on
> 2.6.30-rc2 and top still shows iowait of 70% to 90%, on ext3 filesystem.

Your system *should* show high iowait when you're stress testing it like that. If it doesn't, you're not writing to disk as fast as it can handle it.

High iowait is normal and expected. It is not an indication of a problem.

What is not expected is high latency during those stress tests.

Ideally you should see sync latencies of less than a second - if latencies get higher than that you are likely using ext3 data=ordered or a broken kernel.

2.6.30-rc2 was just released - that should be used for future tests.

Revision history for this message

In Linux Kernel Bug Tracker #12309, todorovic.s (todorovic.s-linux-kernel-bugs) wrote on 2009-04-15:

#467

2.6.30-rc2

fsync-tester shows mostly < 1 second, except a few times when it goes just above 1 sec.

fsync time: 0.1964
fsync time: 0.2317
fsync time: 0.2923
fsync time: 0.0565
fsync time: 1.1033
fsync time: 0.2297
fsync time: 0.0124

fsync time: 0.0848
fsync time: 0.1049
fsync time: 0.6525
fsync time: 11.1130 <--- not sure what that was
fsync time: 2.2619
fsync time: 0.3535
fsync time: 0.1543
fsync time: 0.2699

Unfortunately, the load average shoots up, peaking at about 8 before I run out of space on the disk. System responsiveness is also affected, but don't have a meaningful measurable quantity.

top - 21:41:06 up 16 min, 6 users, load average: 7.23, 5.93, 3.98
top - 21:42:19 up 17 min, 7 users, load average: 8.12, 6.53, 4.34

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 9 0 19428 10752 2681252 0 0 180 13957 1344 497 2 9 30 59
1 8 0 20100 10780 2680416 0 0 0 47644 2883 1290 2 12 0 86
0 9 0 18908 10816 2681888 0 0 0 22528 2819 858 2 11 0 88
0 10 0 20116 10828 2680952 0 0 4 25080 2865 781 2 7 0 92
0 9 0 18900 10844 2682280 0 0 4 32696 3496 835 0 11 0 90
0 9 0 19040 10876 2681736 0 0 0 29936 3060 1064 1 10 0 89
2 8 0 18880 10892 2680868 0 0 4 47736 2954 731 0 7 0 92
0 9 0 18180 10920 2681448 0 0 0 44160 2723 971 0 13 0 87

/dev/sda4 /home ext3 rw,relatime,errors=continue,data=writeback 0 0

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-04-16:

#468

Download full text (3.7 KiB)

Hi all!

I just ran the tests and obtained this:

######################################################
gad@ws-esp16:~$ ./kernel-test2.sh
Using current dir to do IO tests
####################
## System info
System: 2.6.28-11-generic i686
Tag: 2.6.28-11-generic
Memory MemTotal: 2060636 kB
CPU Model: model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz
Running in .
Mounts:
---------------------
rootfs / rootfs rw 0 0
/dev/disk/by-uuid/ee364958-34b6-474e-8e54-9a9eaff56d12 / ext3 rw,relatime,errors=remount-ro,data=ordered 0 0
---------------------
Sda info:
Model=ST91608220AS , FwRev=3.ALE , SerialNo= 5MA4TF4V
####################
First Test: FsyncProblem

Starting
./test-2.6.28-11-generic-1
We have High IO PID 8949 running
We have fsync-tester with 8950 running...
fsync time: 0.1504
fsync time: 0.5174
fsync time: 0.3664
fsync time: 0.1727
fsync time: 0.2163
fsync time: 0.3080
fsync time: 0.3914
fsync time: 0.1766
fsync time: 0.4800
fsync time: 0.2304
fsync time: 0.4018
fsync time: 0.1159
fsync time: 0.4537
fsync time: 0.1837
fsync time: 0.3032
fsync time: 0.5013
fsync time: 2.0128
fsync time: 0.9343
fsync time: 0.3027
fsync time: 1.2761
fsync time: 0.7145
fsync time: 0.4678
fsync time: 2.0326
fsync time: 0.2019
fsync time: 0.5484
fsync time: 0.3867
fsync time: 0.0912
fsync time: 0.2040
fsync time: 0.3893
fsync time: 0.2703
fsync time: 0.3794
fsync time: 0.5449
fsync time: 0.7379
fsync time: 0.5957
fsync time: 0.6034
fsync time: 0.7915
fsync time: 1.0564
fsync time: 0.5795
fsync time: 0.4501
fsync time: 2.2850
fsync time: 8.1411
fsync time: 1.4754
fsync time: 1.3487
fsync time: 0.9896
fsync time: 0.6221
fsync time: 1.1703
fsync time: 0.2775
fsync time: 0.1842
fsync time: 0.3994
fsync time: 0.5275
fsync time: 0.3382
fsync time: 0.3295
fsync time: 0.6451
fsync time: 0.6803
fsync time: 1.2621
fsync time: 1.3397
fsync time: 0.3250
fsync time: 0.3182
fsync time: 0.3491
fsync time: 0.2745
fsync time: 0.3489
fsync time: 0.5478
fsync time: 0.6009
fsync time: 0.4482
fsync time: 0.3772
fsync time: 0.1414
fsync time: 0.2948
fsync time: 0.2228
fsync time: 0.3758
fsync time: 0.3091
fsync time: 0.2624
fsync time: 0.3526
fsync time: 0.0771
fsync time: 0.2078
fsync time: 0.1613
fsync time: 0.2265
fsync time: 0.2759
fsync time: 0.3231
fsync time: 0.3532
fsync time: 0.1200
fsync time: 0.2788
fsync time: 0.4866
fsync time: 0.2710
fsync time: 0.4107
fsync time: 0.4903
fsync time: 0.5680
fsync time: 0.1199
fsync time: 0.3397
fsync time: 0.3929
fsync time: 0.3373
fsync time: 0.4407
fsync time: 0.2629
fsync time: 0.2998
fsync time: 0.2175
fsync time: 0.3119
fsync time: 0.0971
fsync time: 0.1899
fsync time: 0.4977
fsync time: 0.4127
fsync time: 0.2498
fsync time: 0.8439
fsync time: 0.1513
fsync time: 0.1109
fsync time: 0.2506
fsync time: 0.3414
fsync time: 0.1470
fsync time: 0.0558
./kernel-test2.sh: line 84: 8949 Terminado dd if=/dev/zero of="$io_test_path/test-$info_tag-$i" bs=1M count=5000 oflag=direct
./kernel-test2.sh: line 86: 8950 Terminado ./fsync-tester "$io_test_path/test-$info_tag-$i.fsynctest"
./test-2.6.28-11-generic-1 deleted!
./test-2.6.28-11-generic-1.fsynctest dele...

Hi all!

I just ran the tests and obtained this:

######################################################
gad@ws-esp16:~$ ./kernel-test2.sh 
Using current dir to do IO tests
####################
## System info
System: 2.6.28-11-generic i686
Tag: 2.6.28-11-generic
Memory MemTotal:        2060636 kB
CPU Model: model name	: Intel(R) Core(TM)2 Duo CPU     T7500  @ 2.20GHz
Running in .
Mounts: 
---------------------
rootfs / rootfs rw 0 0
/dev/disk/by-uuid/ee364958-34b6-474e-8e54-9a9eaff56d12 / ext3 rw,relatime,errors=remount-ro,data=ordered 0 0
---------------------
Sda info: 
 Model=ST91608220AS                            , FwRev=3.ALE   , SerialNo=            5MA4TF4V
####################
First Test: FsyncProblem

Starting
./test-2.6.28-11-generic-1
We have High IO PID 8949 running
We have fsync-tester with 8950 running...
fsync time: 0.1504
fsync time: 0.5174
fsync time: 0.3664
fsync time: 0.1727
fsync time: 0.2163
fsync time: 0.3080
fsync time: 0.3914
fsync time: 0.1766
fsync time: 0.4800
fsync time: 0.2304
fsync time: 0.4018
fsync time: 0.1159
fsync time: 0.4537
fsync time: 0.1837
fsync time: 0.3032
fsync time: 0.5013
fsync time: 2.0128
fsync time: 0.9343
fsync time: 0.3027
fsync time: 1.2761
fsync time: 0.7145
fsync time: 0.4678
fsync time: 2.0326
fsync time: 0.2019
fsync time: 0.5484
fsync time: 0.3867
fsync time: 0.0912
fsync time: 0.2040
fsync time: 0.3893
fsync time: 0.2703
fsync time: 0.3794
fsync time: 0.5449
fsync time: 0.7379
fsync time: 0.5957
fsync time: 0.6034
fsync time: 0.7915
fsync time: 1.0564
fsync time: 0.5795
fsync time: 0.4501
fsync time: 2.2850
fsync time: 8.1411
fsync time: 1.4754
fsync time: 1.3487
fsync time: 0.9896
fsync time: 0.6221
fsync time: 1.1703
fsync time: 0.2775
fsync time: 0.1842
fsync time: 0.3994
fsync time: 0.5275
fsync time: 0.3382
fsync time: 0.3295
fsync time: 0.6451
fsync time: 0.6803
fsync time: 1.2621
fsync time: 1.3397
fsync time: 0.3250
fsync time: 0.3182
fsync time: 0.3491
fsync time: 0.2745
fsync time: 0.3489
fsync time: 0.5478
fsync time: 0.6009
fsync time: 0.4482
fsync time: 0.3772
fsync time: 0.1414
fsync time: 0.2948
fsync time: 0.2228
fsync time: 0.3758
fsync time: 0.3091
fsync time: 0.2624
fsync time: 0.3526
fsync time: 0.0771
fsync time: 0.2078
fsync time: 0.1613
fsync time: 0.2265
fsync time: 0.2759
fsync time: 0.3231
fsync time: 0.3532
fsync time: 0.1200
fsync time: 0.2788
fsync time: 0.4866
fsync time: 0.2710
fsync time: 0.4107
fsync time: 0.4903
fsync time: 0.5680
fsync time: 0.1199
fsync time: 0.3397
fsync time: 0.3929
fsync time: 0.3373
fsync time: 0.4407
fsync time: 0.2629
fsync time: 0.2998
fsync time: 0.2175
fsync time: 0.3119
fsync time: 0.0971
fsync time: 0.1899
fsync time: 0.4977
fsync time: 0.4127
fsync time: 0.2498
fsync time: 0.8439
fsync time: 0.1513
fsync time: 0.1109
fsync time: 0.2506
fsync time: 0.3414
fsync time: 0.1470
fsync time: 0.0558
./kernel-test2.sh: line 84:  8949 Terminado               dd if=/dev/zero of="$io_test_path/test-$info_tag-$i" bs=1M count=5000 oflag=direct
./kernel-test2.sh: line 86:  8950 Terminado               ./fsync-tester "$io_test_path/test-$info_tag-$i.fsynctest"
./test-2.6.28-11-generic-1 deleted!
./test-2.6.28-11-generic-1.fsynctest deleted!
 --- Finish --- 
Kernel tested: 2.6.28-11-generic i686

######################################################

I have to say that I killed the dd program manually because it took to much time.

I don't know if it's an issue to get about 1,2MB/S IO as much... I suppose that even for a laptop this is not much normal.

Anyway there are results. 
I updated testsuite to include this new test.

It's called kernel-testsuite.tar.gz and it includes:
   kernel-test-fsync.sh
   fsync-tester

The package contains Theodore Ts'o sources but modified to use 1 parameter for the filename of the output.

I hope this helps.

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2009-04-16:

#469

Created attachment 21007
Automatic test suite for this bug V3

This includes the fsync test

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-04-17:

#470

(In reply to comment #306)
> You're running a kernel that is known to have high write latencies, and it
> doesn't appear that your fsync latency test is running in parallel with the
> dd.

????????????????????????????
Most likely about it is not known to anybody. The bug status 'NEW'.

> With 8GB of RAM, you likely need to change your dd to write out at least
> 10GB
> of data instead of 5GB.

OK (add for (In reply to comment #304))

dd if=/dev/zero of=./bigfile bs=1M count=15000 & ./fsync-tester
...
fsync time: 2.3800
fsync time: 2.4295
fsync time: 2.4099
fsync time: 2.1599
fsync time: 2.0760
fsync time: 2.6152
fsync time: 2.1427
fsync time: 2.4893
fsync time: 2.3252
fsync time: 2.3208
fsync time: 2.4223
...
fsync time: 2.3710
fsync time: 1.3094
fsync time: 1.4473
fsync time: 2.7260
fsync time: 2.2739
fsync time: 2.2078
fsync time: 0.5446
15000+0 records in
15000+0 records out
fsync time: 1.5607
15728640000 bytes (16 GB) copied, 201,724 s, 78,0 MB/s

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 5 0 3930476 6108 3873852 0 0 0 74384 883 1632 1 4 0 94
0 5 0 3864644 6108 3941216 0 0 0 64512 667 1088 1 5 0 93
0 4 0 3788956 6108 4015088 0 0 0 73728 943 1738 2 5 0 93
0 5 0 3735848 6108 4070376 0 0 0 53268 666 1181 1 5 0 94
2 5 0 3671468 6108 4135384 0 0 0 65024 735 1277 1 4 0 94
0 4 0 3590356 6108 4213988 0 0 0 77824 860 1590 2 5 0 93
1 5 0 3524484 6108 4280384 0 0 0 64392 749 1495 1 4 0 94

(In reply to comment #306)
> You're running a kernel that is known to have high write latencies, and it
> doesn't appear that your fsync latency test is running in parallel with the
> dd.

????????????????????????????
Most likely about it is not known to anybody. The bug status 'NEW'.

>  With 8GB of RAM, you likely need to change your dd to write out at least
>  10GB
> of data instead of 5GB.

OK (add for (In reply to comment #304))

dd if=/dev/zero of=./bigfile bs=1M count=15000 & ./fsync-tester
...
fsync time: 2.3800                                                          
fsync time: 2.4295                                                          
fsync time: 2.4099                                                          
fsync time: 2.1599                                                          
fsync time: 2.0760                                                          
fsync time: 2.6152                                                          
fsync time: 2.1427                                                          
fsync time: 2.4893                                                          
fsync time: 2.3252                                                          
fsync time: 2.3208                                                          
fsync time: 2.4223                                                          
...
fsync time: 2.3710
fsync time: 1.3094
fsync time: 1.4473
fsync time: 2.7260
fsync time: 2.2739
fsync time: 2.2078
fsync time: 0.5446
15000+0 records in
15000+0 records out
fsync time: 1.5607
15728640000 bytes (16 GB) copied, 201,724 s, 78,0 MB/s

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  5      0 3930476   6108 3873852    0    0     0 74384  883 1632  1  4  0 94
 0  5      0 3864644   6108 3941216    0    0     0 64512  667 1088  1  5  0 93
 0  4      0 3788956   6108 4015088    0    0     0 73728  943 1738  2  5  0 93
 0  5      0 3735848   6108 4070376    0    0     0 53268  666 1181  1  5  0 94
 2  5      0 3671468   6108 4135384    0    0     0 65024  735 1277  1  4  0 94
 0  4      0 3590356   6108 4213988    0    0     0 77824  860 1590  2  5  0 93
 1  5      0 3524484   6108 4280384    0    0     0 64392  749 1495  1  4  0 94

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-04-20:

#471

Created attachment 21054
test case: Takes the time of mouse click events

All my results shows a high probability of high latencies, when there is a high system time. Most posts where related on high latencies during high IO with SSH connection or with the X-Server. Both uses a network/socket connection. The bug may be in the network stack and not in the io scheduler or block layer.

Here my first test.
The "Example Network Job" test (Flexible IO Tester) shows a regression since 2.6.22.
(see the last test on http://global.phoronix-test-suite.com/?k=profile&u=ebird-3722-22013-9288 )

And here the mouse click test. This test case shows exactly the same regression on all kernels and the same behavior I have recognized in a real environment.

It's !!not!! caused by the fsync bug.

The test case is just clicking on a label and takes the time till the event arrives. It's using the platform's native input queue (see java.util.Robot).

The test case is only a quick solution and has no error handling. It expects a factor as parameter. A high factor like 40.0 means a high sensitiveness and produces a high probability for high latencies, but increases the probability for a missing precondition (no high cpu usage and no high system time) on the current kernels. A value below 5.0 means a bad sensitiveness, which reduces the system time and reduces the probability of capture a high latency event. These values may differ on other machines, as it is not tested on other machines.

For generating the high io, I have used the following commands, but it's enough to copy a big folder (> memory size) too.
# for i in 1 2 3 4 5 6; do dd if=/dev/zero of=t-$i bs=1M count=1K & done

The error occurs with the kernels 2.6.17, 2.6.18 and 2.6.20 only while the cache is filling up withing the first five seconds.

kernel no IO high IO
2.6.17 max 160ms max 35ms (max 2.859s within the first 5 seconds)
2.6.18 max 152ms max 101ms (max 2.430s within the first 5 seconds)
2.6.20 max 164ms max 100ms (max 1.049s within the first 5 seconds)

2.6.27 max 46ms max 6.988s (during IO)
2.6.28 max 51ms max 3.778s (during IO)
2.6.29 max 99ms max 3.632s (during IO)
2.6.30-rc2 max 50ms max 4.993s (during IO)

Unable to run test on this kernel, because of missing preconditions.
2.6.22
2.6.30-rc2 (smp) max 3.624s (during IO)

An output like this or no cpu usage means missing preconditions for the test, reduce the factor.
> High total latency of last 19 events at 138.783s - total latency : 646ms

A factor below 5.0 means the test is not able to be run on this kernel.

P.S.
All tests where done on a kernel without SMP support to reduce multi core scheduler differences with a 250Hz timer and without cpu scaling.
On multi cores system you should busy n-1 cores with an job like this.
# bzip2 -c /dev/zero >/dev/null &

Created attachment 21054
test case: Takes the time of mouse click events

All my results shows a high probability of high latencies, when there is a high system time. Most posts where related on high latencies during high IO with SSH connection or with the X-Server. Both uses a network/socket connection. The bug may be in the network stack and not in the io scheduler or block layer.

Here my first test. 
The "Example Network Job" test (Flexible IO Tester) shows a regression since 2.6.22.
(see the last test on http://global.phoronix-test-suite.com/?k=profile&u=ebird-3722-22013-9288 )

And here the mouse click test. This test case shows exactly the same regression on all kernels and the same behavior I have recognized in a real environment.

It's !!not!! caused by the fsync bug.

The test case is just clicking on a label and takes the time till the event arrives. It's using the platform's native input queue (see java.util.Robot).

The test case is only a quick solution and has no error handling. It expects a factor as parameter. A high factor like 40.0 means a high sensitiveness and produces a high probability for high latencies, but increases the probability for a missing precondition (no high cpu usage and no high system time) on the current kernels. A value below 5.0 means a bad sensitiveness, which reduces the system time and reduces the probability of capture a high latency event. These values may differ on other machines, as it is not tested on other machines.

For generating the high io, I have used the following commands, but it's enough to copy a big folder (> memory size) too.
# for i in 1 2 3 4 5 6; do dd if=/dev/zero of=t-$i bs=1M count=1K & done

The error occurs with the kernels 2.6.17, 2.6.18 and 2.6.20 only while the cache is filling up withing the first five seconds. 
 
kernel         no IO         high IO     
2.6.17     max 160ms     max    35ms (max 2.859s within the first 5 seconds)
2.6.18     max 152ms     max   101ms (max 2.430s within the first 5 seconds)
2.6.20     max 164ms     max   100ms (max 1.049s within the first 5 seconds)

2.6.27     max  46ms     max 6.988s  (during IO)
2.6.28     max  51ms     max 3.778s  (during IO)
2.6.29     max  99ms     max 3.632s  (during IO)
2.6.30-rc2 max  50ms     max 4.993s  (during IO)

Unable to run test on this kernel, because of missing preconditions.
2.6.22                     
2.6.30-rc2 (smp)         max 3.624s  (during IO)

An output like this or no cpu usage means missing preconditions for the test, reduce the factor.
> High total latency of last 19 events at 138.783s - total latency : 646ms

A factor below 5.0 means the test is not able to be run on this kernel.

P.S.
All tests where done on a kernel without SMP support to reduce multi core scheduler differences with a 250Hz timer and without cpu scaling.
On multi cores system you should busy n-1 cores with an job like this.
# bzip2 -c /dev/zero >/dev/null &

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-04-20:

#472

Created attachment 21055
Complete test log

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-04-22:

#473

Hi guys,

I have run my test script, which I ran with previous kernels. There is a pretty big increase in performance. on 2.6.30-rc3. The BIGGEST difference I noticed, about my test output was that vmstat reported large numbers (10) of "uninterruptible sleep" processes. Now, it's down to about 1-4.

I saw some 9 and 10 second fsync latencies, but most were around 0.3 seconds, with some around 1-2 seconds.

However, I don't think the kernel is back to what it used to be yet. I never used to have problems with ext3 fsync latencies at all. It used to be that a simple file copy would not cause much latency issues on the responsiveness of my regular apps. In fact, generally speaking, I never noticed any problems when copying huge files. Now, when copying large files, I still get some choppiness, even with Ted's patches.

I'm wondering if the real problem lies in the block io layer, and not the file system layer?

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-04-22:

#474

The reliability of the mouse click test case (Comment #311) can be improved by adding a random reading process.

# for i in 1 2 3 4 5 6; do dd if=/dev/zero of=t-$i bs=1M count=1K & done
# find / 2>%1 >/dev/null
# java MouseClickTester 40

I am able to catch latencies up to 12 seconds with the kernel 2.6.27 (no smp support). Is there a way to trace such an mouse click event in the kernel? It should be suspend/wait and resume.

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-04-22:

#475

Download full text (5.7 KiB)

Kernel 2.6.30-rc2
Other info see comment #304

TEST 1
----------------------------------------------------------------------------
yura@suse:~> dd if=/dev/zero of=./bigfile bs=1M count=15000 & ./fsync-tester
[1] 4561
fsync time: 0.0401
fsync time: 2.4475
fsync time: 1.7808
fsync time: 1.1141
fsync time: 1.6912
fsync time: 1.0753
fsync time: 1.2931
fsync time: 0.3260
fsync time: 0.3653
fsync time: 0.5603
.....
fsync time: 1.3651
fsync time: 1.0479
fsync time: 1.0806
fsync time: 0.6021
fsync time: 0.4708
fsync time: 1.3952
fsync time: 0.6665
fsync time: 1.4431
fsync time: 1.0893
fsync time: 1.7844
fsync time: 0.6520
fsync time: 0.3665
fsync time: 0.8171
fsync time: 0.7537
fsync time: 1.2100
fsync time: 0.9319
fsync time: 1.1578
fsync time: 1.1377
fsync time: 1.4913
fsync time: 1.0317
fsync time: 0.5870
fsync time: 1.8464
fsync time: 1.4770
fsync time: 1.3934
fsync time: 1.3794
fsync time: 0.7868
15000+0 записей считано
15000+0 записей написано
скопировано 15728640000 байт (16 GB), 172,839 c, 91,0 MB/c
^C

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 4 0 6189644 808 1572324 0 0 4 116748 1585 1548 2 26 6 67
1 3 0 6098828 808 1663460 0 0 0 84472 973 1538 2 7 0 91
0 4 0 6011692 808 1749652 0 0 0 88416 722 1248 2 6 0 92
0 3 0 5915592 808 1844204 0 0 0 95232 996 1668 1 7 0 92
1 4 0 5834692 808 1925564 0 0 0 77832 672 838 1 6 0 93
0 4 0 5755452 808 2005900 0 0 0 79872 940 1472 1 5 0 93
1 2 0 5664856 808 2096760 0 0 0 88744 746 1316 1 6 0 92
0 4 0 5574556 808 2185520 0 0 0 86368 802 1286 1 6 0 93
0 3 0 5492072 808 2268036 0 0 0 81408 785 1112 1 6 0 93
0 4 0 5412744 808 2347624 0 0 0 78344 926 1400 1 5 0 93
0 3 0 5333768 808 2428624 0 0 0 78848 659 1046 1 5 50 43
0 4 0 5245744 808 2516336 0 0 0 86536 992 1526 1 6 50 42
0 4 0 5153952 808 2605988 0 0 0 89088 947 4596 4 7 48 41
0 3 0 5074720 808 2686532 0 0 0 78336 958 1768 1 6 49 43
0 4 0 4974280 808 2787192 0 0 0 92198 706 1028 1 7 20 72
0 3 0 4897224 808 2862716 0 0 0 80905 1046 1650 1 5 49 45
0 4 0 4819832 808 2940944 0 0 0 77348 1193 2076 1 6 0 93
1 2 0 4730172 808 3031732 0 0 0 82104 733 1020 1 6 1 91
0 3 0 4648668 808 3112676 0 0 0 86864 994 1674 1 6 50 42
1 3 0 4556864 808 3203828 0 0 0 87232 708 1136 2 6 49 43

TEST 2
----------------------------------------------------------------------------
yura@suse:~> dd if=/dev/zero of=./bigfile2 bs=1M count=15000
15000+0 записей считано
15000+0 записей написано
скопировано 15728640000 байт (16 GB), 174,036 c, 90,4 MB/c

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 3 ...

Kernel 2.6.30-rc2
Other info see comment #304

TEST 1
----------------------------------------------------------------------------
yura@suse:~> dd if=/dev/zero of=./bigfile bs=1M count=15000 & ./fsync-tester
[1] 4561
fsync time: 0.0401
fsync time: 2.4475
fsync time: 1.7808
fsync time: 1.1141
fsync time: 1.6912
fsync time: 1.0753
fsync time: 1.2931
fsync time: 0.3260
fsync time: 0.3653
fsync time: 0.5603
.....
fsync time: 1.3651
fsync time: 1.0479
fsync time: 1.0806
fsync time: 0.6021
fsync time: 0.4708
fsync time: 1.3952
fsync time: 0.6665
fsync time: 1.4431
fsync time: 1.0893
fsync time: 1.7844
fsync time: 0.6520
fsync time: 0.3665
fsync time: 0.8171
fsync time: 0.7537
fsync time: 1.2100
fsync time: 0.9319
fsync time: 1.1578
fsync time: 1.1377
fsync time: 1.4913
fsync time: 1.0317
fsync time: 0.5870
fsync time: 1.8464
fsync time: 1.4770
fsync time: 1.3934
fsync time: 1.3794
fsync time: 0.7868
15000+0 записей считано
15000+0 записей написано
 скопировано 15728640000 байт (16 GB), 172,839 c, 91,0 MB/c
^C

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  4      0 6189644    808 1572324    0    0     4 116748 1585 1548  2 26  6 67
 1  3      0 6098828    808 1663460    0    0     0 84472  973 1538  2  7  0 91
 0  4      0 6011692    808 1749652    0    0     0 88416  722 1248  2  6  0 92
 0  3      0 5915592    808 1844204    0    0     0 95232  996 1668  1  7  0 92
 1  4      0 5834692    808 1925564    0    0     0 77832  672  838  1  6  0 93
 0  4      0 5755452    808 2005900    0    0     0 79872  940 1472  1  5  0 93
 1  2      0 5664856    808 2096760    0    0     0 88744  746 1316  1  6  0 92
 0  4      0 5574556    808 2185520    0    0     0 86368  802 1286  1  6  0 93
 0  3      0 5492072    808 2268036    0    0     0 81408  785 1112  1  6  0 93
 0  4      0 5412744    808 2347624    0    0     0 78344  926 1400  1  5  0 93
 0  3      0 5333768    808 2428624    0    0     0 78848  659 1046  1  5 50 43
 0  4      0 5245744    808 2516336    0    0     0 86536  992 1526  1  6 50 42
 0  4      0 5153952    808 2605988    0    0     0 89088  947 4596  4  7 48 41
 0  3      0 5074720    808 2686532    0    0     0 78336  958 1768  1  6 49 43
 0  4      0 4974280    808 2787192    0    0     0 92198  706 1028  1  7 20 72
 0  3      0 4897224    808 2862716    0    0     0 80905 1046 1650  1  5 49 45
 0  4      0 4819832    808 2940944    0    0     0 77348 1193 2076  1  6  0 93
 1  2      0 4730172    808 3031732    0    0     0 82104  733 1020  1  6  1 91
 0  3      0 4648668    808 3112676    0    0     0 86864  994 1674  1  6 50 42
 1  3      0 4556864    808 3203828    0    0     0 87232  708 1136  2  6 49 43

TEST 2
----------------------------------------------------------------------------
yura@suse:~> dd if=/dev/zero of=./bigfile2 bs=1M count=15000
15000+0 записей считано
15000+0 записей написано
 скопировано 15728640000 байт (16 GB), 174,036 c, 90,4 MB/c

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  3      0  45296      0 7683084    0    0     0 79360 1196 2213  1  6 33 60
 0  3      0  46896      0 7682140    0    0     0 74752  792 1526  1  6 49 43
 0  3      0  48996      0 7681420    0    0     0 79360 1103 2084  1  6 50 42
 0  3      0  45216      0 7684640    0    0     0 84480  824 1494  2  7 49 42
 1  3      0  46028      0 7684060    0    0     0 78336 1081 1981  1  7 16 76
 1  2      0  46960      0 7684032    0    0     0 82536 1138 2168  1  8  0 91
 0  3      0  50072      0 7680432    0    0     0 79768  760 1473  1  6  0 92
 1  2      0  47396      0 7683396    0    0     0 86680  966 1670  1  6 19 74
 1  3      0  48876      0 7681688    0    0     0 75624  758 1304  2  6  0 92
 0  3      0  50652      0 7680384    0    0     0 83456  983 1656  1  7  8 83
 0  3      0  45072      0 7684236    0    0     0 90624 1151 2103  1  7 47 45
 0  3      0  45308      0 7683464    0    0     0 80896  817 1380  1  6 46 46
 1  3      0  45936      0 7684280    0    0     0 73216 1049 1807  2  6 46 46
 2  2      0  45284      0 7685624    0    0     0 81008  881 1397  1  7 47 45
 0  3      0  47208      0 7683352    0    0     0 84405 1055 1642  1  7 47 45
 0  4      0  48368      0 7682056    0    0     0 76299 1049 1721  1  5 48 45

TEST 3 (all parallel - one hdd)
----------------------------------------------------------------------------
yura@suse:~> dd if=/dev/zero of=./bigfile4 bs=1M count=15000 (terminal 1)
15000+0 записей считано
15000+0 записей написано
 скопировано 15728640000 байт (16 GB), 481,226 c, 32,7 MB/c

yura@suse:~> dd if=/dev/zero of=./bigfile5 bs=1M count=15000 (terminal 2)
15000+0 записей считано
15000+0 записей написано
 скопировано 15728640000 байт (16 GB), 485,821 c, 32,4 MB/c

And at the same time
KDE Menu -> Dolphin -> click -> WAIT 1s -> Dolphin is opened

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  5      0  44792      0 7682396    0    0   116 57368 1016 2083  3  6  0 91
 0  5      0  47988      0 7679356    0    0   140 45112  768 1116  2  4  0 94
 0  6      0  48080      0 7679688    0    0   744 57352  935 1410  1  5  0 94
 3  4      0  45584      0 7679752    0    0  1080 16396 1549 2173  2  3  0 95
 1  5      0  46408      0 7680768    0    0  4648    32 1364 1708  4  2  0 93
 0  6      0  46648      0 7680468    0    0  1080 32824 1052 1592  3  4  0 93
 0  6      0  44884      0 7681468    0    0     8 73453  852 1252  1  6  0 93
 0  5      0  48664      0 7676500    0    0    72 44126  825 1770  1  4  0 95
 0  6      0  44752      0 7678648    0    0   540 71215 1272 2865  2  6  0 92

Revision history for this message

In Linux Kernel Bug Tracker #12309, bob+kernel (bob+kernel-linux-kernel-bugs) wrote on 2009-04-27:

#476

This absolutely cannot be an ext3 bug. I'm using reiserfs for my root, and it happens here too. System totally locks up with a swap storm when memory pressure starts forcing things into swap. Firefox using > 2GB memory, and a wine memory bug which causes it to report ~4GB VIRT are what triggers it for me. Killing either one fixes the storm. (which is often not possible because keyboard/mouse are unresponsive) Machine has 4GB RAM, 4GB swap.

It must be in the block layer, or elsewhere.

It also seems to happen with swap *off*.

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-04-27:

#477

Bob, (In reply to comment #316)
> This absolutely cannot be an ext3 bug. I'm using reiserfs for my root, and
> it
> happens here too. System totally locks up with a swap storm when memory
> pressure starts forcing things into swap. Firefox using > 2GB memory, and a
> wine memory bug which causes it to report ~4GB VIRT are what triggers it for
> me. Killing either one fixes the storm. (which is often not possible
> because
> keyboard/mouse are unresponsive) Machine has 4GB RAM, 4GB swap.
>
> It must be in the block layer, or elsewhere.
>
> It also seems to happen with swap *off*.

Bob, what exact symptoms are you seeing? There is another issue in the kernel, to which I have been unable to reproduce for the kernel devs. I have seen it numerous times where the kernel has "futex" deadlocks. It is potentially possible that yours could be related to that.

Because this performance problem, in this bug, does not cause a complete lockup. It may seem that way for a bit, but if you leave the machine, it will eventually recover. The futex one appears to be a complete deadlock, as it doesn't appear that it matters how long I leave it, it will never recover.

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2009-04-27:

#478

I recently experienced a new (for me) condition wherein this bug reared its ugly head, and it actually did not involve high disk throughput. I was running mencoder, which was pegging three of my CPU cores and using a fair share of the fourth. It was reading from a file on my RAID and writing to a file on a tmpfs, not particularly quickly on either end since it was doing a lot of number crunching in between. The bug cropped up when I started an rsync at the same time, sending some files from my RAID to a remote system, again not particularly quickly (my upstream network bandwidth is only about 80 KB/s). So I wasn't stressing the disk at all, yet my system came to a crawl. I could literally watch windows repainting themselves on expose events. Pressing Ctrl+Alt+Delete to bring up the KRunner process list took at least a minute, if not more. My disks were churning an awful lot, which was odd given the quite low demands I should have been placing on them. I thought maybe the input file to mencoder might have been heavily fragmented, but I ran xfs_fsr on it, and it said it only had 4 extents. Something is seriously FUBAR here.

A possible theory: forcing the disks to seek back and forth to read from the two files "simultaneously" meant that the majority of the time was spent waiting for disk seeks. If the kernel was holding a big lock while waiting for those seeks, it could have seriously degraded the performance of the rest of the system.

Revision history for this message

In Linux Kernel Bug Tracker #12309, bob+kernel (bob+kernel-linux-kernel-bugs) wrote on 2009-04-27:

#479

The bug I'm seeing is extremely reproducible. (I just wait for about a day with firefox running and lots of tabs open and it will happen) As I mentioned it occurs when memory pressure starts forcing things into swap. This is not a hard lockup and the system will eventually recover. (Where "eventually" can be > 30 minutes)

updated and trackerd also cause my system to be unusable, as reported above. I have disabled them as a consequence...

Given that I can trigger it, I can run jobs in the background that could log something useful...locks? fsync? What do you suggest?

(This system has a quad core intel and raid5 root as well -- don't know if that's related)

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-04-28:

#480

Matt,(In reply to comment #318)
> I recently experienced a new (for me) condition wherein this bug reared its
> ugly head, and it actually did not involve high disk throughput.

Yes, that is one of the reasons that I believe there is more to it than just ext3 fsync improvements; it doesn't always take a lot to make it happen.

Matt, do these things happen on 2.6.30-rc3? I've seen an almost disappearance of my issues with this release. It's still not gone, which indicates to me that they just didn't hit the nail on the head. But, it certainly is WAY better.

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2009-04-28:

#481

(In reply to comment #320)
> Matt, do these things happen on 2.6.30-rc3?

I'm not willing to run a pre-release kernel. In fact, the kernel is the only package on my Gentoo system that I intentionally maintain at the "Gentoo stable" level, rather than at the leading edge. This is mostly because I don't want to have to reboot every time a new patch set comes out. Right now I'm running 2.6.28-gentoo-r5, which is based on 2.6.28.9.

If this bug is indeed improved upon in 2.6.30, then I look forward to the release of 2.6.31! :)

Revision history for this message

In Linux Kernel Bug Tracker #12309, Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote on 2009-04-28:

#482

(In reply to comment #317)

>
> Bob, what exact symptoms are you seeing? There is another issue in the
> kernel,
> to which I have been unable to reproduce for the kernel devs. I have seen it
> numerous times where the kernel has "futex" deadlocks. It is potentially
> possible that yours could be related to that.

Trenton, could you please point me to the bug of this issue you are speaking of?

Revision history for this message

In Linux Kernel Bug Tracker #12309, unggnu (unggnu-linux-kernel-bugs) wrote on 2009-04-28:

#483

I am using Ubuntu 9.04 with 2.6.30-rc3 x86_64 Kernel and I can confirm the whole behavior.
The irony is that it feels like Windows 95 while a floppy was formated. You know, the whole pseudo multi tasking on top of Dos - everything was really choppy.
A easy testcase is to set up two luks encrypted partitions and copy from one to another. Even if no core is under heavy load everything is slow. The same happens with usb transfers too.
But as like Matt Whitlock pointed out it is not always a disk io problem. Even under higher cpu usage this could happen. If I encode a DVD with ogmrip/mencoder h264 and 16 threads (16 threads get the highest cpu usage from my quad core which is still under 80% per core) Gnome feels like a formatting Win 95.
The latest problem has become less severe with 2.6.30-rc3 but it is still noticeable slow which makes no sense since no core has 100% load.
To have an comparison how it could work. If I fire up Prime95 whith 100% load on every core in Windows Vista I can still play modern 3D games without lagging. Windows of course has also flaws with IO and so on but the cpu multi tasking works really great. Way to go imho.

Revision history for this message

In Linux Kernel Bug Tracker #12309, nalimilan (nalimilan-linux-kernel-bugs) wrote on 2009-04-28:

#484

FWIW, I've tried the test proposed by Thomas in comment 314:
# for i in 1 2 3 4 5 6; do dd if=/dev/zero of=t-$i bs=1M count=1K & done
# find / 2>%1 >/dev/null
(the Java part did not start for some reason)

I ended force-rebooting my laptop, since it was impossible to control *after a few seconds*. I could only switch to VT and back to X, but very slowly and I couldn't even type a character there or in X. I have 500MB of RAM with a swap of the same size, Pentium M 1500 MHz: not very high config, but that should be sufficient to work, isn't it? :-) This was with 2.6.28, I'll try with 2.6.30rc2.

Revision history for this message

In Linux Kernel Bug Tracker #12309, rockorequin (rockorequin-linux-kernel-bugs) wrote on 2009-04-28:

#485

My system also locks up when it tries to access swap. This is on Ubuntu Jaunty with both the Ubuntu 2.6.28 kernel and Ubuntu's vanilla 2.6.30.rc3 kernel. This machine has 4GB of RAM and 4GB of swap and is running on a root ext4 partition.

My test case is to run multiple VirtualBox VMs (eg Jaunty installations) with say 1.4GB of RAM assigned to each. When I run the third one, as soon as the kernel starts to hit swap, it thrashes the hard drive, X rapidly becomes unresponsive and I have to hard reset the the machine. I am able to move the mouse (slowly) but clicking on individual windows doesn't work and the keyboard doesn't respond. atop -d manages to update itself as far as about 300MB of swap use and then stops updating.

I've left it as long as 15 minutes to see if it will recover, but it doesn't.

Revision history for this message

In Linux Kernel Bug Tracker #12309, unggnu (unggnu-linux-kernel-bugs) wrote on 2009-04-28:

#486

(In reply to comment #325)
> My system also locks up when it tries to access swap.
> My test case is to run multiple VirtualBox VMs (eg Jaunty installations) with
> say 1.4GB of RAM assigned to each. When I run the third one, as soon as the
> kernel starts to hit swap, it thrashes the hard drive, X rapidly becomes
> unresponsive and I have to hard reset the the machine.
There are definitive some huge issues with the Kernel but I think this is not one of them. If your applications try to use more ram than it is available and always trying to access/reserve this mem which is likely with Virtualbox every other OS wouldn't operate fine anymore. Of course it should be possible to switch to console and run some commands but this has nothing to do with this report I think.

Btw. I forgot to mention that I don't use a swap.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-04-28:

#487

(In reply to comment #324)
> I ended force-rebooting my laptop, since it was impossible to control *after
> a
> few seconds*.

It's a extreme test case, as there will be generated a very high load. You can try with only two concurrent write processes, as your machine is PATA, only 1,5GHz and with a single core. And start the java test case at the beginning, it was switched before (a long day).

# java MouseClickTester 40

# for i in 1 2; do dd if=/dev/zero of=t-$i bs=1M count=1K & done
# find / 2>%1 >/dev/null

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-04-28:

#488

Little correction.

# java MouseClickTester 40

# for i in 1 2; do dd if=/dev/zero of=t-$i bs=1M count=1K & done
# find >/dev/null 2>&1

Revision history for this message

In Linux Kernel Bug Tracker #12309, rockorequin (rockorequin-linux-kernel-bugs) wrote on 2009-04-28:

#489

(In reply to comment #326)
>There are definitive some huge issues with the Kernel but I think this is not
>one of them. If your applications try to use more ram than it is available and
>always trying to access/reserve this mem which is likely with Virtualbox every
>other OS wouldn't operate fine anymore. Of course it should be possible to
>switch to console and run some commands but this has nothing to do with this
>report I think.
>Btw. I forgot to mention that I don't use a swap.

@unggnu: this is not a kernel issue?!!! If multiple apps are trying to reserve more RAM than is available and thus causing continuous access to swap, the kernel should NOT become completely unresponsive and require a hard reset, risking data loss or in the case of a remote server that you can't hard reset, denial of service. Surely the memory management system should be able to recognise this condition and take appropriate action, eg freeze one or more processes with high RAM requirements.

At the VERY least it should allow an operator to kill off offending processes, but this is impossible because you can't even login via ssh or access a console. This is where the test case is relevant to this bug - if the system didn't become completely unresponsive, the operator could fix the problem without a hard reset.

Revision history for this message

In Linux Kernel Bug Tracker #12309, drees76 (drees76-linux-kernel-bugs) wrote on 2009-04-28:

#490

IMO, this bug has long past the point where it is useful.

There are far too many people posting with different issues.

There is too much noise to filter through to find a single bug.

There aren't any interested kernel developers following the bug.

The bug needs to be closed and reopened with separate bugs for each issue. Each issue should be reproducible with the latest 2.6.30-rc kernel with a simple test case.

Anything else will just result in another huge bug with 300+ comments and no kernel developer interest.

(In reply to comment #329)
> @unggnu: this is not a kernel issue?!!! If multiple apps are trying to
> reserve
> more RAM than is available and thus causing continuous access to swap

It is not a kernel issue. It is a system configuration issue. If you have a half dozen large memory processes trying to fight for more memory than is available in the system causing each of those processes to be continuously swapped in and out as they fight to run, you're going to get horrible performance.

You either need more memory, less swap so that the OOM killer can kill a process or need to avoid running so many large memory processes in parallel.

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-04-29:

#491

(In reply to comment #330)
> IMO, this bug has long past the point where it is useful.

Even I (the reporter) have more or less stopped tracking this bug. I absolutely agree.

> There are far too many people posting with different issues.
>
> There is too much noise to filter through to find a single bug.
>
> There aren't any interested kernel developers following the bug.

I would definitely agree; the bug has long outlived its usefulness. Closing with INSUFFICIENT_DATA.

> The bug needs to be closed and reopened with separate bugs for each issue.
> Each issue should be reproducible with the latest 2.6.30-rc kernel with a
> simple test case.

Absolutely, all of you who have commented on this bug thusfar should open new bugs. While I can't stop anyone from opening bug reports, it is likely that any report without a definite test case reproducing the issue will turn into yet another grab-bag like this one.

Revision history for this message

In Linux Kernel Bug Tracker #12309, simon+kernelbugzilla (simon+kernelbugzilla-linux-kernel-bugs) wrote on 2009-04-29:

#492

Having tracked bugs 7372 and 12309 on the primary issue (performance hitting a brick wall with heavy IO) since October 2007, and now facing the prospect of needing to track yet another one, can I make a plea that whoever opens the new one(s) posts a reference to the new bug ID(s) in this thread?

Revision history for this message

In Linux Kernel Bug Tracker #12309, nalimilan (nalimilan-linux-kernel-bugs) wrote on 2009-04-29:

#493

Thomas: thanks for that update, and indeed the second and more reasonable testcase does not completely kill the system. I'm seeing a possibly interesting phenomenon: the testcase does not trigger any hang when run alone, but when Firefox is started, I can see swap usage rise, and then the mouse won't move for about a second from time to time.

So my guess is that when the system needs to swap, even for only a few MB, it's not able to do that smoothly for the user. Maybe there's a problem of scheduling when the kernel needs to choose to give priority to the swap or to the root partition. Or that's simply because writing to quite remote places on the disk leads to high latencies. Would that be worth a new bug? I think we're a few experiencing this problem here.

I generally agree that this bug is not leading anywhere, but ATST we don't even know how many different issues there are, so opening new reports is problematic too. Maybe we could concentrate on the few cases we're best able to describe precisely, and hope we all suffer from these...

Revision history for this message

In Linux Kernel Bug Tracker #12309, Lukasz.Kurylo (lukasz.kurylo-linux-kernel-bugs) wrote on 2009-04-29:

#494

I have found this article after I had another "freeze". Just before freeze free memory was running out, swap was barely used, buffers were few hundred kB, BUT CACHE was over 2,7GB out of total 3GB of memory. After about 20 minutes I managed to switch to VT1 and there was now about 500MB of free memory, less cache and increased swap usage. Last output of top showed kswapd process kicking in.

Googling gave me this thread:
http://lkml.indiana.edu/hypermail/linux/kernel/0311.3/0406.html

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-04-29:

#495

Lets summarize the bugs.

- High cache usage during write process enforce swapping of processes
Patch in comment #160 works, but is not included in the linux tree.

- Fsync Bug in Ext3
(There is a test case and a activity)

- Too high prioritization of heavy writing processes
(Copying a big file, can delay the start of a program till finishing the copy
operation)

- Missing read and write based scheduler

And finally the annoying bugs
- Low gui responsiveness during heavy IO
  A reliable test case is still missing.
  - The test case in comment #311 shows high click latencies 2-12s
    during heavy io on non smp kernels
    (on smp kernels too, but it's not easy to catch such an event)
  - I have a socket ping pong test (not submitted), which shops latencies of
    ~2s after the writing processes are finished

- Low gui responsiveness in virtual machines
no test case
maybe the same bug as the "Low gui responsiveness during heavy IO" bug

The gui responsiveness are not deterministic, as there may be a day with nearby no latencies and a hour with continuous latencies up to 60 seconds

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-04-29:

#496

Does anybody know why the caches are not dropped after I echo 3 to drop_caches? I would expect that number to come down to 0 ideally but still few megs may be practically. What I see is after some usage of the system, the caches keep increasing and never go down with drop_caches. The graph is ever increasing. Almost like a leak of caches. Has anybody debugged this aspect? I think this is one of the primary reasons for slow down because memory is locked in the caches and new memory requests are swapping the crap out of the system.

Revision history for this message

In Linux Kernel Bug Tracker #12309, Lukasz.Kurylo (lukasz.kurylo-linux-kernel-bugs) wrote on 2009-04-30:

#497

In case of GUI responsiveness iotop showed relatively high IO after the freeze on X process (read). Maybe X poor responsiveness is caused by waiting for IO as well.

Revision history for this message

In Linux Kernel Bug Tracker #12309, Lukasz.Kurylo (lukasz.kurylo-linux-kernel-bugs) wrote on 2009-04-30:

#498

Interesting thing is cache usage and inability to drop most of it. From my understanding memory cache can be dropped if it's not dirty (has been "wittenback" to disk) this brought me to this thread about lack of writeback:
http://marc.info/?l=linux-kernel&m=113919849421679&w=2

On the other side /proc/meminfo shows only ~160kB of dirty memory. Cache shows 880868 KB. echo 3 > /proc/sys/drop_caches doesn't do anything. So why cache can't be freed? Is it possible to have cache leak?

Revision history for this message

In Linux Kernel Bug Tracker #12309, Lukasz.Kurylo (lukasz.kurylo-linux-kernel-bugs) wrote on 2009-04-30:

#499

Looks like drop_caches stopped working as expected somewhere around 2.6.18:
look at first comment:
http://jons-thoughts.blogspot.com/2007/09/tip-of-day-dropcaches.html

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2009-04-30:

#500

Be careful using drop_caches. I actually managed to cause a kernel crash by using it in combination with a removable medium. Think it was a double-free bug, but I don't remember for certain.

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-04-30:

#501

It has been mentioned time and again that none of the kernerl devs have gotten a concise description of the problem and hence none of them seems to have any nswers. Well, does anybody know why my caches show 700MB in a 2GB machine and why can't I get rid of any of it? I don't think the question can get any more precise.This is the heart of the problem folks.

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-04-30:

#502

(In reply to comment #341)
> It has been mentioned time and again that none of the kernerl devs have
> gotten
> a concise description of the problem and hence none of them seems to have any
> nswers. Well, does anybody know why my caches show 700MB in a 2GB machine and
> why can't I get rid of any of it? I don't think the question can get any more
> precise.This is the heart of the problem folks.

I don't understand why you'd assume that cache is a problem. The kernel uses available RAM as cache as it's the most productive use for it. To assume that this is buggy behavior is extremely misled logic.

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-04-30:

#503

(In reply to comment #342)
> (In reply to comment #341)
> > It has been mentioned time and again that none of the kernerl devs have
> gotten
> > a concise description of the problem and hence none of them seems to have
> any
> > nswers. Well, does anybody know why my caches show 700MB in a 2GB machine
> and
> > why can't I get rid of any of it? I don't think the question can get any
> more
> > precise.This is the heart of the problem folks.
>
> I don't understand why you'd assume that cache is a problem. The kernel uses
> available RAM as cache as it's the most productive use for it. To assume that
> this is buggy behavior is extremely misled logic.

What's buggy is that its not ready to relinquish it when asked to drop it or when needed. echo 3 to drop_caches should drop the damn thing. If I configure swappiness=1, cache should be dropped first and then swap disk should be used. I don't like it locking 700MB out of my 2GB RAM, then swapping heavily. If this behavior is by design, someone needs to change that design.

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-05-01:

#505

Kernel 2.6.30-rc3
If a task is executed using a processor on ~30 percents and work is simultaneously executed with the file system (cp, mv, rm) - a computer dies, at this juncture to start something is not real yet.

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-05-01:

#506

The kernel 2.6.30-rc4 is not better.
This bug "Large I/O operations result in slow performance and high iowait times" has passed from the status NEW in not clear state but iowait both was high and remained. Stop these frauds. What data is still necessary?

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-05-01:

#507

(In reply to comment #346)
> The kernel 2.6.30-rc4 is not better.
> This bug "Large I/O operations result in slow performance and high iowait
> times" has passed from the status NEW in not clear state but iowait both was
> high and remained. Stop these frauds. What data is still necessary?

No. Kernel folks will not "stop these frauds". Is your system using the latest DDR3 memory running at 2000Mhz? Is it using core i7 based processor, overclocked to 4.5Ghz? Does it have SSD drives with at least 150MB/s writes? Are you using ext4 yet? If all of these are true, and your system still hangs, only then kernel devs will "stop these frauds" and fix this bug. Until then, just use Vista (make sure to upgrade to SP1)...:D
...

...
in case you couldn't tell, I was just kidding with ya! Please file a separate bug report with specific details about what you are experiencing on your system.

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-05-02:

#508

Terminal 1 (no other active task)
:~/x1> time cp -r qt-x11-opensource-src-4.5.1 qt-x11-opensource-src-4.5.1-1

real 5m51.075s
user 0m0.147s
sys 0m2.192s

302.6Mb / 351s = 0.9 Mb/s

Terminal 2
:~/x1> vmstat 1

<- only cp
0 0 0 4916172 808 2774512 0 0 24 16248 794 1469 2 3 95 0
2 0 0 4915228 808 2776340 0 0 24 2180 959 1385 2 1 97 0
1 0 0 4913492 808 2778140 0 0 24 3144 841 1251 1 1 97 0
0 0 0 4912500 808 2779104 0 0 24 2636 679 936 2 1 97 0
1 0 0 4910516 808 2781112 0 0 32 2804 862 1258 2 1 96 0
0 1 0 4908872 808 2781812 0 0 36 27160 749 913 5 2 91 2
<- enter to foolder in dolphin (100 files in this folder)
2 0 0 4907012 808 2783712 0 0 48 2615 1108 1563 3 2 82 12
0 1 0 4906020 808 2784728 0 0 80 3248 890 1274 2 1 64 33
0 1 0 4905648 808 2785164 0 0 56 2933 705 920 3 1 67 28
0 1 0 4904828 808 2786028 0 0 84 2600 884 1240 2 1 49 47
0 1 0 4903456 808 2787400 0 0 44 3148 723 873 3 1 62 35
0 1 0 4902084 808 2788572 0 0 64 2681 1177 2604 3 1 49 47
0 1 0 4901464 808 2789284 0 0 48 2328 952 1407 2 1 63 34
0 1 0 4900556 808 2790416 0 0 36 2624 951 2373 4 1 59 35
1 1 0 4898040 808 2792868 0 0 60 2672 1224 4018 8 3 46 43
0 1 0 4897032 808 2793868 0 0 80 2760 693 1004 2 1 49 47
0 1 0 4895552 808 2795304 0 0 28 2459 1029 1495 2 1 81 15
0 1 0 4894552 808 2796408 0 0 84 2744 877 1279 2 1 49 47
0 1 0 4892700 808 2798272 0 0 76 2204 773 1078 4 1 48 47

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-05-02:

#509

(In reply to comment #348)

You need to open a new bug report with a thorough explanation of your test case, expected and observed results, and any pertinent data you may have collected. Leave a note here referencing your newly created bug but posting any data here is not going to help anyone. This bug is closed due to a lack of focus.

Revision history for this message

In Linux Kernel Bug Tracker #12309, Lukasz.Kurylo (lukasz.kurylo-linux-kernel-bugs) wrote on 2009-05-05:

#510

I need to get back to 2.6.17, can't work like that! I have 3GB RAM of which >2GB are used by cache that won't drop even if memory is runnig out.

Revision history for this message

In Linux Kernel Bug Tracker #12309, akpm (akpm-linux-kernel-bugs) wrote on 2009-05-05:

#511

(In reply to comment #349)
> (In reply to comment #348)
>
> You need to open a new bug report with a thorough explanation of your test
> case, expected and observed results, and any pertinent data you may have
> collected. Leave a note here referencing your newly created bug but posting
> any
> data here is not going to help anyone. This bug is closed due to a lack of
> focus.

yup.

Guys, problems like this aren't sovled very effectively via bugzilla.

Please prefer to report these issues via email to linux-kernel and
myself and any developers who you think might be relevant. it's confusing,
and clarity is important. Being able to provide a means by which others
can demonstrate the problem is a huge benefit.

Revision history for this message

In Linux Kernel Bug Tracker #12309, trent.bugzilla (trent.bugzilla-linux-kernel-bugs) wrote on 2009-05-06:

#512

Why am I still being CC'd on this bug, even though I'm not on the CC list?

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2009-05-06:

#513

(In reply to comment #352)
> Why am I still being CC'd on this bug, even though I'm not on the CC list?

Maybe you're watching Jens Axboe (the assignee), Ben Gamari (the reporter), or another user who is still in the CC list.

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-05-19:

#514

kernel 2.6.30-rc6
yura@suse:~> export LANG=en
yura@suse:~> dd if=/dev/zero of=test1 bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 129.928 s, 80.7 MB/s

yura@suse:~> vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
3 8 0 44708 0 7628596 0 0 249 12283 362 821 4 3 66 28
0 8 0 49180 0 7627532 0 0 0 40517 1061 1581 7 4 0 89
0 7 0 47188 0 7627180 0 0 0 59694 1156 1505 5 6 0 89
1 6 0 46692 0 7628276 0 0 0 55553 1144 1476 6 5 0 90
0 8 0 46428 0 7628160 0 0 20 51573 900 1096 5 4 0 90
0 7 0 46568 0 7627860 0 0 0 64024 1127 1480 5 5 0 90
0 7 0 45796 0 7629100 0 0 12 44597 889 987 6 4 0 90
0 8 0 46904 0 7627808 0 0 332 40500 1100 1485 6 4 0 90
0 7 0 47772 0 7626884 0 0 168 45300 1158 1628 6 4 0 90
0 8 0 47216 0 7624456 0 0 72 67116 958 1151 5 5 0 90
0 7 0 47032 0 7626480 0 0 280 29244 1177 1667 5 4 0 91
0 7 0 45936 0 7626640 0 0 248 58872 922 1060 6 5 0 89
0 9 0 44988 0 7626640 0 0 216 62492 945 1359 2 6 0 92
0 8 0 47548 0 7625932 0 0 152 47164 926 1425 1 4 0 95
1 6 0 45276 0 7627256 0 0 36 54721 605 1089 2 4 0 94
0 7 0 48208 0 7626388 0 0 44 43612 834 1198 1 4 0 95
0 8 0 47096 0 7625644 0 0 132 53789 655 1156 1 4 0 94
0 7 0 46344 0 7624828 0 0 468 50292 981 2089 2 4 0 94
0 8 0 46576 0 7625416 0 0 116 44056 1155 2119 1 3 0 96
0 8 0 47476 0 7624800 0 0 636 38936 734 1125 2 4 0 94
0 8 0 47348 0 7626676 0 0 32 58410 885 1613 1 5 0 93
1 6 0 48508 0 7626280 0 0 0 67256 623 969 1 4 0 94
0 7 0 47984 0 7625328 0 0 0 64888 694 1335 2 6 0 92
0 7 0 45800 0 7626692 0 0 0 62496 1002 1698 1 4 0 95
0 7 0 48220 0 7625052 0 0 0 61952 614 1222 2 5 0 93
0 7 0 48508 0 7623300 0 0 0 69632 890 1586 1 5 0 94

kernel 2.6.30-rc6
yura@suse:~> export LANG=en
yura@suse:~> dd if=/dev/zero of=test1 bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 129.928 s, 80.7 MB/s

yura@suse:~> vmstat 1                                                           
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----   
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa   
 3  8      0  44708      0 7628596    0    0   249 12283  362  821  4  3 66 28  
 0  8      0  49180      0 7627532    0    0     0 40517 1061 1581  7  4  0 89  
 0  7      0  47188      0 7627180    0    0     0 59694 1156 1505  5  6  0 89  
 1  6      0  46692      0 7628276    0    0     0 55553 1144 1476  6  5  0 90  
 0  8      0  46428      0 7628160    0    0    20 51573  900 1096  5  4  0 90  
 0  7      0  46568      0 7627860    0    0     0 64024 1127 1480  5  5  0 90  
 0  7      0  45796      0 7629100    0    0    12 44597  889  987  6  4  0 90  
 0  8      0  46904      0 7627808    0    0   332 40500 1100 1485  6  4  0 90  
 0  7      0  47772      0 7626884    0    0   168 45300 1158 1628  6  4  0 90
 0  8      0  47216      0 7624456    0    0    72 67116  958 1151  5  5  0 90
 0  7      0  47032      0 7626480    0    0   280 29244 1177 1667  5  4  0 91
 0  7      0  45936      0 7626640    0    0   248 58872  922 1060  6  5  0 89
 0  9      0  44988      0 7626640    0    0   216 62492  945 1359  2  6  0 92
 0  8      0  47548      0 7625932    0    0   152 47164  926 1425  1  4  0 95
 1  6      0  45276      0 7627256    0    0    36 54721  605 1089  2  4  0 94
 0  7      0  48208      0 7626388    0    0    44 43612  834 1198  1  4  0 95
 0  8      0  47096      0 7625644    0    0   132 53789  655 1156  1  4  0 94
 0  7      0  46344      0 7624828    0    0   468 50292  981 2089  2  4  0 94
 0  8      0  46576      0 7625416    0    0   116 44056 1155 2119  1  3  0 96
 0  8      0  47476      0 7624800    0    0   636 38936  734 1125  2  4  0 94
 0  8      0  47348      0 7626676    0    0    32 58410  885 1613  1  5  0 93
 1  6      0  48508      0 7626280    0    0     0 67256  623  969  1  4  0 94
 0  7      0  47984      0 7625328    0    0     0 64888  694 1335  2  6  0 92
 0  7      0  45800      0 7626692    0    0     0 62496 1002 1698  1  4  0 95
 0  7      0  48220      0 7625052    0    0     0 61952  614 1222  2  5  0 93
 0  7      0  48508      0 7623300    0    0     0 69632  890 1586  1  5  0 94

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-05-19:

#515

Download full text (12.9 KiB)

#354 this bigfile
yura@suse:~> time cp bigfile bigfile.cp

real 5m52.457s
user 0m0.343s
sys 0m21.356s

calc speed => 10485760000 / 352.457 = 29.75 Mb/s

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 0 46688 0 7686820 0 0 0 12 564 862 1 0 98 0
0 0 0 46688 0 7686820 0 0 20 0 387 730 2 0 96 1
0 0 0 46688 0 7686840 0 0 0 0 559 879 1 0 98 0
0 0 0 46688 0 7686840 0 0 0 0 598 937 1 1 97 0
0 0 0 46688 0 7686840 0 0 0 0 315 517 2 1 98 0
0 0 0 46704 0 7686840 0 0 0 16 600 1058 2 1 97 0
0 0 0 46704 0 7686840 0 0 0 0 328 473 2 0 98 0
0 0 0 46704 0 7686840 0 0 0 0 610 1122 2 0 98 0
0 0 0 46704 0 7686840 0 0 0 0 582 1013 2 0 98 0
0 0 0 46876 0 7686840 0 0 0 1 341 475 1 0 98 0
0 0 0 46876 0 7686840 0 0 0 0 577 988 2 0 98 0
0 0 0 46876 0 7686840 0 0 0 0 339 543 2 1 97 0
start cp
3 0 0 46500 0 7686704 0 0 17500 0 857 2379 2 2 91 5
3 0 0 43840 0 7689132 0 0 90624 0 2119 5710 4 11 61 24
0 1 0 46532 0 7686180 0 0 83968 0 2008 5246 8 11 57 24
1 1 0 43884 0 7689020 0 0 81024 46 2159 8097 6 10 59 25
0 1 0 45020 0 7687772 0 0 81920 1 1759 3732 4 10 60 26
0 1 0 44948 0 7687472 0 0 91264 0 2154 4449 4 10 60 25
0 1 0 43924 0 7688888 0 0 88064 0 2040 4500 3 11 60 26
0 1 0 46180 0 7686288 0 0 89984 0 1919 4107 3 11 63 22
0 2 0 44692 0 7680932 0 0 86784 39184 2156 4820 4 12 47 38
0 2 0 44568 0 7681376 0 0 64384 22436 1569 4127 3 7 35 54
0 2 0 44092 0 7682832 0 0 35584 37396 1331 2886 3 5 37 55
4 2 0 46920 0 7678572 0 0 42624 43336 1544 3311 3 6 28 63
0 2 0 45724 0 7679280 0 0 49792 31240 1301 3076 2 6 27 64
0 2 0 45328 0 7681288 0 0 41856 31648 1473 3322 3 5 27 65
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 2 0 46328 0 7679272 0 0 52480 29232 1425 3345 3 8 33 56
1 2 0 46276 0 7679748 0 0 48768 24844 1564 3539 3 6 32 60
1 2 0 47196 0 7678088 0 0 63360 28688 1830 4781 3 9 14 74
5 3 0 44052 0 7681744 0 0 58112 23612 1493 3905 3 8 5 83
1 2 0 44988 0 7679956 0 0 18560 53021 1107 2129 2 4 0 94
0 4 0 46872 0 7677272 0 0 55808 22541 1478 4117 3 7 1 89
0 4 0 43824 0 7681360 0 0 52608 33800 1627 4628 3 7 0 89
0 4 0 45536 0 7679600 0 0 41856 31720 1491 4026 3 6 2 89
0 4 0 45688 ...

#354 this bigfile
yura@suse:~> time cp bigfile bigfile.cp

real    5m52.457s
user    0m0.343s
sys     0m21.356s

calc speed =>  10485760000 / 352.457 = 29.75 Mb/s

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- 
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa 
 1  0      0  46688      0 7686820    0    0     0    12  564  862  1  0 98  0
 0  0      0  46688      0 7686820    0    0    20     0  387  730  2  0 96  1
 0  0      0  46688      0 7686840    0    0     0     0  559  879  1  0 98  0
 0  0      0  46688      0 7686840    0    0     0     0  598  937  1  1 97  0
 0  0      0  46688      0 7686840    0    0     0     0  315  517  2  1 98  0
 0  0      0  46704      0 7686840    0    0     0    16  600 1058  2  1 97  0
 0  0      0  46704      0 7686840    0    0     0     0  328  473  2  0 98  0
 0  0      0  46704      0 7686840    0    0     0     0  610 1122  2  0 98  0
 0  0      0  46704      0 7686840    0    0     0     0  582 1013  2  0 98  0
 0  0      0  46876      0 7686840    0    0     0     1  341  475  1  0 98  0
 0  0      0  46876      0 7686840    0    0     0     0  577  988  2  0 98  0
 0  0      0  46876      0 7686840    0    0     0     0  339  543  2  1 97  0
start cp
 3  0      0  46500      0 7686704    0    0 17500     0  857 2379  2  2 91  5
 3  0      0  43840      0 7689132    0    0 90624     0 2119 5710  4 11 61 24
 0  1      0  46532      0 7686180    0    0 83968     0 2008 5246  8 11 57 24
 1  1      0  43884      0 7689020    0    0 81024    46 2159 8097  6 10 59 25
 0  1      0  45020      0 7687772    0    0 81920     1 1759 3732  4 10 60 26
 0  1      0  44948      0 7687472    0    0 91264     0 2154 4449  4 10 60 25
 0  1      0  43924      0 7688888    0    0 88064     0 2040 4500  3 11 60 26
 0  1      0  46180      0 7686288    0    0 89984     0 1919 4107  3 11 63 22
 0  2      0  44692      0 7680932    0    0 86784 39184 2156 4820  4 12 47 38
 0  2      0  44568      0 7681376    0    0 64384 22436 1569 4127  3  7 35 54
 0  2      0  44092      0 7682832    0    0 35584 37396 1331 2886  3  5 37 55
 4  2      0  46920      0 7678572    0    0 42624 43336 1544 3311  3  6 28 63
 0  2      0  45724      0 7679280    0    0 49792 31240 1301 3076  2  6 27 64
 0  2      0  45328      0 7681288    0    0 41856 31648 1473 3322  3  5 27 65
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- 
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa 
 0  2      0  46328      0 7679272    0    0 52480 29232 1425 3345  3  8 33 56
 1  2      0  46276      0 7679748    0    0 48768 24844 1564 3539  3  6 32 60
 1  2      0  47196      0 7678088    0    0 63360 28688 1830 4781  3  9 14 74
 5  3      0  44052      0 7681744    0    0 58112 23612 1493 3905  3  8  5 83
 1  2      0  44988      0 7679956    0    0 18560 53021 1107 2129  2  4  0 94
 0  4      0  46872      0 7677272    0    0 55808 22541 1478 4117  3  7  1 89
 0  4      0  43824      0 7681360    0    0 52608 33800 1627 4628  3  7  0 89
 0  4      0  45536      0 7679600    0    0 41856 31720 1491 4026  3  6  2 89
 0  4      0  45688      0 7680620    0    0 39424 34324 1202 3190  3  6  6 85
 1  4      0  46052      0 7679708    0    0 48104 27148 1901 5505  3  7  2 88
 0  2      0  44120      0 7680964    0    0 53280  6660 1531 5967  3  8  6 83
 2  3      0  45692      0 7679440    0    0 55784     0 1837 6209  4  7  4 84
 0  5      0  44796      0 7678908    0    0 52452 10248 1953 6444  4  8  1 87
 0  2      0  46952      0 7676344    0    0 45264 24619 2085 8506  6  7  0 87
 0  6      0  44820      0 7677336    0    0 34588 41550 2438 5289  2  7  4 86
 0  2      0  47084      0 7675392    0    0 31016 34352 1203 3506  3  5 17 74
 2  3      0  46856      0 7674440    0    0 10252 67612  685 1250  2  3 11 84
 0  5      0  45072      0 7677236    0    0 48004 15368 1575 4859  2  7  5 85
 0  5      0  45504      0 7678588    0    0 29824 23240  952 2688  3  3  8 85
 0  5      0  45020      0 7678452    0    0 62080  8196 1865 6184  3  9  0 88
 0  3      0  44564      0 7679208    0    0  6272 61473  607  926  2  2 24 71
 1  3      0  46444      0 7676624    0    0 14216 64046 1059 2083  3  2 41 54
 0  2      0  44444      0 7680932    0    0 53636 16048 1448 5787  3  8 10 79
 0  2      0  45188      0 7680048    0    0 40320 34024 1439 3785  3  6  1 90
 1  3      0  46208      0 7679612    0    0 63872 10248 1628 4998  3  9  1 87
 3  4      0  45808      0 7680852    0    0 47360 27152 2030 5505  3  6  7 83
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- 
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa 
 0  5      0  44496      0 7681320    0    0 45828 29452 1683 4815  3  6  0 90
 0  5      0  46044      0 7679900    0    0 44160 28704 1273 3575  3  7  3 87
 0  5      0  44676      0 7679868    0    0 17280 62076  840 1855  3  3  1 93
 0  5      0  44716      0 7681108    0    0 53504 11268 1862 6838  3  6  0 91
 0  4      0  44504      0 7681556    0    0 42880 30905 1460 3885  3  7  2 88
 0  3      0  44364      0 7680420    0    0 19712 69876  904 1960  2  4  3 91
 0  4      0  48668      0 7678024    0    0 52096 21788 1904 5313  3  7  1 88
 0  4      0  47032      0 7677752    0    0 28032 54580 1066 2496  3  5 16 76
 0  3      0  46600      0 7678904    0    0 41344 33300 1426 4071  3  6 10 80
 0  4      0  45424      0 7679184    0    0 41856 40276 1240 3204  3  6  0 91
 0  4      0  45692      0 7680156    0    0 46080 18172 1479 4094  3  6  8 83
 0  4      0  45908      0 7679112    0    0 45824 40108 1660 4158  3  7  4 86
 0  4      0  44048      0 7681292    0    0 49408 32776 1345 3872  3  7  8 81
 0  3      0  46548      0 7678436    0    0 50816 22604 1609 4448  3  7 12 77
 0  2      0  46156      0 7678672    0    0 46464 35852 1350 3829  2  7 12 79
 0  3      0  45244      0 7678956    0    0 50304 26420 1664 4697  3  7  4 85
 0  3      0  48256      0 7675968    0    0 33280 31759 1325 3366  3  5 13 79
 0  4      0  44852      0 7679008    0    0 35328 53281 1168 3202  3  5 39 52
 0  4      0  46668      0 7677444    0    0 38784 24628 1390 3743  2  5 14 78
 0  5      0  45028      0 7680464    0    0 49920 20492 1373 4244  3  7  3 87
 0  5      0  45356      0 7681500    0    0 33408 35548 1369 3642  3  5  2 90
 0  5      0  46896      0 7679744    0    0 57216 23808 1526 4868  3  7  3 87
 1  1      0  45884      0 7679008    0    0 34432 50311 1427 3789  3  5  3 89
 2  3      0  44592      0 7679592    0    0 45952 40989 1378 4140  2  7  3 87
 1  2      0  44188      0 7680924    0    0 25856 48793 1023 2593  3  4  1 92
 0  5      0  45092      0 7680536    0    0 46336 26184 1533 4502  3  6  5 86
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- 
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa 
 1  5      0  44880      0 7681796    0    0 46208 23228 1560 4530  3  7  4 85
 0  5      0  46412      0 7679076    0    0 60800 23884 1623 5242  3  8  7 81
 0  5      0  46488      0 7679024    0    0 50048 24908 1637 4677  3  7  8 83
 0  2      0  44564      0 7681684    0    0 36224 39584 1140 3492  3  4 11 81
 0  5      0  45200      0 7681196    0    0 50304 22796 1757 5024  3  7  7 82
 0  3      0  46924      0 7672852    0    0 34560 43484 1175 3359  2  6  0 91
 0  6      0  45632      0 7675104    0    0 37504 36780 1346 3847  3  6  4 87
 0  6      0  45988      0 7673412    0    0 43776 42904 1434 4643  3  6  1 91
 0  5      0  48244      0 7669064    0    0 30848 44548 1094 3317  3  5  3 88
 0  5      0 286084      0 7439272    0    0 35456 33520 1403 3765  3  7  8 82
 2  4      0 181440      0 7543056    0    0 51968 27148 1491 4836  3  6  3 87
 0  4      0 119520      0 7601468    0    0 29184 43984  969 2825  3  4  4 89
 0  6      0  44456      0 7680104    0    0 43008 34584 1377 4120  3  5  2 89
 1  5      0  49396      0 7666304    0    0 29956 56836 1091 3636  2  5  0 92
 0  6      0  47008      0 7669068    0    0 37508 22336 1439 4405  3  6  0 91
 0  6      0  45312      0 7670448    0    0 20864 32748 1173 3639  2  4  0 94
 0  7      0  45792      0 7673916    0    0 29568 14856  996 2870  3  5  1 91
 0  6      0  44136      0 7679304    0    0 16128 63532 1052 2344  2  3  4 90
 0  3      0  44800      0 7678884    0    0 49664 15644 1407 5044  2  7  1 89
 0  7      0  45180      0 7679664    0    0 44416 28660 1534 4548  2  6  2 89
 0  7      0  43892      0 7678188    0    0 30080 35848 1365 4087  3  4  2 91
 0  7      0  45176      0 7672668    0    0 35456 39344 1145 3608  3  5  1 90
 0  7      0  46560      0 7671748    0    0 25472 36872 1239 3494  3  5  0 92
 0  4      0  47096      0 7675136    0    0 30464 24052 1044 3068  2  5  5 88
 0  5      0  43864      0 7683468    0    0 34944 31988 1274 3345  3  5 10 82
 0  3      0  44452      0 7676068    0    0 32640 43300 1758 5028  2  5  1 93
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- 
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa 
 1  5      0  46088      0 7679004    0    0 26368 34104  951 2415  3  4 10 83
 1  7      0  44644      0 7680964    0    0 39296 40568 1457 4161  3  5  4 87
 0  3      0  46904      0 7677248    0    0 17280 52767  847 2121  2  4 26 68
 0  6      0  45004      0 7679484    0    0 33408 36716 1135 3297  3  5  8 83
 0  3      0  45324      0 7680364    0    0 32128 41892 1873 4455  2  5  7 85
 0  3      0  45672      0 7679632    0    0 38144 32788 1146 2985  2  5  4 88
 0  4      0  44792      0 7681208    0    0 31488 34320 1255 2825  3  4  6 87
 0  5      0  44856      0 7678824    0    0 40960 31488 1259 3753  3  6  1 89
 5  4      0  46340      0 7678096    0    0 53632 25620 1703 5203  3  7  4 86
 0  3      0  46204      0 7678884    0    0 34816 37500 1395 3642  2  5  4 88
 0  5      0  47052      0 7671596    0    0 51840 40476 1460 4763  3  6  3 87
 0  3      0  44276      0 7676316    0    0 27520 30743 1245 3234  3  4  2 90
 0  5      0  46692      0 7678184    0    0 22784 48161  909 2197  2  5  3 90
 0  5      0  44396      0 7679168    0    0 46336 30268 1546 4511  3  6  5 85
 0  5      0  44352      0 7680436    0    0 43264 26744 1543 4340  2  6  2 90
 0  5      0  45496      0 7672684    0    0 44672 37352 1397 4869  3  7  0 90
 0  5      0  49372      0 7662316    0    0 33792 49596 1403 3972  3  6  1 90
 0  4      0  45644      0 7677232    0    0 24960 22280  926 2576  2  4  3 90
 0  6      0  44964      0 7680548    0    0 53120 33176 1705 5679  3  7  1 88
 0  7      0  45844      0 7671820    0    0 22456 37240 1288 4498  3  5  0 92
 0  8      0  45220      0 7667760    0    0 27404 24584  979 2885  2  5  0 93
 0  7      0  45812      0 7666412    0    0 21308 24720 1304 4636  2  3  5 89
 0  6      0  44268      0 7664988    0    0 29152 36896 1250 4517  3  5  0 92
 1  7      0  48448      0 7667896    0    0 27376 21508 1298 4418  3  5  0 92
 0  3      0  52464      0 7672108    0    0 27776 36073 1239 4920  3  5  0 92
 0  4      0  46272      0 7677508    0    0 43008 40522 1351 4645  2  6  0 91
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  7      0  45520      0 7672060    0    0 39296 36952 1499 4941  3  6  0 90
 2  6      0  44008      0 7667764    0    0 41088 39800 1286 4794  3  6  0 91
 0  5      0  45480      0 7665824    0    0 32512 39428 1362 4147  3  5  1 90
 5  7      0  45260      0 7674556    0    0 41728 18204 1464 5160  2  6  1 90
 0  6      0  44860      0 7674124    0    0 31360 43012 1069 3452  3  5  4 88
 1  5      0  44912      0 7672628    0    0 31744 43300 1365 4663  3  5  0 92
 0  4      0  50228      0 7674084    0    0 22656 51238  922 2896  2  5  3 90
 1  6      0  44768      0 7680248    0    0 51840 24756 1661 6082  3  7  5 84
 0  6      0  47312      0 7678352    0    0 45952 25656 1554 5644  3  6  3 87
 0  8      0  44808      0 7675632    0    0 39552 47932 1247 4241  3  6  1 90
 0  8      0  45832      0 7664640    0    0 33024 47104 1447 4698  2  5  0 93
 0  9      0  46444      0 7664368    0    0 40192 41608 1299 4904  3  6  0 91
 0  9      0  45212      0 7673000    0    0 39296 16156 1461 5472  3  5  0 92
 5  6      0  44980      0 7673068    0    0 25216 51256 1245 4057  2  5  2 90
 0  6      0  44220      0 7681560    0    0 34816 27188 1088 3746  2  6  6 85
 0  9      0  44824      0 7680132    0    0 47616 19960 1891 6827  3  6  9 81
 0  2      0  45324      0 7678940    0    0  3200 81459  615  855  2  2 42 54
 0  6      0  40888      0 7683144    0    0 24080 43565 1102 3157  3  4 28 65
 0  9      0  46280      0 7678620    0    0 51460  7176 1738 6587  3  6  3 88
 1  1      0  44716      0 7681788    0    0 31360 39488 1113 3891  3  5 14 78
 0  9      0  45096      0 7680228    0    0 52864 21536 1640 5803  3  7  6 84
 0  9      0  45744      0 7680164    0    0 44032 31268 1332 4769  4  6  2 88
 0 10      0  45880      0 7669856    0    0 39336 49300 1433 4746  1  5  0 94
^C

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-05-19:

#516

Bug 12309 - Large I/O operations result in slow performance and high iowait times

Where the low iowait
Where the small I/O operations result
Where Status: RESOLVED INSUFFICIENT_DATA

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-05-19:

#517

http://bugzilla.kernel.org/show_bug.cgi?id=13347

Revision history for this message

In Linux Kernel Bug Tracker #12309, rafal (rafal-linux-kernel-bugs) wrote on 2009-05-20:

#518

There is ongoing discussion about similar issue:
http://lkml.org/lkml/2009/5/15/320
and
http://lkml.org/lkml/2009/5/16/23

Revision history for this message

In Linux Kernel Bug Tracker #12309, perlover (perlover-linux-kernel-bugs) wrote on 2009-05-24:

#519

Download full text (3.9 KiB)

Confirm bug

My OS is Fedora Core release 6
Kernel: 2.6.22.14-72.fc6
2 CPUs: Intel® Xeon® CPU 5130 @ 2.00GHz
HDDs: SAS 3.0 Gb/s, FUJITSU
RAID: Adaptec 4800SAS
RAID10

How to test:
# dd if=/dev/zero of=testfile.1gb bs=1M count=1000

In other terminal during a copying you should run:
# vmstat 1

I see for example:
r b swpd free buff cache si so bi bo in cs us sy id wa st
14 8 460 120716 280236 1509844 0 0 9 14 0 0 9 3 66 22
0
0 13 468 121936 279216 1550936 0 0 1368 47776 1927 4153 24 8 8 60
0
0 15 468 121516 280200 1551200 0 0 1408 3744 1726 2846 1 2 3 94
0
0 8 468 129804 280520 1545940 0 0 1612 4280 1854 4060 3 2 1 95
0
0 6 468 131388 281868 1546628 0 0 2140 3620 2020 4650 12 3 13 71
0
0 17 468 114220 282792 1571864 0 0 1208 3212 1647 2715 4 3 6 87
0
1 12 468 115356 283164 1570704 0 0 1420 18964 1718 2397 2 2 2 94
0
0 9 468 114320 283628 1570868 0 0 768 1204 1753 2831 3 1 0 96

iowait -> 80-90% during 'dd'
All other CPU's task work very very slow ...

AND (!!!), the output of 'dd' is:
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 112.086 seconds, 9.4 MB/s
^^^^^^^^^

During some years i see a following behaviour: if server often uses a harddisk (for testing: 'dd' examples here) then iowait is stability 50-90% and many tasks are frozen during some seconds (10-20 and may be more at me). It's easy for testing through 'dd'. I cannot resolve this trouble by ionice for example - iowait is high even if i do a some i/o tasks ionice -c3 or ionice -c2 -n7 for example! So each server under kernel 2.6.18 and more (i read many topics) has this bug. A people in forums write that the kernel of 2.6.30-rc2 has bug too and that FreeBSD work quickly (mouse moving, video showing and some other CPU's tasks) during 'dd' testing unlike Linux ...

I don't know what arguments do you want for finding this bug! This bug to be since 2007 year ...

Please help! Here examples of my loaded server in some times (not DD - there only typical Mysql database & Mysql tasks & apache tasks):

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
-
r b swpd free buff cache si so bi bo in cs us sy id wa st
13 14 120 68460 574784 1286748 0 0 13 1 0 0 9 3 66 22
0
1 11 120 74564 576080 1286976 0 0 1560 0 1632 3641 34 10 0 57
0
0 12 120 69988 577572 1287352 0 0 1904 0 1969 3696 5 2 0 93
0
0 11 120 66916 578984 1287860 0 0 1900 0 1809 3615 6 2 0 92
0
0 11 120 64960 580424 1288028 0 0 1668 0 1642 2188 1 1 0 97
0
0 11 120 72764 576508 1286788 0 0 1668 0 1681 2198 3 2 0 96
0
1 11 120 71424 577940 1287300 0 0 1604 332 1575 2152 2 1 0 97
0
3 11 120 58852 579528 1289100 0 0 2000 0 1984 3286 44 7 0 49
0
1 11 120 75104 581012 1287472 0 0 1608 0 2119 2839 39 7 0 55
0
0 13 120 72160 582572 1287672 0 0 1908 120 1645 2366 7 1 0 92
0

[root@63 logs]# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
-
r b swpd free buff cache si so bi ...

Confirm bug

My OS is Fedora Core release 6
Kernel: 2.6.22.14-72.fc6
2 CPUs: Intel® Xeon® CPU 5130 @ 2.00GHz
HDDs: SAS 3.0 Gb/s, FUJITSU
RAID: Adaptec 4800SAS
RAID10

How to test:
# dd if=/dev/zero of=testfile.1gb bs=1M count=1000

In other terminal during a copying you should run:
# vmstat 1

I see for example:
r b swpd free buff cache si so bi bo in cs us sy id wa st
14 8 460 120716 280236 1509844 0 0 9 14 0 0 9 3 66 22
0
0 13 468 121936 279216 1550936 0 0 1368 47776 1927 4153 24 8 8 60
0
0 15 468 121516 280200 1551200 0 0 1408 3744 1726 2846 1 2 3 94
0
0 8 468 129804 280520 1545940 0 0 1612 4280 1854 4060 3 2 1 95
0
0 6 468 131388 281868 1546628 0 0 2140 3620 2020 4650 12 3 13 71
0
0 17 468 114220 282792 1571864 0 0 1208 3212 1647 2715 4 3 6 87
0
1 12 468 115356 283164 1570704 0 0 1420 18964 1718 2397 2 2 2 94
0
0 9 468 114320 283628 1570868 0 0 768 1204 1753 2831 3 1 0 96

iowait -> 80-90% during 'dd'
All other CPU's task work very very slow ...

AND (!!!), the output of 'dd' is:
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 112.086 seconds, 9.4 MB/s
                                                   ^^^^^^^^^

During some years i see a following behaviour: if server often uses a harddisk (for testing: 'dd' examples here) then iowait is stability 50-90% and many tasks are frozen during some seconds (10-20 and may be more at me). It's easy for testing through 'dd'. I cannot resolve this trouble by ionice for example - iowait is high even if i do a some i/o tasks ionice -c3 or ionice -c2 -n7 for example! So each server under kernel 2.6.18 and more (i read many topics) has this bug. A people in forums write that the kernel of 2.6.30-rc2 has bug too and that FreeBSD work quickly (mouse moving, video showing and some other CPU's tasks) during 'dd' testing unlike Linux ...

I don't know what arguments do you want for finding this bug! This bug to be since 2007 year ...

Please help! Here examples of my loaded server in some times (not DD - there only typical Mysql database & Mysql tasks & apache tasks):

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
-
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
13 14    120  68460 574784 1286748    0    0    13     1    0    0  9  3 66 22
0
 1 11    120  74564 576080 1286976    0    0  1560     0 1632 3641 34 10  0 57
0
 0 12    120  69988 577572 1287352    0    0  1904     0 1969 3696  5  2  0 93
0
 0 11    120  66916 578984 1287860    0    0  1900     0 1809 3615  6  2  0 92
0
 0 11    120  64960 580424 1288028    0    0  1668     0 1642 2188  1  1  0 97
0
 0 11    120  72764 576508 1286788    0    0  1668     0 1681 2198  3  2  0 96
0
 1 11    120  71424 577940 1287300    0    0  1604   332 1575 2152  2  1  0 97
0
 3 11    120  58852 579528 1289100    0    0  2000     0 1984 3286 44  7  0 49
0
 1 11    120  75104 581012 1287472    0    0  1608     0 2119 2839 39  7  0 55
0
 0 13    120  72160 582572 1287672    0    0  1908   120 1645 2366  7  1  0 92
0

[root@63 logs]# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
-
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 5  9    120  95540 570248 1276840    0    0    13     1    0    0  9  3 66 22
0
 1  7    120  93996 571428 1277440    0    0  1772 33712 2024 4341 28  4 11 57
0
 0  7    120  97980 572528 1277884    0    0  1444   300 1568 2339 13  1 17 70
0
 0  7    120  99900 573532 1278468    0    0  1504     0 1513 2364  4  2  3 90
0
 1  5    120  98656 574484 1278540    0    0  1052   400 1629 1924  2  1  0 97
0
 1  3    120  97924 574932 1278916    0    0   480 21108 2276 1987 11  2 47 40
0
 1  4    120  87280 575264 1279040    0    0   432  3676 2456 2654 23  2 40 35
0
 1  5    120  95856 575668 1279140    0    0   780  4128 2249 3097 26  2 25 47
0

Here you can see stability high 'wa' field. When my tasks frozen during 10-20 seconds i see there 80-90% 'wa'.

Please can catch this bug!

Thanks

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-06-05:

#520

Download full text (3.5 KiB)

Created attachment 21774
Test patch against heavy io bug

I have made an bisection and got these two patches. Reverting these patches improves the desktop responsiveness on my notebook enormous. I have tested it on a 2.6.28 non smp kernel (my heavy io testing installation) during four concurrent read and write operations, while working with two VMs. It's only a Core2 @2.4GHz system. I can even start new application during heavy io.

I have added the patch, which I have applied to my test installation. Use it with care, as I am not a kernel developer and does not know the dependencies in the cfq scheduler.

I have reverted theses two patches:

07db59bd6b0f279c31044cba6787344f63be87ea is first bad commit
commit 07db59bd6b0f279c31044cba6787344f63be87ea
Author: Linus Torvalds <email address hidden>
Date: Fri Apr 27 09:10:47 2007 -0700

Change default dirty-writeback limits

    Do this really early in the 2.6.22-rc series, so that we'll get
    feedback. And don't change by half measures. Just cut the default
    dirty limit to a quarter of what it was, and see if anybody even
    notices.

Signed-off-by: Linus Torvalds <email address hidden>

:040000 040000 b63eb9faf5b9a42a1cdad901a5f18d6cceb7fdf6 2b8b4117ca34077cb0b817c77595aa6c9e34253a M mm

a993800655ee516b6f6a6fc4c2ee13fedfb0590b is first bad commit
commit a993800655ee516b6f6a6fc4c2ee13fedfb0590b
Author: Jens Axboe <email address hidden>
Date: Fri Apr 20 08:55:52 2007 +0200

cfq-iosched: fix sequential write regression

    We have a 10-15% performance regression for sequential writes on TCQ/NCQ
    enabled drives in 2.6.21-rcX after the CFQ update went in. It has been
    reported by Valerie Clement <email address hidden> and the Intel
    testing folks. The regression is because of CFQ's now more aggressive
    queue control, limiting the depth available to the device.

    This patches fixes that regression by allowing a greater depth when only
    one queue is busy. It has been tested to not impact sync-vs-async
    workloads too much - we still do a lot better than 2.6.20.

Signed-off-by: Jens Axboe <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>

:040000 040000 07c48a6930ce62d36540b6650e3ea0563bd7ec59 95fc11105fe3339c90c4e7bebb66a820f7084601 M block

Here the fsync result on my machine:

**************************************************************************
Without patch
Linux balrog 2.6.28 #2 Mon Mar 23 11:19:13 CET 2009 x86_64 GNU/Linux

fsync time: 7.8282
fsync time: 17.3598
fsync time: 24.0352
fsync time: 19.7307
fsync time: 21.9559
fsync time: 21.0571
5000+0 Datensätze ein
5000+0 Datensätze aus
5242880000 Bytes (5,2 GB) kopiert, 129,286 s, 40,6 MB/s
fsync time: 21.8491
fsync time: 0.0430
fsync time: 0.0448
fsync time: 0.0451
fsync time: 0.0451
fsync time: 0.0451
fsync time: 0.0452

**************************************************************************
With patch
Linux balrog 2.6.28 #5 Fri Jun 5 22:23:54 CEST 2009 x86_64 GNU/Linux

fsync time: 2.8409
fsync time: 2.3345
fsync time: 2.8423
fsync time: 0.0851
fsync time: 1.2497
fsync time: 0.9981
fsync time...

Created attachment 21774
Test patch against heavy io bug

I have made an bisection and got these two patches. Reverting these patches improves the desktop responsiveness on my notebook enormous. I have tested it on a 2.6.28 non smp kernel (my heavy io testing installation) during four concurrent read and write operations, while working with two VMs. It's only a Core2 @2.4GHz system. I can even start new application during heavy io.

I have added the patch, which I have applied to my test installation. Use it with care, as I am not a kernel developer and does not know the dependencies in the cfq scheduler.

I have reverted theses two patches:

07db59bd6b0f279c31044cba6787344f63be87ea is first bad commit
commit 07db59bd6b0f279c31044cba6787344f63be87ea
Author: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date:   Fri Apr 27 09:10:47 2007 -0700

Change default dirty-writeback limits

Do this really early in the 2.6.22-rc series, so that we'll get
    feedback.  And don't change by half measures.  Just cut the default
    dirty limit to a quarter of what it was, and see if anybody even
    notices.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

:040000 040000 b63eb9faf5b9a42a1cdad901a5f18d6cceb7fdf6 2b8b4117ca34077cb0b817c77595aa6c9e34253a M      mm

a993800655ee516b6f6a6fc4c2ee13fedfb0590b is first bad commit
commit a993800655ee516b6f6a6fc4c2ee13fedfb0590b
Author: Jens Axboe <jens.axboe@oracle.com>
Date:   Fri Apr 20 08:55:52 2007 +0200

cfq-iosched: fix sequential write regression
    
    We have a 10-15% performance regression for sequential writes on TCQ/NCQ
    enabled drives in 2.6.21-rcX after the CFQ update went in.  It has been
    reported by Valerie Clement <valerie.clement@bull.net> and the Intel
    testing folks.  The regression is because of CFQ's now more aggressive
    queue control, limiting the depth available to the device.
    
    This patches fixes that regression by allowing a greater depth when only
    one queue is busy.  It has been tested to not impact sync-vs-async
    workloads too much - we still do a lot better than 2.6.20.
    
    Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

:040000 040000 07c48a6930ce62d36540b6650e3ea0563bd7ec59 95fc11105fe3339c90c4e7bebb66a820f7084601 M      block

Here the fsync result on my machine:

**************************************************************************
Without patch
Linux balrog 2.6.28 #2 Mon Mar 23 11:19:13 CET 2009 x86_64 GNU/Linux

fsync time: 7.8282
fsync time: 17.3598
fsync time: 24.0352
fsync time: 19.7307
fsync time: 21.9559
fsync time: 21.0571
5000+0 Datensätze ein
5000+0 Datensätze aus
5242880000 Bytes (5,2 GB) kopiert, 129,286 s, 40,6 MB/s
fsync time: 21.8491
fsync time: 0.0430
fsync time: 0.0448
fsync time: 0.0451
fsync time: 0.0451
fsync time: 0.0451
fsync time: 0.0452

**************************************************************************
With patch
Linux balrog 2.6.28 #5 Fri Jun 5 22:23:54 CEST 2009 x86_64 GNU/Linux

fsync time: 2.8409
fsync time: 2.3345
fsync time: 2.8423
fsync time: 0.0851
fsync time: 1.2497
fsync time: 0.9981
fsync time: 0.9494
fsync time: 2.7094
fsync time: 2.9753
fsync time: 2.8886
fsync time: 2.9894
fsync time: 1.2673
fsync time: 2.6728
fsync time: 1.3408
5000+0 Datensätze ein
5000+0 Datensätze aus
5242880000 Bytes (5,2 GB) kopiert, 117,388 s, 44,7 MB/s
fsync time: 85.1461
fsync time: 23.5310
fsync time: 0.0317
fsync time: 0.0337
fsync time: 0.0338
fsync time: 0.0338

Revision history for this message

In Linux Kernel Bug Tracker #12309, nalimilan (nalimilan-linux-kernel-bugs) wrote on 2009-06-08:

#521

Fantastic! Have you bisected the whole kernel tree between 2.17 and 2.20? Really great I've found those patches.

The first one doesn't seem to be very important to me, and in 2.6.30 some of its changes have been reverted. But the second one changes dramatically my system's responsiveness. I'm now running it reverted, and there's no possible comparison with the old behavior: now my pointer no longer freezes when performing updates, and almost everything is smooth!

For those that would like to try the patch in 2.6.30, I've updated it as I could, and I'm attaching it. It's quite dirty and I was doubtful it would work, but it looks like that's enough.

Would a kernel dev look at the patches Thomas identified and tell us what he thinks?

Revision history for this message

In Linux Kernel Bug Tracker #12309, nalimilan (nalimilan-linux-kernel-bugs) wrote on 2009-06-08:

#522

Created attachment 21816
Patch to revert second commit, updated to apply against 2.6.30rc8

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2009-06-08:

#523

(In reply to comment #360)
Thank you very much for you work. I can't imagine how long that bisection must have taken and it is very exciting to have finally found a potential culprit. It would be best for everyone if you opened a new bug report with this information. Developers would be far more likely to look at it if we had a clean slate on which to start.

Revision history for this message

In Linux Kernel Bug Tracker #12309, james (james-linux-kernel-bugs) wrote on 2009-06-08:

#524

Are there patches for 2.6.29 available that I can test?

Revision history for this message

In Linux Kernel Bug Tracker #12309, bob+kernel (bob+kernel-linux-kernel-bugs) wrote on 2009-06-08:

#525

Isn't the second patch just adjusting things which can be adjusted in proc?

echo 10 > /proc/sys/vm/dirty_background_ratio
echo 40 > /proc/sys/vm/dirty_ratio

Someone want to do some tests after adjusting those two?

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-06-09:

#526

Created attachment 21822
Backport of the reverted CFQ commit

This is a proper backport of the commit that was indentified by Thomas to be the problematic one.

Thomas, can you please verify that this makes 2.6.30-rc8 behave better? And if it does, it would be interesting to narrow it down to one single change. The first always makes sure that we drain the queue before servicing a queue that has idling enabled, and the second is just a tweak for idle/async immediate expiration. I think the first one is likely the interesting bit, but it would be good to have confirmation on that.

And Thomas, thanks for all your work on this!

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2009-06-09:

#527

(In reply to comment #365)
> Isn't the second patch just adjusting things which can be adjusted in proc?
>
> echo 10 > /proc/sys/vm/dirty_background_ratio
> echo 40 > /proc/sys/vm/dirty_ratio
>
> Someone want to do some tests after adjusting those two?

We already determined months ago that tuning those knobs way down was a way to minimize the problem. (See comment #263 and comment #292 for test results.) It's not a solution, though; it just skirts around the real issue.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-06-09:

#528

(In reply to comment #366)
> I think the first one is likely the interesting bit, but it would
> be good to have confirmation on that.

Yes it is the first one. I could only execute my long lasting test, which shows only a bad kernel and does not confirm a good kernel one, but I have executed it a long time and there weren't any long times of the lame encoding.

It took 40s without any i/o on all kernels and 48-55s with the following lines during heavy i/o.

+ if (cfqd->rq_in_driver && cfq_cfqq_idle_window(cfqq))
+ return 0;

It took 55-80s without any patch or with the second patch during heavy i/o.

This may be related too. While enabling the second core. The lame encoding process was shifted between the cores without the first patch and it takes up to 130s seconds. I could see it, as the maximum clocks frequency was switched between the cores.

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-06-09:

#529

This question has probably been answered before, but this bug is huge so I'll just ask again... Thomas, what kind of drive are you using? Does it have NCQ enabled? If so, does disabling NCQ make any difference?

You can disable NCQ on sda by doing:

# echo 1 > /sys/block/sda/device/queue_depth

(or use sdX for others, naturally).

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-06-09:

#530

The last tests, I have done on a sata drives with queue depth 31. By reducing the queue depth the overall throughput of the two/four concurrent copy operations is nearby halved with and without patch. I have tried to run some tests, but got some really strange results. I will try it again on my test installation at home.

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-06-09:

#531

(In reply to comment #366)

cd /usr/src/linux-2.6.30-rc8+
suse:/usr/src/linux-2.6.30-rc8+ # patch -p1 < cfq.dif (#360)
patching file block/cfq-iosched.c
Hunk #1 FAILED at 1073.
Hunk #2 FAILED at 1119.
Hunk #3 FAILED at 1129.
3 out of 3 hunks FAILED -- saving rejects to file block/cfq-iosched.c.rej
patching file mm/page-writeback.c
Reversed (or previously applied) patch detected! Assume -R? [n] y
Hunk #1 succeeded at 66 with fuzz 1.
Hunk #2 FAILED at 77.
1 out of 2 hunks FAILED -- saving rejects to file mm/page-writeback.c.rej

suse:/usr/src/linux-2.6.30-rc8+ # patch -p1 < cfq.dif (#360 + #366)
patching file block/cfq-iosched.c
Hunk #3 FAILED at 1119.
Hunk #4 FAILED at 1129.
2 out of 4 hunks FAILED -- saving rejects to file block/cfq-iosched.c.rej
patching file mm/page-writeback.c
Reversed (or previously applied) patch detected! Assume -R? [n]
Apply anyway? [n] y
Hunk #1 FAILED at 66.
Hunk #2 FAILED at 77.
2 out of 2 hunks FAILED -- saving rejects to file mm/page-writeback.c.rej

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-06-09:

#532

(In reply to comment #371)
You should only try the patch in comment #366

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-06-09:

#533

Ok, 2.6.30-rc8 + patch in comment #366, xfs

dd if=/dev/zero of=./bigfile bs=1M count=15000 & ./fsync-tester
fsync time: 1.7085
fsync time: 1.6639
fsync time: 0.4616
fsync time: 1.3800
fsync time: 1.3603
fsync time: 1.5529
fsync time: 1.8435
fsync time: 0.2561
fsync time: 0.9318
fsync time: 0.1965
fsync time: 1.2233
fsync time: 1.3920
fsync time: 0.4677
fsync time: 0.4560
fsync time: 1.8206
fsync time: 1.8135
fsync time: 1.8342
fsync time: 0.8565
fsync time: 0.9477
fsync time: 2.8569
fsync time: 0.4323
15000+0 записей считано
15000+0 записей написано
скопировано 15728640000 байт (16 GB), 181,923 c, 86,5 MB/c
fsync time: 1.3716
fsync time: 0.0168
fsync time: 1.5381
fsync time: 1.5649
fsync time: 0.0349
fsync time: 0.0636
fsync time: 0.0657
fsync time: 0.3337
fsync time: 0.0393

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 2 0 4230432 808 3417716 0 0 0 87568 1102 1850 1 7 13 79
0 4 0 4149632 808 3499392 0 0 0 83960 722 1037 1 5 36 57
0 4 0 4069892 808 3578140 0 0 0 76840 701 1178 1 5 0 93
1 3 0 3988784 808 3659444 0 0 0 78848 727 1151 1 5 14 79
0 4 0 3889380 808 3757188 0 0 0 97310 804 1200 2 6 33 59
0 3 0 3807540 808 3838720 0 0 0 79888 614 1010 2 5 19 74
0 4 0 3729056 808 3918092 0 0 0 76866 840 1367 0 5 29 65
0 3 0 3002860 808 4645932 0 0 0 90672 597 817 2 6 0 93
0 4 0 2921840 808 4728132 0 0 0 80416 865 1377 1 6 0 93
0 3 0 2841564 808 4810132 0 0 0 80384 627 933 1 5 0 93
1 4 0 2743820 808 4906136 0 0 0 94216 892 1398 1 7 0 92
0 3 0 2666100 808 4984280 0 0 0 77824 770 1217 1 5 0 93
1 2 0 2590248 808 5063188 0 0 0 82496 795 1283 2 6 0 92

In the moment of copying of /usr/src/linux-2.6.30-rc8 -> /usr/src/linux-2.6.30-rc8+ (in Konsole, without the use of dolphin, ... and other GUI)
to cause
:~> kdesu /usr/bin/kwrite
it is impossible, after completion of copying - it is impossible, it is needed only to overload an user or computer

Speed of copying of /usr/src/linux-2.6.30-rc8 -> /usr/src/linux-2.6.30-rc8+ as was near to the zero remained so.

time cp -r /usr/src/linux-2.6.30-rc8 /usr/src/linux-2.6.30-rc8+
real 6m14.566s
user 0m0.158s
sys 0m2.838s

Ok, 2.6.30-rc8 + patch in comment #366, xfs

dd if=/dev/zero of=./bigfile bs=1M count=15000 & ./fsync-tester
fsync time: 1.7085
fsync time: 1.6639
fsync time: 0.4616
fsync time: 1.3800
fsync time: 1.3603
fsync time: 1.5529
fsync time: 1.8435
fsync time: 0.2561
fsync time: 0.9318
fsync time: 0.1965
fsync time: 1.2233
fsync time: 1.3920
fsync time: 0.4677
fsync time: 0.4560
fsync time: 1.8206
fsync time: 1.8135
fsync time: 1.8342
fsync time: 0.8565
fsync time: 0.9477
fsync time: 2.8569
fsync time: 0.4323
15000+0 записей считано
15000+0 записей написано
 скопировано 15728640000 байт (16 GB), 181,923 c, 86,5 MB/c
fsync time: 1.3716
fsync time: 0.0168
fsync time: 1.5381
fsync time: 1.5649
fsync time: 0.0349
fsync time: 0.0636
fsync time: 0.0657
fsync time: 0.3337
fsync time: 0.0393

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  2      0 4230432    808 3417716    0    0     0 87568 1102 1850  1  7 13 79
 0  4      0 4149632    808 3499392    0    0     0 83960  722 1037  1  5 36 57
 0  4      0 4069892    808 3578140    0    0     0 76840  701 1178  1  5  0 93
 1  3      0 3988784    808 3659444    0    0     0 78848  727 1151  1  5 14 79
 0  4      0 3889380    808 3757188    0    0     0 97310  804 1200  2  6 33 59
 0  3      0 3807540    808 3838720    0    0     0 79888  614 1010  2  5 19 74
 0  4      0 3729056    808 3918092    0    0     0 76866  840 1367  0  5 29 65
0  3      0 3002860    808 4645932    0    0     0 90672  597  817  2  6  0 93
 0  4      0 2921840    808 4728132    0    0     0 80416  865 1377  1  6  0 93
 0  3      0 2841564    808 4810132    0    0     0 80384  627  933  1  5  0 93
 1  4      0 2743820    808 4906136    0    0     0 94216  892 1398  1  7  0 92
 0  3      0 2666100    808 4984280    0    0     0 77824  770 1217  1  5  0 93
 1  2      0 2590248    808 5063188    0    0     0 82496  795 1283  2  6  0 92

In the moment of copying of /usr/src/linux-2.6.30-rc8 -> /usr/src/linux-2.6.30-rc8+ (in Konsole, without the use of dolphin, ... and other GUI)
to cause
:~> kdesu /usr/bin/kwrite
it is impossible, after completion of copying - it is impossible, it is needed only to overload an user or computer

Speed of copying of /usr/src/linux-2.6.30-rc8 -> /usr/src/linux-2.6.30-rc8+ as was near to the zero remained so.

time cp -r /usr/src/linux-2.6.30-rc8 /usr/src/linux-2.6.30-rc8+
real    6m14.566s
user    0m0.158s
sys     0m2.838s

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-06-09:

#534

Brought in misinformation. Sometimes after completion of copying of kdesu /usr/bin/kwrite executed successfully, but in the moment of copying never.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-06-10:

#535

(In reply to comment #369)
> This question has probably been answered before, but this bug is huge so I'll
> just ask again... Thomas, what kind of drive are you using? Does it have NCQ
> enabled? If so, does disabling NCQ make any difference?

This bug is really annoying. I was not able to reproduce the mouse freezes any more, with and without patch and with and without NCQ. I will try later again.

Is there a possibility to simulate a disc in ram with a parametrized speed and latency?

Revision history for this message

In Linux Kernel Bug Tracker #12309, perlover (perlover-linux-kernel-bugs) wrote on 2009-06-11:

#536

Created attachment 21849
The corrected patch from #360 post (for 2.6.29 and may be more kernels)

I tried to patch from post #360 to kernel 2.6.29 and found some rejects
I made rejects by hands and put here normal variant
I saw the patch from #366, but i think there are not same correctoins as #360
So i would like to suggest to test this patch (only cfq-iosched.c file)

Revision history for this message

In Linux Kernel Bug Tracker #12309, perlover (perlover-linux-kernel-bugs) wrote on 2009-06-11:

#537

And i saw the patch from post #366, i didn't understand why the author tell that this is "proper backport". There no code with 'prev_cfqq' variable. I think that patch from #366 may be not valid patch.

Please try this patch. This patch for 2.6.29 and more kernels as i think
I didn't test it because i don't have a test machine for experiments. I have only Linux server under a heavy load...

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-06-11:

#538

It IS the proper backport. I'm the author and maintainer of CFQ, I should know... I would generally advise against using patches from people who don't know what they are doing, especially for data integrity important code like the IO scheduler. There could be data loss from bad patches.

The reason the 2.6.30 and 2.6.29 patches are different is that the CFQ request dispatch mechanism is different in 2.6.30. As such there's no prev_cfqq to take into account, since we never dispatch from more than one cfqq in one round. You would need to take the prev_cfqq out of local function scope for it to have any meaning.

So, not to be rude, but the last thing this bug needs are more cooks or chefs asking people to test things. It's a huge mess already. For now the focus is making Thomas happy, since he's spent much time on this and has a reproducible (sort of) way of testing it. Once that is done, we can proceed to any other potential issues. Any comments not related to that exact issue will be ignored.

Revision history for this message

In Linux Kernel Bug Tracker #12309, hassium (hassium-linux-kernel-bugs) wrote on 2009-06-11:

#539

Created attachment 21852
test results

Two hard drive SAMSUNG HD753LJ + NCQ + mdadm raid1 + ext3 + 2GB RAM + Core2Duo E6750 2.66 @ 3.44 GHz

Revision history for this message

In Linux Kernel Bug Tracker #12309, hassium (hassium-linux-kernel-bugs) wrote on 2009-06-11:

#540

Download full text (45.0 KiB)

Comment on attachment 21852
test results

>==============================2.6.30==============================
>ff@home-desktop:~$ dd if=/dev/zero of=./bigfile bs=1M count=15000 &
>./fsync-tester
>[1] 6958
>fsync time: 0.1025
>fsync time: 0.8720
>fsync time: 5.5800
>fsync time: 5.6179
>fsync time: 3.7413
>fsync time: 4.2393
>fsync time: 5.2596
>fsync time: 0.0985
>fsync time: 1.7070
>fsync time: 4.1414
>fsync time: 0.1577
>fsync time: 4.8191
>fsync time: 0.6993
>fsync time: 3.6732
>fsync time: 3.6963
>fsync time: 4.7696
>fsync time: 6.0947
>fsync time: 3.4383
>fsync time: 0.7583
>fsync time: 4.0760
>fsync time: 4.1786
>fsync time: 3.9886
>fsync time: 0.3802
>fsync time: 3.4182
>fsync time: 1.1262
>fsync time: 2.8425
>fsync time: 3.9217
>fsync time: 1.4758
>fsync time: 3.7798
>fsync time: 3.9234
>fsync time: 0.3557
>fsync time: 4.1882
>fsync time: 4.4526
>15000+0 records in
>15000+0 records out
>15728640000 bytes (16 GB) copied, 231.473 s, 68.0 MB/s
>fsync time: 2.1747
>fsync time: 0.0820
>fsync time: 0.0774
>fsync time: 0.0299
>fsync time: 0.0268
>fsync time: 0.0282
>fsync time: 0.0277
>fsync time: 0.0270
>^C
>[1]+ Done dd if=/dev/zero of=./bigfile bs=1M count=15000
>
>ff@home-desktop:~$ vmstat 1
>procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 1 0 214308 1592368 6344 66260 1 5 253 653 611 592 15 5 73 7
> 0 0 214308 1592400 6344 66264 0 0 0 0 309 525 7 3 90 0
> 2 0 214308 1592448 6344 66264 0 0 0 0 365 686 5 3 91 0
> 0 0 214308 1592400 6344 66264 0 0 0 0 291 543 5 3 92 0
> 0 2 214308 1126216 6756 464876 0 0 24 398976 980 1265 7 36 37
> 20
> 0 4 214308 1107468 6780 489032 0 0 0 20524 671 551 9 5 35 51
> 0 6 214308 1118544 6780 489032 0 0 0 4 658 575 7 3 32 58
> 0 5 214308 1129752 6784 489032 0 0 0 4 646 578 6 5 36 53
> 0 4 214308 1142036 6784 489032 0 0 0 8 656 576 6 4 36 54
> 2 3 214308 1151708 6784 489032 0 0 0 0 590 501 8 3 16 72
> 0 1 214308 1156616 6792 491124 0 0 0 1572 587 485 7 3 29 60
> 0 2 214308 704504 7188 876836 0 0 0 392152 885 716 8 38 21 32
> 0 4 214308 637132 7252 942604 0 0 0 65728 666 494 7 10 0 83
> 0 4 214308 561368 7324 1016556 0 0 0 73984 686 499 7 12 0 81
> 0 4 214308 490020 7392 1086476 0 0 0...

Comment on attachment 21852
test results

>==============================2.6.30==============================
>ff@home-desktop:~$ dd if=/dev/zero of=./bigfile bs=1M count=15000 &
>./fsync-tester                                                                 
>[1] 6958                                                                       
>fsync time: 0.1025                                                             
>fsync time: 0.8720                                                             
>fsync time: 5.5800                                                             
>fsync time: 5.6179                                                             
>fsync time: 3.7413                                                             
>fsync time: 4.2393                                                             
>fsync time: 5.2596                                                             
>fsync time: 0.0985                                                             
>fsync time: 1.7070
>fsync time: 4.1414
>fsync time: 0.1577
>fsync time: 4.8191
>fsync time: 0.6993
>fsync time: 3.6732
>fsync time: 3.6963
>fsync time: 4.7696
>fsync time: 6.0947
>fsync time: 3.4383
>fsync time: 0.7583
>fsync time: 4.0760
>fsync time: 4.1786
>fsync time: 3.9886
>fsync time: 0.3802
>fsync time: 3.4182
>fsync time: 1.1262
>fsync time: 2.8425
>fsync time: 3.9217
>fsync time: 1.4758
>fsync time: 3.7798
>fsync time: 3.9234
>fsync time: 0.3557
>fsync time: 4.1882
>fsync time: 4.4526
>15000+0 records in
>15000+0 records out
>15728640000 bytes (16 GB) copied, 231.473 s, 68.0 MB/s
>fsync time: 2.1747
>fsync time: 0.0820
>fsync time: 0.0774
>fsync time: 0.0299
>fsync time: 0.0268
>fsync time: 0.0282
>fsync time: 0.0277
>fsync time: 0.0270
>^C
>[1]+  Done                    dd if=/dev/zero of=./bigfile bs=1M count=15000
>
>ff@home-desktop:~$ vmstat 1
>procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
> 1  0 214308 1592368   6344  66260    1    5   253   653  611  592 15  5 73  7
> 0  0 214308 1592400   6344  66264    0    0     0     0  309  525  7  3 90  0
> 2  0 214308 1592448   6344  66264    0    0     0     0  365  686  5  3 91  0
> 0  0 214308 1592400   6344  66264    0    0     0     0  291  543  5  3 92  0
> 0  2 214308 1126216   6756 464876    0    0    24 398976  980 1265  7 36 37
> 20
> 0  4 214308 1107468   6780 489032    0    0     0 20524  671  551  9  5 35 51 
> 0  6 214308 1118544   6780 489032    0    0     0     4  658  575  7  3 32 58 
> 0  5 214308 1129752   6784 489032    0    0     0     4  646  578  6  5 36 53 
> 0  4 214308 1142036   6784 489032    0    0     0     8  656  576  6  4 36 54 
> 2  3 214308 1151708   6784 489032    0    0     0     0  590  501  8  3 16 72 
> 0  1 214308 1156616   6792 491124    0    0     0  1572  587  485  7  3 29 60 
> 0  2 214308 704504   7188 876836    0    0     0 392152  885  716  8 38 21 32 
> 0  4 214308 637132   7252 942604    0    0     0 65728  666  494  7 10  0 83  
> 0  4 214308 561368   7324 1016556    0    0     0 73984  686  499  7 12  0 81 
> 0  4 214308 490020   7392 1086476    0    0     0 69920  693  537  7 10  0 83 
> 0  4 214308 418224   7460 1156364    0    0     0 69888  686  490  9  9  0 82 
> 0  3 214308 398752   7496 1177372    0    0     4 22316  781  500  6  7  7 80 
> 0  4 214308 406700   7496 1177404    0    0    28     0  532  510  8  3 10 79 
> 0  5 214308 416212   7516 1177528    0    0   160     8  645  550  6  4 14 76 
> 0  4 214308 427788   7524 1177648    0    0   108    12  620  526  8  4  1 87 
> 0  3 214308 437528   7536 1177688    0    0    56     0  674  651  6  5  0 89 
> 1  2 214308 302288   7688 1321540    0    0     8    16  540  533  9 12 20 58 
> 1  3 214308  15268   7536 1548008    0    0    56 391944  878  707  7 28  0
> 65
> 0  5 214308  15152   7372 1548204    0    0    96 69896  699  574  8 11  0 81 
> 1  7 214308  15232   7420 1549092    0    0   220 45220  661  616  6  9  0 85 
> 7  6 214308  14752   7544 1548616    0    0   640 82216  755  747  6 13  0 81 
> 0  5 214308  15084   7620 1548232    0    0    24 65720  709  674  8 11  0 81 
> 0  4 214308  17436   7636 1549072    0    0   284  8224  889  572  6  6  0 88 
> 0  4 214308  17776   7660 1550176    0    0   100  1496  637  545  6  5  0 89 
> 0  4 214308  27604   7684 1550280    0    0   132     0  606  513  7  5  0 88 
> 0  4 214308  35192   7712 1550412    0    0   156     0  588  572  8  3  0 90 
> 0  5 214308  44060   7744 1550820    0    0   480     0  681  683  6  6  0 88 
> 0  3 214308  55500   7800 1551100    0    0   296    12  750  746  5  6 31 58 
> 2  1 214308  13580   8028 1601036    0    0  1356   120  696  881  8  9 26 56 
> 0  3 214308  15220   8316 1547984    0    0     0 381300  944 1397  7 39 18
> 37
> 0  5 214308  13560   8384 1550264    0    0     0 69944  769  536  7 10  0 82 
>procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----  
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa  
> 2  5 214308  14424   8452 1549564    0    0     0 69888  739  508  6  9  0 85 
> 1  5 214308  13952   8524 1549784    0    0     0 74016  719  527  7 12  0 81 
> 0  3 214308  14468   8564 1549144    0    0     0 69920  781  576  6 12  1 81 
> 0  3 214308  20844   8564 1549148    0    0     0     0  580  477  7  3 22 67 
> 0  5 214308  28084   8568 1549144    0    0     0     8  659  498  7  3 36 54 
> 0  3 214308  37388   8576 1549148    0    0     0  1764  605  555  8  5 12 76 
> 0  3 214308  45184   8576 1549148    0    0     0     0  525  470  6  5  0 90 
> 0  1 214308  58560   8576 1549148    0    0     0    12  811  623  8  5  5 81 
> 0  3 214308  14952   8812 1592052    0    0     0 57644  637  804  6 25 37 32 
> 0  2 214308  14012   8952 1549216    0    0     0 347944 1108  767  8 19 14
> 59
> 0  2 214308  15548   8964 1549476    0    0     0  5320  421  472  7  3  2 88 
> 0  4 214308  28940   8964 1549476    0    0     0    12  694  466  7  5 11 77 
> 0  4 214308  43720   8964 1549476    0    0     0     0  669  455  6  3 34 57 
> 0  4 214308  56184   8964 1549476    0    0     0     0  714  437  6  4 36 54 
> 0  4 214308  13704   8752 1594000    0    0     8 32772  889  834  4 28 30 38 
> 0  3 214308  14292   8760 1592532    0    0     0 72228  720 1005  8  6 36 51 
> 0  2 214308  15452   8680 1548660    0    0     0 301880  688  756  6 18 18
> 59
> 0  2 214308  24520   8680 1548660    0    0     0     0  618  443  7  4 25 64 
> 0  3 214308  39648   8680 1548660    0    0     0     4  683  482  5  5 21 68 
> 1  2 214308  52696   8684 1548660    0    0     0    12  583  602  6  5 35 53 
> 3  0 214308  13620   8344 1599184    0    0    40     4  817  598  6 14  4 76 
> 1  1 214308  14660   7476 1565564    0    0   112 276832  772  970  3 27 14
> 56
> 0  4 214308  14024   7544 1552748    0    0     4 145788  625  542  0  9  0
> 91
> 2  2 214308  13996   7596 1556652    0    0     4 28800  523  529  1  5  3 90 
> 0  5 214308  14692   7692 1552168    0    0     4 115004  702  578  0 12  2
> 86
> 0  5 214308  17748   7744 1551596    0    0     8 41132  619  522  2  5  0 93 
> 0  4 214308  15604   7808 1560576    0    0   204     0  631  537  0  6  0 94 
> 0  3 214308  26272   7808 1560576    0    0     0     0  539  486  2  1  0 97 
> 0  3 214308  38512   7816 1560576    0    0     0  1440  555  503  0  0  0
> 100
> 1  1 214308  50424   7816 1560576    0    0     0     0  465  497  2  0 27 71 
> 3  2 214308  14948   8096 1595136    0    0    12 81980  816  866  0 25 42 32 
> 0  3 214308  14936   8188 1550608    0    0     4 347056  643  579  2 15 15
> 69
> 1  3 214308  13788   8256 1552700    0    0     4 64168  599  476  0  7  0 93 
> 0  3 214308  14436   8292 1551948    0    0     0 73920  657  567  2 10 11 77 
> 1  2 214308  14452   8380 1551596    0    0     4 80780  795  529  0 10  0 90 
> 0  3 214308  15284   8400 1552484    0    0     4 16384  390  461  2  2 36 60 
>procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----  
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa  
> 0  2 214308  27336   8408 1552484    0    0     0  1688  631  507  0  1 10 88 
> 0  2 214308  39496   8408 1552484    0    0     0     0  500  442  2  1 29 68 
> 1  1 214308  53336   8408 1552484    0    0     0     0  616  503  0  1 44 55 
> 1  2 214308  14904   8456 1602564    0    0   148     4  515  588  1 10 42 46 
> 0  6 214308  14088   6968 1595120    0    0  1280 123568  793 1043  1 26  1
> 73
> 0  4 214308  15032   5900 1554300    0    0  1108 312700  695 1060  0 13  0
> 87
> 0  6 214308  19824   5904 1556772    0    0   192     0  465  459  0  1  0 99 
> 0  4 214308  14432   5640 1554152    0    0   680 125936  698  583  0 14  0
> 85
> 1  4 214308  14268   4524 1556644    0    0     4 79152  627  536  0  9  0 91 
> 0  4 214308  15472   4328 1557116    0    0    44 67104  653  512  2  9  0 89 
> 0  3 214308  14192   4356 1557264    0    0     4 34360  869  524  0  4  0 96 
> 0  3 214308  23236   4356 1557264    0    0     0     0  521  509  2  1 37 60 
> 0  4 214308  34784   4364 1557264    0    0     4    12  536  499  0  1 32 68 
> 0  4 214308  47144   4364 1557336    0    0    72     0  420  426  0  1 28 71 
> 0  1 214308  62740   4368 1557336    0    0     0    16  677  727  3  1 30 68 
> 0  1 214308  14492   4608 1604772    0    0    72 37004  608  779  8 21 41 30 
> 0  3 214308  17452   4768 1600460    0    0     4 91240  801  740  6 20  7 67 
> 0  2 214308  13752   4848 1557448    0    0     4 352920 1058  907  8 15  9
> 69
> 0  3 214308  20620   4848 1557444    0    0     0     0  618  478  6  4 30 60 
> 0  3 214308  18152   4972 1556620    0    0     4 117476  743  575  2 14  2
> 82
> 0  4 214308  13708   5060 1557112    0    0     0 100632  848  556  1 11 19
> 69
> 3  4 214308  17728   5132 1556508    0    0   592 32928  650  580  1  4  0 95 
> 0  4 214308  17744   5148 1559500    0    0   324  1388  626  543  0  2 19 79 
> 0  4 214308  27648   5160 1559788    0    0   236     0  498  504  0  1  2 97 
> 0  3 214308  41392   5160 1559920    0    0   140     0  566  592  2  1  8 89 
> 0  1 214308  55464   5160 1559920    0    0     0     0  613  659  2  1 43 53 
> 2  3 214308  13524   5464 1566192    0    0     8 286840  771  911  3 35 35
> 27
> 1  2 214308  14304   5536 1556740    0    0     4 138424  727  671  7 12 10
> 71
> 0  3 214308  15288   5604 1555804    0    0     4 78144  740  555  7 13 18 63 
> 0  3 214308  14120   5680 1554992    0    0     0 84032  622  538  2 10 19 69 
> 0  4 214308  15312   5720 1556812    0    0     4 49508  590  564  0  6 12 81 
> 0  3 214308  18076   5764 1555236    0    0     0 41132  730  548  2  6 21 71 
> 1  3 214308  29532   5768 1555236    0    0     0  1720  501  498  0  1 16 83 
> 1  4 214308  41840   5772 1555256    0    0    20     0  612  566  6  4 24 66 
> 0  1 214308  56008   5772 1555332    0    0    80    12  747  670  6  3  3 88 
> 3  0 214308  13156   5952 1607880    0    0     4   172  609  625  8 16 35 41 
>procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----  
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa  
> 1  4 214308  15264   6168 1553756    0    0     8 409604  877  936  5 30 29
> 36
> 0  3 214308  15940   6232 1554652    0    0     4 64624  604  571  2  9 13 75 
> 1  2 214308  14836   6316 1555000    0    0     0 93280  682  535  0 13  0 87 
> 2  3 214308  14840   6388 1554324    0    0     4 74644  618  515  1  8  3 87 
> 0  2 214308  15460   6452 1555048    0    0     4 57504  736  537  0  6 15 79 
> 0  3 214308  19076   6460 1556076    0    0     0  1632  672  487  7  4 39 50 
> 0  4 214308  30076   6460 1556076    0    0     0     4  630  492  5  5  0 90 
> 0  3 214308  42704   6464 1556076    0    0     0    12  569  475  3  2 19 76 
> 0  2 214308  51996   6464 1556076    0    0     0     0  520  551  0  0  4 96 
> 2  1 214308  13928   6584 1606556    0    0     4    16  532  625  1 10 38 50 
> 0  2 214308  14848   6812 1596100    0    0     8 129312  765  897  0 22 22
> 56
> 0  4 214308  14632   6880 1553992    0    0     0 308888  701  662  7 13  0
> 80
> 0  4 214308  15096   6976 1553728    0    0     4 94592  716  592  5 14  0 81 
> 0  4 214308  14700   7048 1554080    0    0     0 74048  700  570  7 11  0 82 
> 0  4 214308  13780   7116 1555188    0    0     4 71092  685  538  7 10  0 83 
> 0  3 214308  14964   7152 1554204    0    0     0 36944  918  502  7  7  0 86 
> 0  4 214308  19908   7156 1554292    0    0    88  1532  565  514  6  4  0 90 
> 0  4 214308  28984   7160 1554336    0    0    44    12  552  487  8  3  0 89 
> 0  5 214308  37492   7160 1554452    0    0   116    36  610  539  6  5 21 68 
> 0  2 214308  52188   7164 1554452    0    0     0    12  731  722  6  5  9 79 
> 1  1 214308  14644   7304 1605148    0    0     4     8  509  594  6 13 37 44 
> 0  1 214308  17204   7340 1597036    0    0     8 81928  678  707  7 15 36 42 
> 0  2 214308  15036   7276 1553748    0    0     8 357892  807  848  7 23 37
> 33
> 0  4 214308  14800   7344 1554312    0    0     4 65824  713  585  8 11  9 72 
> 1  3 214308  14808   7420 1554168    0    0     0 86340  739  575  7 13  0 80 
> 0  4 214308  14004   7492 1554560    0    0     4 74016  639  531  2  8  0 89 
> 0  3 214308  19856   7504 1554272    0    0     0 12320  596  506  0  2 29 68 
> 0  3 214308  23068   7508 1554272    0    0     0  1476  668  489  1  1 14 83 
> 0  3 214308  35408   7508 1554272    0    0     0     0  529  474  0  1  0 99 
> 0  3 214308  46364   7512 1554272    0    0     0    12  521  457  1  1 12 86 
> 0  2 214308  62656   7512 1554272    0    0     0     4  492  609  0  0 14 86 
> 0  2 214308  15412   7680 1594804    0    0    32 94884  735  897  2 32 25 41 
> 1  4 214308  14692   7744 1575932    0    0     0 194608  731 1150  0 14 34
> 52
> 0  3 214308  17144   7780 1554232    0    0     4 180920  572  517  2  7 22
> 70
> 1  2 214308  14296   7864 1553780    0    0     0 86432  587  514  0 10 46 44 
> 0  3 214308  13692   7944 1554200    0    0     4 82304  620  578  2  8  9 82 
>procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----  
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa  
> 0  4 214308  14068   8020 1553412    0    0     0 69900  746  633  0  8 22 71 
> 0  4 214308  19652   8036 1553056    0    0     4  8224  472  540  2  3  0 96 
> 0  3 214308  25872   8048 1554388    0    0     0  1456  731  634  0  1  0 98 
> 0  3 214308  34260   8048 1554388    0    0     0     0  431  509  1  1 13 84 
> 1  3 214308  46200   8048 1554388    0    0     0     0  525  539  0  0 23 76 
> 0  1 214308  58964   8048 1554388    0    0     0     0  605  600  2  1 39 58 
> 1  1 214308  14124   8136 1556832    0    0     8 333512  796  925  0 35 36
> 28
> 0  3 214308  14004   8072 1552952    0    0     4 99628  638  512  1 10  0 88 
> 0  3 214308  14708   8020 1552224    0    0     0 74048  648  501  0 10 12 78 
> 0  3 214308  14104   8100 1553228    0    0     4 75852  661  524  2 11  0 87 
> 0  3 214308  15048   8040 1552704    0    0     0 65772  775  572  0  8  0 92 
> 0  5 214308  21864   8044 1552888    0    0     4    36  441  477  2  0  0 97 
> 0  4 214308  33624   8052 1552884    0    0     0  1700  541  524  0  0  0
> 100
> 0  3 214308  45344   8052 1552892    0    0     0     8  533  574  2  1 33 64 
> 0  2 214308  57328   8052 1552892    0    0     0     0  537  561  0  1 33 67 
> 2  1 214308  14412   8188 1600840    0    0     4 28688  583  749  2 18 41 39 
> 0  2 214308  20020   8212 1595688    0    0     4 56132  754  587  0  6 21 73 
> 1  1 214308  15360   8220 1552048    0    0     4 369572  841  905  2 23  6
> 69
> 0  3 214308  21504   8224 1552504    0    0     0  5344  423  421  0  2 28 70 
> 0  3 214308  32504   8224 1552504    0    0     0     0  547  606  1  1 10 88 
> 0  3 214308  45044   8224 1552504    0    0     0     0  513  427  0  1  0 99 
> 1  2 214308  57088   8224 1552504    0    0     0     0  609  652  2  0  4 94 
> 2  0 214308  16176   8216 1602536    0    0     8    16  651  602  0 16 28 55 
> 0  2 214308  15432   8324 1594212    0    0     4 117376  746  789  2 16  7
> 76
> 0  3 214308  15028   8424 1550876    0    0     0 318200  632  621  0 18  1
> 81
> 0  3 214308  14672   8472 1553268    0    0     4 75828  630  571  2  9 17 71 
> 0  3 214308  15204   8516 1553032    0    0     4 57568  622  475  0 10 18 73 
> 1  3 214308  14388   8580 1553448    0    0     0 86368  674  567  2 12 26 60 
> 2  2 214308  20152   8580 1553432    0    0     0     0  521  413  0  0 35 65 
> 0  4 214308  21876   8588 1553432    0    0     0  1508  569  578  2  2 21 75 
> 0  2 214308  36576   8592 1553428    0    0     0     4  550  416  0  1 35 64 
> 0  1 214308  49388   8592 1553432    0    0     0     8  565  553  1  1 39 58 
> 0  1 214308  21752   8652 1596876    0    0     4    36  387  460  1  5 51 44 
> 0  2 214308  15584   8900 1593092    0    0     8 118212  838 1253  2 24 24
> 50
> 0  1 214308  15640   9016 1599208    0    0     4  4540  610  557  1 11 30 59 
> 0  3 214308  13976   9080 1551688    0    0     0 392328  813  684  2 16 16
> 67
>procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----  
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa  
> 0  3 214308  15492   9120 1551696    0    0     4 42364  511  481  3  6  7 84 
> 0  3 214308  13956   9204 1552296    0    0     4 82272  686  633  8 12  0 80 
> 0  2 214308  16768   9236 1551476    0    0     0 53472  652  483  7  9  5 79 
> 0  3 214308  27796   9236 1551476    0    0     0     0  673  583  7  4 29 60 
> 0  3 214308  37796   9240 1551476    0    0     0    12  693  458  5  5 36 53 
> 0  5 214308  44304   9252 1551476    0    0     8  1448  589  612  8  4 36 52 
> 0  1 214308  59348   9256 1551472    0    0     0     4  731  614  6  4  3 87 
> 0  3 214308  14708   9500 1595132    0    0    12 57356  696  947  8 26 23 43 
> 1  1 214308  13588   9620 1554652    0    0     4 335112  829  756  9 20 12
> 60
> 0  3 214308  14964   9660 1550636    0    0     0 69192  705  554  1  6 10 82 
> 2  2 214308  13688   9724 1554112    0    0     4 49324  571  571  0  6  7 86 
> 0  2 214308  17120   9788 1549904    0    0     0 68184  615  659  2  8 16 73 
> 0  3 214308  28960   9788 1549904    0    0     0     4  530  409  0  1 33 66 
> 1  1 214308  40744   9792 1549904    0    0     0    12  540  498  1  2 25 72 
> 0  2 214308  47964   9792 1549904    0    0     0     0  481  410  0  0 21 79 
> 1  0 214308  15176   9920 1598424    0    0     4     4  690  677  2 11 44 43 
> 0  2 214308  14808  10060 1593288    0    0     8 122856  793 1034  0 21 30
> 49
> 0  5 214308  14700  10136 1549124    0    0    28 314392  636  812  2 17 22
> 60
> 2  2 214308  14576  10192 1551960    0    0   168 70868  654  548  0  9  7 83 
> 1  5 214308  15064  10228 1550548    0    0   320 77584  605  582  1 11  5 82 
> 1  4 214308  13652  10300 1551720    0    0   144 69852  654  557  0  9  8 83 
> 1  3 214308  14824  10316 1551632    0    0   184 45248  606  587  2  6 17 75 
> 0  3 214308  17100  10332 1552116    0    0   492    12  734  511  0  1 10 89 
> 0  4 214308  24752  10336 1552140    0    0    24  1432  462  503  1  1  4 93 
> 0  5 214308  36596  10344 1552336    0    0   288    12  535  508  0  0  0
> 100
> 0  6 214308  48460  10400 1554488    0    0  2116     8  686 1024  2  1  7 89 
> 1  2 214308  13784  10568 1602292    0    0   528    20  535  581  0 12 34 53 
> 0  3 214308  15168  10736 1549184    0    0   336 359376 1141 1015  2 26  1
> 72
> 0  3 214308  15156  10760 1549976    0    0    72 21776  496  442  0  3  1 95 
> 0  3 214308  29756  10768 1550304    0    0   268     0  653  533  2  1  6 92 
> 0  3 214308  38128  10776 1550776    0    0   492     0  536  448  0  1  4 95 
> 0  3 214308  49364  10780 1551048    0    0   276     0  549  532  2  1  0 96 
> 1  5 214304  13884  10992 1593420   32    0   892 20484  726  760  0 19  9 71 
> 0  4 214304  14648  11116 1547548    0    0   852 324424  804 1278  2 19  4
> 75
> 0  6 214304  14964  11184 1549504    0    0  1848 35824  638  796  0  6  0 94 
> 0  6 214304  28112  11188 1549480    0    0     4    12  620  571  2  1  0 97
>procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
> 1  7 214304  14604  11296 1549212    0    0    48 134928  640  696  0 13  0
> 86
> 0  7 214304  15052  11372 1547720    0    0   168 95424  822 1137  3 17  0 80
> 0  4 214304  20360  11392 1547520    0    0     0 20552  709  512  0  3  7 91
> 0  4 214304  25844  11396 1548544    0    0    16  1416  438  555  2  1  0 97
> 0  4 214304  38652  11396 1548560    0    0     0     0  571  407  0  2  0 98
> 0  3 214304  52360  11396 1548560    0    0     0     0  601  721  2  1  0 97
> 0  3 214304  50348  11424 1565520    0    0     4    16  481  397  0  2  2 97
> 1  3 214304  14952  11768 1596452    0    0   252 52444  661  952  2 25 19 54
> 0  5 214304  15320  11856 1548480    0    0     4 364900  760  666  0 16  4
> 79
> 0  5 214304  14896  11940 1547704    0    0     0 90496  709  633  2 11  0 87
> 0  5 214304  14400  11936 1547988    0    0   188 79464  652  544  0 10  0 90
> 0  7 214304  13556  11972 1549776    0    0     8 57844  668  553  2  8  0 90
> 0  4 214304  16036  11996 1550756    0    0     4 38756  755  518  0  6  0 94
> 0  4 214304  28108  11996 1550780    0    0    24     0  485  505  2  1 16 82
> 0  4 214304  40644  11996 1550780    0    0     0     0  498  427  1  1 12 86
> 0  3 214304  53108  11996 1550780    0    0     0     0  634  689  2  2 32 65
> 0  3 214304  14060  12008 1599916    0    0    52    20  525  490  0  8 27 66
> 1  2 214304  14952  11652 1554332    0    0   144 337780  805 1056  2 32  1
> 66
> 1  4 214304  14136  11672 1548960    0    0     0 121500  693  537  0 11  0
> 89
> 1  4 214304  20988  11700 1549328    0    0   888 12288  557  525  2  2 11 85
> 0  3 214304  35248  11700 1549436    0    0   120     0  659  461  6  4 28 62
> 0  3 214304  48952  11700 1549436    0    0     0     0  586  508  8  3  2 86
> 0  4 214304  49148  11748 1552932    0    0  3468  1368  888  542  3  2  1 93
> 0  5 214304  52508  11756 1552948    0    0    80    12  422  243  2  0  0 98
> 0  5 214304  60292  11784 1553904   32    0  1064     8  492  258  0  0  0
> 100
> 1  2 214304  61600  12056 1556072    0    0  2476  1092  916 2595  9  5 42 44
> 2  0 214304  60032  12072 1557844    0    0  1644    16  528 1295  7  6 66 22
> 0  0 214304  59660  12080 1558152    0    0   308  1052  448  937  7  2 86  4
> 3  0 214304  59660  12088 1558152    0    0     0  1052  297  627  1  0 98  1
> 0  0 214304  59668  12096 1558152    0    0     0  1052  390 1060  2  1 96  1
> 1  0 214304  59660  12104 1558152    0    0     0  1052  434  858  6  2 92  0
> 0  0 214304  59536  12112 1558152    0    0     0  1052  449  897  6  3 90  1
> 2  0 214304  60048  12120 1558620    0    0   468  1052  390  809  3  1 93  3
> 0  0 214304  57320  12136 1561792    0    0  3188     0  395  785  3  1 88  8
>^C
>
>==============================2.6.30 + patch from
>#366==============================
>ff@home-desktop:~$ dd if=/dev/zero of=./bigfile bs=1M count=15000 &
>./fsync-tester                                                                 
>[1] 5148                                                                       
>fsync time: 0.1111                                                             
>fsync time: 4.3442                                                             
>fsync time: 3.9939                                                             
>fsync time: 3.7558                                                             
>fsync time: 5.8475                                                             
>fsync time: 1.3059                                                             
>fsync time: 3.0354                                                             
>fsync time: 4.5832
>fsync time: 4.3041
>fsync time: 0.2866
>fsync time: 0.7935
>fsync time: 3.2131
>fsync time: 1.5684
>fsync time: 2.0876
>fsync time: 0.9385
>fsync time: 4.3251
>fsync time: 4.1135
>fsync time: 0.7379
>fsync time: 4.9408
>fsync time: 1.1250
>fsync time: 4.2838
>fsync time: 1.1455
>fsync time: 4.7464
>fsync time: 2.8139
>fsync time: 3.5942
>fsync time: 0.9125
>fsync time: 3.4242
>fsync time: 0.1742
>fsync time: 4.8445
>fsync time: 4.0925
>fsync time: 0.8951
>fsync time: 4.1239
>fsync time: 0.0716
>fsync time: 4.5728
>fsync time: 0.3215
>fsync time: 4.6018
>fsync time: 3.5965
>15000+0 records in
>15000+0 records out
>15728640000 bytes (16 GB) copied, 234.895 s, 67.0 MB/s
>fsync time: 0.0515
>fsync time: 0.0345
>fsync time: 0.0722
>^C
>[1]+  Done                    dd if=/dev/zero of=./bigfile bs=1M count=15000
>
>ff@home-desktop:~$ vmstat 1
>procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
> 0  0      0 1235868  42744 329892    0    0  1120    36  267  866  7  4 68 22
> 0  0      0 1235860  42744 329892    0    0     0     0  193  463  0  1 99  0
> 0  0      0 1235860  42744 329892    0    0     0    64  279  633  1  0 99  0
> 1  0      0 1045624  42956 514680    0    0     0  1252  443  792  0 13 83  3
> 0  3      0 821604  43124 686576    0    0     0 355816  810  622  0 19 20 61
> 0  3      0 754580  43188 751656    0    0     0 65088  609  538  1  8 18 74 
> 1  2      0 680256  43260 823852    0    0     0 67648  610  486  1  8  8 83 
> 0  4      0 608228  43328 893888    0    0     0 74496  567  357  0  8  0 92 
> 0  4      0 559444  43380 941264    0    0     0 48372  613  490  1  5  0 94 
> 0  4      0 565140  43384 941268    0    0     0    12  435  464  0  0  0 100
> 0  5      0 578448  43384 941272    0    0     0     4  574  603  1  2  0 97 
> 0  5      0 589836  43384 941272    0    0     0     0  507  469  0  1  0 99 
> 0  1      0 603220  43388 941272    0    0     0    52  495  648  0  0  0 99 
> 0  3      0 236396  43696 1253428    0    0     0 313596  912  502  2 26 26
> 45
> 0  4      0 242476  43700 1253432    0    0     0  1380  497  487  2  0  0 98 
> 0  4      0 254124  43704 1253432    0    0     0    12  472  428  0  1  0
> 100
> 0  6      0 263184  43704 1253432    0    0     0     8  478  507  1  0  0
> 100
> 0  3      0 276748  43704 1253432    0    0     0     4  576  550  0  1  0 99 
> 1  3      0  14600  42256 1501064    0    0     0 96932  816  697  6 23 10 61 
> 0  4      0  20824  37932 1474172    0    0     0 213872  710  496  7  8 12
> 72
> 0  4      0  32524  37932 1474172    0    0     0     0  540  504  2  1  0 97 
> 0  4      0  44192  37936 1474172    0    0     0    12  519  408  0  1  0
> 100
> 0  2      0  57712  37940 1474168    0    0     0    12  608  531  1  0 29 70 
> 0  3      0  14572  19936 1491896    0    0     4 286460  714 1011  2 29 25
> 43
> 0  3      0  25932  19940 1491820    0    0     0   328  640  515  8  2 26 64 
> 0  3      0  38744  19940 1491820    0    0     0     0  609  428  6  4 37 54 
> 1  1      0  15000  20040 1526688    0    0     0     4  700  629  6 13 35 45 
> 2  2      0  13512  20240 1493504    0    0     4 301892  782  750  6 24  4
> 67
> 1  3      0  14852  20304 1491700    0    0     0 52476  721  567  6 11  0 83 
> 1  2      0  18284  20316 1492080    0    0     0 28780  566  451  5  3  5 87 
> 2  2      0  27224  20316 1492080    0    0     0     0  496  495  0  1 46 52 
> 1  2      0  35648  20316 1492080    0    0     0     0  677  381  0  1 25 74 
> 0  1      0  42856  20328 1493104    0    0     8  1456  654  503  6  3 31 59 
> 1  1      0  15036  20432 1536900    0    0     0    16  443  503  6 11 37 46 
> 0  3      0  15760  20644 1490228    0    0     8 311972  751  904  8 23  7
> 63
> 0  3      0  30516  20648 1490232    0    0     0    12  501  426  0  1 34 65 
>procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----  
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa  
> 1  3      0  42548  20648 1490232    0    0     0     0  632  528  6  4 23 68 
> 1  1      0  50416  20664 1494928    0    0     8     4  604  478  7  3  2 88 
> 0  3      0  14100  20860 1490812    0    0    24 325696  928  950  7 35 35
> 22
> 0  2      0  15028  19232 1491500    0    0    48 53484  716  533  7  8 22 62 
> 0  4      0  24768  19232 1491512    0    0     0     4  545  482  2  1 28 69 
> 0  2      0  35968  19236 1491512    0    0     0    12  607  423  3  2 16 79 
> 0  2      0  46564  19236 1491512    0    0     0     0  492  483  1  1 31 67 
> 0  1      0  51728  19240 1491508    0    0     0  1468  666  410  1  1 31 68 
> 0  1      0  61724  19240 1491512    0    0     0     0  533  478  5  4 38 53 
> 0  2      0  15500  17552 1490608    0    0    16 362168  805 1000  1 36 34
> 30
> 0  4      0  14232  17588 1492032    0    0     0 78084  692  585  1  9  6 85 
> 0  3      0  14324  17496 1491412    0    0     0 69932  694  499  6  9  0 84 
> 0  3      0  14712  16420 1492288    0    0     0 85520  715  568  2 11 19 68 
> 0  3      0  15200  16464 1491824    0    0     0 74048  615  468  0 11 16 74 
> 0  3      0  21124  16472 1492776    0    0     0  8224  606  517  1  2 33 64 
> 0  3      0  30116  16476 1492772    0    0     0    12  476  425  0  1 38 61 
> 0  5      0  39592  16480 1492776    0    0     0  1760  468  525  0  1 49 49 
> 0  3      0  55292  16480 1492776    0    0     0     0  588  552  1  1 29 69 
> 1  1      0  14584  16508 1543964    0    0     0    16  558  553  7  9 29 56 
> 0  3      0  14960   9296 1542656    0    0    16 121004  802  950  6 27 20
> 48
> 0  3      0  15168   9224 1499636    0    0     4 347852  832  699  1 17  4
> 77
> 0  3      0  13620   9180 1500896    0    0     0 69932  609  485  0  6  0 94 
> 0  4      0  13716   8364 1501620    0    0     4 66352  612  544  1  9  0 90 
> 1  4      0  14900   6416 1502408    0    0     4 89640  627  492  0 10  0 90 
> 1  3      0  17748   6452 1502668    0    0   104 29000  656  531  1  4 23 72 
> 0  4      0  15436   6480 1506132    0    0   972  1556  641  527  0  1 24 75 
> 0  4      0  25444   6488 1506384    0    0   236     0  464  505  0  1  6 94 
> 0  5      0  34200   6504 1507236    0    0   836     0  491  557  0  0 16 83 
> 0  2      0  46676   6508 1507436    0    0   232    12  649  608  7  4 16 72 
> 0  1      0  57780   6508 1507436    0    0     0     0  558  434  5  4 25 66 
> 0  4      0  14308   6440 1501448    0    0     8 346148  956 1041  6 34 33
> 27
> 0  3      0  14916   6444 1501928    0    0    36 13740  434  463  7  4  0 89 
> 0  3      0  24976   6468 1502376    0    0   496     0  581  503  2  1 29 68 
> 0  3      0  39420   6468 1502400    0    0     0     8  519  413  0  1 27 72 
> 0  4      0  49148   6492 1502900    0    0   604     0  530  543  1  0  0 98 
> 0  3      0  14804   5096 1547228    0    0  2128     8  689  686  1 10  2 86 
>procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----  
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa  
> 1  1      0  14040   5168 1553184    0    0    16 54552  848  908  7 16 40 37 
> 0  2      0  14596   5292 1544912    0    0     4 119588  757  883  6 17 12
> 66
> 0  2      0  14616   4196 1504332    0    0     0 318128  911  798  5 17 21
> 56
> 0  5      0  14252   4248 1504768    0    0   124 41100  573  479  0  6 17 77 
> 0  5      0  21548   4248 1504760    0    0     0     0  356  454  0  0  0
> 100
> 0  5      0  30616   4256 1508596    0    0     4     8  492  408  0  1  0 99 
> 1  4      0  13568   4480 1518228    0    0    64 128736  740  773  1 20  4
> 75
> 0  2      0  19012   4504 1504444    0    0    16 108876  500  458  0  4 11
> 85
> 0  2      0  29868   4504 1504444    0    0     0     0  629  680  1  1 30 68 
> 0  3      0  43568   4508 1504444    0    0     0  1468  506  482  0  0 24 76 
> 0  2      0  56920   4508 1504444    0    0     0     0  567  667  1  1 13 85 
> 0  2      0  41276   4544 1530452    0    0     4    20  451  476  0  2 42 56 
> 0  3      0  13836   4888 1546144    0    0   116 114292  802 1039  2 31 35
> 32
> 0  2      0  23516   4896 1525112    0    0    44 154896  622  499  0  6 16
> 77
> 0  3      0  14904   4984 1546848    0    0     4  8196  680  774  0  9  8 82 
> 0  4      0  14516   5040 1503056    0    0   132 342932  655  613  0 18  0
> 82
> 0  3      0  13804   5116 1503620    0    0    76 73128  610  557  1  8  8 83 
> 0  3      0  16136   5136 1504412    0    0     4 53484  605  481  0  7  0 93 
> 0  3      0  27920   5136 1504404    0    0     0     4  505  470  1  1  5 93 
> 0  2      0  40740   5140 1504408    0    0     0    12  710  456  0  0 36 63 
> 1  2      0  44156   5144 1504408    0    0     0  1400  508  473  1  1 34 64 
> 0  1      0  57520   5144 1504408    0    0     0     0  560  404  0  0 34 66 
> 0  1      0  14276   5268 1555548    0    0     4    28  473  597  1  8 50 40 
> 0  2      0  14720   5464 1544052    0    0     8 145540  769  793  0 25 41
> 34
> 0  4      0  13724   5552 1502248    0    0   208 318396  714  964  1 13 21
> 64
> 1  6      0  15652   5596 1502480    0    0    92 45248  478  454  0  6  0 94 
> 1  8      0  13848   5700 1503536    0    0  3588 61408  784 1299  1  9  0 90 
> 0  6      0  15004   5776 1502076    0    0  3012 57504  748 1322  0  8  0 92 
> 0  6      0  13600   5840 1503888    0    0   380 78176  653  666  1  9  0 90 
> 0  5      0  16248   5860 1502668    0    0   180 20556  821  557  0  3  0 97 
> 0  7      0  19132   5868 1502708    0    0    56  1416  432  523  1  0  0 99 
> 0  5      0  35600   5892 1504112    0    0  1360     4  511  614  0  1  0 99 
> 1  5      0  45772   5892 1504196    0    0   100     8  512  547  1  1  0 98 
> 0  3      0  55852   5900 1504464    0    0   316     0  562  604  0  1  0 99 
> 1  2      0  14092   6252 1551468    0    0  3968 49244  783 1201  2 20  0 78 
> 0  3      0  14336   6412 1505532    0    0   104 369176  788  713  0 22 34
> 45
>procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----  
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa  
> 1  5      0  14668   6476 1505676    0    0   132 57660  550  511  1  6  7 85 
> 0  4      0  17624   6544 1508624    0    0    16 45260  588  483  0  6  0 93 
> 0  4      0  13984   6668 1504972    0    0    64 115264  700  588  0 11  0
> 89
> 0  4      0  15700   6696 1505884    0    0   396 12868  477  508  0  2  0 98 
> 0  3      0  28964   6700 1506988    0    0     0  1672  565  565  1  1  0 98 
> 0  3      0  41988   6700 1506988    0    0     0     0  513  413  0  1  0 99 
> 0  2      0  54796   6704 1506984    0    0     0    12  525  536  1  1  0 98 
> 0  1      0  65068   6704 1506988    0    0     0     4  417  386  0  0 37 63 
> 1  1      0  14544   7056 1547220    0    0     8 114364  785 1032  1 31 41
> 27
> 1  3      0  14984   7112 1546224    0    0     4 62020  633  990  0  6 10 84 
> 0  4      0  15400   7180 1503764    0    0     4 322084  662  695  1 14  0
> 85
> 0  4      0  14676   7252 1505040    0    0     0 74048  542  511  0  8  0 92 
> 0  4      0  14160   7328 1504748    0    0     4 74124  624  585  1  9  0 90 
> 1  6      0  14044   7388 1505560    0    0     0 74020  619  511  0  8  0 92 
> 1  3      0  14800   7452 1503940    0    0     4 58168  896  593  1  6  4 88 
> 2  2      0  15276   7456 1503888    0    0     0  1464  386  432  0  1  0 98 
> 1  3      0  25124   7456 1503892    0    0     0     0  547  505  1  0  0 99 
> 0  3      0  38420   7456 1503892    0    0     0     0  490  410  0  0  0
> 100
> 0  3      0  51580   7456 1503892    0    0     0     0  533  498  1  0  0 98 
> 0  1      0  65620   7456 1503892    0    0     0    12  482  564  0  0 36 64 
> 0  2      0  14968   7680 1545992    0    0    52 103228  791 1004  1 32 37
> 29
> 0  3      0  13916   7708 1547864    0    0     0 36232  565  931  0  2  8 89 
> 3  3      0  14048   7776 1504432    0    0     4 349496  640  762  1 16  9
> 74
> 0  4      0  14252   7860 1503760    0    0     4 78124  625  530  0  8 12 80 
> 0  4      0  13900   7784 1504388    0    0   436 47364  554  581  1  6  1 92 
> 0  4      0  15136   7740 1503292    0    0   196 87056  634  543  0 10  0 90 
> 1  3      0  14756   7776 1503712    0    0     0 53468  626  516  2  6  0 92 
> 0  3      0  20884   7780 1503724    0    0     0    12  726  446  0  1  0 99 
> 0  5      0  24668   7788 1504748    0    0    52  1460  459  520  1  2  4 93 
> 0  3      0  37628   7792 1504920    0    0   124     4  554  491  0  2  0 98 
> 0  2      0  51356   7792 1504924    0    0     0     8  520  583  1  1 40 58 
> 0  2      0  15188   7868 1556060    0    0     4    16  566  514  0  6 32 62 
> 0  3      0  14740   7764 1547640    0    0     8 119488  762  941  1 23 47
> 28
> 0  1      0  19600   7776 1547712    0    0     4 27084  511  427  0  1 52 46 
> 0  3      0  13664   7708 1505740    0    0     4 371988  717  828  2 23 18
> 58
> 0  3      0  13956   7656 1505784    0    0     4 61676  622  502  0  9 30 61 
>procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----  
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa  
> 0  3      0  13952   7588 1506016    0    0     0 78208  600  589  1 10 26 63 
> 0  3      0  15800   7628 1505272    0    0     4 51392  544  461  0  8 27 65 
> 0  3      0  18264   7672 1504532    0    0     0 63748  745  536  4  7 10 80 
> 1  3      0  19980   7684 1505532    0    0     0  1504  558  441  0  0 40 60 
> 0  3      0  27952   7684 1505536    0    0     0     0  482  500  2  0 49 49 
> 0  3      0  43392   7688 1505532    0    0     0     8  556  486  1  2 28 68 
> 0  3      0  52820   7688 1505604    0    0    68     8  517  603  6  4  8 82 
> 0  1      0  12496   7836 1558900    0    0     4    24  632  719  5 13 22 59 
> 1  2      0  15284   7840 1547452    0    0     8 104228  767  898  6 18 20
> 56
> 1  2      0  21324   7900 1514460    0    0     0 249132  691  548  7 12 18
> 63
> 0  5      0  30912   7900 1514464    0    0     0     4  583  505  8  4 33 55 
> 1  3      0  46064   7904 1515080    0    0     0     4  681  588  6  3 37 54 
> 1  3      0  18740   8060 1552056    0    0     8   876  659  746  6 17  6 71 
> 0  3      0  14936   7920 1503820    0    0     0 356160  956  624  5 21 22
> 52
> 0  2      0  25384   7936 1504544    0    0     4  1356  583  540  7  4 13 76 
> 1  3      0  38272   7940 1504544    0    0     0    12  604  464  7  4 32 57 
> 0  3      0  44132   7940 1504544    0    0     0     0  540  551  8  4 35 53 
> 2  1      0  15056   8028 1547112    0    0     0    12  739  691  6 11 31 51 
> 0  4      0  14924   8208 1538096    0    0     8 120368  687  900  8 24 16
> 52
> 0  3      0  16108   8228 1545492    0    0     4 13968  710  766  7  9  8 76 
> 1  3      0  14888   8248 1504728    0    0     0 338572  708  670  7 16 25
> 52
> 1  3      0  15092   8288 1504772    0    0     4 41132  619  469  6  9  0 84 
> 0  4      0  13720   8372 1506108    0    0     0 98724  718  577  7 13 27 53 
> 0  2      0  15300   8412 1505056    0    0     4 32908  630  470  6  7 14 73 
> 0  2      0  23212   8412 1505064    0    0     0     0  554  526  7  3 36 54 
> 0  2      0  29264   8416 1505064    0    0     0  1364  801  430  7  4 39 50 
> 0  2      0  40780   8416 1505064    0    0     0     0  521  478  3  2 18 77 
> 2  1      0  53716   8420 1505064    0    0     0    12  437  405  0  0 51 49 
> 2  0      0  13800   8500 1557188    0    0     0    36  456  623  2  8 49 40 
> 1  2      0  15452   8712 1545408    0    0     8 125840  930 1130  7 26 37
> 30
> 1  1      0  14220   8788 1503544    0    0     4 339620  775  788  7 19 25
> 49
> 0  3      0  15200   8752 1502960    0    0     4 45216  621  477  6 10  1 83 
> 0  3      0  15364   8816 1502940    0    0     0 90496  728  600  8 12  0 80 
> 0  3      0  17116   8868 1505288    0    0     4 32908  641  454  5 10  0 85 
> 0  4      0  18532   8836 1502924    0    0     0 90564  734  577  7 12  0 80 
> 0  2      0  24432   8844 1502932    0    0     0  1440  724  466  6  4 21 69 
>procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----  
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa  
> 1  2      0  32152   8844 1502932    0    0     0     0  484  497  7  3 36 54 
> 1  1      0  45520   8844 1502932    0    0     0     0  594  401  7  2 37 54 
> 0  1      0  58376   8844 1502932    0    0     0     0  669  498  6  6 27 61 
> 1  1      0  14164   8908 1556064    0    0     4    16  531  552  7 14 37 41 
> 0  2      0  15200   8868 1502456    0    0    12 394016  824 1047  2 26 14
> 57
> 0  3      0  17988   8920 1505192    0    0     4 32948  556  477  2  6  0 92 
> 1  2      0  15244   9008 1502740    0    0     0 109784  739  597  7 14  0
> 78
> 1  2      0  14120   9064 1504488    0    0     4 69952  660  487  7 11  0 82 
> 0  3      0  15544   9136 1502676    0    0     0 78112  795  584  9 10  0 81 
> 0  2      0  14772   9164 1503996    0    0     4 20524  769  449  6  6  4 84 
> 0  2      0  23116   9168 1505024    0    0     0  1456  472  500  8  3 21 67 
> 0  3      0  35232   9168 1505024    0    0     0     4  598  413  6  5 11 78 
> 0  2      0  47668   9172 1505024    0    0     0    12  506  558  2  0 14 83 
> 0  2      0  56336   9180 1505016    0    0   104     0  548  426  0  1 37 62 
> 1  1      0  14308   9104 1550864    0    0     4 45552  585  752  4 22 42 32 
> 0  4      0  14908   9000 1546296    0    0     8 81468  787  747  6 15  8 70 
> 2  1      0  13672   8984 1503684    0    0     4 356028  843  854  8 19 16
> 57
> 1  3      0  15088   9044 1506536    0    0     4 45248  707  506  7 10 15 68 
> 0  4      0  15088   8956 1502944    0    0     0 98656  687  581  4 11  0 85 
> 0  4      0  13996   9016 1503984    0    0     4 78144  631  477  0 10  0 89 
> 1  4      0  17544   9072 1502820    0    0     4 49852  599  569  1  5  0 93 
> 0  3      0  17292   9052 1505400    0    0     0  1468  730  476  0  3  1 97 
> 0  4      0  30052   9052 1505404    0    0     0    36  566  506  6  4 30 60 
> 0  3      0  43416   9056 1505404    0    0     0    12  585  449  4  2 12 83 
> 0  2      0  53728   9060 1505408    0    0    60     0  534  604  1  2  6 91 
> 2  1      0  13792   9204 1556156    0    0     4    16  556  618  3 12 47 39 
> 1  0      0  15192   9128 1547256    0    0     4 119120  774  796  6 18 38
> 38
> 1  2      0  14468   9124 1503200    0    0     4 355892  930  949  5 21 30
> 44
> 0  4      0  16568   9168 1506296    0    0     4 24644  602  532  8  8  7 77 
> 0  3      0  14576   9176 1503172    0    0     0 119060  728  587  6 16 18
> 61
> 1  2      0  14352   9120 1508740    0    0     4 20576  612  558  8  7 19 65
> 0  3      0  15532   9136 1501896    0    0     0 128572  813  532  7 15  0
> 78
> 0  2      0  15140   9152 1504820    0    0     4  1448  768  547  7  5  5 83
> 1  3      0  25064   9156 1504856    0    0     0    12  586  437  6  4  7 82
> 1  5      0  38540   9156 1504856    0    0     0     4  642  611  9  3 26 62
> 0  3      0  52500   9156 1504856    0    0     0     0  624  481  5  5 37 53
>procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
> 2  1      0  14500   9232 1555248    0    0     0    16  682  704  7 10 28 55
> 1  2      0  14644   9376 1546224    0    0     8 111736  724  810  6 26 40
> 28
> 0  2      0  13508   9308 1503808    0    0     4 341320  782  948  8 18  8
> 67
> 1  2      0  13724   9328 1504316    0    0     0 61708  662  620  7 10 15 67
> 1  3      0  14292   9376 1503728    0    0     4 61764  667  673  7 10  0 83
> 0  4      0  16960   9312 1504224    0    0     4 61700  661  587  7 12  0 81
> 0  3      0  14488   9400 1502776    0    0     0 98792  723  605  7 10  0 83
> 0  3      0  14544   9408 1502508    0    0     0 26160  794  573  6  7 34 53
> 0  3      0  26252   9412 1502512    0    0     0  1420  650  546  7  4  2 87
> 0  3      0  39100   9416 1502512    0    0     0    12  592  432  6  4  0 89
> 0  4      0  49712   9416 1502512    0    0     0     4  499  493  2  2  8 88
> 0  1      0  63372   9420 1502512    0    0     0    12  582  593  0  0 14 85
> 2  0      0  13900   9588 1555884    0    0  1632    56  531  762  2 14 27 56
> 1  2      0  14012   9768 1505492    0    0     8 391688  790  846  0 26 23
> 52
> 0  3      0  14904   9812 1502068    0    0     0 98880  695  574  1 11 21 66
> 0  3      0  13512   9812 1503280    0    0     4 74060  642  496  0 10 29 61
> 0  4      0  14024   9844 1502960    0    0     0 73940  658  598  1 10 32 57
> 0  3      0  18676   9872 1501804    0    0     4 35736  591  437  0  5 15 80
> 0  4      0  27660   9884 1501824    0    0     8    12  541  284  2  1  2 95
> 0  3      0  39176   9888 1501912    0    0    92  1720  448  274  0  1 10 89
> 0  3      0  51596   9888 1501912    0    0     0     0  545  593  1  1 36 61
> 0  1      0  66220   9892 1501912    0    0     0    16  570  555  0  0 48 51
> 0  0      0  15144   9940 1556124    0    0  2240   176  506  753  4 10 37 49
> 0  0      0  15192   9948 1556176    0    0     0  1052  283  483  0  0 98  1
> 0  0      0  15192   9956 1556176    0    0     0  1052  316  769  2  0 97  2
> 0  1      0  14348   9988 1556772    0    0  2996  1052  408  718  0  0 77 22
> 0  0      0  14252  10012 1558320    0    0  1588     0  404  803  2  6 78 13
> 0  0      0  14680  10036 1557968    0    0  1840    28  400 1197  0  3 88  8
> 0  0      0  14740  10080 1557812    0    0  2168     0  406  992  2  2 85 12
>^C

Revision history for this message

In Linux Kernel Bug Tracker #12309, perlover (perlover-linux-kernel-bugs) wrote on 2009-06-11:

#541

I read russian forums about this problem
I should go now and i cannot write to more info right now
But there somebodies tried to checnge scheduler from cfg to other and iowait bug stayed. If you think that bug in scheduler may be to try to change scheduler through /proc ?

Revision history for this message

In Linux Kernel Bug Tracker #12309, akatopaz (akatopaz-linux-kernel-bugs) wrote on 2009-06-12:

#542

I'm annoyed by the same bug (I suspect). And I'm able to reproduce it with both anticipatory and cfq schedulers. Therefore, is this but to be link with cfq ?

I'm running my kernel with: elevator=as

Revision history for this message

In Linux Kernel Bug Tracker #12309, rockorequin (rockorequin-linux-kernel-bugs) wrote on 2009-06-13:

#543

@Jens Axboe: I tried your patch in comment 366 on the 2.6.30 kernel, and it did improve responsiveness in my initial testing. I used to have the problem that the kernel became highly unresponsive on large file copies to the same partition or as soon as it tried to use swap (in 2.6.30-rc3 and earlier), but the unpatched 2.6.30 performs quite reasonably and the patch improved responsiveness further (my unscientific test results are that moving the mouse resulted in much less 'stuttering' after the patch - note that with earlier kernels the mouse would just freeze).

I did though just find a problem where an overnight memory leak caused X to become so unresponsive it couldn't even draw the screen background until I killed the culprit (firefox). This might be unrelated to the patch, ie a problem with swap management, but it does show that the kernel can still become bogged down under high disk I/O.

Revision history for this message

In Linux Kernel Bug Tracker #12309, perlover (perlover-linux-kernel-bugs) wrote on 2009-06-17:

#544

Did anybody here resolve this bug ?
I see a workaround only as an installing FreeBSD instead a Linux kernel version >= 2.6.18

Revision history for this message

In Linux Kernel Bug Tracker #12309, perlover (perlover-linux-kernel-bugs) wrote on 2009-06-18:

#545

I think that i coped with a this bug !
I made some options of kernel and my server works stability and there are no frozen timeouts with high iowait already 10-12 hours!

Detailed info:
My kernel now is 2.6.22.14-72.fc6
Fedora Core 6

This the suggestion is not bug resolving (i think there is bug in kernel and it stays) but this is a workaround. I have read many topics and forums and stopped at these commands:

# echo 50 > /proc/sys/vm/vfs_cache_pressure
# echo deadline > /sys/block/DEVICE/queue/scheduler
# # echo 1 > /sys/block/DEVICE/device/queue_depth
# echo 1024 > /sys/block/DEVICE/queue/nr_requests

The DEVICE is 'hda' or 'sda' for some HDDs. I didn't test queue_depth because for my HDDs (SAS SCSI + RAID10) this file is readonly (no there NCQ supporting as i think). But may be this command will help to you. I don't know.

I suggest anybody who have a frozen timeouts with high iowait to try this turning

I am very glad! Please to try this workaround. I didn't test 'dd' command but my heavy a HDD working has been freezing the server. Now i don't see this.

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-06-19:

#546

Can you try the three settings separately, to see which one makes the large difference?

Revision history for this message

In Linux Kernel Bug Tracker #12309, perlover (perlover-linux-kernel-bugs) wrote on 2009-06-19:

#547

I will try but this is my work server under heavy load. I am afraid now there to touch something already :-/
But near time i will try to define the main option of this turning. Already passed > 24 hours and i don't have a troubles there with freezes. I cannot believe ...

Revision history for this message

In Linux Kernel Bug Tracker #12309, perlover (perlover-linux-kernel-bugs) wrote on 2009-06-19:

#548

Here test for same server as in my post here # 359
Same server but after turning port # 385

# dd if=/dev/zero of=testfile.1gb bs=1M count=1000

And during 'dd' i do vnstat 1:

0 2 116 103632 507240 2016112 0 0 1324 16 1024 963 1 1 50 48
0
1 2 116 101512 507484 2015736 0 0 1436 0 1314 1253 21 5 25 48
0
0 2 116 103632 507240 2016112 0 0 1324 16 1024 963 1 1 50 48
0
0 7 116 25208 496944 2105464 0 0 4 26272 2892 239 0 4 23 73
0
0 9 116 21636 496972 2109568 0 0 32 21904 2150 339 0 2 8 90
0
0 10 116 39888 481904 2105552 0 0 4 23544 1964 368 0 4 1 96
0
0 9 116 49036 472984 2105016 0 0 8 18252 1730 728 0 3 0 97
0
0 7 116 61700 459736 2105412 0 0 16 74176 2167 317 0 5 13 82
0
0 7 116 71416 450576 2104272 0 0 24 8680 1322 237 0 4 16 80
0
1 5 116 82772 439000 2106280 0 0 24 58616 1457 3332 0 7 5 88
0
1 5 116 97224 424752 2105804 0 0 20 60164 848 286 0 6 24 70
0
0 7 116 110700 409384 2107036 0 0 56 105584 884 397 0 9 15 76
0
2 5 116 116444 392304 2118776 0 0 288 95624 1096 424 1 11 10 78

As you can see there no stability iowait 90-99%, only sometimes ...

Revision history for this message

In Linux Kernel Bug Tracker #12309, perlover (perlover-linux-kernel-bugs) wrote on 2009-06-19:

#549

Here some tests:

I do to defaults settings before tuning:
# echo 100 > /proc/sys/vm/vfs_cache_pressure
#
# echo cfq > /sys/block/sda/queue/scheduler
#
# echo 128 > /sys/block/sda/queue/nr_requests

# dd if=/dev/zero of=testfile.1gb bs=1M count=1000
^C
116+0 records in
116+0 records out
121634816 bytes (122 MB) copied, 20.5609 seconds, 5.9 MB/s
^^^^^^^^^^^^^^^^^^^^^^^^^ (!!!)

During a riunning of 'dd' i do vmstat 1:

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
-
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 10 116 760168 503488 1329836 0 0 4 4 0 0 9 3 65 23
0
0 11 116 756132 502488 1330536 0 0 1332 5648 1744 4909 5 2 2 91
0
0 12 116 760208 503128 1330856 0 0 1136 4388 1875 3053 4 2 0 94
0
0 11 116 759832 502668 1331608 0 0 1004 7488 2379 4032 1 2 0 97
0
0 12 116 758740 503288 1331832 0 0 1280 3252 1818 2402 1 1 0 98
0
0 10 116 733976 502936 1356780 0 0 1232 4476 1753 4143 1 3 0 96
0
1 8 116 733596 502368 1357324 0 0 804 5792 1831 2980 20 2 0 79
0
1 7 116 738388 502920 1357788 0 0 928 6652 1875 2349 17 2 4 77
0

**************************

Now i after this to do:

# echo 50 > /proc/sys/vm/vfs_cache_pressure
#
# echo deadline > /sys/block/sda/queue/scheduler
#
# echo 1024 > /sys/block/sda/queue/nr_requests

# dd if=/dev/zero of=testfile.1gb bs=1M count=1000

^C
638+0 records in
638+0 records out
668991488 bytes (669 MB) copied, 10.463 seconds, 63.9 MB/s
^^^^^^^^^^^^^^^^ (!!! :-))) )

During 'dd' i do in other terminal:
# vmstat 1

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
-
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 7 116 718764 502884 1371484 0 0 4 4 0 0 9 3 65 23
0
0 9 116 687208 502924 1405624 0 0 8 26664 2708 746 6 4 3 87
0
1 8 116 668924 502976 1422116 0 0 16 21404 2246 8462 1 4 9 87
0
0 8 116 654804 501632 1434492 0 0 24 30804 2072 9249 10 4 0 86
0
0 10 116 613152 501692 1475220 0 0 20 42880 2021 4408 15 5 7 73
0
2 10 116 559860 499464 1524600 0 0 32 58504 2108 10612 5 6 15 74
0
0 11 116 510132 499528 1578340 0 0 36 59400 984 1748 17 5 2 77
0
0 10 116 399420 499672 1689316 0 0 108 111332 910 957 4 11 2 84
0
1 7 116 331556 499756 1750580 0 0 104 62268 1501 5255 11 6 10 74
0

*********************

and noticing:

I have other servers, there other hardware. I cannot repeat this iowait problem with and without this turning (there Fedora release 7 (Moonshine), kernel 2.6.23.17-88.fc7). Now i think that this trouble is not for all HDDs. May be this trouble is hardware dependent.

I am researching now what option will help to resolve iowait problem

Here some tests:

I do to defaults settings before tuning:
# echo 100 > /proc/sys/vm/vfs_cache_pressure
#
# echo cfq > /sys/block/sda/queue/scheduler
#
# echo 128 > /sys/block/sda/queue/nr_requests

# dd if=/dev/zero of=testfile.1gb bs=1M count=1000
^C
116+0 records in
116+0 records out
121634816 bytes (122 MB) copied, 20.5609 seconds, 5.9 MB/s
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^ (!!!)

During a riunning of 'dd' i do vmstat 1:

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
-
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0 10    116 760168 503488 1329836    0    0     4     4    0    0  9  3 65 23
0
 0 11    116 756132 502488 1330536    0    0  1332  5648 1744 4909  5  2  2 91
0
 0 12    116 760208 503128 1330856    0    0  1136  4388 1875 3053  4  2  0 94
0
 0 11    116 759832 502668 1331608    0    0  1004  7488 2379 4032  1  2  0 97
0
 0 12    116 758740 503288 1331832    0    0  1280  3252 1818 2402  1  1  0 98
0
 0 10    116 733976 502936 1356780    0    0  1232  4476 1753 4143  1  3  0 96
0
 1  8    116 733596 502368 1357324    0    0   804  5792 1831 2980 20  2  0 79
0
 1  7    116 738388 502920 1357788    0    0   928  6652 1875 2349 17  2  4 77
0

**************************

Now i after this to do:

# echo 50 > /proc/sys/vm/vfs_cache_pressure
#
# echo deadline > /sys/block/sda/queue/scheduler
#
# echo 1024 > /sys/block/sda/queue/nr_requests

# dd if=/dev/zero of=testfile.1gb bs=1M count=1000

^C
638+0 records in
638+0 records out
668991488 bytes (669 MB) copied, 10.463 seconds, 63.9 MB/s
                                     ^^^^^^^^^^^^^^^^ (!!! :-))) )

During 'dd' i do in other terminal:
# vmstat 1

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
-
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  7    116 718764 502884 1371484    0    0     4     4    0    0  9  3 65 23
0
 0  9    116 687208 502924 1405624    0    0     8 26664 2708  746  6  4  3 87
0
 1  8    116 668924 502976 1422116    0    0    16 21404 2246 8462  1  4  9 87
0
 0  8    116 654804 501632 1434492    0    0    24 30804 2072 9249 10  4  0 86
0
 0 10    116 613152 501692 1475220    0    0    20 42880 2021 4408 15  5  7 73
0
 2 10    116 559860 499464 1524600    0    0    32 58504 2108 10612  5  6 15 74
 0
 0 11    116 510132 499528 1578340    0    0    36 59400  984 1748 17  5  2 77
0
 0 10    116 399420 499672 1689316    0    0   108 111332  910  957  4 11  2 84
 0
 1  7    116 331556 499756 1750580    0    0   104 62268 1501 5255 11  6 10 74
0

*********************

and noticing:

I have other servers, there other hardware. I cannot repeat this iowait problem with and without this turning (there Fedora release 7 (Moonshine), kernel 2.6.23.17-88.fc7). Now i think that this trouble is not for all HDDs. May be this trouble is hardware dependent.

I am researching now what option will help to resolve iowait problem

Revision history for this message

In Linux Kernel Bug Tracker #12309, perlover (perlover-linux-kernel-bugs) wrote on 2009-06-19:

#550

I determined main option:

Only this option helped to me:

# echo deadline > /sys/block/sda/queue/scheduler

I don't understand why. I have read many russian topics that a changing of scheduler doesn't help ... I don't think that only a changing of scheduler will help to me. But i only have changed the scheduler from cfq to deadline and 'dd' test now this:

# dd if=/dev/zero of=testfile.1gb bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 13.7121 seconds, 76.5 MB/s

iowait sometime was only 80-90%.

Here my current setting:

# cat /proc/sys/vm/vfs_cache_pressure
100
# cat /sys/block/sda/queue/scheduler
noop anticipatory [deadline] cfq
# cat /sys/block/sda/queue/nr_requests
128

Now i will keep these setting and will watch there are a freezes or not.

Revision history for this message

In Linux Kernel Bug Tracker #12309, perlover (perlover-linux-kernel-bugs) wrote on 2009-06-19:

#551

I made a some experiments
And i think that i found main reason of high iowait with cfq scheduler.

I made some tests:

I changed cfg <--> scheduler into my two servers with same hardware & OS (FC6, kernel 2.6.22.14-72.fc6). There same CPUs, motherboard, SAS & RAID controllers & HDDs. But i saw only in one server high iowait & cfq scheduler during 'dd' command.

I think that main reason is A LOT AMOUNT OF USED INODES OF PARTIOTION into HDD.

For example:

The 'OK' server where i counld not reproduce bug:
# df -i

/dev/sda1 524288 8543 515745 2% /
tmpfs 219756 1 219755 1% /dev/shm
/dev/sda6 787200 34068 753132 5% /usr
/dev/sda5 787200 25582 761618 4% /usr/local
/dev/sda7 524288 1993 522295 1% /var
/dev/sda8 30900224 1719787 29180437 6% /wwws
/dev/sda3 1048576 49655 998921 5% /wwws/accel-proxy

I wrote the test testfile.1gb file to /wwws partiotion . There no highest iowait with deadline & cfq schedulers.

The second server, 'BAD' server has a same hardware & soft but there df -i:

Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda1 524288 7444 516844 2% /
tmpfs 219756 1 219755 1% /dev/shm
/dev/sda7 787200 35307 751893 5% /usr
/dev/sda6 787200 27520 759680 4% /usr/local
/dev/sda8 524288 2334 521954 1% /var
/dev/sda3 30900224 5332794 25567430 18% /wwws
/dev/sda5 524288 4128 520160 1% /wwws/accel-proxy

I did 'dd' tests to /wwws/ partition too (i get used to write there big files) ... There if i use cfq scheduler and (important) have some worked processes (apaches, mysql - not idle server) that during 'dd' command i have highest iowait (90-99%) and very low speed of writing (9-10 Mb/sec). If i change there to deadline scheduler and write to /wwws/ partition too i have 60-80 Mb/sec speed and not high iowait. But if i wrote testfile.1gb to other partiotion (for example to /var) i have not iowait even with cfq scheduler. Thus cfq scheeduler + a lot used inodes is bad as i think. The deadline scehduler + a lot used inodes is not bad.

So i think that high amount of used inodes in partiotion and cfq scheduler together have some wrong something.

May be if i could have a much used inodes into my other servers (FC7 where i could not reproduce iowait problem) i could reproduce this high iowait bug too.

Please to try make a many many small files in some partiotion (5-6 millions for example) and to test 'dd' & cfq scheduler.

I made a some experiments
And i think that i found main reason of high iowait with cfq scheduler.

I made some tests:

I changed cfg <--> scheduler into my two servers with same hardware & OS (FC6, kernel 2.6.22.14-72.fc6). There same CPUs, motherboard, SAS & RAID controllers & HDDs. But i saw only in one server high iowait & cfq scheduler during 'dd' command.

I think that main reason is A LOT AMOUNT OF USED INODES OF PARTIOTION into HDD.

For example:

The 'OK' server where i counld not reproduce bug:
# df -i

/dev/sda1             524288    8543  515745    2% /
tmpfs                 219756       1  219755    1% /dev/shm
/dev/sda6             787200   34068  753132    5% /usr
/dev/sda5             787200   25582  761618    4% /usr/local
/dev/sda7             524288    1993  522295    1% /var
/dev/sda8            30900224 1719787 29180437    6% /wwws
/dev/sda3            1048576   49655  998921    5% /wwws/accel-proxy

I wrote the test testfile.1gb file to /wwws partiotion . There no highest iowait with deadline & cfq schedulers.

The second server, 'BAD' server has a same hardware & soft but there df -i:

Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda1             524288    7444  516844    2% /
tmpfs                 219756       1  219755    1% /dev/shm
/dev/sda7             787200   35307  751893    5% /usr
/dev/sda6             787200   27520  759680    4% /usr/local
/dev/sda8             524288    2334  521954    1% /var
/dev/sda3            30900224 5332794 25567430   18% /wwws
/dev/sda5             524288    4128  520160    1% /wwws/accel-proxy

I did 'dd' tests to /wwws/ partition too (i get used to write there big files) ... There if i use cfq scheduler and (important) have some worked processes (apaches, mysql - not idle server) that during 'dd' command i have highest iowait (90-99%) and very low speed of writing (9-10 Mb/sec). If i change there to deadline scheduler and write to /wwws/ partition too i have 60-80 Mb/sec speed and not high iowait. But if i wrote testfile.1gb to other partiotion (for example to /var) i have not iowait even with cfq scheduler. Thus cfq scheeduler + a lot used inodes is bad as i think. The deadline scehduler + a lot used inodes is not bad.

So i think that high amount of used inodes in partiotion and cfq scheduler together have some wrong something.

May be if i could have a much used inodes into my other servers (FC7 where i could not reproduce iowait problem) i could reproduce this high iowait bug too.

Please to try make a many many small files in some partiotion (5-6 millions for example) and to test 'dd' & cfq scheduler.

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-06-19:

#552

It would be ideal, if you could try 2.6.30 on the problematic server. I realize that this may not be easy, however there's not much I can do about a problem on an ancient kernel.

If you do try 2.6.30 and it also has the same problem, then I want you to capture some blktrace data of both deadline and cfq. Basically, right after you start the dd test, in another terminal do:

# cd /dev/shm; blktrace /dev/sda

and ctrl-c that blktrace after ~5 seconds or so. Then stop the dd as well. Save the blktrace files on the harddrive.

Now switch do deadline and repeat the exact same thing. Then tar up the two sets of files and attach them to this bug report.

Revision history for this message

In Linux Kernel Bug Tracker #12309, perlover (perlover-linux-kernel-bugs) wrote on 2009-06-19:

#553

Jens Axboe, i happy to help but i cannot to try 2.6.30 :(

I never install kernels and i am afraid that something maybe to be not right after installing kernel and i will not be able to access to server. This server under heavy load and is located in other continent. I cannot risk, sorry ;-(

May be will anybody try to make many small files in HDD (many inodes - ~ 5-6 millions for example) and will try to compare cfq & deadline schedulers ?

Revision history for this message

In Linux Kernel Bug Tracker #12309, hassium (hassium-linux-kernel-bugs) wrote on 2009-06-20:

#554

Created attachment 22019
test results 2.6.30: cfq, deadline

It gives no improvement in responsiveness at all variants. Maybe quite a bit.

Revision history for this message

In Linux Kernel Bug Tracker #12309, erbrochendes (erbrochendes-linux-kernel-bugs) wrote on 2009-06-27:

#555

hi, i am using the 2.6.30 kernel with the patch from #366.
Before using the patch i got really trouble when downloading large files with torrent at high speed ( over 5MB/sec).
Now it just works great. Thanks for this patch.

Revision history for this message

In Linux Kernel Bug Tracker #12309, hassium (hassium-linux-kernel-bugs) wrote on 2009-07-01:

#556

Created attachment 22167
test result 2.6.30 without ACHI

I turned off the ACHI in BIOS on the laptop. System has become much more responsive. Now possible to run new applications, while the dd is running.

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-07-02:

#557

Created attachment 22180
Drain async IO on the hw side

This patch makes sure that async IO has completed drained from the device queue before starting sync IO. Hopefully that should make things as good as disabling NCQ, and it should even improve the situation without NCQ.

I'd like for people to test this patch and see if it makes a difference. It's against 2.6.31-rc (ish), but I _think_ it will apply against 2.6.30 as well. If not, holler, and I'll do a backport too.

Revision history for this message

In Linux Kernel Bug Tracker #12309, hassium (hassium-linux-kernel-bugs) wrote on 2009-07-03:

#558

Created attachment 22184
test result 2.6.30 with patch from #397

(2.6.30 + NCQ + patch from #397) == (2.6.30 + NCQ). New applications start very slow.

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-07-04:

#559

Notebooks toshiba earlier came across to me, and I saw as on them then worked Linux. If ACPI to switch off — it was possible to listen to music (but, and it is clear, not to see how many remains fuel in batteries) and if ACPI to switch on — any sound was not, whether but it was visible that with a battery began and to understand have stolen this battery while we enjoyed music. It is possible and to accuse of it the scheduler (in our case cfq) and to search why it cannot plan simultaneously two processes — to play music and to check a battery state.
For us it has turned out that all schedulers have broken (because I have tried them all — and all non-working). The theory of probability does not deny possibility of such event. But, that, for one person all schedulers broke, and all worked for another are already influence supernatural forces. Struggle against them is useless.
So why for one all works, and for another the system hardly creeps. In what a difference. Only in computers (or is more exact in their complete set).

I can be mistaken, but can then someone will tell why on one iron all simply flies, and on other hardly creeps (without looking at that on the second both the processor faster and disks faster, and bus faster).

Revision history for this message

In Linux Kernel Bug Tracker #12309, kebjoern (kebjoern-linux-kernel-bugs) wrote on 2009-07-17:

#560

Had big troubles on a ASUS PN5e Motherboard and a WD 320 G. Compiled a 2.6.31-rc3 with your patch and it works great. Thank you very much! I'd like to backport it to 2.6.29 to try it together with the realtime patch. Is there a chance to get it working?

Revision history for this message

In Linux Kernel Bug Tracker #12309, hassium (hassium-linux-kernel-bugs) wrote on 2009-07-18:

#561

2.6.31-rc3-git3 + NCQ + patch from #397: new applications start very
slow.
Without NCQ new applications start quickly.

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-07-19:

#562

There is one more interesting question.
KSYSGUARD shows "Used Memory" = 0.66Gb.
> top
top - 21:43:57 up 7:00, 3 users, load average: 0.74, 0.39, 0.29
Tasks: 149 total, 3 running, 146 sleeping, 0 stopped, 0 zombie
Cpu (s): 2.8%us, 1.3%sy, 0.0%ni, 93.1%id, 2.5%wa, 0.2%hi, 0.2%si, 0.0%st
Mem: 8035628k total, 7998716k used, 36912k free, 0k buffers
Swap: 2104472k total, 6564k used, 2097908k free, 7402836k cached

When value Mem:used aspires to value Mem:total - the graphic interface works much more slowly (and without any disk operations).

It only at me is present such problem?

Revision history for this message

In Linux Kernel Bug Tracker #12309, benjfitz (benjfitz-linux-kernel-bugs) wrote on 2009-08-18:

#563

I applied the patch in 397 to a vanilla 2.6.30.4 and the difference was dramatic (with the patch is _much_ better, ie the complete freezing for 15+ seconds when running multiple IO intensive jobs are gone). I'll work on getting some hard numbers (with iobench, etc) to see if they agree.

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2009-08-18:

#564

(In reply to comment #397)
> Created an attachment (id=22180) [details]
> Drain async IO on the hw side
>
> This patch makes sure that async IO has completed drained from the device
> queue
> before starting sync IO. Hopefully that should make things as good as
> disabling
> NCQ, and it should even improve the situation without NCQ.
>
> I'd like for people to test this patch and see if it makes a difference. It's
> against 2.6.31-rc (ish), but I _think_ it will apply against 2.6.30 as well.
> If
> not, holler, and I'll do a backport too.

Is this in the vanilla 2.6.31-rc5 already?

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-08-18:

#565

No, the patch is queued up for 2.6.32 since it was a rather risky change for 2.6.31. But I'm glad it makes a difference, that means that the starvation experienced is largely on the device side. By draining the queue, we prevent that from happening (or, at least we lessen the effect dramatically).

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-08-25:

#566

2.6.31-rc7 + patch in 397 - There are no improvements

Revision history for this message

In Linux Kernel Bug Tracker #12309, james (james-linux-kernel-bugs) wrote on 2009-08-30:

#567

No improvements seen here with 2.6.30.5 and the patch, either. Pretty much *any* write to swap causes major latency (disruption to audio, graphics etc.).

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-09-15:

#568

There is an improvement in desktop responsiveness with kernel 2.6.31 and the as scheduler compared to the cfq scheduler. It does not solve the problem, but it makes it more sufferable. I am using a full encrypted lvm drive with ext3 partitions, mounted with noatime and data=ordered.

Revision history for this message

In Linux Kernel Bug Tracker #12309, rockorequin (rockorequin-linux-kernel-bugs) wrote on 2009-10-26:

#569

I've observed something that might be relevant to this bug (using the 2.6.31.5 kernel): when I do large I/O operations from one external device (say /dev/sdb) to another slow USB flash key (say /dev/sdc), I can hear my *internal* hard drive (/dev/sda) thrashing away constantly even though its light indicates that no read/write activity is going on. During this time anything that requires access to /dev/sda is slowed right down and hence running new programs slows down disk access.

When I start copying, eg using nautilus, there is usually a 400 MB buffering delay before writing starts to the USB drive (ie before its light starts flashing). During this time, there is NO /dev/sda thrashing. /dev/sda starts thrashing starts as soon as the USB key light starts flashing.

So there appears to be a bug that makes /dev/sda constantly seek during the /dev/sdc USB write operation, and this is affecting system responsiveness.

Revision history for this message

In Linux Kernel Bug Tracker #12309, axboe (axboe-linux-kernel-bugs) wrote on 2009-10-27:

#570

Please try 2.6.32-rc5. Make sure you are using CFQ as your io scheduler.

Revision history for this message

In Linux Kernel Bug Tracker #12309, rockorequin (rockorequin-linux-kernel-bugs) wrote on 2009-10-28:

#571

I opened http://bugzilla.kernel.org/show_bug.cgi?id=14491 to track this bug separately - I've put comments in there about 2.6.32-rc5, which I don't think exhibits the problem.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-11-01:

#572

Download full text (4.1 KiB)

Created attachment 23618
Simple sleeper test case

As this bug occurs more permanent while working in an virtual machine or while using java and I still think, that's this is a process scheduler bug (or something related). Here another test case, which shows the suspected behaviour. As there are many system calls while using a virtual machine, I have tries to find an equal test. The test case just sleeps for 1µs and measures the time difference of the usleep operation. I am using such many of the usleep operations, as the problems does not occur deterministic and I tried to catch as many as possibly occurrences.

I have run this test case on three machines. The first one was a Core2 Duo with a first generation SDD (OCZ Core Series) with a poor write performance and on a Ubuntu kernel 2.6.31-14-generic. The partitions are block aligned. I have run this test, while my wife was using firefox. Every time, she was submitting something and firefox is using sqlite for writing the history, there was a high latency for the sleep test.

Timediff 7629094: 16.80ms Total: 61.12ms
Timediff 7629100: 18.82ms Total: 93.68ms
Timediff 7629101: 19.96ms Total: 113.54ms
Timediff 7629102: 19.98ms Total: 133.43ms
Timediff 7629103: 19.97ms Total: 153.31ms
Timediff 7629104: 20.00ms Total: 173.24ms
Timediff 7629105: 19.96ms Total: 193.09ms
Timediff 7629106: 20.02ms Total: 213.02ms
Timediff 7629107: 19.94ms Total: 232.86ms
Timediff 7636162: 16.40ms Total: 34.44ms
Timediff 7636164: 19.90ms Total: 64.00ms

While the duration of 100 usleep should be somewhere between 10ms and 20ms, 10 usleep(1) takes more than 200ms. This behaviour is reproducible.

On my machine, a Core2Duo, a normal 2.5" hard drive with a vanilla kernel 2.6.31.5, there is an equal behaviour. While making a backup from one hard drive to another, the latency jumps to >30ms for one usleep(1) nearby every second. But there are some latencies grater than >150ms for one usleep(1).

Timediff 11054523: 38.23ms Total: 53.19ms
Timediff 11212737: 21.64ms Total: 31.46ms
Timediff 11213557: 35.59ms Total: 44.62ms
Timediff 11213939: 59.88ms Total: 65.76ms
Timediff 11264190: 40.83ms Total: 49.72ms
Timediff 11264709: 53.77ms Total: 63.09ms
Timediff 11265629: 145.74ms Total: 155.96ms
Timediff 11327458: 16.94ms Total: 25.23ms
Timediff 11376430: 18.91ms Total: 27.67ms
Timediff 11408941: 17.67ms Total: 26.36ms
Timediff 11424964: 19.26ms Total: 28.01ms
Timediff 11509722: 19.84ms Total: 28.30ms
Timediff 11627259: 27.01ms Total: 34.51ms
Timediff 11645718: 18.26ms Total: 29.80ms

On my server Athlon X2 on a full encrypted RAID-5 with lvm on a 2.6.18-128.2.1.el5.028stab064.7 (CentOS with OpenVZ) kernel, the behaviour was even worse. While copying a 4GB iso, there are latencies of one usleep(1) up to one second.

Timediff 40397: 24.16ms Total: 122.93ms
Timediff 40417: 859.04ms Total: 981.78ms
Total 40417: 981.78ms
Timediff 45928: 22.62ms Total: 220.16ms
Timediff 50471: 25.02ms Total: 135.80ms
Timediff 51085: 19.23ms Total: 163.03ms
Timediff 51097: 205.12ms...

Created attachment 23618
Simple sleeper test case

As this bug occurs more permanent while working in an virtual machine or while using java and I still think, that's this is a process scheduler bug (or something related). Here another test case, which shows the suspected behaviour. As there are many system calls while using a virtual machine, I have tries to find an equal test. The test case just sleeps for 1µs and measures the time difference of the usleep operation. I am using such many of the usleep operations, as the problems does not occur deterministic and I tried to catch as many as possibly occurrences.

I have run this test case on three machines. The first one was a Core2 Duo with a first generation SDD (OCZ Core Series) with a poor write performance and on a Ubuntu kernel 2.6.31-14-generic. The partitions are block aligned. I have run this test, while my wife was using firefox. Every time, she was submitting something and firefox is using sqlite for writing the history, there was a high latency for the sleep test.

Timediff    7629094:   16.80ms Total:  61.12ms
Timediff    7629100:   18.82ms Total:  93.68ms
Timediff    7629101:   19.96ms Total: 113.54ms
Timediff    7629102:   19.98ms Total: 133.43ms
Timediff    7629103:   19.97ms Total: 153.31ms
Timediff    7629104:   20.00ms Total: 173.24ms
Timediff    7629105:   19.96ms Total: 193.09ms
Timediff    7629106:   20.02ms Total: 213.02ms
Timediff    7629107:   19.94ms Total: 232.86ms
Timediff    7636162:   16.40ms Total:  34.44ms
Timediff    7636164:   19.90ms Total:  64.00ms

While the duration of 100 usleep should be somewhere between 10ms and 20ms, 10 usleep(1) takes more than 200ms. This behaviour is reproducible.

On my machine, a Core2Duo, a normal 2.5" hard drive with a vanilla kernel 2.6.31.5, there is an equal behaviour. While making a backup from one hard drive to another, the latency jumps to >30ms for one usleep(1) nearby every second. But there are some latencies grater than >150ms for one usleep(1).

Timediff   11054523:   38.23ms Total:  53.19ms
Timediff   11212737:   21.64ms Total:  31.46ms
Timediff   11213557:   35.59ms Total:  44.62ms
Timediff   11213939:   59.88ms Total:  65.76ms
Timediff   11264190:   40.83ms Total:  49.72ms
Timediff   11264709:   53.77ms Total:  63.09ms
Timediff   11265629:  145.74ms Total: 155.96ms
Timediff   11327458:   16.94ms Total:  25.23ms
Timediff   11376430:   18.91ms Total:  27.67ms
Timediff   11408941:   17.67ms Total:  26.36ms
Timediff   11424964:   19.26ms Total:  28.01ms
Timediff   11509722:   19.84ms Total:  28.30ms
Timediff   11627259:   27.01ms Total:  34.51ms
Timediff   11645718:   18.26ms Total:  29.80ms

On my server Athlon X2 on a full encrypted RAID-5 with lvm on a 2.6.18-128.2.1.el5.028stab064.7 (CentOS with OpenVZ) kernel, the behaviour was even worse. While copying a 4GB iso, there are latencies of one usleep(1) up to one second.

Timediff      40397:   24.16ms Total: 122.93ms
Timediff      40417:  859.04ms Total: 981.78ms
Total        40417:  981.78ms
Timediff      45928:   22.62ms Total: 220.16ms
Timediff      50471:   25.02ms Total: 135.80ms
Timediff      51085:   19.23ms Total: 163.03ms
Timediff      51097:  205.12ms Total: 360.66ms
Timediff      51160:   47.47ms Total: 422.81ms
Total        51160:  422.81ms
Timediff      51662:   21.93ms Total: 279.08ms
Timediff      51663:   40.87ms Total: 318.58ms
Total        52068:  401.49ms
Timediff      54540:   16.69ms Total: 150.93ms
Timediff      63056:   78.07ms Total: 203.86ms
Timediff      65673:   16.43ms Total: 228.44ms
Timediff      65675:   24.04ms Total: 265.11ms

On all three machines, there were small latencies without any fsync or copy operation. On the machines with the Core2Duo and kernel 2.6.31 the latencies are below 0.2ms and 0.1ms, even while watching a movie or using 100% of the cpu. On the Athlon X2 and the kernel 2.6.18, the latencies are always below 1ms.

A 200ms latency while moving the mouse is noticeable. A delay of 1 second, while moving the mouse, should be the freezes, which many of us notice during copy operations.

Why is the kernel delaying resume of the usleep(1) operation up to one second during a copy operation? Please have a look on this behaviour.

Revision history for this message

In Linux Kernel Bug Tracker #12309, vshader (vshader-linux-kernel-bugs) wrote on 2009-11-09:

#573

I also had problem with system latency with high I/O usage. After applying patch from #397 to kernel 2.6.31.5, the problem became really smaller. Before patching, machine were sometimes freezing for more than 5 minutes. Now, maximum latency delay is less than half-second.

Revision history for this message

In Linux Kernel Bug Tracker #12309, zenith22.22.22 (zenith22.22.22-linux-kernel-bugs) wrote on 2009-11-18:

#574

I have the same issue on a machine with i845e chipset, P4-1.5 Northwood, 2GB DDR RAM, GF6800 video and Audigy2 sound card. My main HDD is 160GB IDE Seagate.

When there is disk activity the system becomes virtually unusable.

For example, when I am burning a DVD on the drive attached to SII 3512 SATA controller, the CPU load goes from 40% at 7-8x to 98% at 16x.

Downloading Fedora12 ISO last night at 500 kb/s kept system busy at 90%!

If I start kernel compile, CPU load is stable 100%, which is Okay, but switching tabs in Firefox takes 10 seconds and starting any application like JUK, Dolphin, Konsole takes up to 1 minute.

Running Fedora11 with 2.6.30.9.96 FC11 i686 PAE kernel.

The system has become a bit more responsive (by about 10-20%) since I noticed p4-clockmod was being loaded and shut it down.

Revision history for this message

In Linux Kernel Bug Tracker #12309, ylalym (ylalym-linux-kernel-bugs) wrote on 2009-12-08:

#575

There are not enthusiastic comments after an output 2.6.32. I understand so - "And cartful and now there"

Revision history for this message

sbec67 (sbec) wrote on 2009-12-24:

#1

all Logs Edit (19.5 KiB, application/x-tar)

Revision history for this message

Dexter (pogany-tamas+bug) wrote on 2009-12-24:

#2

I have this problem too, check my logs. My USB 2.0 pendrive's speed is about 3-5 MB/s. Soooo slow.

Revision history for this message

sbec67 (sbec) wrote on 2010-01-02:

#3

did someone took a look on this ?
This Bug is really annoying, as it takes ages to bring some Data on a USB Device ;-(

Regards

Colin Ian King (colin-king) on 2010-01-12

Changed in linux (Ubuntu):
status:	New → In Progress
assignee:	nobody → Colin King (colin-king)

Revision history for this message

Colin Ian King (colin-king) wrote on 2010-01-12:

#4

@sbec67, can you run the following command to collect all the system specific information about your computer to help us diagnose this bug:

apport-collect 500069

please can you supply the model number of the pendrive and if you are using it via a USB hub or not.

Thanks!

Changed in linux (Ubuntu):
importance:	Undecided → Medium
importance:	Medium → High

Andy Whitcroft (apw) on 2010-01-13

tags:

added: kernel-series-unknown

Revision history for this message

sbec67 (sbec) wrote on 2010-01-14:

#5

@Colin: I entered "sudo apport-collect 500069"
the Device i facing the Problem is a Tech Line DFS-1002 MP3 Player.
But it seems the Problem happens with almost any Flash Memory USB Device.

@Andy: GRUB uses ( out of /boot/gruub/menu.lsi ) this Kernel Parm to boot:

kernel /boot/vmlinuz-2.6.31-17-generic root=UUID=96f8feb3-e65b-4696-b8f
5-c32638cde861 ro quiet splash

Kind regards
Simon

Revision history for this message

Cklein (pablo-cascon) wrote on 2010-01-15: apport-collect data

#6

AplayDevices:
**** List of PLAYBACK Hardware Devices ****
card 0: Intel [HDA Intel], device 0: ALC272 Analog [ALC272 Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
Architecture: i386
ArecordDevices:
**** List of CAPTURE Hardware Devices ****
card 0: Intel [HDA Intel], device 0: ALC272 Analog [ALC272 Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/controlC0: pablo 1645 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
Card hw:0 'Intel'/'HDA Intel at 0xf0340000 irq 22'
   Mixer name : 'Realtek ALC272'
   Components : 'HDA:10ec0272,144dca00,00100001'
   Controls : 19
   Simple ctrls : 12
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=UUID=c1938014-8689-4db3-add3-99b0c514b946
MachineType: SAMSUNG ELECTRONICS CO., LTD. NC10
Package: linux (not installed)
ProcCmdLine: root=UUID=5f964d43-b8cd-4886-a6e5-80376a5eb7b5 ro quiet splash
ProcEnviron:
SHELL=/bin/bash
LANG=es_ES.UTF-8
ProcVersionSignature: Ubuntu 2.6.31-17.54-generic
RelatedPackageVersions:
linux-backports-modules-2.6.31-17-generic N/A
linux-firmware 1.25
RfKill:
0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
Tags: ubuntu-unr
Uname: Linux 2.6.31-17-generic i686
UserGroups: adm admin cdrom dialout fuse lpadmin plugdev sambashare
WpaSupplicantLog:

dmi.bios.date: 09/08/2009
dmi.bios.vendor: Phoenix Technologies Ltd.
dmi.bios.version: 11CA.M015.20090908.RHU
dmi.board.name: NC10
dmi.board.vendor: SAMSUNG ELECTRONICS CO., LTD.
dmi.board.version: Not Applicable
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: SAMSUNG ELECTRONICS CO., LTD.
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLtd.:bvr11CA.M015.20090908.RHU:bd09/08/2009:svnSAMSUNGELECTRONICSCO.,LTD.:pnNC10:pvrNotApplicable:rvnSAMSUNGELECTRONICSCO.,LTD.:rnNC10:rvrNotApplicable:cvnSAMSUNGELECTRONICSCO.,LTD.:ct10:cvrN/A:
dmi.product.name: NC10
dmi.product.version: Not Applicable
dmi.sys.vendor: SAMSUNG ELECTRONICS CO., LTD.

AplayDevices:
 **** List of PLAYBACK Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC272 Analog [ALC272 Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
Architecture: i386
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC272 Analog [ALC272 Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER        PID ACCESS COMMAND
 /dev/snd/controlC0:  pablo      1645 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xf0340000 irq 22'
   Mixer name	: 'Realtek ALC272'
   Components	: 'HDA:10ec0272,144dca00,00100001'
   Controls      : 19
   Simple ctrls  : 12
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=UUID=c1938014-8689-4db3-add3-99b0c514b946
MachineType: SAMSUNG ELECTRONICS CO., LTD. NC10
Package: linux (not installed)
ProcCmdLine: root=UUID=5f964d43-b8cd-4886-a6e5-80376a5eb7b5 ro quiet splash
ProcEnviron:
 SHELL=/bin/bash
 LANG=es_ES.UTF-8
ProcVersionSignature: Ubuntu 2.6.31-17.54-generic
RelatedPackageVersions:
 linux-backports-modules-2.6.31-17-generic N/A
 linux-firmware 1.25
RfKill:
 0: phy0: Wireless LAN
 	Soft blocked: no
 	Hard blocked: no
Tags:  ubuntu-unr
Uname: Linux 2.6.31-17-generic i686
UserGroups: adm admin cdrom dialout fuse lpadmin plugdev sambashare
WpaSupplicantLog:
 
dmi.bios.date: 09/08/2009
dmi.bios.vendor: Phoenix Technologies Ltd.
dmi.bios.version: 11CA.M015.20090908.RHU
dmi.board.name: NC10
dmi.board.vendor: SAMSUNG ELECTRONICS CO., LTD.
dmi.board.version: Not Applicable
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: SAMSUNG ELECTRONICS CO., LTD.
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLtd.:bvr11CA.M015.20090908.RHU:bd09/08/2009:svnSAMSUNGELECTRONICSCO.,LTD.:pnNC10:pvrNotApplicable:rvnSAMSUNGELECTRONICSCO.,LTD.:rnNC10:rvrNotApplicable:cvnSAMSUNGELECTRONICSCO.,LTD.:ct10:cvrN/A:
dmi.product.name: NC10
dmi.product.version: Not Applicable
dmi.sys.vendor: SAMSUNG ELECTRONICS CO., LTD.

tags:

added: apport-collected

Revision history for this message

David Turner (dkturner) wrote on 2010-01-20: Re: since Ubuntu karmic Filetransfer to some USB Drives got realy slow

#23

I've also experienced very slow USB transfers, with CPU waiters piling up. The odd thing is that USB-to-USB transfers go at about four times the speed of HDD-to-USB transfers. On average I get 2MB/s from hard drive to USB, and about 8MB/s copying between flash drives.

Revision history for this message

sbec67 (sbec) wrote on 2010-01-23:

#24

its correct.. this Report is a duplicate of #197762
But #197762 has Status "incomplete", and is Medium. & almost 1 1/2 Years old !

I think this Bug should be High.. its realy a Pain in the A..... for many Users.
Regards
Sbec

Revision history for this message

Simon Holm (odie-cs) wrote on 2010-01-24:

#25

sbec67:
Have you tried your device with another OS with different performance results?

From your lsusb output is does not look like your device is listed? Was it plugged in when you ran lsusb?

If that doesn't turn up anything obvious you should probably see https://help.ubuntu.com/community/DiskPerformance and test whether the raw device performance is better than when formatted. Backup your data on the device first.

dd numbers with bs=1M in addition to the bs=32 would also be interesting.

Revision history for this message

MMS-Prodeia (mms-prodeia) wrote on 2010-01-24:

#26

I do second this report!

Revision history for this message

sbec67 (sbec) wrote on 2010-01-27:

#27

Simon Holm Thøgersen:
it was connected, and it also shows on lsusb
Bus 001 Device 004: ID 0402:5661 ALi Corp

its a Bug, witch hit many Users.
Do a google...... just one Talk:
http://ubuntuforums.org/showthread.php?t=1306333

i passed all Data with the apport-collect command.. Unit was connected while doing this command.

i did a other Test with a Sony 4GB USB stick.. there i dont see this Problem...
so it seems that that Problem happens only with certain Flash Drives
Regards

Mudstone (agovernali) on 2010-01-29

description:

updated

Revision history for this message

Mathias Zubek (mathias-zubek) wrote on 2010-02-03:

#28

Hello,

I have this problem too....
2 Installations: My PC is an Intel MoBo DG965WH; my Laptop is an ThinkpadT60.
Both with Karmnic 9.10 and patches are up to date. Maximum Write speed after transferring more than 150MB slows down to 3,5MB/s.
Adding ehci_hcd to /etc/initramfs-tools/modules as recommended in some posts was not helpful.

Thanks in advance

Revision history for this message

In Red Hat Bugzilla #562662, Hedayat (hedayat-redhat-bugs) wrote on 2010-02-07:

#138

Description of problem:
I was trying to copy a very large file (3.3 Gb iso of Fedora x86_64) to a flash drive, and after copying a few hundred megabytes (e.g. 300MB) the copying speed slowed down to around 500KB/s and it was going even lower gradually. It was really unacceptable (the remaining time was climbing up to over 1 hour and 30 minutes using nautilus to copy the file). I've tried different flash drives and different USB ports of my laptop, and there was no significant difference. I encountered this slow copying in past, but usually I've ignored it as the files was not that big. But this time it was really disappointing and I've decided to see why it is so slow.

iotop showed slow disk writes (it was often 0, jumping to some low values (usually less than 500KB/s) and lowering to 0 again) and lots of IO waiting time for pdflush. After searching a little in the Internet, I found that playing with dirty_ratio and dirty_background_ratio kernel parameters in /proc/sys/vm/ might help.

First, I lowered the dirty_background_ratio to 1 (default value is 10), and the disk write speed jumped to around 2.5 MB/s for awhile (around 1 or a few minutes) and then the speed dropped again to around 0.

Finally, I lowered the dirty_ratio parameter to 10 (default value is 20), and it resulted in a constant 2.5MB/s to 3MB/s disk read (from hard disk) and write (to usb disk) speed to the end of the copy operation (which took a few minutes rather than more than an hour!).

The problem is so weird and unacceptable.

As it might help, I have 1.5GBs of RAM and at the time of copy around 44% of it was used by running programs.

Version-Release number of selected component (if applicable):
kernel-2.6.31.12-174.2.3.fc12.x86_64
but the problem was observed in previous F12 kernels too.

How reproducible:
100% in my few tests.

Steps to Reproduce:
1. Start copying a large file (e.g. over 2GB) to a regular USB flash drive. Use a single big file rather than many small files.
2. Observe the copying speed

Actual results:
Very slow copying operation (around 450KB/s in my case)

Expected results:
Regular copy speed (2.5MB/s seems to be possible with my hardware and the mentioned settings)

Additional info:
I don't know if it helps but: the source file was on an NTFS mounted partition, and both of the partitions were mounted by Gnome.
The problem might be related to the mount options used by gnome, but doesn't seem to be related to nautilus. Since I get slow speeds even using dd to copy the file.

Description of problem:
I was trying to copy a very large file (3.3 Gb iso of Fedora x86_64) to a flash drive, and after copying a few hundred megabytes (e.g. 300MB) the copying speed slowed down to around 500KB/s and it was going even lower gradually. It was really unacceptable (the remaining time was climbing up to over 1 hour and 30 minutes using nautilus to copy the file). I've tried different flash drives and different USB ports of my laptop, and there was no significant difference. I encountered this slow copying in past, but usually I've ignored it as the files was not that big. But this time it was really disappointing and I've decided to see why it is so slow.

iotop showed slow disk writes (it was often 0, jumping to some low values (usually less than 500KB/s) and lowering to 0 again) and lots of IO waiting time for pdflush. After searching a little in the Internet, I found that playing with dirty_ratio and dirty_background_ratio kernel parameters in /proc/sys/vm/ might help.

First, I lowered the dirty_background_ratio to 1 (default value is 10), and the disk write speed jumped to around 2.5 MB/s for awhile (around 1 or a few minutes) and then the speed dropped again to around 0.

Finally, I lowered the dirty_ratio parameter to 10 (default value is 20), and it resulted in a constant 2.5MB/s to 3MB/s disk read (from hard disk) and write (to usb disk) speed to the end of the copy operation (which took a few minutes rather than more than an hour!).

The problem is so weird and unacceptable.

As it might help, I have 1.5GBs of RAM and at the time of copy around 44% of it was used by running programs.

Version-Release number of selected component (if applicable):
kernel-2.6.31.12-174.2.3.fc12.x86_64 
but the problem was observed in previous F12 kernels too.

How reproducible:
100% in my few tests.

Steps to Reproduce:
1. Start copying a large file (e.g. over 2GB) to a regular USB flash drive. Use a single big file rather than many small files.
2. Observe the copying speed
  
Actual results:
Very slow copying operation (around 450KB/s in my case)

Expected results:
Regular copy speed (2.5MB/s seems to be possible with my hardware and the mentioned settings)

Additional info:
I don't know if it helps but: the source file was on an NTFS mounted partition, and both of the partitions were mounted by Gnome.
The problem might be related to the mount options used by gnome, but doesn't seem to be related to nautilus. Since I get slow speeds even using dd to copy the file.

Revision history for this message

sbec67 (sbec) wrote on 2010-02-22:

#29

@colin Mins: any new on this, as its assigned to U ?

Rgerads
Sbec

Revision history for this message

In Linux Kernel Bug Tracker #12309, spawels13 (spawels13-linux-kernel-bugs) wrote on 2010-02-28:

#576

Created attachment 25281
perf chart high io latency

I am using 2.6.33 kernel and this problem is still present. When I copy big file (few GB) system becomes unresponsive. I ran perf chart and generated svg image. You can notice Plasma-desktop (part of KDE) is blocked for long time by IO. I copied file from the ntfs partition, but it also happens when I am copying big files over my Linux partition or from hard drive to pendrive.

Revision history for this message

In Linux Kernel Bug Tracker #12309, cmertes (cmertes-linux-kernel-bugs) wrote on 2010-02-28:

#577

(In reply to comment #416)
> I am using 2.6.33 kernel and this problem is still present.

Yep, this definitely earns the Most Embarrassing Linux Bug Award 2009 and is a Nominee for Most Annoying Linux Bug 2009 although the ATI binary driver wins in this category. Call me unfair for allowing binary blobs.

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2010-02-28:

#578

I will agree that something still isn't right with the VM. In my uninformed opinion, it does seem to be far too eager to swap out executable page in favor of streaming pages. Unfortunately, it seems that very few people know the VM well enough to fix it.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2010-03-16:

#579

I am currently using the linux kernel 2.6.33 and the the desktop responsiveness is awful on my machine compared to the 2.6.32.x kernel. It's even worse than I have even seen it before. The load avg is rising to >7 very quickly, while writing many small file to the filesystem. I can make some tests with my configuration, but a kernel developer should tell me which tests.

Revision history for this message

In Linux Kernel Bug Tracker #12309, akpm (akpm-linux-kernel-bugs) wrote on 2010-03-16:

#580

(In reply to comment #419)
> I am currently using the linux kernel 2.6.33 and the the desktop
> responsiveness
> is awful on my machine compared to the 2.6.32.x kernel. It's even worse than
> I
> have even seen it before. The load avg is rising to >7 very quickly, while
> writing many small file to the filesystem. I can make some tests with my
> configuration, but a kernel developer should tell me which tests.

This isn't really the best place to bring this up. Please send a full description to <email address hidden>. cc myself, Ingo Molnar <email address hidden>, Peter Zijlstra <email address hidden>, Jens Axboe <email address hidden>. In that email, please identify what the system is doing at the time. Is it disk-related? CPU scheduler related? etc.

Thanks.

Revision history for this message

In Linux Kernel Bug Tracker #12309, frankrq2009 (frankrq2009-linux-kernel-bugs) wrote on 2010-03-31:

#581

Download full text (8.4 KiB)

Gentlemen,
I have suffered the high iowait problem for almost 4 years, and I have been watching the bug report(Bug 12309) on bugzilla.kernel.org for 1 year,
and yesterday I finally managed to get out of this trouble by switching from CentOS 5.4(with kernel 2.6.18) to zenwalk 6.2(with a snapshot kernel 2.6.32.2).
The computer is used to collect signal data from 4 gas turbines in a power plant. The project started from 2004,and we used mandrake 9 and zenwalk, both are 2.4.x kernel,and there was no high iowait problems. Since 2006 we switched to fedore 6(kernel 2.6.18) and then centos 5, and the iowait began to make trouble, the system's response of mouse and keyboard became very slow, new applications needed a long time to be launched. During these years, I always thought the main reason of this was because the computer's hardware was not good enough. But early this month, the plant has upgraded the computer to a new Lenovo server with two Xeon E5504 CPUs(total 8 cores), and 4GB memory,but the iowait is still very very high, the following is the output of "top" command on that machine:

Tasks: 215 total, 1 running, 213 sleeping, 0 stopped, 1 zombie
Cpu0 : 1.0%us, 0.3%sy, 0.0%ni, 65.9%id, 32.8%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 1.0%us, 3.6%sy, 0.0%ni, 45.0%id, 50.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 1.0%us, 4.0%sy, 0.0%ni, 94.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu3 : 1.3%us, 3.3%sy, 0.0%ni, 56.3%id, 38.3%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu4 : 1.3%us, 6.7%sy, 0.0%ni, 0.0%id, 89.7%wa, 0.7%hi, 1.7%si, 0.0%st
Cpu5 : 0.3%us, 3.3%sy, 0.0%ni, 91.7%id, 0.0%wa, 0.7%hi, 4.0%si, 0.0%st
Cpu6 : 10.3%us, 30.2%sy, 0.0%ni, 50.2%id, 2.3%wa, 1.0%hi, 6.0%si, 0.0%st
Cpu7 : 1.3%us, 8.6%sy, 0.0%ni, 83.1%id, 4.0%wa, 1.0%hi, 2.0%si, 0.0%st
Mem: 4078540k total, 3872720k used, 205820k free, 182344k buffers
Swap: 4192956k total, 0k used, 4192956k free, 2815596k cached

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3841 markv 15 0 72172 12m 8380 S 42.2 0.3 1984:24 lvinf
8573 markv 15 0 60232 12m 8876 S 11.6 0.3 0:17.22 mark
4067 markv 15 0 19056 3224 2336 S 10.6 0.1 759:52.00 dms
3548 mysql 21 0 656m 617m 9292 S 9.0 15.5 764:42.05 mysqld
27042 markv 15 0 69404 12m 8756 S 4.3 0.3 290:36.14 walin
3810 root 15 0 39772 15m 8224 S 1.3 0.4 3:59.76 Xorg
    1 root 15 0 2068 620 532 S 0.0 0.0 0:01.19 init
    2 root RT -5 0 0 0 S 0.0 0.0 0:00.04 migration/0
    3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
    4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
    5 root RT -5 0 0 0 S 0.0 0.0 0:00.02 migration/1
    6 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1
    7 root ...

Gentlemen,
	I have suffered the high iowait problem for almost 4 years, and I have been watching the bug report(Bug 12309) on bugzilla.kernel.org for 1 year,
and yesterday I finally managed to get out of this trouble by switching from CentOS 5.4(with kernel 2.6.18) to zenwalk 6.2(with a snapshot kernel 2.6.32.2). 
	The computer is used to collect signal data from 4 gas turbines in a power plant. The project started from 2004,and we used mandrake 9 and zenwalk, both are 2.4.x kernel,and there was no high iowait problems. Since 2006 we switched to fedore 6(kernel 2.6.18) and then centos 5, and the iowait began to make trouble, the system's response of mouse and keyboard became very slow, new applications needed a long time to be launched. During these years, I always thought the main reason of this was because the computer's hardware was not good enough. But early this month, the plant has upgraded the computer to a new Lenovo server with two Xeon E5504 CPUs(total 8 cores), and 4GB memory,but the iowait is still very very high, the following is the output of "top" command on that machine:

Tasks: 215 total,   1 running, 213 sleeping,   0 stopped,   1 zombie
Cpu0  :  1.0%us,  0.3%sy,  0.0%ni, 65.9%id, 32.8%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  1.0%us,  3.6%sy,  0.0%ni, 45.0%id, 50.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  1.0%us,  4.0%sy,  0.0%ni, 94.7%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu3  :  1.3%us,  3.3%sy,  0.0%ni, 56.3%id, 38.3%wa,  0.0%hi,  0.7%si,  0.0%st
Cpu4  :  1.3%us,  6.7%sy,  0.0%ni,  0.0%id, 89.7%wa,  0.7%hi,  1.7%si,  0.0%st
Cpu5  :  0.3%us,  3.3%sy,  0.0%ni, 91.7%id,  0.0%wa,  0.7%hi,  4.0%si,  0.0%st
Cpu6  : 10.3%us, 30.2%sy,  0.0%ni, 50.2%id,  2.3%wa,  1.0%hi,  6.0%si,  0.0%st
Cpu7  :  1.3%us,  8.6%sy,  0.0%ni, 83.1%id,  4.0%wa,  1.0%hi,  2.0%si,  0.0%st
Mem:   4078540k total,  3872720k used,   205820k free,   182344k buffers
Swap:  4192956k total,        0k used,  4192956k free,  2815596k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                          
 3841 markv     15   0 72172  12m 8380 S 42.2  0.3   1984:24 lvinf                            
 8573 markv     15   0 60232  12m 8876 S 11.6  0.3   0:17.22 mark                             
 4067 markv     15   0 19056 3224 2336 S 10.6  0.1 759:52.00 dms                              
 3548 mysql     21   0  656m 617m 9292 S  9.0 15.5 764:42.05 mysqld                           
27042 markv     15   0 69404  12m 8756 S  4.3  0.3 290:36.14 walin                            
 3810 root      15   0 39772  15m 8224 S  1.3  0.4   3:59.76 Xorg                             
    1 root      15   0  2068  620  532 S  0.0  0.0   0:01.19 init                             
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.04 migration/0                      
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0                      
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0                       
    5 root      RT  -5     0    0    0 S  0.0  0.0   0:00.02 migration/1                      
    6 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/1                      
    7 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/1                       
    8 root      RT  -5     0    0    0 S  0.0  0.0   0:00.01 migration/2                      
    9 root      34  19     0    0    0 S  0.0  0.0   0:00.01 ksoftirqd/2                      
   10 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/2                       
   11 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/3                      
   12 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/3                      
   13 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/3                       
   14 root      RT  -5     0    0    0 S  0.0  0.0   0:00.09 migration/4                      
   15 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/4                      
   16 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/4                       
   17 root      RT  -5     0    0    0 S  0.0  0.0   0:00.03 migration/5                      
   18 root      36  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/5                      
[markv@markgt ~]$

All our application's job is to insert 16 records per second(every record is a fixed 12 bytes in 3 fields) into a mysql database, the storage is a LVM consisted of two 750GB seagate SATA 7200RPM disks. I am sure this high iowait was not caused by other things like network cards or video card, because I have experimented to comment out only the mysql inserting lines from our source code, and the system iowait would drop to 0, GUI would become very responsive.
    It also has nothing to do with the io scheduler,because I had tested deadline and noop on CentOS 5.4 and iowait could not be reduced. I also tried to enlarge the /sys/block/sda/queue/nr_requests, and it does not work.      
	I got information from this bugzilla report that kernel 2.6.32 has fixed this high iowait problem, and I tested the snapshot kernel 2.6.32.2 of zenwalk on my notebook, and found the high iowait is gone, so yesterday I installed the zenwalk 6.2 with the 2.6.32.2 kernel on that server, although the kernel only detected/used one Xeon CPU and 2GB memory, the iowait is very low and the whole system became very fast, only several seconds iowait would reach to 30%-40%, and then dropped back to 0 very soon. 
By the way, the io scheduler is cfq. The following is "top" output of it:

Tasks: 157 total,   2 running, 155 sleeping,   0 stopped,   0 zombie
Cpu0  : 12.3%us,  7.8%sy,  0.0%ni, 77.3%id,  0.0%wa,  0.0%hi,  2.6%si,  0.0%st
Cpu1  : 11.3%us,  8.4%sy,  0.0%ni, 76.1%id,  0.0%wa,  0.0%hi,  4.2%si,  0.0%st
Cpu2  :  5.2%us,  7.2%sy,  0.0%ni, 84.0%id,  0.0%wa,  0.3%hi,  3.3%si,  0.0%st
Cpu3  :  8.1%us,  7.7%sy,  0.0%ni, 81.3%id,  0.0%wa,  0.0%hi,  2.9%si,  0.0%st
Mem:   2272368k total,  1153508k used,  1118860k free,    79384k buffers
Swap:  4192956k total,        0k used,  4192956k free,   797568k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                        
3085 mysql     40   0  453m  68m 4864 S   35  3.1  12:25.12 mysqld                         
3203 markv     40   0 77852  17m  11m S   24  0.8   7:53.75 mark                           
2684 root      40   0 16440 2896 2144 S    9  0.1  11:03.99 dms                            
3879 markv     40   0 42256  12m 9336 S    4  0.6   1:43.26 walin                          
1520 root      40   0  4156 1232  972 S    0  0.1   0:00.06 ntpd                           
3235 root      40   0 64164  29m   9m S    0  1.3   0:50.78 X                              
3885 markv     40   0  2452 1180  892 R    0  0.1   0:02.08 top                            
   1 root      40   0   804  332  292 S    0  0.0   0:00.90 init                           
   2 root      40   0     0    0    0 S    0  0.0   0:00.00 kthreadd                       
   3 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/0                    
   4 root      20   0     0    0    0 S    0  0.0   0:00.05 ksoftirqd/0                    
        We didn't change the storage, it is still that two seagate disks LVM with 4 years turbine data in them. 
	I found in kernel 2.6.32.8 the high iowait is back. How do I know
that? When I copy a 700MB avi file from my notebook disk to a 3.5" usb
mobile disk, I found the reading side disk LED start to falsh quickly and immediately, but the writing side disk LED will keep still for a long time(like 25-30 seconds), and then start to flash slowly,and the course is abnormally long and low responsive. The kernel 2.6.32.2 is the only 2.6 kernel(since 2.6.18) on which I found both of the reading and writing side disk LED will start to falsh quickly and immediately.There must be somthing wrong with the write cache behavior which will cause the high iowait, and it has been fixed in 2.6.32.2 and brought back in 2.6.32.8.
	So by copying a big file to an usb disk and watching the disk LED, it can be used as a method for the kernel developers to reproduce and observe this bug. I hope this may be helpful.
        I noticed Mr. Morton said this is not the best place to discuss, but the linux-kernel@vger.kernel.org rejected two of my emails from two different email account. So I mailed and cc-ed the email, and also post it here, to make more people can share my experience.
	
Best regards,
Frank Ren

Revision history for this message

Badcam (kiwicameron+launchpad) wrote on 2010-05-09:

#30

I believe that I have this very same issue on my Mint 8.0 distro. I have two 5 port USB 2.0 hubs and only 1 USB port in either device is showing as a USB 2.0, with all the rest showing 1.1 I've checked the hardware and every port is supposed to be 2.0

Running a "sudo dmesg | grep usb" swwms to show that all the devices are recognised as 2.0 but somehow being limited to 1.1:

[156037.796028] usb 10-2: new full speed USB device using uhci_hcd and address 5
[156037.941105] usb 10-2: not running at top speed; connect to a high speed hub
[156037.979263] usb 10-2: configuration #1 chosen from 1 choice

I have not attached a patch, just my Terminal info.

Revision history for this message

aleth (aleth) wrote on 2010-05-15:

#31

I also have this issue when writing to a 4Gb USB stick. Transfer rates start high, then slow to a crawl or even stall completely. As they slow down, the whole computer becomes unresponsive for seconds at a time.
On the same machine, the same USB stick is working fine under Windows XP; read access is also no problem.

Revision history for this message

aleth (aleth) wrote on 2010-05-15:

#32

Just in case, it might be worth pointing out there are many more reports of this problem at http://ubuntuforums.org/showthread.php?t=1306333&page=12

Revision history for this message

André Desgualdo Pereira (desgua) wrote on 2010-05-16:

#33

This is one workaround until it's got fixed: update the kernel.

Go to http://kernel.ubuntu.com/~kernel-ppa/mainline and choose a recent linux-image file. Download and install.

Changed in linux (Ubuntu):
status:	In Progress → Confirmed

Revision history for this message

In Linux Kernel Bug Tracker #12309, dik_again (dikagain-linux-kernel-bugs) wrote on 2010-05-18:

#583

Im using mandriva 2010 with kernel 2.6.33-rc5

The freeze ist huge. System becomes unusable at every small disk activity (for example sudo urpmi blackbox).

with kernels 2.6.31 2.6.32 is the problem too. Other kernels was not tested.

Please, reopen the bug. It is a huge problem for many people.

Revision history for this message

In Linux Kernel Bug Tracker #12309, cmertes (cmertes-linux-kernel-bugs) wrote on 2010-05-18:

#584

It *is* a huge problem indeed. I kinda got used to it but it feels like in the 80s. I still have a Windows in a 10GB corner of my HDD which I use very rarely but every time I do it feels like a miracle to see what these modern computers are able to do when they don't run a f*cked up kernel :/

Revision history for this message

In Linux Kernel Bug Tracker #12309, Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote on 2010-05-18:

#585

One angle to tackle this would be those who don't suffer from this bug, what kind of kernel (and with what parameters) and hardware they're running. Since this seem to affect a wide range of people and setups, could be intresting... but also a huge undertake

Revision history for this message

In Linux Kernel Bug Tracker #12309, dik_again (dikagain-linux-kernel-bugs) wrote on 2010-05-18:

#586

Probably is this bug triggered by GCC compiler? Have anyone tryed to compile a 2.6.30-2.6.33 with a earlyer gcc version?

Revision history for this message

In Linux Kernel Bug Tracker #12309, akatopaz (akatopaz-linux-kernel-bugs) wrote on 2010-05-18:

#587

Could be interesting, but I've read some comment whom writers had tried to isolate two consecutive kernel versions surrounding the bug.

At last, it might be quicker, but quite boring for the operator, to try a laggy scenario with many different kernel versions catching the bug by dichotomia ?

We might also distribute the effort between ourselves. I propose something like this: everybody uses the same kernel version which exhibit the bug, and try the same laggy scenario amongst a set of kernel version. Let's play with 4 version each, to cover the 2.6.x revisions, should not take so long. Who is volonteer ?

Have a nice day.
Topaz.

Revision history for this message

In Linux Kernel Bug Tracker #12309, Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote on 2010-05-18:

#588

Topaz, you'll have to explain the meaning of catching the bug by dichotomia. I wonder if running with a "barebone" kernel can trigger this bug?

Revision history for this message

In Linux Kernel Bug Tracker #12309, akatopaz (akatopaz-linux-kernel-bugs) wrote on 2010-05-18:

#589

I'm currently running on Ubuntu Lucid, and I've noticed the bug since the Jaunty release (Intel x86 centrino platform, with a core 2 duo and on two different machines, both contaminated).
When I first had some poor performance problems, I've tried to compile the vanilla kernel by myself and it resulted in a failure, the vanilla kernel 2.6.30 was also affected by this bug.
My plans are to establish a laggy scenario, and to compile all version of 2.6.x kernel, and to test them all against my laggy scenario. Should not take that long, but the more the merrier :)

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2010-05-18:

#590

One clear angle that has not been investigated by kernel developers is that this issue is highlighted by 64-bit code. I don't see this lag and high IO wait in 32-bit kernel. I have a laptop with 2GB RAM and I got so sick of the lag that I have gone back to 32-bit kernel and userspace.

And the speed difference is amazing to say the least. No more stuck mouse and no more waiting to see that konsole window pop up. Everything is much faster. Feels like a new laptop. And this is a 2Ghz core2duo based T61, not a slow hardware by any means!

And I get extra 300-400MB of RAM back (YES! that's what you are reading!) by just switching to 32-bit system. 64-bit C++ apps like firefox and KDE eat almost twice the RAM. Firefox is running at 250MB with the same number of tabs and windows as was the 64-bit system, where it was consuming (RSS) about 450MB. Go figure!

I am running a Virtualbox copy of XP on the laptop and I still don't see swap kick in. With 64-bit, running firefox and XP in VB at the same time would lead to heavy swapping and things would be crawling!

So much for advancement to 64-bit! I have been running 64-bit systems for 4 years now and switching to 32-bit feels like I was living under a rock!

I know all this sounds backwards. But give it a try.

Revision history for this message

In Linux Kernel Bug Tracker #12309, nalimilan (nalimilan-linux-kernel-bugs) wrote on 2010-05-18:

#591

Frank Ren: If you are sure the bug doesn't happen with 2.6.32.2, but with all other releases you could test, then you should try to find what has changed in it. Were you always running vanilla upstream kernels? Or always kernels from your distribution? Built on the same machine with the same compiler? If so, then have a look at the changelog from 2.6.32.2 to 2.6.32.8, looking for the culprit. I'd suggest you try 2.6.32.3 and check if the bug is there; and if not, increase the minor version until you get it: that will make the changelog really small. Then, send a mail to LKML with your findings.

You seem to be the reporter with the most precise informations out there, you may catch something interesting!

devsk: Beware not to be misled by the swapping behavior of your system. If you're often completely filling your RAM when on 64bits, then swapping may hurt responsiveness badly. When moving to 32bits, if you gain 300MB, you may not suffer from this because there's free RAM, but that's not really linked with a 64bit-only bug.

Revision history for this message

In Linux Kernel Bug Tracker #12309, cmertes (cmertes-linux-kernel-bugs) wrote on 2010-05-18:

#592

(In reply to comment #430)
> One clear angle that has not been investigated by kernel developers is that
> this issue is highlighted by 64-bit code. I don't see this lag and high IO
> wait
> in 32-bit kernel.

I do. I share your opinion on RAM use though but it surely doesn't belong here. The bug itself is definitely not restricted to 64-bit systems.

Revision history for this message

In Linux Kernel Bug Tracker #12309, perlover (perlover-linux-kernel-bugs) wrote on 2010-05-18:

#593

Anybody, please read this comment:
https://bugzilla.kernel.org/show_bug.cgi?id=13347#c59
I think there is the worthwhile suggestion.

Revision history for this message

In Linux Kernel Bug Tracker #12309, l.wandrebeck (l.wandrebeck-linux-kernel-bugs) wrote on 2010-05-19:

#594

I'm really unsure CFQ is the (only ?) culprit.
I've met the same behaviour using deadline and a 3ware 9650, and the fix was a completely other thing (pci_set_mwi).
See https://bugzilla.redhat.com/show_bug.cgi?id=444759 for more details.

Revision history for this message

In Linux Kernel Bug Tracker #12309, Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote on 2010-05-19:

#595

I'm gonna chip in my experiences:
I've had this bug with both 32 and 64 bits of the kernel.
setting different schedulers didnt make a difference.
I've tried different versions of the kernel with no luck (though i haven't tried specifically 2.6.32-2).

Revision history for this message

In Linux Kernel Bug Tracker #12309, dik_again (dikagain-linux-kernel-bugs) wrote on 2010-05-19:

#596

I have a 32bit system. The bug is almost there.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2010-05-19:

#597

This bug depends on cpu, memory and first of all disc and filesystem, lvm and encryption. It's a mix of transactions/s and throughput. If both are in a system dependent range, the problem starts.
There is no throughput/transaction statistic for processes in the scheduler to disadvantage processes, which are causing a high load. A process can gain all available dirty pages and block the other processes.

Revision history for this message

In Linux Kernel Bug Tracker #12309, ivan1986 (ivan1986-linux-kernel-bugs) wrote on 2010-05-19:

#598

I update from 2.6.32 to 2.6.34 and bug fixed on two computers

on vmstat
wa take full free time, but interface not freeze

may give all need info and try build any version from git for test

Revision history for this message

In Linux Kernel Bug Tracker #12309, ruslan (ruslan-linux-kernel-bugs) wrote on 2010-05-19:

#599

2 topaz (#429)
Ready to join. It would be nice to determine the methods of testing: be advised that some of the methods. At the kernel of the current Lucid 2.6.32 bug reproduced.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2010-05-19:

#600

That's the problem. There is no reliable method for testing.

Revision history for this message

In Linux Kernel Bug Tracker #12309, ruslan (ruslan-linux-kernel-bugs) wrote on 2010-05-19:

#601

I know about using dd. In addition I can move really big files (files in 4-7Gb). My database on the server really tiny and there I can easily reproduce the bug (the system of Hardy 32bit, 2.6.24-19-server) is copying the archives of sites and virtual machines.

Further, in the office, I use Lucid (2.6.32-22-386, 32bit), but at home Fedora12 (32bit).

But this is all subjective:

Revision history for this message

In Linux Kernel Bug Tracker #12309, ivan1986 (ivan1986-linux-kernel-bugs) wrote on 2010-05-19:

#602

after update to 34

ivan1986@ivan1986:~/$ dd if=/dev/zero of=testfile.1gb bs=1M count=1000[]
1000+0 записей считано
1000+0 записей написано
скопировано 1048576000 байт (1,0 GB), 36,1762 c, 29,0 MB/c
ivan1986@ivan1986:~/$ dd if=/dev/zero of=testfile.1gb bs=1M count=1000[]
1000+0 записей считано
1000+0 записей написано
скопировано 1048576000 байт (1,0 GB), 26,7475 c, 39,2 MB/c
ivan1986@ivan1986:~/$ dd if=/dev/zero of=testfile.1gb bs=1M count=1000[]
1000+0 записей считано
1000+0 записей написано
скопировано 1048576000 байт (1,0 GB), 32,8729 c, 31,9 MB/c

1 3 0 20940 19664 315188 0 0 128 7860 571 1108 6 10 3 81
2 2 0 15744 19668 320272 0 0 68 65332 893 1593 3 32 7 58
0 3 0 11932 19668 323260 0 0 96 49384 579 1142 3 11 0 85
0 3 0 17252 19704 318232 0 0 0 6832 516 1131 2 3 0 94
0 4 0 12732 19704 323204 0 0 128 6520 940 1145 4 22 4 69
2 4 0 11808 19980 323796 0 0 88 30492 1093 1393 7 20 2 70
0 4 0 32860 19980 302340 0 0 148 70892 1117 2026 2 10 4 84
0 4 0 11856 19980 323400 0 0 176 6652 553 1217 3 9 33 54
1 4 0 12340 19980 323156 0 0 12 12396 604 1269 2 8 4 85
0 4 0 12228 19980 323768 0 0 0 13520 816 1612 2 6 0 91
0 4 0 13136 19980 322244 0 0 0 21924 937 1504 7 8 0 85
0 3 0 11820 19980 324064 0 0 112 42740 857 1404 1 33 14 52
0 3 0 11896 19980 323468 0 0 48 9668 600 1161 4 6 1 88
0 4 0 12608 19980 322604 0 0 128 55032 746 1342 10 11 19 61
0 3 0 11328 19980 323508 0 0 76 27868 498 1087 4 3 6 86
0 4 0 11952 20020 322996 0 0 36 1196 502 1268 5 3 0 92
0 4 0 11952 20020 323512 0 0 0 4036 540 1064 3 8 0 89
0 4 0 11868 20304 323064 0 0 112 64560 893 1190 5 28 3 64
0 5 0 21888 20304 312760 0 0 336 35284 639 1520 4 15 0 82
0 5 0 21764 20304 313068 0 0 0 20936 572 1490 6 3 0 90
0 4 0 11844 20316 323896 0 0 248 364 610 1165 5 12 0 83
1 3 0 12336 20360 323368 0 0 0 31160 1113 1188 3 18 0 78

max 30% cpu in htop

interface NO freeze, music play normal, and other work fine

after update to 34

ivan1986@ivan1986:~/$ dd if=/dev/zero of=testfile.1gb bs=1M count=1000[] 
1000+0 записей считано
1000+0 записей написано
 скопировано 1048576000 байт (1,0 GB), 36,1762 c, 29,0 MB/c
ivan1986@ivan1986:~/$ dd if=/dev/zero of=testfile.1gb bs=1M count=1000[] 
1000+0 записей считано
1000+0 записей написано
 скопировано 1048576000 байт (1,0 GB), 26,7475 c, 39,2 MB/c
ivan1986@ivan1986:~/$ dd if=/dev/zero of=testfile.1gb bs=1M count=1000[] 
1000+0 записей считано
1000+0 записей написано
 скопировано 1048576000 байт (1,0 GB), 32,8729 c, 31,9 MB/c

1  3      0  20940  19664 315188    0    0   128  7860  571 1108  6 10  3 81
 2  2      0  15744  19668 320272    0    0    68 65332  893 1593  3 32  7 58
 0  3      0  11932  19668 323260    0    0    96 49384  579 1142  3 11  0 85
 0  3      0  17252  19704 318232    0    0     0  6832  516 1131  2  3  0 94
 0  4      0  12732  19704 323204    0    0   128  6520  940 1145  4 22  4 69
 2  4      0  11808  19980 323796    0    0    88 30492 1093 1393  7 20  2 70
 0  4      0  32860  19980 302340    0    0   148 70892 1117 2026  2 10  4 84
 0  4      0  11856  19980 323400    0    0   176  6652  553 1217  3  9 33 54
 1  4      0  12340  19980 323156    0    0    12 12396  604 1269  2  8  4 85
 0  4      0  12228  19980 323768    0    0     0 13520  816 1612  2  6  0 91
 0  4      0  13136  19980 322244    0    0     0 21924  937 1504  7  8  0 85
 0  3      0  11820  19980 324064    0    0   112 42740  857 1404  1 33 14 52
 0  3      0  11896  19980 323468    0    0    48  9668  600 1161  4  6  1 88
 0  4      0  12608  19980 322604    0    0   128 55032  746 1342 10 11 19 61
 0  3      0  11328  19980 323508    0    0    76 27868  498 1087  4  3  6 86
 0  4      0  11952  20020 322996    0    0    36  1196  502 1268  5  3  0 92
 0  4      0  11952  20020 323512    0    0     0  4036  540 1064  3  8  0 89
 0  4      0  11868  20304 323064    0    0   112 64560  893 1190  5 28  3 64
 0  5      0  21888  20304 312760    0    0   336 35284  639 1520  4 15  0 82
 0  5      0  21764  20304 313068    0    0     0 20936  572 1490  6  3  0 90
 0  4      0  11844  20316 323896    0    0   248   364  610 1165  5 12  0 83
 1  3      0  12336  20360 323368    0    0     0 31160 1113 1188  3 18  0 78

max 30% cpu in htop

interface NO freeze, music play normal, and other work fine

Revision history for this message

In Linux Kernel Bug Tracker #12309, cat (cat-linux-kernel-bugs) wrote on 2010-05-19:

#603

Simplest way to reproduce this bug on most hardware is:

1. create a cryptsetup partition (on LVM or without LVM, both variants are ok). Preferably, all partitions used in test case are must be encrypted;
2. install VirtualBox and try to create preallocated hard disk image, size must be 4GB or more.

That's it! If you try to use another applications at same time, you will see 5-10 sec freezes.

I've reproduced bug on many hardware configurations with 2.6.34 and older kernels, such as:
C2Q Q9650 / 8GB RAM / Seagate HDD / x86_64
i7 920 / 6GB / WD HDD / x86_64
C2D U7600 / 2GB / Samsung SSD / i686
C2D T7200 / 3GB / Seagate HDD / i686

So it's not hardware problems - hardwares age from 4 to 1 years and results are same.
Also, on *BSD and Win there are no problems with that hardware.

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2010-05-19:

#604

I'm wondering. Isn't bad reponsiveness equals starvation of processes in the
cpu schedueler? In that case it would be better to meassure the amount of cpu-
cycles it is possible to burn during pekmop1024's procedure.

I have tried to just dd a 8 Gb file, and it gives me stalls in the gui, but it
is because of stat64-calls in the application. Under normal circumstances the
file that is stat'ed is cached. But during high thoughput the cache is filled up
with other data. So the stat64-call have to read from the disk which will the
compete my dd. Running glxgears alongside the dd show a constant fps during to
whole dd.

I have followed this thread a long time and I do not remember anyone
mensioning that a single high thoughput application renders the cache useless
to other applications.

Is it possible to avoid filling the cahce with data that is written ?

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2010-05-19:

#605

I'm wondering. Isn't bad reponsiveness equals starvation of processes in the
cpu schedueler? In that case it would be better to meassure the amount of cpu-
cycles it is possible to burn during pekmop1024's procedure.

I have tried to just dd a 8 Gb file, and it gives me stalls in the gui, but it
is because of stat64-calls in the application. Under normal circumstances the
file that is stat'ed is cached. But during high thoughput the cache is filled up
with other data. So the stat64-call have to read from the disk which will the
compete my dd. Running glxgears alongside the dd show a constant fps during to
whole dd.

I have followed this thread a long time and I do not remember anyone
mensioning that a single high thoughput application renders the cache useless
to other applications. I'm guessing that a simple application that once per
second reads the first byte from a memory mapped file will stall, even if it is
only a single byte that needs to be cached.

I'm sorry If my thoughts have been mensioned before in this thread :)

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2010-05-20:

#606

I've tested my assumption about the 1-byte mmap'ed file. It turned out that it is running fine during my dd. Probable 1 byte is not enough.

Revision history for this message

In Linux Kernel Bug Tracker #12309, ivan1986 (ivan1986-linux-kernel-bugs) wrote on 2010-05-22:

#607

Still repeats itself - a compilation psi freezing interface

Revision history for this message

In Linux Kernel Bug Tracker #12309, mpartap (mpartap-linux-kernel-bugs) wrote on 2010-05-29:

#608

Download full text (24.0 KiB)

(In reply to comment #421)
> I have suffered the high iowait problem for almost 4 years
Then let's finally kill it!

> I got information from this bugzilla report that kernel 2.6.32 has fixed
> this high iowait problem, and I tested the snapshot kernel 2.6.32.2 of
> zenwalk
> on my notebook, and found the high iowait is gone

> I found in kernel 2.6.32.8 the high iowait is back. How do I know
> that? When I copy a 700MB avi file from my notebook disk to a 3.5" usb
> mobile disk, I found the reading side disk LED start to falsh quickly and
> immediately, but the writing side disk LED will keep still for a long
> time(like
> 25-30 seconds), and then start to flash slowly,and the course is abnormally
> long and low responsive.

> The kernel 2.6.32.2 is the only 2.6 kernel (since 2.6.18) on which I found
> both of the reading and writing side disk LED will start to falsh
> quickly and immediately.There must be somthing wrong with the write
> cache behavior which will cause the high iowait, and it has been fixed in
> 2.6.32.2 and brought back in 2.6.32.8.

This is the complete git log 2.6.32.2..2.6.32.8:
b0e4370 Linux 2.6.32.8
6117db7 NET: fix oops at bootime in sysctl code
e4a6a35 powerpc: TIF_ABI_PENDING bit removal
a420e9f ath9k: fix beacon slot/buffer leak
1c97637 ath9k: fix eeprom INI values override for 2GHz-only cards
2c7f87e pktcdvd: removing device does not remove its sysfs dir
b31aa5c uartlite: fix crash when using as console
e06fbe9 kernel/cred.c: use kmem_cache_free
35cfb03 starfire: clean up properly if firmware loading fails
906f68d mx3fb: some debug and initialisation fixes
682efb8 imxfb: correct location of callbacks in suspend and resume
b260729 mac80211: fix NULL pointer dereference when ftrace is enabled
3a9353f mm: flush dcache before writing into page to avoid alias
78da404 be2net: Fix memset() arg ordering.
e38d76e be2net: Bug fix to support newer generation of BE ASIC
43d7ff2 connector: Delete buggy notification code.
f06f00e usb: r8a66597-hdc disable interrupts fix
0ae2b7d block: fix bugs in bio-integrity mempool usage
9648148 random: Remove unused inode variable
8857a1a random: drop weird m_time/a_time manipulation
94af44b Fix 'flush_old_exec()/setup_new_exec()' split
cb723ba block: fix bio_add_page for non trivial merge_bvec_fn case
e52299d mm: purge fragmented percpu vmap blocks
56d4b77 mm: percpu-vmap fix RCU list walking
dce6a09 libata: retry link resume if necessary
42f7e23 oprofile/x86: fix crash when profiling more than 28 events
9c66557 oprofile/x86: add Xeon 7500 series support
4f7d666 KVM: allow userspace to adjust kvmclock offset
a74e62c ax25: netrom: rose: Fix timer oopses
3125258 af_packet: Don't use skb after dev_queue_xmit()
ecb7287 net: restore ip source validation
1681333 sky2: Fix oops in sky2_xmit_frame() after TX timeout
16b8efa tcp: update the netstamp_needed counter when cloning sockets
359e2f2 clocksource: fix compilation if no GENERIC_TIME
253f887 x86/amd-iommu: Fix possible integer overflow
d1a3103 x86: Add quirk for Intel DG45FC board to avoid low memory corruption
8159070 x86: Add Dell OptiPlex 760 reboot quirk
00362b9 regulator: Specify REGULATOR_CHANGE_STATUS for WM835x LED constraints
6db6ace ...

(In reply to comment #421)
> I have suffered the high iowait problem for almost 4 years
Then let's finally kill it!

> I got information from this bugzilla report that kernel 2.6.32 has fixed
> this high iowait problem, and I tested the snapshot kernel 2.6.32.2 of
> zenwalk
> on my notebook, and found the high iowait is gone

> I found in kernel 2.6.32.8 the high iowait is back. How do I know
> that? When I copy a 700MB avi file from my notebook disk to a 3.5" usb
> mobile disk, I found the reading side disk LED start to falsh quickly and
> immediately, but the writing side disk LED will keep still for a long
> time(like
> 25-30 seconds), and then start to flash slowly,and the course is abnormally
> long and low responsive.

> The kernel 2.6.32.2 is the only 2.6 kernel (since 2.6.18) on which I found
> both of the reading and writing side disk LED will start to falsh
> quickly and immediately.There must be somthing wrong with the write
> cache behavior which will cause the high iowait, and it has been fixed in
> 2.6.32.2 and brought back in 2.6.32.8.

This is the complete git log 2.6.32.2..2.6.32.8:
b0e4370 Linux 2.6.32.8
6117db7 NET: fix oops at bootime in sysctl code
e4a6a35 powerpc: TIF_ABI_PENDING bit removal
a420e9f ath9k: fix beacon slot/buffer leak
1c97637 ath9k: fix eeprom INI values override for 2GHz-only cards
2c7f87e pktcdvd: removing device does not remove its sysfs dir
b31aa5c uartlite: fix crash when using as console
e06fbe9 kernel/cred.c: use kmem_cache_free
35cfb03 starfire: clean up properly if firmware loading fails
906f68d mx3fb: some debug and initialisation fixes
682efb8 imxfb: correct location of callbacks in suspend and resume
b260729 mac80211: fix NULL pointer dereference when ftrace is enabled
3a9353f mm: flush dcache before writing into page to avoid alias
78da404 be2net: Fix memset() arg ordering.
e38d76e be2net: Bug fix to support newer generation of BE ASIC
43d7ff2 connector: Delete buggy notification code.
f06f00e usb: r8a66597-hdc disable interrupts fix
0ae2b7d block: fix bugs in bio-integrity mempool usage
9648148 random: Remove unused inode variable
8857a1a random: drop weird m_time/a_time manipulation
94af44b Fix 'flush_old_exec()/setup_new_exec()' split
cb723ba block: fix bio_add_page for non trivial merge_bvec_fn case
e52299d mm: purge fragmented percpu vmap blocks
56d4b77 mm: percpu-vmap fix RCU list walking
dce6a09 libata: retry link resume if necessary
42f7e23 oprofile/x86: fix crash when profiling more than 28 events
9c66557 oprofile/x86: add Xeon 7500 series support
4f7d666 KVM: allow userspace to adjust kvmclock offset
a74e62c ax25: netrom: rose: Fix timer oopses
3125258 af_packet: Don't use skb after dev_queue_xmit()
ecb7287 net: restore ip source validation
1681333 sky2: Fix oops in sky2_xmit_frame() after TX timeout
16b8efa tcp: update the netstamp_needed counter when cloning sockets
359e2f2 clocksource: fix compilation if no GENERIC_TIME
253f887 x86/amd-iommu: Fix possible integer overflow
d1a3103 x86: Add quirk for Intel DG45FC board to avoid low memory corruption
8159070 x86: Add Dell OptiPlex 760 reboot quirk
00362b9 regulator: Specify REGULATOR_CHANGE_STATUS for WM835x LED constraints
6db6ace SECURITY: selinux, fix update_rlimit_cpu parameter
80569f6 firewire: core: add_descriptor size check
612e99b drm/i915: only enable hotplug for detected outputs
69bf9a6 iwlwifi: set default aggregation frame count limit to 31
3492bbb x86: Disable HPET MSI on ATI SB700/SB800
cf135e5 Input: winbond-cir - remove dmesg spam
5e806e1 x86: get rid of the insane TIF_ABI_PENDING bit
c2e245d sparc: TIF_ABI_PENDING bit removal
336ca4c Split 'flush_old_exec' into two functions
944a638 FDPIC: Respect PT_GNU_STACK exec protection markings when creating NOMMU stack
0b3bf81 mm: fix migratetype bug which slowed swapping
629527c Fix failure exit in ipathfs
30d3844 fix affs parse_options()
d842c31 Fix remount races with symlink handling in affs
36a0a4a fix leak in romfs_fill_super()
26d2257 fix oops in fs/9p late mount failure
deb20f1 Fix failure exits in bfs_fill_super()
703c300 Fix a leak in affs_fill_super()
61d4374 drm/i915: Reload hangcheck timer too for Ironlake
f0b4195 e1000/e1000e: don't use small hardware rx buffers
b9ad9bb e1000e: enhance frame fragment detection
dff2267 e1000: enhance frame fragment detection
cfc7e54 UBI: fix volume creation input checking
3b4f785 ACPI: Advertise to BIOS in _OSC: _OST on _PPC changes
0d48a1a ACPI: fix OSC regression that caused aer and pciehp not to load
1a52add ACPI: Add platform-wide _OSC support.
e62a96c ACPI: Add a generic API for _OSC -v2
1e88960 dasd: fix possible NULL pointer errors
083beff zcrypt: Do not remove coprocessor for error 8/72
63693ee libata: retry FS IOs even if it has failed with AC_ERR_INVALID
8c2cd3f x86: Remove "x86 CPU features in debugfs" (CONFIG_X86_CPU_DEBUG)
b5b39c3 x86: Set hotpluggable nodes in nodes_possible_map
76e789c S390: fix single stepped svcs with TRACE_IRQFLAGS=y
16a2ae6 firewire: ohci: fix crashes with TSB43AB23 on 64bit systems
d8e0902 drm/i915: Selectively enable self-reclaim
8268c0b mm: add new 'read_cache_page_gfp()' helper function
b7a9d92 mptsas: Fix issue with chain pools allocation on katmai
e15fca0 scsi_lib: Fix bug in completion of bidi commands
b4bdd73 Linux 2.6.32.7
a8e96d6 x86, msr/cpuid: Pass the number of minors when unregistering MSR and CPUID drivers.
0a1c275 fnctl: f_modown should call write_lock_irqsave/restore
01e991b iwlwifi: Fix throughput stall issue in HT mode for 5000
d274df6 ACPI: enable C2 and Turbo-mode on Nehalem notebooks on A/C
59568be x86: Reenable TSC sync check at boot, even with NONSTOP_TSC
194223f IPoIB: Clear ipoib_neigh.dgid in ipoib_neigh_alloc()
454f8b1 KVM: only clear irq_source_id if irqchip is present
eaccd49 KVM: fix lock imbalance in kvm_*_irq_source_id()
9801911 KVM: x86: Fix leak of free lapic date in kvm_arch_vcpu_init()
8e5c20d KVM: x86: Fix probable memory leak of vcpu->arch.mce_banks
0118bac KVM: x86: Fix host_mapping_level()
4938210 KVM: MMU: bail out pagewalk on kvm_read_guest error
59cf854 KVM: Fix race between APIC TMR and IRR
f0d13b8 KVM: only allow one gsi per fd
70be4d7 KVM: S390: fix potential array overrun in intercept handling
eb60025 cfg80211: fix channel setting for wext
304cd19 mac80211: check that ieee80211_set_power_mgmt only handles STA interfaces.
09e4d0f ASoC: fix a memory-leak in wm8903
2cdc2dc UBI: initialise update marker
f6fbe0b UBI: fix memory leak in update path
4d845d6 hwmon: (fschmd) Fix a memleak on multiple opens of /dev/watchdog
00bd133 ALSA: hda - Fix HP T5735 automute
a0dffef ipc ns: fix memory leak (idr)
a5981df netiucv: displayed TX bytes value much too high
27aeefb cio: dont panic in non-fatal conditions
f5b1bc5 cio: fix double free in case of probe failure
da02974 V4L/DVB (13826): uvcvideo: Fix controls blacklisting
2928b68 md: fix small irregularity with start_ro module parameter
31cf6d8 ata_piix: fix MWDMA handling on PIIX3
3de08a12 ahci: disable SNotification capability for ich8
c817c19 iTCO_wdt: Add Intel Cougar Point and PCH DeviceIDs
42b4505 iTCO_wdt: add PCI ID for the Intel EP80579 (Tolapai) SoC
53691f2 iTCO_wdt.c - cleanup chipset documentation
4220098 ALSA: hda - Add missing Line-Out and PCM switches as slave
9049580 ALSA: hda - Fix quirk for Maxdata obook4-1
a2c5952 ALSA: hda - select IbexPeak handler for Calpella
d160610 Input: i8042 - add Dritek quirk for Acer Aspire 5610.
461eb3f Input: i8042 - add Gigabyte M1022M to the noloop list
f6278f1 Input: i8042 - remove identification strings from DMI tables
44d13be DMI: allow omitting ident strings in DMI tables
5172b4b PCI: AER: fix aer inject result in kernel oops
bf9a88d qlge: Bonding fix for mode 6.
6b07617 qlge: Add handler for DCBX firmware event.
6055e7f qlge: Don't fail open when port is not initialized.
836750b qlge: Set PCIE max read request size.
ffd1fab qlge: Remove explicit setting of PCI Dev CTL reg.
7c0798e fcoe: Fix getting san mac for VLAN interface
1ce0348 fcoe: Fix checking san mac address
e166cb1 fcoe, libfc: fix an libfc issue with queue ramp down in libfc
2792e0ce libfc: remote port gets stuck in restart state without really restarting
407590a libfc: fix free of fc_rport_priv with timer pending
a3d46ca libfc: fix memory corruption caused by double frees and bad error handling
4c40dbe libfc: Fix frags in frame exceeding SKB_MAX_FRAGS in fc_fcp_send_data
88cc93a fcoe: initialize return value in fcoe_destroy
7c8a0dc libfc: don't WARN_ON in lport_timeout for RESET state
83d236b libfc: lport: fix minor documentation errors
56320f6 libfc: Fix wrong scsi return status under FC_DATA_UNDRUN
d5d72da fcoe: remove redundant checking of netdev->netdev_ops
34556a1 libfc: fix ddp in fc_fcp for 0 xid
1e418b2 libfc: fix typo in retry check on received PRLI
253f41b lpfc: fix hang on SGI ia64 platform
4b2bc96 scsi_transport_fc: remove invalid BUG_ON
d502a76 scsi_dh: create sysfs file, dh_state for all SCSI disk devices
e7c8167 scsi_devinfo: update Hitachi entries (v2)
001252f HID: fixup quirk for NCR devices
5e05787 NFS: Revert default r/wsize behavior
1d42a1b iscsi class: modify handling of replacement timeout
83886fa PCI: Always set prefetchable base/limit upper32 registers
5cf92e9 timers, init: Limit the number of per cpu calibration bootup messages
34911bf nfsd: Fix sort_pacl in fs/nfsd/nf4acl.c to actually sort groups
a9238ce nohz: Prevent clocksource wrapping during idle
db47a16 sched: Fix missing sched tunable recalculation on cpu add/remove
08b84be sched: Fix isolcpus boot option
eb9dbd9 ALSA: ice1724 - Patch for suspend/resume for ESI Juli@
e96610c partitions: use sector size for EFI GPT
6f8de29 partitions: read whole sector with EFI GPT header
8f2fefc netfilter: xtables: fix conntrack match v1 ipt-save output
3cd4bea V4L/DVB (13680b): DocBook/media: create links for included sources
35f42c9 V4L/DVB (13680a): DocBook/media: copy images after building HTML
857ffb8 atl1e:disable NETIF_F_TSO6 for hardware limit
f7b1714 atl1c:use common_task instead of reset_task and link_chg_task
b68f619 iTCO_wdt: Add support for Intel Ibex Peak
96ef353 V4L/DVB (13168): Add support for Asus Europa Hybrid DVB-T card (SAA7134 SubVendor ID: 0x1043 Device ID: 0x4847)
8429570 USB: ftdi_sio: add USB device ID's for B&B Electronics line
5bcaffb USB: mos7840: add device IDs for B&B electronics devices
4d3c678 V4L/DVB (13569): smsusb: add autodetection support for five additional Hauppauge USB IDs
ff23399 ALSA: hda - Add PCI IDs for Nvidia G2xx-series
4bc685e vfs: get_sb_single() - do not pass options twice
1b715f1 driver-core: fix devtmpfs crash on s390
da30443 Driver-Core: devtmpfs - set root directory mode to 0755
04daa51 Input: ALPS - add interleaved protocol support (Dell E6x00 series)
30dc12e davinci: dm646x: Add support for 3.x silicon revision
c375e84 powerpc/fsl: Add PCI device ids for new QoirQ chips
a98917c ar9170: Add support for D-Link DWA 160 A2
002464c mpt2sas: New device SAS2208 support is added
90ee3ca be2net: Add the new PCI IDs to PCI_DEVICE_TABLE.
879c8e8 be2net: Add support for next generation of BladeEngine device.
c97c73d sfc: Fix DMA mapping cleanup in case of an error in TSO
9396c90 ACPI: don't cond_resched if irq is disabled
ce946bc clockevents: Add missing include to pacify sparse
08b8ff4 clockevent: Don't remove broadcast device when cpu is dead
f584d37 Linux 2.6.32.6
9607f06 perf: Honour event state for aux stream data
b0a9392 perf events: Dont report side-band events on each cpu for per-task-per-cpu events
5a20267 perf timechart: Use tid not pid for COMM change
f2fa92b vmalloc: remove BUG_ON due to racy counting of VM_LAZY_FREE
3d0cc9a USB: fix usbstorage for 2770:915d delivers no FAT
538a6fd x86/PCI/PAT: return EINVAL for pci mmap WC request for !pat_enabled
e0f5cfa DM: Fix device mapper topology stacking
fbe2992 block: bdev_stack_limits wrapper
ed0cd89 drm/i915: try another possible DDC bus for the SDVO device with multiple outputs
4fb77a3 drm/i915: Read the response after issuing DDC bus switch command
8cef765 SCSI: enclosure: fix oops while iterating enclosure_status array
5f0ab2d ACPI: EC: Add wait for irq storm
1ff7b99 ACPI: EC: Accelerate query execution
111ab4b USB: add speed values for USB 3.0 and wireless controllers
a2a5b33 USB: add missing delay during remote wakeup
bfec5ce USB: EHCI & UHCI: fix race between root-hub suspend and port resume
07d577f USB: EHCI: fix handling of unusual interrupt intervals
186c74d USB: Don't use GFP_KERNEL while we cannot reset a storage device
fa68188 USB: fix bitmask merge error
911b8be usb: serial: fix memory leak in generic driver
04f7ec7 serial: 8250_pnp: use wildcard for serial Wacom tablets
6fc7937 nozomi: quick fix for the close/close bug
8c53542 ecryptfs: initialize private persistent file before dereferencing pointer
3621216 ecryptfs: use after free
179b7e5 tty: fix race in tty_fasync
b70922a Staging: hv: fix smp problems in the hyperv core code
50e4975 Staging: asus_oled: fix oops in 2.6.32.2
ccb90b8 V4L/DVB (13900): gspca - sunplus: Fix bridge exchanges.
d547e91 x86, msr/cpuid: Register enough minors for the MSR and CPUID drivers
a2febcd Linux 2.6.32.5
af55a3d vfs: Fix vmtruncate() regression
2693139 sched: Fix task priority bug
fdc360e serial/8250_pnp: add a new Fujitsu Wacom Tablet PC device
2d22b38 i2c/pca: Don't use *_interruptible
c1f77a7 i2c: Do not use device name after device_unregister
4bff5ff sparc64: Fix Niagara2 perf event handling.
9d6567c sparc64: Fix NMI programming when perf events are active.
896fb0d sched: Fix cpu_clock() in NMIs, on !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
9fc68ca asus-laptop: add Lenovo SL hotkey support
2196ca4 Input: pmouse - move Sentelic probe down the list
94249e6 megaraid_sas: remove sysfs poll_mode_io world writeable permissions
2db740c PCI/cardbus: Add a fixup hook and fix powerpc
eecd8a9 HID: add device IDs for new model of Apple Wireless Keyboard
781d5c4 reiserfs: truncate blocks not used by a write
56a7f72 V4L/DVB (13868): gspca - sn9c20x: Fix test of unsigned.
fe52cee ALSA: hda - Fix missing capture mixer for ALC861/660 codecs
34e7aa0 mfd: Correct WM835x ISINK ramp time defines
33faa3c mfd: WM835x GPIO direction register is not locked
7f08f93 x86: SGI UV: Fix mapping of MMIO registers
7f40c6b edac: i5000_edac critical fix panic out of bounds
25d5699 x86, apic: use physical mode for IBM summit platforms
c91ab04 page allocator: update NR_FREE_PAGES only when necessary
d4c893f futexes: Remove rw parameter from get_futex_key()
8410b13 x86, mce: Thermal monitoring depends on APIC being enabled
1bd24fd block: Fix incorrect reporting of partition alignment
8a9c3f5 drm/i915: remove loop in Ironlake interrupt handler
4334ab7 memcg: ensure list is empty at rmdir
70f800f revert "drivers/video/s3c-fb.c: fix clock setting for Samsung SoC Framebuffer"
800c028 inotify: only warn once for inotify problems
cec3ad6 inotify: do not reuse watch descriptors
3df7673 Linux 2.6.32.4
5877960 agp/intel-agp: Clear entire GTT on startup
5deb72e ipv6: skb_dst() can be NULL in ipv6_hop_jumbo().
54f1b39 module: handle ppc64 relocating kcrctabs when CONFIG_RELOCATABLE=y
9ef9a7c fix more leaks in audit_tree.c tag_chunk()
dffaea5 fix braindamage in audit_tree.c untag_chunk()
d3b1e3b mac80211: fix skb buffering issue (and fixes to that)
71c7707 kernel/sysctl.c: fix stable merge error in NOMMU mmap_min_addr
904e373 libertas: Remove carrier signaling from the scan code
b9945e7 drm/i915: remove render reclock support
9b13cca mac80211: add missing sanity checks for action frames
0ea5505 iwl: off by one bug
724ad42 cfg80211: fix syntax error on user regulatory hints
e6efac7 ath5k: Fix eeprom checksum check for custom sized eeproms
fc95845 iwlwifi: fix iwl_queue_used bug when read_ptr == write_ptr
a111c28 xen: fix hang on suspend.
38c4d8d quota: Fix dquot_transfer for filesystems different from ext4
a61dcb0 hwmon: (adt7462) Fix pin 28 monitoring
4052fbf hwmon: (coretemp) Fix TjMax for Atom N450/D410/D510 CPUs
545b020 netfilter: nf_ct_ftp: fix out of bounds read in update_nl_seq()
635b4f9 netfilter: ebtables: enforce CAP_NET_ADMIN
954c8ef ASoC: Fix WM8350 DSP mode B configuration
cf99848 ALSA: atiixp: Specify codec for Foxconn RC4107MA-RS2
0385cc0 ALSA: ac97: Add Dell Dimension 2400 to Headphone/Line Jack Sense blacklist
5bb4e84 ALSA: hda - Fix ALC861-VD capture source mixer
e0abcea mmc_block: fix queue cleanup
0c74f45 mmc_block: fix probe error cleanup bug
0798abf mmc_block: add dev_t initialization check
0696a3b kernel/signal.c: fix kernel information leak with print-fatal-signals=1
ecac13f dma-debug: allow DMA_BIDIRECTIONAL mappings to be synced with DMA_FROM_DEVICE and
f21efc5 lib/rational.c needs module.h
21f7654 cgroups: fix 2.6.32 regression causing BUG_ON() in cgroup_diput()
6abb6ac drivers/cpuidle/governors/menu.c: fix undefined reference to `__udivdi3'
fdc0895 rtc_cmos: convert shutdown to new pnp_driver->shutdown
0c51b5c drm/i915: fix unused var
c7e8c26 drm/i915: Select the correct BPC for LVDS on Ironlake
c04fd30 drm/i915: Make the BPC in FDI rx/transcoder be consistent with that in pipeconf on Ironlake
cba0270 drm/i915: Enable/disable the dithering for LVDS based on VBT setting
de04091 drm: remove address mask param for drm_pci_alloc()
c693959 drm/i915: Permit pinning whilst the device is 'suspended'
d241962 drm/i915: fix order of fence release wrt flushing
d3e4d5f drm/i915: Update LVDS connector status when receiving ACPI LID event
8064af1 sunrpc: on successful gss error pipe write, don't return error
8ffe947 SUNRPC: Fix the return value in gss_import_sec_context()
e64b13f SUNRPC: Fix up an error return value in gss_import_sec_context_kerberos()
eb0b93d sunrpc: fix peername failed on closed listener
3aafc55 nfsd: make sure data is on disk before calling ->fsync
b7e5f77 Revert "x86: Side-step lguest problem by only building cmpxchg8b_emu for pre-Pentium"
2448811 exofs: simple_write_end does not mark_inode_dirty
8dfabfc modules: Skip empty sections when exporting section notes
efd38f4 ASoC: fix params_rate() macro use in several codecs
e4dd8ca fasync: split 'fasync_helper()' into separate add/remove functions
1f51eb3 untangle the do_mremap() mess
c3a8e0e Linux 2.6.32.3
84d330e generic_permission: MAY_OPEN is not write access
3815270 rt2x00: Disable powersaving for rt61pci and rt2800pci.
8ac9e80 ksm: fix mlockfreed to munlocked
b2ea8cb vmscan: do not evict inactive pages when skipping an active list scan
370b758 lguest: fix bug in setting guest GDT entry
743c078 ext4: Update documentation to correct the inode_readahead_blks option name
fc31022 sched: Sched_rt_periodic_timer vs cpu hotplug
9127720 amd64_edac: fix forcing module load/unload
1538323 amd64_edac: make driver loading more robust
44a529c amd64_edac: fix driver instance freeing
2d9e1f0 x86, msr: msrs_alloc/free for CONFIG_SMP=n
eb21839 x86, msr: Add support for non-contiguous cpumasks
26eb2ac amd64_edac: unify MCGCTL ECC switching
ebd2802 cpumask: use modern cpumask style in drivers/edac/amd64_edac.c
a89a9e1 x86, msr: Unify rdmsr_on_cpus/wrmsr_on_cpus
b2dbc46 ext4: fix sleep inside spinlock issue with quota and dealloc (#14739)
dbe5cc0 ext4: Convert to generic reserved quota's space management.
bbf2450 quota: decouple fs reserved space from quota reservation
f07c88d Add unlocked version of inode_add_bytes() function
0aebc28 udf: Try harder when looking for VAT inode
3196f98 orinoco: fix GFP_KERNEL in orinoco_set_key with interrupts disabled
fad0c31 xen: wait up to 5 minutes for device connetion
2cfea00 xen: improvement to wait_for_devices()
af70ddf xen: fix is_disconnected_device/exists_disconnected_device
1dc51f1 S390: dasd: support DIAG access for read-only devices
4012cf6 drm: disable all the possible outputs/crtcs before entering KMS mode
08ff733 drm/radeon/kms: fix crtc vblank update for r600
a09adfe sched: Fix balance vs hotplug race
fb70ac4 Keys: KEYCTL_SESSION_TO_PARENT needs TIF_NOTIFY_RESUME architecture support
7fcb558 b43: avoid PPC fault during resume
a8e3ec9 hwmon: (sht15) Off-by-one error in array index + incorrect constants
048a424 netfilter: fix crashes in bridge netfilter caused by fragment jumps
89cf4f4 ipv6: reassembly: use seperate reassembly queues for conntrack and local delivery
ee6bfc6 e100: Fix broken cbs accounting due to missing memset.
ad46fed memcg: avoid oom-killing innocent task in case of use_hierarchy
b52d855 x86/ptrace: make genregs[32]_get/set more robust
6e2aa7d V4L/DVB (13596): ov511.c typo: lock => unlock
4b6d263 kernel/sysctl.c: fix the incomplete part of sysctl_max_map_count-should-be-non-negative.patch
3ec268a 'sysctl_max_map_count' should be non-negative
0399123 NOMMU: Optimise away the {dac_,}mmap_min_addr tests
1cfe005 mac80211: fix race with suspend and dynamic_ps_disable_work
14b4d74 iwlwifi: fix 40MHz operation setting on cards that do not allow it
c4ae8ae iwlwifi: fix more eeprom endian bugs
df5d119 iwlwifi: fix EEPROM/OTP reading endian annotations and a bug
0c0cdaf iwl3945: fix panic in iwl3945 driver
66c9e44 iwl3945: disable power save
87d512c ath9k_hw: Fix AR_GPIO_INPUT_EN_VAL_BT_PRIORITY_BB and its shift value in 0x4054
a6d8cc6 ath9k_hw: Fix possible OOB array indexing in gen_timer_index[] on 64-bit
12ba709 ath9k: fix suspend by waking device prior to stop
c965e1e ath9k: wake hardware during AMPDU TX actions
463a7f9 ath9k: fix missed error codes in the tx status check
bef82b6 ath9k: Fix TX queue draining
0ebbdd7 ath9k: wake hardware for interface IBSS/AP/Mesh removal
d5086b9 ath5k: fix SWI calibration interrupt storm
4777020 cfg80211: fix race between deauth and assoc response
9f7028e mac80211: Fix IBSS merge
0b41c5a mac80211: fix WMM AP settings application
330b937 mac80211: fix propagation of failed hardware reconfigurations
38cf2a0 iwmc3200wifi: fix array out-of-boundary access
08a9378 Libertas: fix buffer overflow in lbs_get_essid()
3b96f9a KVM: LAPIC: make sure IRR bitmap is scanned after vm load
3a9f992 KVM: MMU: remove prefault from invlpg handler
8b9f038 ioat2,3: put channel hardware in known state at init
e05a6f0 ioat3: fix p-disabled q-continuation
e93166f x86/amd-iommu: Fix initialization failure panic
cd7bc18 cifs: NULL out tcon, pSesInfo, and srvTcp pointers when chasing DFS referrals
6cb5fcc dma-debug: Fix bug causing build warning
120dbaa dma-debug: Do not add notifier when dma debugging is disabled.
c4ddbba dma: at_hdmac: correct incompatible type for argument 1 of 'spin_lock_bh'
ed8f6eb md: Fix unfortunate interaction with evms
acb8be4 x86: SGI UV: Fix writes to led registers on remote uv hubs
4ba51fe drivers/net/usb: Correct code taking the size of a pointer
526fed8 USB: fix bugs in usb_(de)authorize_device
c6d7a67 USB: rename usb_configure_device
f661c3f Bluetooth: Prevent ill-timed autosuspend in USB driver
b71bfa6 USB: musb: gadget_ep0: avoid SetupEnd interrupt
3635acd USB: Fix a bug on appledisplay.c regarding signedness
5a82dd5 USB: option: support hi speed for modem Haier CE100
702a0a0 USB: emi62: fix crash when trying to load EMI 6|2 firmware
2d67231 drm/radeon: fix build on 64-bit with some compilers.
474ae5e ASoC: Do not write to invalid registers on the wm9712.
d75621c powerpc: Handle VSX alignment faults correctly in little-endian mode
8aafd7d ACPI: Use the return result of ACPI lid notifier chain correctly
3872bf5 ACPI: EC: Fix MSI DMI detection
5ab8996 acerhdf: limit modalias matching to supported
296e9be ALSA: hda - Fix missing capsrc_nids for ALC88x
aec8dc2 sound: sgio2audio/pdaudiocf/usb-audio: initialize PCM buffer
e255d3c ASoC: wm8974: fix a wrong bit definition
1ee0552 pata_cmd64x: fix overclocking of UDMA0-2 modes
f31733a pata_hpt3x2n: fix clock turnaround
fa3f5a5 clockevents: Prevent clockevent_devices list corruption on cpu hotplug
8e04c81 sched: Select_task_rq_fair() must honour SD_LOAD_BALANCE
c9ac6a9 x86, cpuid: Add "volatile" to asm in native_cpuid()
14ae082 sched: Fix task_hot() test order
fdf2675 SCSI: fc class: fix fc_transport_init error handling
1ab0714 SCSI: st: fix mdata->page_order handling
9f63d27 SCSI: qla2xxx: dpc thread can execute before scsi host has been added
c1d17da SCSI: ipr: fix EEH recovery
a1092bf Linux 2.6.32.2

The problem has to be somewhere in there. Frank, you're the only guy up to now bringing up hard evidence and two relatively close good/bad kernel versions. Would you be able to dig deeper on this? It's just ridiculous some IO can prevent a quadcore from skipless video playback (on 2.6.34-git12 that is).. because of btrfs i can't switch back to 2.6.32.2 - but maybe someone can figure out how to use the phoronix-test-suite's automagic to bisect this?

And despite all noise: this bug really shouldn't be marked RESOLVED INSUFFICIENT_DATA ^^

Revision history for this message

In Linux Kernel Bug Tracker #12309, frankrq2009 (frankrq2009-linux-kernel-bugs) wrote on 2010-05-30:

#609

(In reply to comment #448 and comment #431)
Sorry, I really want to help, but I am not a kernel developer, hacking the kernel source is too difficult for me. Besides, the gas turbine historian is a live production system, it can not be used as a debug system. I will keep watching for the final resolve, for now, we will stick with 2.6.32.2.

Revision history for this message

In Linux Kernel Bug Tracker #12309, mpartap (mpartap-linux-kernel-bugs) wrote on 2010-05-30:

#610

wait wait what is this :O
updating to yesterday's git kernel (from 2.6.34-git12) gave me a huge perceived speed boost? haven't specifically compared iowait times - but all processes seem to be using less cpu time? my BOINC likes that very much ;)
a lot of concurrent IO here and the system, apart from minor application stalling (although 8GiB RAM and no swap), hasn't been this un-sluggish for a loooong time (2.6.18? ;)
feels like someone finally released the breaks - hope you guys can confirm this!

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2010-05-30:

#611

wrt comment #450, 2.6.35-rc1 is out! I hope that has something for all of us sufferers. I will try it later today. Can other folks also try and report here?

Revision history for this message

In Linux Kernel Bug Tracker #12309, andre (andre-linux-kernel-bugs) wrote on 2010-05-31:

#612

(In reply to comment #450)
Maybe this is related to the observations at phoronox's kernel tracker[1]. An in depth article was also posted[2].

1: http://www.phoromatic.com/kernel-tracker.php?sys_1=yes&sys_3=yes&sys_4=yes&sub_type_System=yes&sub_type_Processor=yes&sub_type_Disk=yes&sub_type_Graphics=yes&sub_type_Memory=yes&sub_type_Network=yes&date_range=15&regression_threshold=0.15&only_show_regressions=yes&submit=Update+Results
2: http://www.phoronix.com/scan.php?page=article&item=linux_2635_fail&num=1

Note: Link 1 is valid for the next few days, thereafter you have to raise the displayed days to get the regression back into view

Revision history for this message

In Linux Kernel Bug Tracker #12309, akpm (akpm-linux-kernel-bugs) wrote on 2010-06-04:

#613

lol, this bug was marked "resolved". I wish.

(Hi, everyone).

I suspect we have about 25 different bugs here. Really the only way we'll make progress here is if people can come up with specific test cases which developers can run on their own machines, and reproduce the bug.

So if any of you guys have time to try that and are successful then please attach that testcase here, or send it out via email to the relevant culprits.

It's really that important. There's practically a 1:1 ratio between reproduction-test-cases and bugfixes.

Revision history for this message

In Linux Kernel Bug Tracker #12309, desasterman (desasterman-linux-kernel-bugs) wrote on 2010-06-24:

#614

Let me point out a potential pitfall: For a long while I thought my machine was suffering from this bug. However, the real reason for my high IO wait and extremely poor performance was this:

http://www.osnews.com/story/22872/Linux_Not_Fully_Prepared_for_4096-Byte_Sector_Hard_Drives

So everyone should rule out that one first... for me, a repartitioning of my drive helped a lot :).

Revision history for this message

In Linux Kernel Bug Tracker #12309, Khalid.rashid (khalid.rashid-linux-kernel-bugs) wrote on 2010-07-12:

#615

Just want to report that I've had great success with the kernel 2.6.35-020635rc4-generic on ubuntu 32 bit. Apps can still grey out when allocating space for big files, but the interface is still responsive on other apps. I'll try it out on more setups and report back here if i notice it appearing on other places.

Finally I can say that my linux machines are usable again. Cheers!

Revision history for this message

blahde (daisy-ice) wrote on 2010-07-17:

#34

@André Desgualdo Pereira:

which kernel exactly do you recommend? Linux 2.6.35-020635rc1-generic does not bring any changes - for me.

Revision history for this message

André Desgualdo Pereira (desgua) wrote on 2010-07-17:

#35

@blahde

The 2.6.33 kernel has worked with Karmic Koala, but I haven 't tested with Lucid Lynx.

Regards.

Revision history for this message

In Linux Kernel Bug Tracker #12309, psypher246 (psypher246-linux-kernel-bugs) wrote on 2010-07-22:

#616

(In reply to comment #453)
> lol, this bug was marked "resolved". I wish.
>
> (Hi, everyone).
>
> I suspect we have about 25 different bugs here. Really the only way we'll
> make
> progress here is if people can come up with specific test cases which
> developers can run on their own machines, and reproduce the bug.
>
> So if any of you guys have time to try that and are successful then please
> attach that testcase here, or send it out via email to the relevant culprits.
>
> It's really that important. There's practically a 1:1 ratio between
> reproduction-test-cases and bugfixes.

Hi ANdrew,

Very simple testing procedure:

Launch Firefox

Run 'stress -d 1'

Try open some websites

Machine hangs

Thanks

Revision history for this message

In Linux Kernel Bug Tracker #12309, psypher246 (psypher246-linux-kernel-bugs) wrote on 2010-07-22:

#617

(In reply to comment #455)
> Just want to report that I've had great success with the kernel
> 2.6.35-020635rc4-generic on ubuntu 32 bit. Apps can still grey out when
> allocating space for big files, but the interface is still responsive on
> other
> apps. I'll try it out on more setups and report back here if i notice it
> appearing on other places.
>
> Finally I can say that my linux machines are usable again. Cheers!

I will try that, but I have no issues in XP and my hard drive is at least 2 1/2 years old and this issue has been around for even longer than that.

Doubt it's the reason for my issues.

I have also tried playing around with other schedulers and disk mounting options. I have tried writeback and journal mode. Writeback provides very minimal improvement, not enough to make it worth my while to run always. Changing between ATA and AHCI mode makes no difference as well as changing the scheduler from cfg to anticipatory or deadline.

I am testing this on a Dell Precision M6300 Laptop with SATA drive, but I have experienced this issue on all my various types of PC's since at least Ubuntu Gusty or Intrepid.

Revision history for this message

In Linux Kernel Bug Tracker #12309, akpm (akpm-linux-kernel-bugs) wrote on 2010-07-22:

#618

(In reply to comment #456)
>
> Very simple testing procedure:
>
> Launch Firefox
>
> Run 'stress -d 1'
>

From where does one obtain a copy of `stress'?

Thanks.

Revision history for this message

In Linux Kernel Bug Tracker #12309, benjfitz (benjfitz-linux-kernel-bugs) wrote on 2010-07-22:

#619

I believe this is the website (according to gentoo portage).
http://weather.ou.edu/~apw/projects/stress/
Benj

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2010-07-22:

#620

I've tried stress also.
I have 2 Gb og memory and 1.5 Gb swap

With swap activated stress -d 1 hangs my machine

Same does stress -d while swapiness set to 0

Widh swap deactivated things runs pretty fine. Of couse apps utilizing syncronous disk-io fight stress for priority.

There must be a reasonable explanation on why everything stops when swap is activated. Even a simple app like "dstat" stalls.

Revision history for this message

In Linux Kernel Bug Tracker #12309, nels.nielson (nels.nielson-linux-kernel-bugs) wrote on 2010-07-23:

#621

I can also confirm this. Disabling swap with swapoff -a solves the problem.
I have 8gb of ram and 8gb of swap with a fake raid mirror.

Before this I couldn't do backups without the whole system grinding to a halt. Right now I am doing a backup from the drives, watching a movie from the same drives and more. No more iowait times and programs freezing as they are starved from being able to access the drives.

Revision history for this message

In Linux Kernel Bug Tracker #12309, andrew (andrew-linux-kernel-bugs) wrote on 2010-07-23:

#622

Perhaps you could capture some vmstat 1 output from just before/when the stall occurs?

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2010-07-23:

#623

Created attachment 27230
vmstat for my system running "stress -d 1" without hanging.

My system just logged into KDE around 650 Mb of memory used by applications
prior to starting "stress -d 1"

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2010-07-23:

#624

Created attachment 27231
vmstat for my system running "stress -d 1". System hangs.

My system just logged into KDE around 860 Mb of memory used by applications
prior to starting "stress -d 1". Application utilizing extra memory is
digikam and kontact - both sitting there doing nothing.

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2010-07-23:

#625

Created attachment 27232
vmstat for my system (without swap) running "stress -d 1" without hanging.

Same setup as stress_swap_hang.vmstat except that swap is turned off using
"swapoff -a" in this run.

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2010-07-23:

#626

The strange thing about every high throughput io is that *every* byte of memory is used up intil a certain limit. That use of memory will even swap out stuff.

Also looking at especially stress_noswap_nohang.vmstat the behavior mimics this.

1. Place data to be written into memory
2. Write some data to the disk
3. goto 1 if not all allowed memory is used.

Interesting is that "stress -d 1" places data into memory a lot faster than a normal hard disk can handle. So the memory will be filled up eventually (the limit will be reached eventually).

So for me I only have a hanging system when "stress -d 1" writes compete with "swap out" - which is actually caused by "stress -d 1" filling the memory.

So the big question: Why do the kernel allow large data writes to fill up the memory and even swap out stuff just to get data to be written into memory?

Revision history for this message

In Linux Kernel Bug Tracker #12309, cmertes (cmertes-linux-kernel-bugs) wrote on 2010-07-24:

#627

(In reply to comment #466)
> So the big question: Why do the kernel allow large data writes to fill up the
> memory and even swap out stuff just to get data to be written into memory?

A good question, but not the real source of this problem I guess. Judging by the previous posts and my own experience, this problems seems to occur with any concurrent I/O, possibly promoted by encryption. Provided that it is only one bug we are talking about.

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2010-07-24:

#628

I've notices that earlier in the long list of comments. But could it be that others confuse the real issue with swapout slowing things down during high disk write?

Revision history for this message

In Linux Kernel Bug Tracker #12309, james (james-linux-kernel-bugs) wrote on 2010-07-24:

#629

(In reply to comment #468)
> I've notices that earlier in the long list of comments. But could it be that
> others confuse the real issue with swapout slowing things down during high
> disk
> write?

This squares somewhat with my own experience:

1. The file cache is *very* aggressive, even pushing out to swap stuff I think I might be using.

2. Large writes to swap trounce interactivity (and little gets scheduled).

Small writes seem not to have an adverse effect. OK, I understand pushing out pages that haven't been used in a while in favour of more current caches; however, doing something that can result in 1.5 GiB going to page cache on a 2 GiB system (large copy, kernel compile) seem to provoke these large writes which make everything go slow.

Revision history for this message

blahde (daisy-ice) wrote on 2010-07-24:

#36

@André Desgualdo Pereira:

Thank you. Unfortunately 2.6.33 doesn't make any difference for me either - on Lucid Lynx. Whereas I can confirm that the transfer rate from USB to USB seems to be not affected by this.

Revision history for this message

André Desgualdo Pereira (desgua) wrote on 2010-07-24:

#37

Sorry I can't help any further.
If I found something I will post here.
Regards.

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2010-07-24:

#630

(In reply to comment #469)
>
> 1. The file cache is *very* aggressive, even pushing out to swap stuff I
> think
> I might be using.
>

Now, I'm not a kernel hacker, but a programmer afterall, and to me it seems to be a an easier job to fix the aggressive file cache than to fix this "large I/O operations ......"-thing - which is not at all that concrete and varies over platforms, machine specs etc.

Maybe fixing the aggressive file cache would fix a lot of peoples problems - I'm guessing that the file cache behaves 100% the same on all systems. Is that a correct assumption?

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2010-07-24:

#631

(In reply to comment #470)
> (In reply to comment #469)
> >
> > 1. The file cache is *very* aggressive, even pushing out to swap stuff I
> think
> > I might be using.
> >
>
> Now, I'm not a kernel hacker, but a programmer afterall, and to me it seems
> to
> be a an easier job to fix the aggressive file cache than to fix this "large
> I/O
> operations ......"-thing - which is not at all that concrete and varies over
> platforms, machine specs etc.

Isn't there already a knob for controlling the kernel's preference for swapping anonymous pages out to disk versus retaining cached/buffered block-device pages?

/proc/sys/vm/swappiness — http://kerneltrap.org/node/3000

Our apps are appearing to hang because their GUI threads have stalled while waiting on pages (containing either executable code or auxiliary data like pixmaps) to come back into RAM from the disk. Reading those pages back in is taking forever because the disk queue is full of writes. The situation is worsened because reading the pages is not pipelined since the requests are being submitted from the page fault handler, so a program executing while huge disk activity is in progress will submit a request to load one page from disk and stall; then when that request is fulfilled, the program will execute a few hundred instructions more until its instruction pointer crosses into another page that isn't loaded from disk, whereupon the page fault handler will be invoked again, a new request will be submitted to the disk queue, and the application will hang again. Repeat ad infinitum. Meanwhile, while the program is stalled waiting for the page it needs to be loaded in from disk, all the rest of its pages are being evicted from RAM to make room for the huge disk buffers, thus perpetuating the problem.

I would think the easiest and most reliable solution to this problem would be for the kernel to prefer fulfilling page-in requests ahead of dirtying blocks. If there are any requests to read pages in from disk to satisfy page faults, those requests should be fulfilled and a process's request to dirty a new page should be blocked. In other words, as dirty blocks are flushed to disk, thus freeing up RAM, the process performing the huge write shouldn't be allowed to dirty another block (thus consuming that freed RAM) if there are page-ins waiting to be fulfilled.

(In reply to comment #470)
> (In reply to comment #469)
> > 
> > 1. The file cache is *very* aggressive, even pushing out to swap stuff I
> think
> > I might be using.
> > 
> 
> Now, I'm not a kernel hacker, but a programmer afterall, and to me it seems
> to
> be a an easier job to fix the aggressive file cache than to fix this "large
> I/O
> operations ......"-thing - which is not at all that concrete and varies over
> platforms, machine specs etc.

Isn't there already a knob for controlling the kernel's preference for swapping anonymous pages out to disk versus retaining cached/buffered block-device pages?

/proc/sys/vm/swappiness — http://kerneltrap.org/node/3000

Our apps are appearing to hang because their GUI threads have stalled while waiting on pages (containing either executable code or auxiliary data like pixmaps) to come back into RAM from the disk.  Reading those pages back in is taking forever because the disk queue is full of writes.  The situation is worsened because reading the pages is not pipelined since the requests are being submitted from the page fault handler, so a program executing while huge disk activity is in progress will submit a request to load one page from disk and stall; then when that request is fulfilled, the program will execute a few hundred instructions more until its instruction pointer crosses into another page that isn't loaded from disk, whereupon the page fault handler will be invoked again, a new request will be submitted to the disk queue, and the application will hang again.  Repeat ad infinitum.  Meanwhile, while the program is stalled waiting for the page it needs to be loaded in from disk, all the rest of its pages are being evicted from RAM to make room for the huge disk buffers, thus perpetuating the problem.

I would think the easiest and most reliable solution to this problem would be for the kernel to prefer fulfilling page-in requests ahead of dirtying blocks.  If there are any requests to read pages in from disk to satisfy page faults, those requests should be fulfilled and a process's request to dirty a new page should be blocked.  In other words, as dirty blocks are flushed to disk, thus freeing up RAM, the process performing the huge write shouldn't be allowed to dirty another block (thus consuming that freed RAM) if there are page-ins waiting to be fulfilled.

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2010-07-24:

#632

Created attachment 27243
vmstat for my system running "stress -d 1" without hanging.

My system just logged into KDE around 650 Mb of memory used by applications
prior to starting "stress -d 1"

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2010-07-24:

#633

(In reply to comment #471)
>
> I would think the easiest and most reliable solution to this problem would be
> for the kernel to prefer fulfilling page-in requests ahead of dirtying
> blocks.
> If there are any requests to read pages in from disk to satisfy page faults,
> those requests should be fulfilled and a process's request to dirty a new
> page
> should be blocked. In other words, as dirty blocks are flushed to disk, thus
> freeing up RAM, the process performing the huge write shouldn't be allowed to
> dirty another block (thus consuming that freed RAM) if there are page-ins
> waiting to be fulfilled.

I agree with you on the preference-part. It will fix the race-like situation. But as I understand, it will not keep the file cache from swapping out a single page?

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2010-07-25:

#634

(In reply to comment #473)
> I agree with you on the preference-part. It will fix the race-like situation.
> But as I understand, it will not keep the file cache from swapping out a
> single
> page?

Implementing my suggestion wouldn't prevent mmap'd pages from being evicted from RAM to make room for file cache. It would only mean (1) that the file cache wouldn't be allowed to consume pages that are needed to satisfy page faults, and (2) that requests to read pages in from disk (whether from swap (anonymous pages) or from mmap'd files such as executables) would be serviced ahead of any other reads or writes in the disk queue.

Revision history for this message

In Linux Kernel Bug Tracker #12309, james (james-linux-kernel-bugs) wrote on 2010-07-25:

#635

(In reply to comment #471)

> Isn't there already a knob for controlling the kernel's preference for
> swapping
> anonymous pages out to disk versus retaining cached/buffered block-device
> pages?
>
> /proc/sys/vm/swappiness — http://kerneltrap.org/node/3000

(For some reason playing with this doesn't seem to do anything, but perhaps that's another bug report.)

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2010-07-25:

#636

> I would think the easiest and most reliable solution to this problem would be
> for the kernel to prefer fulfilling page-in requests ahead of dirtying
> blocks.
> If there are any requests to read pages in from disk to satisfy page faults,
> those requests should be fulfilled and a process's request to dirty a new
> page
> should be blocked. In other words, as dirty blocks are flushed to disk, thus
> freeing up RAM, the process performing the huge write shouldn't be allowed to
> dirty another block (thus consuming that freed RAM) if there are page-ins
> waiting to be fulfilled.

Matt: Wouldn't setting dirty_bytes to low values make sure that the processes never dirty more than a fixed number of pages, and hence never get to consume more RAM until their existing dirty pages are flushed? Or may that's not how dirty_*bytes is designed to work. May be (I am guessing here) it just controls when the flush begins to happen for dirty pages, the application can still continue to dirty more pages. But if dirty_bytes controls when the process itself has to flush its dirty buffers, then it would be busy flushing and waiting on IO to complete and can't be dirtying more memory, right? So, it does look like setting dirty_bytes to a low value like 4096 will produce an extreme case where the process writes are almost completely sync and page cache is not pounded at all.

Can someone try this extreme test? set dirty_bytes to 4096 and rerun your scenario. The sequential bandwidth seen by the disk stresser will go down the drain but your system should survive.

Revision history for this message

In Linux Kernel Bug Tracker #12309, andrew (andrew-linux-kernel-bugs) wrote on 2010-07-25:

#637

According to http://www.kernel.org/doc/Documentation/sysctl/vm.txt

"Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any
value lower than this limit will be ignored and the old configuration will be
retained."

Better make that 8192

Also you could try lowering /proc/sys/vm/dirty_ratio

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2010-07-25:

#638

(In reply to comment #477)
> According to http://www.kernel.org/doc/Documentation/sysctl/vm.txt
>
> "Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any
> value lower than this limit will be ignored and the old configuration will be
> retained."
>
> Better make that 8192
>
> Also you could try lowering /proc/sys/vm/dirty_ratio

Setting dirty_bytes to 8192 solves the slowdown of me. Of cause it ends up with a troughput from "stress -d 1" which is considerably lower than when dirty_bytes was set to 0 (ie.

<quote-from-doc>
If dirty_bytes is written, dirty_ratio becomes a function of its value
(dirty_bytes / the amount of dirtyable system memory).
</quote-from-doc>

Now, dirty_ratio is 60 by default, so 60% of my system memory can be used for dirty pages. On my system that is 1.2GB. So if I do not have 1.2GB free and I am doing some high troughput write to disk my system will hang. I think it is a bit overkill especially seen in the perspective that a standard harddisk can write no more that 100MB/sec.

The kernel should be reosonable enough to behave and not just hog the majority of system memory during high throughput operations. Just think of system with 8GB of memory and the 6Gb is used by running application. Runnning "stress -d 1" on such a setup would kill it. The writing application would be allowed to use 60% of the 8GB for dirty pages. It seems massive, so please correctly me if I'm wrong since I have not done a test on such a system.

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2010-07-25:

#639

Søren: These parameters exist to tune the system behavior. There are other parameters which control the behavior of pdflush and FS journal threads but getting these all in harmony to make the system perform well in all scenarios is not an easy task. I think the hope is that pages will be reclaimed fast enough by pdflush if its parameters are tuned as well.

But I agree that by default letting one process to dirty 60% of physical RAM before it blocks itself on IO flush, is a bad thing. Particularly, when filling RAM is many orders of magnitude faster than emptying it to disk. A couple of rogue user processes can bring the system down in a hurry.

Linux needs to account for the disparity between RAM and disk, and how that disparity has increased many folds in recent times. 2GB system is considered minimum these days. Filling 60% of it will take few microseconds even on slowest of RAM, but emptying it to disk will take many seconds if not minutes on fastest drives.

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2010-07-25:

#640

Søren: These parameters exist to tune the system behavior. There are other parameters which control the behavior of pdflush and FS journal threads but getting these all in harmony to make the system perform well in all scenarios is not an easy task. I think the hope is that pages will be reclaimed fast enough by pdflush if its parameters are tuned as well.

But I agree that by default letting one process to dirty 60% of physical RAM before it blocks itself on IO flush, is a bad thing. Particularly, when filling RAM is many orders of magnitude faster than emptying it to disk. A couple of rogue user processes can bring the system down in a hurry.

Linux needs to account for the disparity between RAM and disk, and how that disparity has increased many folds in recent times. 2GB system is considered minimum these days. Filling 60% of it will take few microseconds even on slowest of RAM, but emptying it to disk will take many seconds if not minutes on normal drives.

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2010-07-25:

#641

Apologies for the double post. The first one timed out on me. While reposting, I realized fastest drives on market today (the SSDs) will likely be able to do stuff in seconds, so, I changed the word fastest to normal...:-)

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2010-07-25:

#642

devsk: Yeah, but shouldn't those knobs be to squeeze the most out of your system? The defaults should be set in a way that is not destructive.

fx.

swappiness = 0 - 10
or
dirty_ratio = 10

or a combination of both or some other settings.

People will experience trouble with the default settings anyway, so reports like "high troughput disk writes is slow" is certainly a lot better than "high troughput disk write locks my machine".

What is the best fist steps to solving this:
1. Changing defaults on existing knobs?
2. Change the kernel code?

Revision history for this message

In Linux Kernel Bug Tracker #12309, andrew (andrew-linux-kernel-bugs) wrote on 2010-07-25:

#643

There are currently various patches dealing with various aspects of writeback. Some or all of these _may_ be ready for inclusion in 2.6.36

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2010-07-27:

#644

Nice .... where are those. If they apply to 2.6.35-something I will be happy to try them out.

Revision history for this message

In Linux Kernel Bug Tracker #12309, andrew (andrew-linux-kernel-bugs) wrote on 2010-07-27:

#645

Here are a couple of things being worked on.

http://lwn.net/Articles/397003/
http://lwn.net/Articles/396512/

You'll need to dig around for the patches.

Revision history for this message

In Linux Kernel Bug Tracker #12309, andrew (andrew-linux-kernel-bugs) wrote on 2010-08-01:

#646

Wu Fengguang of Intel has started looking through this bug report. He has some patches that he'd like people to try.

http://lkml.org/lkml/2010/8/1/40
http://lkml.org/lkml/2010/8/1/45

Revision history for this message

In Linux Kernel Bug Tracker #12309, mpartap (mpartap-linux-kernel-bugs) wrote on 2010-08-02:

#647

Created attachment 27313
screenshot of extreme iowait at ridiculously low throughput

Revision history for this message

In Linux Kernel Bug Tracker #12309, mpartap (mpartap-linux-kernel-bugs) wrote on 2010-08-02:

#648

Created attachment 27314
Wu Fengguang's anti-io-stall patch rebased for vanilla 2.6.35

@#486
The posted patches didn't apply to recent kernels, just rebased for latest kernel release and compiled.. Will restart machine now and party wildly if FINALLY this small change fixes this issue.

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2010-08-02:

#649

(In reply to comment #487)
> Created an attachment (id=27313) [details]
> screenshot of extreme iowait at ridiculously low throughput

I have found that even if dstat should 0B throughput the disk have be very much active. So dstat seems to not measure the amount of bytes actually going to the disk.

Revision history for this message

In Linux Kernel Bug Tracker #12309, hassium (hassium-linux-kernel-bugs) wrote on 2010-08-02:

#650

2.6.35 + patch from #488

Mouse froze four times at 1 - 1.5 seconds, while dd wrote.

When the sweep opens the file and swap grew from 0 to 1.3 GiB, mouse frozen. After opening the file Kopete loses connection to the Jabber account and KWin disables desktop effects.

Revision history for this message

In Linux Kernel Bug Tracker #12309, hassium (hassium-linux-kernel-bugs) wrote on 2010-08-02:

#651

Created attachment 27324
test results

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2010-08-02:

#652

(In reply to comment #490)
> 2.6.35 + patch from #488
>
> Mouse froze four times at 1 - 1.5 seconds, while dd wrote.
>
> When the sweep opens the file and swap grew from 0 to 1.3 GiB, mouse frozen.
> After opening the file Kopete loses connection to the Jabber account and KWin
> disables desktop effects.

Did you ensure to have a 50% usage before starting the test. Just to make sure to trigger pageout.

Revision history for this message

In Linux Kernel Bug Tracker #12309, gaguilar (gaguilar-linux-kernel-bugs) wrote on 2010-08-03:

#653

I'm just copying a few files from NFS folder to USB in my computer.

I found that IO wait times are huge but Network is not in use. This is strange as the folder is a NFS one with GB ethernet attached.

The problem is that the IOWait times are making my desktop unusable. Window manager takes a lot of time to move a window around, desktop does not responds well, mouse got hang sometimes... This is a mess.

This is kernel.

Linux azul1 2.6.35-10-generic #15-Ubuntu SMP Thu Jul 22 11:10:38 UTC 2010 x86_64 GNU/Linux

Some maintainer of the kernel should order this bug. Separate in a few different bugs (because I'm sure that ther are more than one related to this) and try to resolve them. Divide and conquer!

Thank you guys!

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2010-08-05:

#654

The patch from #488 does not solve the problem on my machine. My machine start to stall even if there is still 2GiB of 8GiB RAM free. The menu stalls, if the icons are not loaded and there is heavy io.

It starts faster to stall while executing
dd if=/dev/zero of=t1 bs=1M count=8K (throughput ~48,2MiB/s)
instead of
dd if=/dev/zero of=t1 bs=4K count=2M (throughput ~52,7MiB/s)

The test data is written on the inner part of the disk, while the os is on the outer part. All partitions are ext4.

High fragmentation caused by lvm snapshots, increases this problem.

Revision history for this message

In Linux Kernel Bug Tracker #12309, pedrib (pedrib-linux-kernel-bugs) wrote on 2010-08-05:

#655

Hi,

I did some tests with the patch from #488.

Test procedure:
- filled up memory to 70/80% (4GB physical memory total)
- executed "stress -d 1"
- played changing windows, changing tabs in chromium, accessing menus, etc

-----------------------------------------
2.6.35 vanilla, 10GB swap partition on:
Complete hang, no response at all from mouse or keyboard, had to reboot manually

2.6.35 vanilla, 10GB swap partition off:
A few hiccups, but system was still usable, although slow.

2.6.35 + patch from #488, swap partition on:
A few hiccups, but system was still usable, although slow.

2.6.35 + patch from #488, swap partition off:
A few hiccups, but system was still usable, although slow.
-----------------------------------------

So the patch from #488 seems to solve the problem for me. The hiccups and slowness can be attributed to my relatively slow magnetic disk and the fact that my partition is encrypted under LUKS.

This is a very important bug for Linux in the desktop, I'm glad there is a patch out for it and I'll continue to use the patch for my kernels, but it should definitely be fixed in mainline!

Revision history for this message

Adam Kulagowski (fidor-fidor) wrote on 2010-08-05:

#38

I have Sandisk Backup U3. (32G). I've tested it on 4 different computers. I had always writing speed around 3MB/s, which is slow, because this pendrive is capable of doing 16-17MB/s writing speed. The only way I'm able to put files faster is to use dd with bs=64 AND oflag=direct. Using these options I have full writing speed.

dd if=ubuntu-10.04-server-amd64.iso of=/media/FE35-228F/file.bin bs=64k oflag=direct
10840+1 records in
10840+1 records out
710412288 bytes (710 MB) copied, 43,8788 s, 16,2 MB/s

What is also important I can break the copying proccess any time. Without oflag=direct dd ignores Ctrl-C or even kill -9.

This was tested on 10.04 and with similar result on "Recovery is Possible" distro (kernel 2.6.34-git16)

uname -a
Linux fidor 2.6.32-24-generic #38-Ubuntu SMP Mon Jul 5 09:20:59 UTC 2010 x86_64 GNU/Linux

Maybe it will help.

Revision history for this message

In Linux Kernel Bug Tracker #12309, psypher246 (psypher246-linux-kernel-bugs) wrote on 2010-08-05:

#656

Hi all, has anyone seen this article?

http://www.phoronix.com/scan.php?page=news_item&px=ODQ3Mw

Are they talking about the same patches? Sounds like the same issue.

Revision history for this message

Adam Kulagowski (fidor-fidor) wrote on 2010-08-05:

#39

I've made a typo in my previous comment: you have to specify (at least in my case) bs=64k , not bs=64. Command line example was correct. Any other value, bigger or smaller (32k, 128k, 256k) brings speed down back to 3MB/s.

One more thing. I've found second pen drive which is working correctly (full 5MB/s writing speed). There are some small differences in lsusb between those two. I'm attaching lsusb -v output.

On the working pen drive (Adata) bs doesn't really matter. Of course with bigger block size, you get bigger writing speed up to 128k. Bigger bs than 128k doesnt change anything. I've tested from 256k up to 2048k, still achieving full writing speed.

I'll try to test more USB sticks.

Revision history for this message

In Linux Kernel Bug Tracker #12309, coornail (coornail-linux-kernel-bugs) wrote on 2010-08-06:

#657

I tried the patch from #488 on 2.6.35.
When running dd if=/dev/zero of=/tmp/test bs=1M count=1M the system was almost flawless, windows switched quickly, opened programs reacted instantly.

It might be that I'm mistaken, but I'm under the impression that my programs takes more time to launch. I wonder if anyone else have that.

Revision history for this message

In Linux Kernel Bug Tracker #12309, uzytkownik2 (uzytkownik2-linux-kernel-bugs) wrote on 2010-08-06:

#658

*** Bug 15463 has been marked as a duplicate of this bug. ***

Revision history for this message

In Linux Kernel Bug Tracker #12309, mpartap (mpartap-linux-kernel-bugs) wrote on 2010-08-06:

#659

#496:
yes the patch mentioned on phoronix IS the one from #488, and as reported by several it seems to improve IO latency (at the cost of throughput?) but falls short of completely preventing stalls. Strange thing for me is, the problems seemingly increase with uptime... besides i noticed some rogue flash-btrfs-1 threads causing 1MiB/s avg disk writing (uptime > 2 days, even after bringing down services causing heavy IO).. posted a blktrace of that to the linux-btrfs ml but no answer yet ^^

Wow this one's a tricky one.
One thing i noticed a few kernel revisions back that might be relevant: there were a lot of processes in IOWAIT state (result of compiling packages, BOINC, munin-graph, ntop... and then some) and i wanted to priorize a single process so i issued a ionice -p xxx -c1 -n0 (realtime: prio 0). What i expected was that that process would instantly get its IO through and pick up work - alas it took SEVERAL MINUTES before it did. That really wtfed me.. Is this broken by design? Shouldn't iorenicing take effect immediately?

Revision history for this message

In Linux Kernel Bug Tracker #12309, gatekeeper.mail (gatekeeper.mail-linux-kernel-bugs) wrote on 2010-08-06:

#660

#496 doesn't solve the problem IMHO.

Tested on Ubuntu Karmic (10.04) with vanilla 2.6.35.

A simple 'dd if=/dev/zero of=/some/file bs=1M' caused 100% load (dual-head Core2 Duo E8500) and a high latency on even ^C'ing dd process itself. Need more info? Ask please.

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2010-08-06:

#661

I tried the patch rebased for 2.6.35
https://bugzilla.kernel.org/attachment.cgi?id=27314

It is problably ok, byt my first test is to fill my memory with all apps I can find and then run "stress -d 1". And as expected it started paging stuff out. You other guys must have the exact same problem, at least you Pedro. To me the responsiveness drop because of paging out.

Revision history for this message

In Linux Kernel Bug Tracker #12309, sprutos (sprutos-linux-kernel-bugs) wrote on 2010-08-07:

#662

echo 10 > /proc/sys/vm/vfs_cache_pressure
echo 4096 > /sys/block/sda/queue/nr_requests
echo 4096 > /sys/block/sda/queue/read_ahead_kb
echo 100 > /proc/sys/vm/swappiness
echo 0 > /proc/sys/vm/dirty_ratio
echo 0 > /proc/sys/vm/dirty_background_ratio

this solution work for me.
or use "sync" fs-mount option.

Revision history for this message

In Linux Kernel Bug Tracker #12309, sgh (sgh-linux-kernel-bugs) wrote on 2010-08-07:

#663

(In reply to comment #502)
> echo 10 > /proc/sys/vm/vfs_cache_pressure
> echo 4096 > /sys/block/sda/queue/nr_requests
> echo 4096 > /sys/block/sda/queue/read_ahead_kb
> echo 100 > /proc/sys/vm/swappiness
> echo 0 > /proc/sys/vm/dirty_ratio
> echo 0 > /proc/sys/vm/dirty_background_ratio
>
> this solution work for me.
> or use "sync" fs-mount option.

Yeah, but testing a kernel patch with those testtings is not good for seing its effects.

Revision history for this message

In Linux Kernel Bug Tracker #12309, pedrib (pedrib-linux-kernel-bugs) wrote on 2010-08-07:

#664

(In reply to comment #501)
> I tried the patch rebased for 2.6.35
> https://bugzilla.kernel.org/attachment.cgi?id=27314
>
> It is problably ok, byt my first test is to fill my memory with all apps I
> can
> find and then run "stress -d 1". And as expected it started paging stuff out.
> You other guys must have the exact same problem, at least you Pedro. To me
> the
> responsiveness drop because of paging out.

Hi Soren,

as said in my comment, I do have the responsiveness drop, but I don't think that is a bug. If you are swapping to a slow disk, that is kind of expected. However, what is not expected is a complete loss of responsiveness, with the UI hanging if even for a few seconds.

I find that the mentioned patch improves a lot this situation vs the vanilla kernel. Of course, the best option yet is to disable swap, but for me 4GB of ram is not enough...

Revision history for this message

In Linux Kernel Bug Tracker #12309, alpha_one_x86 (alphaonex86-linux-kernel-bugs) wrote on 2010-08-08:

#665

I have too reactivity problem on linux when I do large file copy.
Other OS is very responsive when do multiple file copy but not linux.
Windows have all IO async (no sync possible, read in the Qt doc), why not have same option in linux kernel?

Revision history for this message

In Linux Kernel Bug Tracker #12309, pedrib (pedrib-linux-kernel-bugs) wrote on 2010-08-16:

#666

After testing the patches intensively, I have to say that although they do improve the situation, they do it only slightly. I guess the best solution is still disabling swap.

Also, what's the idea of having a swappiness tunable if it doesn't work? I can set it to 0, and even though I have only 70% of physical memory in use the system starts swapping to disk.

Revision history for this message

In Linux Kernel Bug Tracker #12309, rsarraf (rsarraf-linux-kernel-bugs) wrote on 2010-08-16:

#667

(In reply to comment #506)
> After testing the patches intensively, I have to say that although they do
> improve the situation, they do it only slightly. I guess the best solution is
> still disabling swap.
>

It does help initially but not always. Under memory crunch, I found my laptop completely unresponsive even though swap was off (RAM is 3GiB)

> Also, what's the idea of having a swappiness tunable if it doesn't work? I
> can
> set it to 0, and even though I have only 70% of physical memory in use the
> system starts swapping to disk.

That's weird. On my box, it does work the way it is designed. I have overall concluded that the default value of 60 is correct. If there is a buggy application, that should be fixed. I wouldn't be interested in OOMs on my box.

Revision history for this message

In Linux Kernel Bug Tracker #12309, zenith22.22.22 (zenith22.22.22-linux-kernel-bugs) wrote on 2010-08-25:

#668

Memory count actually drops when the system becomes unresponsive during copying of a large file, if a bunch of small files was copied immediately before.

Revision history for this message

FriedChicken (domlyons) wrote on 2010-10-03:

#40

Maybe this is related to https://bugzilla.kernel.org/show_bug.cgi?id=12309

Revision history for this message

In Linux Kernel Bug Tracker #12309, peterhoeg (peterhoeg-linux-kernel-bugs) wrote on 2010-10-21:

#669

I've added some information on the Ubuntu bug page, but will add it here for completeness sake:

1) I'm seeing this problem extremely frequently due to an unrelated bug that makes X leak memory.

2) On a machine with 4GB memory and no swap, the disk starts thrashing like crazy when 60-70% of the memory is used. It's so bad that I can't even log in on a console as getty times out before I get a chance to enter the password.

3) If swap is enabled on the same machine, it will start swapping out. Doing a "swapoff -a" will force the swap in as planned, but it happens with approximately 500KB/s.

Revision history for this message

In Linux Kernel Bug Tracker #12309, frankrq2009 (frankrq2009-linux-kernel-bugs) wrote on 2010-10-24:

#670

I have compiled the new 2.6.36 kernel today, I found this bug is REALLY fixed on my notebook! Copy a 700MB movie to USB disk became very smooth and quick, GUIs are very responsive, much better than 2.6.35.4(the last kernel of Zenwalk). Just like some one said, the angels are singing again! Congratulations! Great work! Long live Linux!

Revision history for this message

In Linux Kernel Bug Tracker #12309, mihel (mihel-linux-kernel-bugs) wrote on 2010-10-26:

#671

I'm not seeing this issue on 2.6.36 amd64 4Gb RAM 3Gb swap swapiness 20
Running 'stress -d 1' and browsing websites for 15 minutes with no issues

Revision history for this message

In Linux Kernel Bug Tracker #12309, vi0oss (vi0oss-linux-kernel-bugs) wrote on 2010-10-27:

#672

2.6.36-zen0-00214-g665fe96

Still has about 1 second page faults when "stress -d 1" or "pv /dev/zero > qqq".

Swap is off.

This:

> echo 10 > /proc/sys/vm/vfs_cache_pressure
> echo 4096 > /sys/block/sda/queue/nr_requests
> echo 4096 > /sys/block/sda/queue/read_ahead_kb
> echo 100 > /proc/sys/vm/swappiness
> echo 0 > /proc/sys/vm/dirty_ratio
> echo 0 > /proc/sys/vm/dirty_background_ratio

does not help.

Uniprocessor system, i386. 1.5G of RAM. 1G of it was in use by applications when testing.

Revision history for this message

In Linux Kernel Bug Tracker #12309, loki (loki-linux-kernel-bugs) wrote on 2010-10-28:

#673

Download full text (5.3 KiB)

Just wanted to add my two cents, since I'm experiencing this problem for a very long time now on various machines. But I just adopted myself by doing nothing on the OS when I have large file copies. But somehow I stumbelled upon a solution for this, maybe. I had this problems, the one that you are talking about in this bug and some others after I started using MD-raid. First I thought it was something with the IO-scheduler. Tried all schedulers there are, No-op, CFQ, Deadline, Anticipatory... Some helped a little bit some didn't. Then I thought it was something with the FS, tried ext2, ext3, XFS and now ext4. The same problem prevailed. When I started copying large files I had OS "hickups". Everything that had to do some disk work stopped. Music, and OpenGL where still functioning normal, only the responsivness of the system was gone for 1 or 2 secs. No browsing, no changing terminal windows. Then I thought that it had something to do with SWAP, too.

A few days ago I got meself a new machine, i7/950, 2 x SATA3 WD HD, 12GB of ram, and I installed myself a new OS, pure64 bit kernel 3.6.36. The thing I had to do was to copy my old data to the new disks, and reuse the old disks. Now the way I did it is very important. I took a 1 TB WD HD Sata3, made some partitions (6 to be exact) and compiled a new OS. Then I copied the old data from the old raid. The old raid was 4 partitions on each disk with MD RAID 1 on two part. each. While I copied the data I had this hickups also, with the new system.

I had this idea, since now it is possible to make partitioned raid with MD, and you can take whole disks for an array, to make a RAID 10 out of this four disks, 2 new ones and 2 old ones. So it was like "mdadm --create /dev/md0 ... --raid-devices=4 /dev/sda /dev/sdb..."
Worked like a charm. Then I partitioned the array "fdisk /dev/md0". No problem there. Then I copied the old stuff from the single hard, with 6 part, to the new array. Now here is the interesting bit. No hickups !!!. Throughput was around 120MB/s and the OS was working smoothly as a Babies bump. And it was the same OS, no changes at all regarding kernel compile, or something else. Reading throughput was 270MB/s (dd-test). But since rootfs won't work on a partitioned MD array (some kernel racing problem, but that's another story) I had to change my setup on the new HDs. So again I created 4 normal partitions on each disk, one from all HD's for bootfs RAID 1, another 4 for swap, another 4 for the rootfs also RAID1 and the last four ones for RAID10 which I partitioned into two seperate partitions (srv and home). And the hickups came back. So this isn't hardware related. Because this problem I can reproduce on many Hardware. A list will follow. It's not with file system or such because I used them all. It's not SWAP, because on this new machine it didn't start to swap while I was copying. But this problem always comes up when I make more partitions (normal ones) for md-raid.

The list of Hardware:

Quad-Core 6600, I think it was ICH7 chipset, 8GB Ram, 2 x WD10EARS I think the kernel was 2.6.20 something, 32bit system, LinuxFromScratch 6.1 or 2. Can't
remember. The system worked for three yrs to...

Just wanted to add my two cents, since I'm experiencing this problem for a very long time now on various machines. But I just adopted myself by doing nothing on the OS when I have large file copies. But somehow I stumbelled upon a solution for this, maybe. I had this problems, the one that you are talking about in this bug and some others after I started using MD-raid. First I thought it was something with the IO-scheduler. Tried all schedulers there are, No-op, CFQ, Deadline, Anticipatory... Some helped a little bit some didn't. Then I thought it was something with the FS, tried ext2, ext3, XFS and now ext4. The same problem prevailed. When I started copying large files I had OS "hickups". Everything that had to do some disk work stopped. Music, and OpenGL where still functioning normal, only the responsivness of the system was gone for 1 or 2 secs. No browsing, no changing terminal windows. Then I thought that it had something to do with SWAP, too.

A few days ago I got meself a new machine, i7/950, 2 x SATA3 WD HD, 12GB of ram, and I installed myself a new OS, pure64 bit kernel 3.6.36. The thing I had to do was to copy my old data to the new disks, and reuse the old disks. Now the way I did it is very important. I took a 1 TB WD HD Sata3, made some partitions (6 to be exact) and compiled a new OS. Then I copied the old data from the old raid. The old raid was 4 partitions on each disk with MD RAID 1 on two part. each. While I copied the data I had this hickups also, with the new system.

I had this idea, since now it is possible to make partitioned raid with MD, and you can take whole disks for an array, to make a RAID 10 out of this four disks, 2 new ones and 2 old ones. So it was like "mdadm --create /dev/md0 ... --raid-devices=4 /dev/sda /dev/sdb..."
Worked like a charm. Then I partitioned the array "fdisk /dev/md0". No problem there. Then I copied the old stuff from the single hard, with 6 part, to the new array. Now here is the interesting bit. No hickups !!!. Throughput was around 120MB/s and the OS was working smoothly as a Babies bump. And it was the same OS, no changes at all regarding kernel compile, or something else. Reading throughput was 270MB/s (dd-test). But since rootfs won't work on a partitioned MD array (some kernel racing problem, but that's another story) I had to change my setup on the new HDs. So again I created 4 normal partitions on each disk, one from all HD's for bootfs RAID 1, another 4 for swap, another 4 for the rootfs also RAID1 and the last four ones for RAID10 which I partitioned into two seperate partitions (srv and home). And the hickups came back. So this isn't hardware related. Because this problem I can reproduce on many Hardware. A list will follow. It's not with file system or such because I used them all. It's not SWAP, because on this new machine it didn't start to swap while I was copying. But this problem always comes up when I make more partitions (normal ones) for md-raid.

The list of Hardware:

Quad-Core 6600, I think it was ICH7 chipset, 8GB Ram, 2 x WD10EARS I think the kernel was 2.6.20 something, 32bit system, LinuxFromScratch 6.1 or 2. Can't
remember. The system worked for three yrs to now.

The partition of the disks was sda1,sda2,sda3,sda4,sdb1,sdb2,sdb3,sdb4
The raid arrays were md0 -> (sda1,sdb1); ... ; md4 -> (sda4,sdb4)
md0 -> /boot
md1 -> swap
md2 -> /
md3 -> /srv

Fujtisu Siemens RX100S6 x 2
1x XEON 3220 (Quad), 4GB memory, and I can't remember the chipset.
1x XEON E3110 (Dual), 4GB Ram, still can't remember the chipset.
kernel 2.6.32.10 pure 64bit system, LFS 6.5

And now:
i7 950, 12GB Ram, ICH 10 chipset, 2 x WD10EARS, 2 x WD1002FAEX (+1 temporary)
kernel 2.6.36, pure 64bit, LFS6.7
The setup that worked

sda1,sda2,sda3,sda4,sda5,sda6; sdb,sdc,sdd,sde
md0 -> (sdb,sdc,sdd,sde) RAID10
md0p1 -> boot (tried it but grub couldn't do it)
md0p2 -> Swap (no problem there)
md0p3 -> / (tried it after a workaround for grub to boot from RAID10, but the kernel didn't want to play along)
md0p4 -> extended part.
md0p5 -> /home (no problem there)
md0p6 -> /srv (no problem there)

sda1 -> /boot
sda2 -> swap
sda3 -> /
sda5 -> /home
sda6 -> /srv

Unfortunatly I had to dump this setup because of a Race condition where the kernel can't put partitioned md together before the rootfs boot process starts. :-(

Now the setup that doesn't work (the one with the hickups)

sda1,sda2,sda3,sda4,sdb1,sdb2,sdb3,sdb4,sdc1,sdc2,sdc3,sdc4,sdd1,sdd2,sdd3,sdd4
md0 -> (sda1,sdb1,sdc1,sdd1) RAID 1 -> /boot
swap -> (sda2,sdb2,sdc2,sdd2), didn't know what else to do with the free space
md1 (which somehow changed to md126 automagically after the third boot) -> (sda3,sdb3,sdc3,sdd3) RAID 1 -> /
md2 (which somehow changed to md127 automagically after the third boot) -> (sda4,sdb4,sdc4,sdd4) RAID 10 ->
md2p1 (changed to md127p1) -> /home 
md2p2 (changed to md127p2) -> /srv

and the temporary disk which used the sda segment until I copied everything to the new setup.

Just to mention that throughput is still ok, around 80MB/s write. Didn't try read yet. Except for those hickups.

So, what else do you need from me so that we can kill this pesting bug??
I can do everything that is not going to kill my system, cause I'm using it
for everyday work. Everything else, torture tests, and so on after working hours is ok. Oh, and yes tried the 
/sys/block/sdX/device/queue_depth thingie, worked for 5 mins and then it was back to hickuping.
dd is around 120MB/s...

Revision history for this message

In Linux Kernel Bug Tracker #12309, michiel (michiel-linux-kernel-bugs) wrote on 2010-10-29:

#674

Download full text (3.6 KiB)

To tackle this bug, there needs to be deep digging by the people who have these bugs, or good debug data has to be generated. And good info has to be given on the system.

Because there can be serveral bugs out there with the same symptoms as this one. To solve this bug, the best you could do individual bug reports with complete information. If you cannot give complete information, don't post that report, because then you are sure it cannot be solved. The more relevant info we get, the easier it becomes to detect the problems.

First install the newest kernel. Because that has the newest code and it will reduce the change that you'll run into an old and fixed bug. On time of writing it's: 2.6.36. Then test again, if it still happens, file a bug report.

First give correct system information:
Kernel: uname -a and cat /proc/version
Architecture: also from uname -a
Distro: name and version (could be handy for distro specific patches)
CPU info: cat /proc/cpuinfo | grep -e '$model name\|bogomips\|MHz\|flags$'
Mem info: cat /proc/meminfo | grep MemTotal
IO scheduler used: cat /sys/block/sdX/queue/scheduler

harddisk configuration: has raid, type of disks, speed of disk, partitions used and filessystems used

harddisk speed by hdparm:
hdparm -tT --direct /dev/sdX
hdparm -tT /dev/sdX

give dumps of the following commands:
lshw
dmesg
lsmod
cat /proc/swaps
cat /proc/meminfo
cat /proc/cmdline
cat /proc/config.gz | gunzip -

and give dumps of the following files:
for every disk:
/sys/block/<disk>/queue/*
/sys/block/<disk>/queue/iosched/*
/proc/sys/vm/*

This is for information, so the developers can detect what configuration the system has. And if there are known configurations or drivers which are bad and maybe giving the same symptoms, they will be noticed earlier.

If you want to use a script for that to help you collect the information, you can use the script located at: http://github.com/meghuizen/systeminfo which will build a tar.bz2 for you you can give as attachment, so you'll have complete information.

After that learn a bit on the I/O scheduler. To make it easier for yourself to debug and understand the situation:
  - http://www.linuxjournal.com/article/6931 (info on I/O schedulers)
  - http://www.devshed.com/c/a/BrainDump/Linux-IO-Schedulers/
  - http://kerneltrap.org/node/7637
  - kernel-source/Documentation/block/iosched-description.txt (see: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=tree;f=Documentation/block;hb=HEAD)
  - http://www.westnet.com/~gsmith/content/linux-pdflush.htm
  - http://www.docunext.com/blog/2009/10/debugging-and-reducing-io-wait.html

There are some tools which are very handy to use. The Linux Perf tool, is for example very handy to debug slowness and latencies and stuff in your system.

For some documentation on perf see:
  - https://perf.wiki.kernel.org/index.php/Main_Page
  - http://anton.ozlabs.org/blog/2010/01/10/using-perf-the-linux-performance-analysis-tool-on-ubuntu-karmic/
  - http://blog.fenrus.org/?p=5

perf --help gives you also a lot of information.

And other profiling tools:
- http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/basic_...

To tackle this bug, there needs to be deep digging by the people who have these bugs, or good debug data has to be generated. And good info has to be given on the system.

Because there can be serveral bugs out there with the same symptoms as this one. To solve this bug, the best you could do individual bug reports with complete information. If you cannot give complete information, don't post that report, because then you are sure it cannot be solved. The more relevant info we get, the easier it becomes to detect the problems.

First install the newest kernel. Because that has the newest code and it will reduce the change that you'll run into an old and fixed bug. On time of writing it's: 2.6.36. Then test again, if it still happens, file a bug report.

First give correct system information:
Kernel: uname -a and cat /proc/version
Architecture: also from uname -a
Distro: name and version (could be handy for distro specific patches)
CPU info: cat /proc/cpuinfo | grep -e '$model name\|bogomips\|MHz\|flags$'
Mem info: cat /proc/meminfo  | grep MemTotal
IO scheduler used: cat /sys/block/sdX/queue/scheduler

harddisk configuration: has raid, type of disks, speed of disk, partitions used and filessystems used

harddisk speed by hdparm:
hdparm -tT --direct /dev/sdX
hdparm -tT /dev/sdX

give dumps of the following commands:
lshw
dmesg
lsmod
cat /proc/swaps
cat /proc/meminfo
cat /proc/cmdline
cat /proc/config.gz | gunzip -

and give dumps of the following files:
for every disk:
	/sys/block/<disk>/queue/*
	/sys/block/<disk>/queue/iosched/*
/proc/sys/vm/*

This is for information, so the developers can detect what configuration the system has. And if there are known configurations or drivers which are bad and maybe giving the same symptoms, they will be noticed earlier.

If you want to use a script for that to help you collect the information, you can use the script located at: http://github.com/meghuizen/systeminfo which will build a tar.bz2 for you you can give as attachment, so you'll have complete information.

After that learn a bit on the I/O scheduler. To make it easier for yourself to debug and understand the situation:
  - http://www.linuxjournal.com/article/6931 (info on I/O schedulers)
  - http://www.devshed.com/c/a/BrainDump/Linux-IO-Schedulers/
  - http://kerneltrap.org/node/7637
  - kernel-source/Documentation/block/iosched-description.txt (see: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=tree;f=Documentation/block;hb=HEAD)
  - http://www.westnet.com/~gsmith/content/linux-pdflush.htm
  - http://www.docunext.com/blog/2009/10/debugging-and-reducing-io-wait.html

There are some tools which are very handy to use. The Linux Perf tool, is for example very handy to debug slowness and latencies and stuff in your system.

For some documentation on perf see:
  - https://perf.wiki.kernel.org/index.php/Main_Page
  - http://anton.ozlabs.org/blog/2010/01/10/using-perf-the-linux-performance-analysis-tool-on-ubuntu-karmic/
  - http://blog.fenrus.org/?p=5

perf --help gives you also a lot of information.

And other profiling tools:
  - http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/basic_profiling.txt;hb=HEAD
  
 
So to debug these options, perf output is rather handy. So if there are slowdowns happening again, try to get at the same time get some perf record dumps and maybe as well as perf timechart dumps, so the developers can analyze that as well.

for example perf top gives you what's currently happening in the kernel.

perf bench can help you benchmark your system, so you could test changes with patches and kernel versions and tuning parameters.

Revision history for this message

In Linux Kernel Bug Tracker #12309, vi0oss (vi0oss-linux-kernel-bugs) wrote on 2010-10-29:

#675

Tried with recent official master: 18cb657ca1bafe635f368346a1676fb04c512edf

http://vi-server.org/vi/12309_report/linux-2.6.36-09212-g18cb657_i686-sysinfo.tar.bz2

While running "pv /dev/zero > qqq" (http://vi-server.org/vi/12309_report/fill.txt), after about 2 GB I get pagefaults: http://vi-server.org/vi/12309_report/pagefault.txt http://vi-server.org/vi/12309_report/pagefault2.txt

If I try deadline or noop scheduler, I still get pagefaults, but after about 5 GB of copied data (and probably not that often)

In case of cfq the speed is jumping between 10 MB/s to 200 MB/s.

In case of deadline or noop it is more stable, around 40 MB/s

Trying
> echo 10 > /proc/sys/vm/vfs_cache_pressure
> echo 4096 > /sys/block/sda/queue/nr_requests
> echo 4096 > /sys/block/sda/queue/read_ahead_kb
> echo 100 > /proc/sys/vm/swappiness
> echo 0 > /proc/sys/vm/dirty_ratio
> echo 0 > /proc/sys/vm/dirty_background_ratio
on this kernel leads to low filling speed (lover than 10 MB/s, measured with pv)
Also after applying that settings applications (starting with gpg2) begin to hang in uninterruptible sleep with these settings. I cannot stop filling (probably it hangs too).

P.S. Using this kernel I also cannot start X server.

If somebody want, I can try other settings, other kernel revisions, patches, other config.

Revision history for this message

In Linux Kernel Bug Tracker #12309, vi0oss (vi0oss-linux-kernel-bugs) wrote on 2010-10-30:

#676

Checked a bit more with CONFIG_HZ_100 and CONFIG_PREEMPT_NONE: the same.

Filling rate with vm.dirty_ratio=0 is 1 MB/s (with periodic stalls of everything).

If I set vm.dirty_ratio to 1, it raises to 40 MB/s (stable). Long page faults when loading programs are present as well.

Was testing with only 200 MB (of 1.5G) of memory filled.

Revision history for this message

In Red Hat Bugzilla #562662, Bug (bug-redhat-bugs) wrote on 2010-11-03:

#139

This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 12 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

The process we are following is described here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Revision history for this message

In Linux Kernel Bug Tracker #12309, james (james-linux-kernel-bugs) wrote on 2010-11-22:

#677

While it feels like a general improvement with 2.6.36 (no audio stutter with swap, and building a kernel no longer drags the system down (and fills up cache) like it did with 2.6.35), I still see cursor jerkiness when I first log in and start loading Firefox, Evolution and Pidgin (all at the same time).

Revision history for this message

In Red Hat Bugzilla #562662, Bug (bug-redhat-bugs) wrote on 2010-12-03:

#140

Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Revision history for this message

In Linux Kernel Bug Tracker #12309, snigurmd (snigurmd-linux-kernel-bugs) wrote on 2010-12-12:

#678

I've come to face this problem when using the new cgroup-sheduler patch.
PC: Samsung NC10 netbook, kernel 2.6.36 vanilla, Zenwalk-snapshot.
WHen trying to upgrade some packets in X session and browsing the Net at the same time, the latency increases badly, but not constantly, just in hitches. If i stop serfing the Net and return to my packager - the system works further, otherwise it may hang so that i have to reboot with a sysrq-key.
If i turn off the cgroup scheduler in /sys - everything works fine.
The kernel is compiled with full preemption, 1000 hz timer.

Revision history for this message

In Linux Kernel Bug Tracker #12309, vi0oss (vi0oss-linux-kernel-bugs) wrote on 2011-01-04:

#679

Trying 162253844be6caa9ad8bd84562cb3271690ceca9 from zenstable/io-less-dirty-throttling-2.6.37 - the same.

Page faults of random processes (including Xorg) jump over 1 second while "pv /dev/zero > qqq".

The speed measurements by "pv" are fluctuating (from 64 kb/s to 120 MB/s; avg 40 MB/s) just like in usual 2.6.35-zen2

Revision history for this message

In Linux Kernel Bug Tracker #12309, anonymous (anonymous-linux-kernel-bugs) wrote on 2011-01-05:

#680

Reply-To: <email address hidden>

I'm currently Out Of Office. I'll be responding to emails, but except some delay in replies.

For any urgent issues, please contact my manager, Kugesh Veeraraghavan <email address hidden>

Jeremy Foshee (jeremyfoshee) on 2011-01-12

Changed in linux (Ubuntu):
assignee:	Colin King (colin-king) → nobody

Revision history for this message

In Linux Kernel Bug Tracker #12309, dik_again (dikagain-linux-kernel-bugs) wrote on 2011-03-01:

#681

I have a reproducible test sequence for a 12309. It's easy:

Take a _SCRATCHED_ DVD. Put it into the drive and copy all files on it to a HDD. The bug comes early :)

The system freezes COMPLETELY at the time the drive read a scratched sectors.

Distro: Arch

Linux linuxhost 2.6.36-ARCH #1 SMP PREEMPT Fri Dec 10 20:01:53 UTC 2010 i686 AMD Athlon(TM) XP AuthenticAMD GNU/Linux

Drive (dmesg |grep TSS)

Feb 14 20:11:45 linuxhost kernel: scsi 2:0:0:0: CD-ROM TSSTcorp CDDVDW SH-S203B SB00 PQ: 0 ANSI: 5
Feb 10 12:05:36 linuxhost kernel: ata1.00: ATAPI: TSSTcorp CDDVDW SH-S203B, SB00, max UDMA/100

SATA-Controller (on the PCI-bus, drive connected to it):

00:0a.0 RAID bus controller: VIA Technologies, Inc. VT6421 IDE RAID Controller (rev 50)

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2011-03-02:

#682

(In reply to comment #521)
> I have a reproducible test sequence for a 12309. It's easy:
>
> Take a _SCRATCHED_ DVD. Put it into the drive and copy all files on it to a
> HDD. The bug comes early :)
>
> The system freezes COMPLETELY at the time the drive read a scratched sectors.

I suspect this has more to do with the IDE bus than with the interaction between the kernel's block layer and the VM.

Try this:
dd if=/dev/dvd of=/dev/null bs=2048

I bet you get the same freezes when it reaches the scratches.

Revision history for this message

In Linux Kernel Bug Tracker #12309, dik_again (dikagain-linux-kernel-bugs) wrote on 2011-03-02:

#683

I checked the same DVD with another DVD-Drive (The Drive is on the IDE-bus, and not on the SATA-bus). All was OK. Any freezes at all. Any ideas? Is this another bug?

Revision history for this message

In Linux Kernel Bug Tracker #12309, dik_again (dikagain-linux-kernel-bugs) wrote on 2011-03-02:

#684

>Try this:
>dd if=/dev/dvd of=/dev/null bs=2048

>I bet you get the same freezes when it reaches the scratches.

You're right.

But this is still the 12309 bug, isn't it?

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2011-03-02:

#685

(In reply to comment #524)
> But this is still the 12309 bug, isn't it?

No.

However, this bug report has turned into a dumping ground for anyone experiencing any lagginess, regardless of cause. The actual bug here is related to the kernel preferring to evict memory-mapped executable pages when a process dirties blocks faster than they can be flushed to disk. The apparent hangs in responsiveness are due to threads (particularly GUI threads) triggering page faults and being unable to make progress until their code is re-fetched from disk. The fix should be to block the writing process from dirtying any more blocks well before the kernel starts evicting mapped executable pages from memory, but so far no one has been able to make it work correctly in all cases (afaik).

Revision history for this message

In Linux Kernel Bug Tracker #12309, dik_again (dikagain-linux-kernel-bugs) wrote on 2011-03-02:

#686

Alos, I better make a new bugreport for my bug?

Revision history for this message

In Linux Kernel Bug Tracker #12309, vi0oss (vi0oss-linux-kernel-bugs) wrote on 2011-03-10:

#687

Trying kernel from writeback/dirty-throttling-v6

Nothing seems to be changed, as usual. Still lengthy "Page Faults" (and others) for firefox-bin while "pv /dev/zero > qqq".

Should provide more info about dirty-throttling-v6 (how to collect it)?

Revision history for this message

In Linux Kernel Bug Tracker #12309, jaroslaw.fedewicz (jaroslaw.fedewicz-linux-kernel-bugs) wrote on 2011-03-12:

#688

> The actual bug here is
> related to the kernel preferring to evict memory-mapped executable pages when
> a
> process dirties blocks faster than they can be flushed to disk.

Okay.

Let it be so. However, the subject line for this bug is

> Large I/O operations result in poor interactive performance and high iowait
> times

and that's what I'm experiencing now, rsync'ing a 100 GB worth of data with almost everything being there on the receiving side (thus making the receiving rsync read files heavily for the checksums). And I am dead sure this has nothing to do with the virtual memory as the swap is completely off (I would probably need to compile a different kernel with no support for swapping to reconfirm). iowait rises to 90%, LA shows disturbingly large numbers of up to 20, and unrelated processes like Xorg freeze, taking around 15 seconds to redraw the screen or move the mouse cursor or whatever.

What I thought this bug was about is that while one process does overwhelmingly large volumes of I/O, it should by no means impact other, unrelated processes which might not even use the disc subsystem, or not use the same disc. At least this is what Mac OS X does: for example, Transmission preallocates space for 40 GB worth of torrent data, naturally freezing in the process and ceasing to respond to any events, but then again, I can minimise its window, type code in Eclipse or anything — barely noticing the disc thrashing. I think I'm reiterating this example for the upmteenth time here, sorry if that's the case.

If I'm wrong and the bug #12309 was reduced to its VM part, I just request which one is about the above problem — high iowait affecting unrelated processes, with no swapping involved. Is that #13347? I cannot follow it because the submitter uses a dialect of English I'm not quite capable to parse. If there's no specific bug, I'll take the time to report it, because it bugs me a great deal, however I'm afraid I'll have to repeat most of the tests already conducted here.

Please don't take it as if I'm trying to offend anyone, because I'm not. I just want to know where does go the specific symptom as described above.

Thank you all for every effort to have it resolved.

> The actual bug here is
> related to the kernel preferring to evict memory-mapped executable pages when
> a
> process dirties blocks faster than they can be flushed to disk.

Okay.

Let it be so. However, the subject line for this bug is

> Large I/O operations result in poor interactive performance and high iowait
> times

and that's what I'm experiencing now, rsync'ing a 100 GB worth of data with almost everything being there on the receiving side (thus making the receiving rsync read files heavily for the checksums). And I am dead sure this has nothing to do with the virtual memory as the swap is completely off (I would probably need to compile a different kernel with no support for swapping to reconfirm). iowait rises to 90%, LA shows disturbingly large numbers of up to 20, and unrelated processes like Xorg freeze, taking around 15 seconds to redraw the screen or move the mouse cursor or whatever.

What I thought this bug was about is that while one process does overwhelmingly large volumes of I/O, it should by no means impact other, unrelated processes which might not even use the disc subsystem, or not use the same disc. At least this is what Mac OS X does: for example, Transmission preallocates space for  40 GB worth of torrent data, naturally freezing in the process and ceasing to respond to any events, but  then again, I can minimise its window, type code in Eclipse or anything — barely noticing the disc  thrashing. I think I'm reiterating this example for the upmteenth time here, sorry if that's the case.

If I'm wrong and the bug #12309 was reduced to its VM part, I just request which one is about the above problem — high iowait affecting unrelated processes, with no swapping involved. Is that #13347? I cannot follow it because the submitter uses a dialect of English I'm not quite capable to parse. If there's no specific bug, I'll take the time to report it, because it bugs me a great deal, however I'm afraid I'll have to repeat most of the tests already conducted here.

Please don't take it as if I'm trying to offend anyone, because I'm not. I just want to know where does go the specific symptom as described above.

Thank you all for every effort to have it resolved.

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2011-03-12:

#689

Download full text (3.1 KiB)

@Yaroslav: Your misconception is that having swap disabled means that memory pages are never backed by disk blocks. That is simply not true. All it means is that *anonymous* pages cannot be backed by disk.

All Linux kernels launch processes from disk (via execve(2)) by memory-mapping the executable image on disk and then jumping to the entry point address in the mapped image. Since the entry point address is in a non-resident page, the CPU's attempt to fetch an instruction from it triggers a page fault, which the kernel then handles by loading the needed page (and usually several more) from disk.

When physical memory becomes scarce, the kernel has several tricks it may employ to attempt to free up memory. One of the first of these tricks is dropping cached blocks from the block layer and cached directory entries from the file system layer, which means that those blocks and dentries will have to be fetched from disk the next time they are accessed. One of the last tricks the kernel has is the OOM killer, which selects the "most offending" process and KILLs it in order to reclaim the memory it was using.

Somewhere in between those two tricks, the kernel has another trick it attempts for freeing up physical memory. It can force memory pages out to disk. If the system has swap enabled, the kernel may force anonymous pages (e.g., process heaps and stacks) out to disk. In all cases, however, the kernel may also choose to force memory-mapped pages out to disk. If those memory-mapped pages are read-only (such as is the case with executable images), then "forcing them out to disk" really just means dropping them from physical memory, since they can always be fetched back in later.

So, what does this mean in the context of this bug? The process that's hitting the disk a lot (usually it's dirtying blocks, but maybe it's possible that this happens even if it's just reading blocks) causes RAM to fill up with disk blocks. The kernel starts attempting its tricks to free up physical memory. One of those tricks is dropping memory-mapped pages from RAM, since they can always be fetched back into RAM from disk later. Then you the user switch applications or click on a button in the GUI or try to log into an SSH session, and what happens? Page fault! The code for repainting the X11 window or handling the button click or spawning a login session is not resident in memory because it was forced out by the kernel. That code now must be refetched from disk to satisfy the page fault, but uh oh, the disk is VERY busy and has very long queue depths, so it will be a while before the needed pages can be fetched. And at the same time as those pages are being fetched, the kernel is evicting other memory-mapped pages from RAM, so the responsiveness problem is just going to persist until the pressure on RAM subsides.

Ideally, the kernel should not allow so many blocks to be dirtied that it has to resort to dropping memory-mapped pages from RAM. The dirty_ratio knob is supposed to control how much of RAM a process is allowed to fill with dirty blocks before it's forced to write them to disk itself (synchronously), but that does not appear to be working p...

@Yaroslav: Your misconception is that having swap disabled means that memory pages are never backed by disk blocks.  That is simply not true.  All it means is that *anonymous* pages cannot be backed by disk.

All Linux kernels launch processes from disk (via execve(2)) by memory-mapping the executable image on disk and then jumping to the entry point address in the mapped image.  Since the entry point address is in a non-resident page, the CPU's attempt to fetch an instruction from it triggers a page fault, which the kernel then handles by loading the needed page (and usually several more) from disk.

When physical memory becomes scarce, the kernel has several tricks it may employ to attempt to free up memory.  One of the first of these tricks is dropping cached blocks from the block layer and cached directory entries from the file system layer, which means that those blocks and dentries will have to be fetched from disk the next time they are accessed.  One of the last tricks the kernel has is the OOM killer, which selects the "most offending" process and KILLs it in order to reclaim the memory it was using.

Somewhere in between those two tricks, the kernel has another trick it attempts for freeing up physical memory.  It can force memory pages out to disk.  If the system has swap enabled, the kernel may force anonymous pages (e.g., process heaps and stacks) out to disk.  In all cases, however, the kernel may also choose to force memory-mapped pages out to disk.  If those memory-mapped pages are read-only (such as is the case with executable images), then "forcing them out to disk" really just means dropping them from physical memory, since they can always be fetched back in later.

So, what does this mean in the context of this bug?  The process that's hitting the disk a lot (usually it's dirtying blocks, but maybe it's possible that this happens even if it's just reading blocks) causes RAM to fill up with disk blocks.  The kernel starts attempting its tricks to free up physical memory.  One of those tricks is dropping memory-mapped pages from RAM, since they can always be fetched back into RAM from disk later.  Then you the user switch applications or click on a button in the GUI or try to log into an SSH session, and what happens?  Page fault!  The code for repainting the X11 window or handling the button click or spawning a login session is not resident in memory because it was forced out by the kernel.  That code now must be refetched from disk to satisfy the page fault, but uh oh, the disk is VERY busy and has very long queue depths, so it will be a while before the needed pages can be fetched.  And at the same time as those pages are being fetched, the kernel is evicting other memory-mapped pages from RAM, so the responsiveness problem is just going to persist until the pressure on RAM subsides.

Ideally, the kernel should not allow so many blocks to be dirtied that it has to resort to dropping memory-mapped pages from RAM.  The dirty_ratio knob is supposed to control how much of RAM a process is allowed to fill with dirty blocks before it's forced to write them to disk itself (synchronously), but that does not appear to be working properly.

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2011-03-12:

#690

Incidentally, one reason this bug seems to manifest a lot more on 64-bit systems than on 32-bit systems is that 64-bit systems use Position-Independent Code (PIC) in their shared libraries universally, whereas 32-bit systems usually don't. Not using PIC means that 32-bit systems usually have to perform relocations throughout their shared libraries upon memory-mapping them, and those relocations cause private (anonymous) copies of those pages to be created, and those anonymous pages cannot be forced out to disk on systems without swap, so accessing those pages can never cause page faults. On 64-bit systems, PIC virtually eliminates the need to perform relocations in shared libraries, meaning most mappings of shared-library code are directly backed by the images on the disk and thus *may* be forced out of RAM and *may* cause page faults. In principle, using PIC (on 64-bit systems, which have new addressing modes to make it efficient) is a good idea because it means only one copy of a library needs to be in RAM, regardless of how many processes map it, rather than one relocated, private copy for each process, but because of this bug, *not* making private copies of the library code is what's killing us, as the only copy we have in memory is evictable. Please note, I am not arguing that the kernel should be making private copies of all executable pages; that would be the wrong solution. A better solution would be to prevent processes from dirtying so much RAM that the kernel has to start evicting pages that were memory-mapped by execve or dlopen (but not by plain old mmap!).

Revision history for this message

In Linux Kernel Bug Tracker #12309, jaroslaw.fedewicz (jaroslaw.fedewicz-linux-kernel-bugs) wrote on 2011-03-12:

#691

Thanks for prompt reply and the patience to explain these things,

but then there's one more misconception on my side in a desperate need for debunking. And it's about the I/O queues.

This misconception starts from a suggestion that not all data are equal. For example, non-resident executable pages are tier-0. I/O buffers for application usage like those for read(), write() and friends are tier-1. If there are no priorities on the queue, we cannot tell the origins of I/O requests apart and thus get what we have: swapping a process in has to wait until the queue is emptied by a disk-hungry application beast which just happened to fill it up.

If we prioritize the queue and find a way to tell swap-in reads from application reads (say), on the other hand, it might improve interactive responsiveness. And the expense of having a tiered queue might be mitigated by employing it only on the media which has at least one mmapp'ed process. I say "it might improve things" because the solution is so obvious, in fact, that I have little doubt it has been thoroughfully thought through and ultimately rejected.

And I have no doubt that every folk who gets a single line of code accepted and committed into mainline is smarter than me in this respect[1] so this must have popped up a while ago.

[1] I'm no kernel hacker at all, just your average applications developer.

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2011-03-12:

#692

@Yaroslav: I agree. I've had the same thoughts regarding priority in the I/O queues. The biggest problem with this approach is that much of the queues actually sit inside the hardware nowadays. SCSI TCQ (tagged command queuing) and SATA NCQ (native command queuing) have exacerbated this. The Linux kernel can't do anything to prioritize queues inside the hardware, but it can limit how much of the hardware queue it will use, thus effectively keeping the queue in software only. Some proposed workarounds to this bug 12309 involve reducing the depth of the hardware queue that Linux is allowed to use, and that does seem to improve the worst case, although it severely degrades the common case.

Another workaround might be to prevent the kernel from evicting executable memory-mapped pages in the first place. This would be only a partial solution, though, as applications often memory map resources that are not executable (for example, fonts, pixmaps, databases), so their responsiveness could hang on page faults for those resources just as readily as on page faults for code.

Revision history for this message

In Linux Kernel Bug Tracker #12309, jaroslaw.fedewicz (jaroslaw.fedewicz-linux-kernel-bugs) wrote on 2011-03-12:

#693

You are right about the workaround, but having a queue prioritised would be of help when, despite all workarounds, pages were actually evicted.

I actually imagine it as a 4-tier queue: tier 0 for realtime processes, 1 for swap-ins we are talking about now, 2 for every other virtual memory operations, and 3 for everything else (or count 2 and 3 as everything else, maybe).

My question then will be as follows:

yes, we cannot control the commands queueing once they enter the hardware. But if we happen to know the hardware command queue size (which we do) and if we are able to tell how full it currently is (which I'm not quite sure about but I think it can be figured out), we could split it so that every tier is permitted to fill no more than some percentage of the hardware queue. It would of course hit average case performance, but still guarantee some bandwidth for higher tier I/O which is a good thing IMHO.

Sorry for bugging and probably ignorance, but I really want this nailed.

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2011-03-12:

#694

To everyone interested in this bug:

An easy and reliable way to demonstrate the issues surrounding this bug (on a system without anonymous swap) is to mount a tmpfs that is sized as large as your physical RAM. Then start writing to it (slowly!!!). The kernel will be unable to flush those blocks to disk, as they are not backed by disk. As you continue writing to the tmpfs, the kernel will gradually evict everything else in your block cache and file system cache.

At some point, the kernel will have run out of caches to evict and will start evicting memory-mapped pages. You'll know this has happened when the system responsiveness comes to a crawl and your disk starts thrashing. Yes, your disk will thrash, even though you're only writing to a tmpfs. The thrashing is due to all the page-ins of executable pages that are being accessed as various processes on your system struggle to keep executing their background threads and event processing loops.

If your writer process continues writing to the tmpfs, your system will become completely unusable. If you're lucky, eventually the kernel's OOM killer will be invoked. The OOM killer probably won't choose your tmpfs writer as its victim, though, so you'll have only a short time to kill the writer yourself before your system grinds to a halt again. If you do manage to get it killed, you can simply unmount the tmpfs, and everything will return to normal in short order. You will notice a bit of lag the first time you switch back to other applications that were running, as they will trigger page faults to get their code loaded back into RAM, but once that's done, everything will be as usual.

Revision history for this message

In Linux Kernel Bug Tracker #12309, zenith22.22.22 (zenith22.22.22-linux-kernel-bugs) wrote on 2011-03-13:

#695

It would have made sense if only starting new processes was slow. Copying large volumes of data slows down even mouse cursor, where Xorg HID driver already sits in memory. If what you've described affects driver already in memory, entire architecture has to be abandoned. So to say, definition of the problem, not an excuse.

Revision history for this message

In Linux Kernel Bug Tracker #12309, jaroslaw.fedewicz (jaroslaw.fedewicz-linux-kernel-bugs) wrote on 2011-03-13:

#696

Hm, I then have another wild suggestion.

It is in fact a very rare event that a process needs to hang in memory but wake up once in a blue moon, so that it can be harmlessly paged out and not bring the system to a halt. From my desktop experience I can only remember LibreOffice sitting on my long-running machine and be actually used once in two weeks or so.

If the problem is really so grave that an often-running process (like Xorg!) is selected by the kernel to be paged out, why not work this around by disabling evicting processes' pages altogether? I think it must be somewhat easier than designing an over-engineered strategy for choosing what pages to throw away, test it over a couple years, find bugs in the very design, throw it away, design another one and so on.

I would love to see a flag which I could set per control group. If the flag is set, pages owned by processes in that cgroup are never swapped out. Combined with pessimistic overcommit policy, it could help at least a bit.

Or at least worth a try.

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2011-03-14:

#697

(In reply to comment #535)
> It would have made sense if only starting new processes was slow. Copying
> large
> volumes of data slows down even mouse cursor, where Xorg HID driver already
> sits in memory. If what you've described affects driver already in memory,
> entire architecture has to be abandoned. So to say, definition of the
> problem,
> not an excuse.

If you're seeing the mouse cursor lag/skip while copying large volumes of data, an alternative explanation could be that you're using PIO mode for your data transfers rather than DMA. However, as you identify, it's possible that the X.org driver that handles the mouse input is indeed being paged out, and that would result in mouse interrupts triggering page faults, and the mouse cursor would not update on screen until the code for doing so had been paged back in.

To say the entire architecture must be abandoned is too extreme. Memory-mapping executable images is a very efficient mechanism that ordinarily works beautifully. This bug is creating pathological conditions that should never occur.

(In reply to comment #536)
> If the problem is really so grave that an often-running process (like Xorg!)
> is
> selected by the kernel to be paged out, why not work this around by disabling
> evicting processes' pages altogether?

You can't do that. Consider a process that maps a 1 TB file into memory and then starts randomly reading from it, thus causing more and more of the file to be loaded from disk into physical memory. You *must* allow pages to be evicted, or you will run out of RAM.

Don't try to solve a problem that doesn't exist. The actual problem here is that the block layer is using too much RAM for dirty (or possibly even clean) blocks. To demonstrate to yourself that this is so, you may try another of the proposed workarounds, which is to mount your file system in "sync" mode, which causes all file writes to be performed synchronously rather than being buffered and written back later. Under that constraint, you will never run into this bug, because the block layer is never allowed to use so much RAM that the kernel starts paging out "hot" memory-mapped pages. (By "hot," I mean pages that are regularly being accessed, such that you would notice if they had to be paged back in from disk.)

(In reply to comment #535)
> It would have made sense if only starting new processes was slow. Copying
> large
> volumes of data slows down even mouse cursor, where Xorg HID driver already
> sits in memory. If what you've described affects driver already in memory,
> entire architecture has to be abandoned. So to say, definition of the
> problem,
> not an excuse.

If you're seeing the mouse cursor lag/skip while copying large volumes of data, an alternative explanation could be that you're using PIO mode for your data transfers rather than DMA.  However, as you identify, it's possible that the X.org driver that handles the mouse input is indeed being paged out, and that would result in mouse interrupts triggering page faults, and the mouse cursor would not update on screen until the code for doing so had been paged back in.

To say the entire architecture must be abandoned is too extreme.  Memory-mapping executable images is a very efficient mechanism that ordinarily works beautifully.  This bug is creating pathological conditions that should never occur.

(In reply to comment #536)
> If the problem is really so grave that an often-running process (like Xorg!)
> is
> selected by the kernel to be paged out, why not work this around by disabling 
> evicting processes' pages altogether?

You can't do that.  Consider a process that maps a 1 TB file into memory and then starts randomly reading from it, thus causing more and more of the file to be loaded from disk into physical memory.  You *must* allow pages to be evicted, or you will run out of RAM.

Don't try to solve a problem that doesn't exist.  The actual problem here is that the block layer is using too much RAM for dirty (or possibly even clean) blocks.  To demonstrate to yourself that this is so, you may try another of the proposed workarounds, which is to mount your file system in "sync" mode, which causes all file writes to be performed synchronously rather than being buffered and written back later.  Under that constraint, you will never run into this bug, because the block layer is never allowed to use so much RAM that the kernel starts paging out "hot" memory-mapped pages.  (By "hot," I mean pages that are regularly being accessed, such that you would notice if they had to be paged back in from disk.)

Revision history for this message

In Linux Kernel Bug Tracker #12309, jaroslaw.fedewicz (jaroslaw.fedewicz-linux-kernel-bugs) wrote on 2011-03-14:

#698

Okay, sync might work, but it also would make filesystems slow as hell and contribute to media wear from another side. If what you say is the case, and I have no reason for disbelief, then there must be a way to limit the number of dirty blocks (and total blocks) which may exist before buffers are flushed. E. g., there's X seconds of commit interval or Y dirty blocks, whichever comes first, and a max Z buffered blocks in total per device or per system. This would be 'almost sync', I think, and it would solve one more problem with USB flash media.

The problem is that too big write buffers tend to be flushed at a sub-optimal speed, thus increasing the total time needed to copy and sync the data. Again, this does not occur neither with Windows nor with OS X. And they don't mount 'sync'; they buffer writes (which is a good thing with any device with expensive and wearsome writes), it's just that their buffers are considerably smaller in size than those of Linux.

I'd be happy to know that a solution of limiting buffer sizes exists, this at least would enable us to fine-tune the system so that in 90% of use cases the problem wouldn't appear, and that it would appear only in the cases where it's tough anyway.

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2011-03-14:

#699

@Yaroslav: There is already a knob for tuning the maximum amount of RAM that may be used for holding dirty blocks.

From Documentation/sysctl/vm.txt:
> dirty_ratio
>
> Contains, as a percentage of total system memory, the number of pages at
> which
> a process which is generating disk writes will itself start writing out dirty
> data.

The intent is as you describe: asynchronous writing until dirty_ratio is reached, and then synchronous writing only. "dirty_ratio" is 10% by default. You can test if it's working by starting a large write to disk (`dd if=/dev/zero of=/bigfile bs=1M`) and monitoring the "Dirty" counter in /proc/meminfo (`watch grep Dirty /proc/meminfo`).

For what it's worth, it does work for me (and I haven't seen this bug manifest on my system in quite a while). I'm running Linux 2.6.36-gentoo-r5. I can still get the unresponsiveness and disk thrashing to happen using the tmpfs test case I described in comment #534, but that's not a failing of the kernel; that's a failing of the user (filling a tmpfs too much).

Revision history for this message

In Linux Kernel Bug Tracker #12309, zenith22.22.22 (zenith22.22.22-linux-kernel-bugs) wrote on 2011-03-14:

#700

(In reply to comment #537)
> an alternative explanation could be that you're using PIO mode for your data
> transfers rather than DMA. However, as you identify, it's possible that the

Excuse me, I am using you said? That would be like, specifically configuring the kernel to use PIO? Why would anyone do that?

[ 1.101092] ata2.00: ATA-7: WDC WD3200KS-00PFB0, 21.00M21, max UDMA/133
[ 1.101205] ata2.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 1), AA
[ 1.102146] ata2.00: configured for UDMA/133

[ 2.191312] ata13.00: ATA-7: ST3160215A, 3.AAD, max UDMA/100
[ 2.191343] ata13.00: 312581808 sectors, multi 16: LBA48
[ 2.266143] ata13.00: configured for UDMA/100

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2011-03-14:

#701

(In reply to comment #540)
> That would be like, specifically configuring
> the kernel to use PIO? Why would anyone do that?

The kernel can fall back to PIO mode if DMA mode is encountering problems (which can happen with faulty hardware). It happens with CD/DVD drives more often than with hard drives.

The next time you encounter system sluggishness and the mouse cursor starts skipping, see if you can get a readout of /proc/meminfo (while the sluggishness is happening). If your "MemFree" is very low *and* your "Cached" or "Dirty" is very high, then you might be suffering from this bug.

Revision history for this message

In Linux Kernel Bug Tracker #12309, funtoos (funtoos-linux-kernel-bugs) wrote on 2011-03-14:

#702

dirty_ratio is not really a good measure of when to start flushing to disk. On a 24GB system, even 1% may be large for your disks to handle. Its better to configure dirty_bytes and dirty_background_bytes. dirty_bytes applies to the process which is doing the IO and dirty_background_bytes applies to kernel flush threads. When these thresholds are hit, if sum total of IO happening in the system is at a rate higher than your disks can take, you will start seeing very initial symptoms of this bug. The overall flow has been described well by Matt. I think this is precisely what's happening.

One way to avoid the issue would be set the dirty_bytes and dirty_background_bytes in such a way that their sum total is within reasonable ratio of your disk's sequential bandwidth. When a Linux system is in steady state with a reasonable uptime, it will likely use all RAM for read side caches. It will free up those on demand when it comes under memory pressure (which may be created by large IO). By keeping the (dirty_bytes + dirty_background_bytes) a multiple of your disk's raw speed, you can put a bound on the overall latency of the system. For example, I don't let dirty to go beyond 200MB on my laptop. It makes all my sequential operations bound by the sequential speed of the disk but lets the small random IO to be buffered (so, its better than "sync" mode of the FS in that sense).

Revision history for this message

In Linux Kernel Bug Tracker #12309, nalimilan (nalimilan-linux-kernel-bugs) wrote on 2011-03-14:

#703

And can we find a solution that would apply in the case where the system is running out of free RAM and starts swapping out everything? I often experienced total unresponsiveness of both X and the consoles when a program tries to use more RAM than is available, and I wasn't even even able to kill the process manually (forced reboot). Maybe that should be considered as a pathological case requiring just the OOM killer to be more aggressive - I don't know.

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2011-03-14:

#704

(In reply to comment #543)
> And can we find a solution that would apply in the case where the system is
> running out of free RAM and starts swapping out everything? I often
> experienced
> total unresponsiveness of both X and the consoles when a program tries to use
> more RAM than is available, and I wasn't even even able to kill the process
> manually (forced reboot). Maybe that should be considered as a pathological
> case requiring just the OOM killer to be more aggressive - I don't know.

If you have the Magic SysRq key enabled in your kernel, you could do AltGr+SysRq+F to invoke the OOM killer manually.

I do agree in principle, though, that the offending process should be denied the allocation of any additional memory before any frequently used memory-mapped pages start getting evicted from RAM.

One possible solution might be to set a threshold for the minimum number of memory-mapped pages that the kernel must allow to remain in RAM. As an example, setting such a knob to 100000 would mean that the kernel would not evict any memory-mapped pages if fewer than 100000 memory-mapped pages were resident in RAM. Assuming that the kernel uses a least-recently-used eviction policy, this would prevent the debilitating thrashing scenario that occurs when essentially all memory-mapped pages have been and continue to be evicted.

Revision history for this message

In Linux Kernel Bug Tracker #12309, jaroslaw.fedewicz (jaroslaw.fedewicz-linux-kernel-bugs) wrote on 2011-03-14:

#705

(In reply to comment #544)
> Assuming that the kernel uses a least-recently-used eviction
> policy, this would prevent the debilitating thrashing scenario that occurs
> when
> essentially all memory-mapped pages have been and continue to be evicted.

Given the fact that Xorg all too often falls victim to that, and it is active most of the time, I cannot help but assume something is wrong with the kernel's definition of "least recently used."

By the way, setting vm.overcommit_memory to 2 and overcommit_ratio to 80 seems to at least somewhat reduce the problem; the same rsync command which has triggered this bug (or similar bug if you prefer) now behaves a lot better, letting me type these words.

Revision history for this message

In Red Hat Bugzilla #562662, Andrej (andrej-redhat-bugs) wrote on 2011-03-22:

#141

This bug is still present (in version F14):
2.6.35.6-48.fc14.x86_64

Copying big files is at beginning fast but gradually becoming slower and then stops at the end of the file, after a few moments (minutes or so) it continue and finally finish.

What should I do to gather more details?

Revision history for this message

In Linux Kernel Bug Tracker #12309, zenith22.22.22 (zenith22.22.22-linux-kernel-bugs) wrote on 2011-03-24:

#706

I find that amount of slowness strongly depends on the writing driver.
Today I had to evacuate Win7 machine onto Fedora14 and copying from NTFS to EXT3 was painful. Now I am returning the files back onto NTFS and there is no slowdown at all. Dig in the ext3 filesystem, it should be in the writing code.

Revision history for this message

In Linux Kernel Bug Tracker #12309, vesok (vesok-linux-kernel-bugs) wrote on 2011-04-02:

#707

This seems to be a hardware related issue, at least in some cases.
Can the other people experiencing it confirm whether they have a WD Greed hard disk?
Google search for "wd15eads firmware" reveals quite a few people having similar problems.
I have one of these hard disks and I was using it on a fanless VIA Samuel 2 (pre-686) CPU and I was seeing the high IOWait problem and associated poor performance. When I put the same hard disk in a dual AMD opteron it had the same problem.
Then I did a full backup and restore on a different hard disk. It is the same debian system on the same VIA cpu but now the high IOWait times are gone and the performance is adequate for the CPU.
I should point out that the kernel should not suffer poor overall performance during disk I/O even on flakey hardware, especially with swap disabled.
The offending hard disk is now blanked. I can run a few tests with it if somebody is interested.

Revision history for this message

In Linux Kernel Bug Tracker #12309, zenith22.22.22 (zenith22.22.22-linux-kernel-bugs) wrote on 2011-04-02:

#708

Blaming hardware is the lamest practice in IT world and it surely earns those who practice that great deal of disrespect.

Revision history for this message

In Linux Kernel Bug Tracker #12309, powerman-asdf (powerman-asdf-linux-kernel-bugs) wrote on 2011-04-02:

#709

(In reply to comment #548)
> Blaming hardware is the lamest practice in IT world and it surely earns those
> who practice that great deal of disrespect.

Vesselin Kostadinov doesn't blame hardware, he says this bug (or one of bugs discussed here) is hardware-dependent. I can confirm this too: initially I used
Barracuda 7200.10 320GB ST3320620AS, then I tried to replace it with Seagate Barracuda LP 2TB without success (nothing changed), then I replaced it with Samsung HD103UJ 1TB and this helps a lot - bug is still noticeable, but very very rarely and have much less impact on overall system performance. You can find more details about this in my comments on bug 13347.

Revision history for this message

In Linux Kernel Bug Tracker #12309, loki (loki-linux-kernel-bugs) wrote on 2011-04-02:

#710

Regarding the WEADS disks from WD. It has something to do with disk geometry. We had some problems with them as well. We have some 30 pieces of them. But actually it's not a problem it's more a RTFM thingie. I think that there's something on the WD site, not sure. To partition this disk under linux / Windows XP (Win 7 is automagically doing it) you have to use fdisk -H 224 -S 56 /dev/sd...
You can read my comment at https://bugzilla.kernel.org/show_bug.cgi?id=12309#c513
Two of the disks are green WD-s partitioned with the fdisk method. Until then I had also problems with speed where the HD-s only had a throughput of 2-5 MB/s. After the fdisk I had a throughput of up to 100 MB/s. But again the problem with this bug is not throughput it's if you start a big file copy or like dd if=/dev/zero of=test.img bs=1M count=5000 your desktop comes almost to a halt. But after some time I think that this even isn't a bug it's more a new kernel queing methodology. After entering this:

vm.swappiness=1
vm.dirty_background_ratio=1
vm.dirty_ratio=1

into sysctl.conf I almost don't have this problem anymore. I read a lot about this problem and as far as I can understand the new way the kernel is working is that it, depending on the above configuration, put's something first into RAM and then writes it to disk (very simplified). So if you have a lot of Ram (in my case 12GB) and the above configuration is per default 40% then the kernel is putting almost 5GB as cache into RAM. And then writes it to disk, and yes I have a very fast RAID system but even with 400MB/s I have to wait 10 secs, and more, in which he has to write it to disk. I forgot with which kernel version this started but I know that I checked it and that my problems with responsivness started after changing to this new kernel (methodology). So you can say that this is not a bug but merely a kernel configuration matter. Because with this new metodolgy a default configuration of vm doesn't work for all, especially with those with a lot of RAM.
And yes I would like that the old methodology would be integrated again into the new kernels but until then I'll try to circumvent this problem with understanding and configuring the kernel. The above sysctl configuration is working for me with the setup that I have in my comment #513 in this bug. There are slight hickups but nothing as severe as earlier when I couldn't do anything until the file writing finished.

Regarding the WEADS disks from WD. It has something to do with disk geometry. We had some problems with them as well. We have some 30 pieces of them. But actually it's not a problem it's more a RTFM thingie. I think that there's something on the WD site, not sure. To partition this disk under linux / Windows XP (Win 7 is automagically doing it)  you have to use fdisk -H 224 -S 56 /dev/sd... 
You can read my comment at https://bugzilla.kernel.org/show_bug.cgi?id=12309#c513 
Two of the disks are green WD-s partitioned with the fdisk method. Until then I had also problems with speed where the HD-s only had a throughput of 2-5 MB/s. After the fdisk I had a throughput of up to 100 MB/s. But again the problem with this bug is not throughput it's if you start a big file copy or like dd if=/dev/zero of=test.img bs=1M count=5000 your desktop comes almost to a halt. But after some time I think that this even isn't a bug it's more a new kernel queing methodology. After entering this:

vm.swappiness=1
vm.dirty_background_ratio=1
vm.dirty_ratio=1

into sysctl.conf I almost don't have this problem anymore. I read a lot about this problem and as far as I can understand the new way the kernel is working is that it, depending on the above configuration, put's something first into RAM and then writes it to disk (very simplified). So if you have a lot of Ram (in my case 12GB) and the above configuration is per default 40% then the kernel is putting almost 5GB as cache into RAM. And then writes it to disk, and yes I have a very fast RAID system but even with 400MB/s I have to wait 10 secs, and more, in which he has to write it to disk. I forgot with which kernel version this started but I know that I checked it and that my problems with responsivness started after changing to this new kernel (methodology). So you can say that this is not a bug but merely a kernel configuration matter. Because with this new metodolgy a default configuration of vm doesn't work for all, especially with those with a lot of RAM.
And yes I would like that the old methodology would be integrated again into the new kernels but until then I'll try to circumvent this problem with understanding and configuring the kernel. The above sysctl configuration is working for me with the setup that I have in my comment #513 in this bug. There are slight hickups but nothing as severe as earlier when I couldn't do anything until the file writing finished.

Revision history for this message

In Linux Kernel Bug Tracker #12309, mihel (mihel-linux-kernel-bugs) wrote on 2011-04-03:

#711

Sorry for interrupting your research with my naive question, but does this bug have clear steps to reproduce it?

The initial comment says 'starting a new shell takes minutes' after the system is left with dd running for significant time.

But for me shells/browsers etc take just maybe 1 or 2 seconds longer to start after I have 'stress -d 1' or 'dd if=/dev/zero of=bigfile bs=1M' running for ~10 minutes (bigfile is 30Gb after my tests, dirty blocks quickly reach ~670M (3.67G RAM total) and stay there.

The small file test that I accidentally ran with TWO simultaneous bigfile dd processes in the background finished in 0.073s (or is this bad?):

$ dd if=/dev/zero of=/tmp/bigfile bs=1M count=30000 conv=fdatasync & sleep 30 ; time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync
[2] 27953
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.0718053 s, 57.0 kB/s

real 0m0.073s
user 0m0.001s
sys 0m0.001s

dd: writing `/tmp/bigfile': No space left on device
dd: writing `/var/tmp/bigfile': No space left on device
22891+0 records in
22890+0 records out
24002064384 bytes (24 GB) copied, 1211.53 s, 19.8 MB/s
21957+0 records in
21956+0 records out
23022534656 bytes (23 GB) copied, 1189.07 s, 19.4 MB/s

[1]- Exit 1 dd if=/dev/zero of=/var/tmp/bigfile bs=1M count=100000 conv=fdatasync
[2]+ Exit 1 dd if=/dev/zero of=/tmp/bigfile bs=1M count=30000 conv=fdatasync

I'm noticing loss of interactivity when my RAM gets filled up and swap grows >500M, but this bug is not about such case is it?

Could it be my HW on latest stable vanilla 2.6.38.2 amd64 (swappiness 20, the rest being defaults)? Or could I have just configured my kernel in some genius way?

[ 2.051391] ata1.00: ATA-8: HITACHI HTS545025B9A300, PB2ZC61H, max UDMA/100
[ 2.054162] ata1.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[ 2.065605] ata1.00: configured for UDMA/100
[ 2.087958] scsi 0:0:0:0: Direct-Access ATA HITACHI HTS54502 PB2Z PQ: 0 ANSI: 5

$ sudo hdparm -i /dev/sda

/dev/sda:

Model=HITACHI HTS545025B9A300, FwRev=PB2ZC61H, SerialNo=100408PBNXXXXXXXXXX
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=DualPortCache, BuffSize=7208kB, MaxMultSect=16, MultSect=off
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=488397168
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
AdvancedPM=yes: mode=0x80 (128) WriteCache=enabled
Drive conforms to: unknown: ATA/ATAPI-2,3,4,5,6,7

PS: I'm on ext3

Sorry for interrupting your research with my naive question, but does this bug have clear steps to reproduce it?

The initial comment says 'starting a new shell takes minutes' after the system is left with dd running for significant time.

But for me shells/browsers etc take just maybe 1 or 2 seconds longer to start after I have 'stress -d 1' or 'dd if=/dev/zero of=bigfile bs=1M' running for ~10 minutes (bigfile is 30Gb after my tests, dirty blocks quickly reach ~670M (3.67G RAM total) and stay there.

The small file test that I accidentally ran with TWO simultaneous bigfile dd processes in the background finished in 0.073s (or is this bad?):

$ dd if=/dev/zero of=/tmp/bigfile bs=1M count=30000 conv=fdatasync & sleep 30 ; time dd if=/dev/zero of=/tmp/smallfile bs=4k count=1 conv=fdatasync
[2] 27953
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.0718053 s, 57.0 kB/s

real	0m0.073s
user	0m0.001s
sys	0m0.001s

dd: writing `/tmp/bigfile': No space left on device
dd: writing `/var/tmp/bigfile': No space left on device
22891+0 records in
22890+0 records out
24002064384 bytes (24 GB) copied, 1211.53 s, 19.8 MB/s
21957+0 records in
21956+0 records out
23022534656 bytes (23 GB) copied, 1189.07 s, 19.4 MB/s

[1]-  Exit 1                  dd if=/dev/zero of=/var/tmp/bigfile bs=1M count=100000 conv=fdatasync
[2]+  Exit 1                  dd if=/dev/zero of=/tmp/bigfile bs=1M count=30000 conv=fdatasync

I'm noticing loss of interactivity when my RAM gets filled up and swap grows >500M, but this bug is not about such case is it?

Could it be my HW on latest stable vanilla 2.6.38.2 amd64 (swappiness 20, the rest being defaults)? Or could I have just configured my kernel in some genius way?

[    2.051391] ata1.00: ATA-8: HITACHI HTS545025B9A300, PB2ZC61H, max UDMA/100
[    2.054162] ata1.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    2.065605] ata1.00: configured for UDMA/100
[    2.087958] scsi 0:0:0:0: Direct-Access     ATA      HITACHI HTS54502 PB2Z PQ: 0 ANSI: 5

$ sudo hdparm -i /dev/sda

/dev/sda:

Model=HITACHI HTS545025B9A300, FwRev=PB2ZC61H, SerialNo=100408PBNXXXXXXXXXX
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=DualPortCache, BuffSize=7208kB, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=488397168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 
 AdvancedPM=yes: mode=0x80 (128) WriteCache=enabled
 Drive conforms to: unknown:  ATA/ATAPI-2,3,4,5,6,7

PS: I'm on ext3

Revision history for this message

In Linux Kernel Bug Tracker #12309, mihel (mihel-linux-kernel-bugs) wrote on 2011-04-03:

#712

controller in the previous comment was
        *-storage
             description: SATA controller
             product: Ibex Peak 6 port SATA AHCI Controller
             vendor: Intel Corporation
             physical id: 1f.2
             bus info: pci@0000:00:1f.2
             logical name: scsi0
             version: 06
             width: 32 bits
             clock: 66MHz
             capabilities: storage msi pm ahci_1.0 bus_master cap_list emulated
             configuration: driver=ahci latency=0
             resources: irq:41 ioport:1860(size=8) ioport:1814(size=4) ioport:1818(size=8) ioport:1810(size=4) ioport:1840(size=32) memory:f2727000-f27277ff(In reply to comment #551)

Revision history for this message

In Linux Kernel Bug Tracker #12309, vesok (vesok-linux-kernel-bugs) wrote on 2011-04-04:

#713

Download full text (3.8 KiB)

OK, the fun continues.

Installed the offending hard disk in another system, booted Fedora 14 live and the drive worked OK:
[root@localhost ~]# dd if=/dev/zero of=/dev/sd_ bs=1M count=4000 conv=fdatasync
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 50.0265 s, 83.8 MB/s

(Replaced /dev/sda with /dev/sd_ in case someone decides to copy/paste the command).

Then I booted Knoppix 5.1.1 (from 2007) and saw the fault. CPU usage was 49.7%wa (dual cpu) and had to interrupt dd because it was taking way too long. Then I tried again with a smaller file:

root@Knoppix:~# uname -a
Linux Knoppix 2.6.19 #7 SMP PREEMPT Sun Dec 17 22:01:07 CET 2006 i686 GNU/Linux
root@Knoppix:~# dd if=/dev/zero of=/dev/sd_ bs=1M count=40 conv=fdatasync
40+0 records in
40+0 records out
41943040 bytes (42 MB) copied, 20.8245 seconds, 2.0 MB/s

Then I booted Fedora again and saw the fault again:
[root@localhost ~]# uname -a
Linux localhost.localdomain 2.6.35.6-45.fc14.i686 #1 SMP Mon Oct 18 23:56:17 UTC 2010 i686 i686 i386 GNU/Linux
[root@localhost ~]# dd if=/dev/zero of=/dev/sd_ bs=1M count=40 conv=fdatasync
40+0 records in
40+0 records out
41943040 bytes (42 MB) copied, 20.3055 s, 2.1 MB/s

@ #548 From Zenith88:
Ignoring the possibility of a hardware fault when the evidence points that way surely brings those who practice that great deal of fruitless debugging and frustration.

@ #550 From D.M.
I don't think it is the "partition starts at the wrong sector" issue. In the dd commands listed above I was writing to the drive as a whole, without messing with partitions at all.
For the sake of it I decided to create a new partition and see what will happen:
[root@localhost ~]# fdisk -H 224 -S 56 /dev/sd_
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x9b81ad16.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4, default 1): 1
First sector (2048-2930275054, default 2048):
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-2930275054, default 2930275054): +10G

Command (m for help): p

Disk /dev/sda: 1500.3 GB, 1500300828160 bytes
224 heads, 56 sectors/track, 233599 cylinders, total 2930275055 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x9b81ad16

Device Boot Start End Blocks Id System
/dev/sda1 2048 20973567 10485760 83 Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
[root@localhost ~]# mkfs.ext2 -q /dev/sda_
[root@localhost ~]# mount /dev/sda1 /mnt
[root@localhost ~]# dd if=/dev/zero of=/mnt/bigfile bs=1M count=100 conv=fdatasync
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 77.3839 s, 1.4 MB/s

I guess the perfor...

OK, the fun continues.

Installed the offending hard disk in another system, booted Fedora 14 live and the drive worked OK:
[root@localhost ~]# dd if=/dev/zero of=/dev/sd_ bs=1M count=4000 conv=fdatasync 
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 50.0265 s, 83.8 MB/s

(Replaced /dev/sda with /dev/sd_ in case someone decides to copy/paste the command).

Then I booted Knoppix 5.1.1 (from 2007) and saw the fault. CPU usage was 49.7%wa (dual cpu) and had to interrupt dd because it was taking way too long. Then I tried again with a smaller file:

root@Knoppix:~# uname -a
Linux Knoppix 2.6.19 #7 SMP PREEMPT Sun Dec 17 22:01:07 CET 2006 i686 GNU/Linux
root@Knoppix:~# dd if=/dev/zero of=/dev/sd_ bs=1M count=40 conv=fdatasync 
40+0 records in
40+0 records out
41943040 bytes (42 MB) copied, 20.8245 seconds, 2.0 MB/s

Then I booted Fedora again and saw the fault again:
[root@localhost ~]# uname -a
Linux localhost.localdomain 2.6.35.6-45.fc14.i686 #1 SMP Mon Oct 18 23:56:17 UTC 2010 i686 i686 i386 GNU/Linux
[root@localhost ~]# dd if=/dev/zero of=/dev/sd_ bs=1M count=40 conv=fdatasync
40+0 records in
40+0 records out
41943040 bytes (42 MB) copied, 20.3055 s, 2.1 MB/s

@ #548 From Zenith88:
Ignoring the possibility of a hardware fault when the evidence points that way surely brings those who practice that great deal of fruitless debugging and frustration.

@ #550 From D.M. 
I don't think it is the "partition starts at the wrong sector" issue. In the dd commands listed above I was writing to the drive as a whole, without messing with partitions at all.
For the sake of it I decided to create a new partition and see what will happen:
[root@localhost ~]# fdisk -H 224 -S 56 /dev/sd_
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x9b81ad16.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4, default 1): 1
First sector (2048-2930275054, default 2048): 
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-2930275054, default 2930275054): +10G

Command (m for help): p

Disk /dev/sda: 1500.3 GB, 1500300828160 bytes
224 heads, 56 sectors/track, 233599 cylinders, total 2930275055 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x9b81ad16

Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048    20973567    10485760   83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
[root@localhost ~]# mkfs.ext2 -q /dev/sda_
[root@localhost ~]# mount /dev/sda1 /mnt
[root@localhost ~]# dd if=/dev/zero of=/mnt/bigfile  bs=1M count=100 conv=fdatasync
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 77.3839 s, 1.4 MB/s

I guess the performance drop can be attributed to the filesystem overhead.

The issue you describe with writing a large bunch of dirty pages is a real one but is different to the high iowait times.

I have seen high iowait times when the only active application I had was rtorrent running in seeding mode - so no disk writes but lots of disk reads from all over the place, with total system memory less than the size of the torrent.

Basically when the performance of the drive drops from 80 MB/s to 2 MB/s the only thing the kernel does is waiting for I/O operations to complete. I am not sure if there is a solution for this problem at all.

The disk is still available so I can run more tests if anyone is interested.

Revision history for this message

In Linux Kernel Bug Tracker #12309, zenith22.22.22 (zenith22.22.22-linux-kernel-bugs) wrote on 2011-04-04:

#714

You can continue debating hard drives, or look into comparison of ntfs vs ext3 code on the cue from post #546 which is a reproducible test case. Your call.

Revision history for this message

In Linux Kernel Bug Tracker #12309, loki (loki-linux-kernel-bugs) wrote on 2011-04-04:

#715

Oleg: Unfortunatly no clear steps. If you read my comment #513 you'll see that I didn't have any troubles with whole disk software raid10. After that I thought that it was something file-system related but I tested ext2-ext4 and xfs and this is also answering your question Zenith. Same thing, no matter what.

And regarding the Hardware it may be that this particular HD is broken and in the case of Kostadinov I even think it is a broken hardware problem because on one system (Fedora) it worked and then after using Knoppix and getting back to Fedora it didn't. I'm just mentioning which troubles we had with the green WDs, and not only under linux, until I read about this fdisk thing. Now I have two of them and they didn't give me any troubles when I had them in the whole disk RAID10 or when I had an older kernel, or now with the new kernel setting.

But to get backto the substance again. Yes, if you go into dd-ing multiple RAM onto HD the system is coming to a halt. With the old kernels it was "I'm doing dd and the system automagically knows that firefox or mail or whatever is of more priority to me than dd, so he slows down dd a bit so firefox could get some time reading from the HD. Or maybe the queing was more fairly so all processes got some time raping the HD, I don't know, I'm not a kernel developer. I'm just a user and as a user I'm mentioning the diffs between the old and the new kernels." With the new kernel it's not it's he who's writing has all the power over the HD. But again that is more a perception than a fact.

The difference I have to earlier, before I configured vm, is that wa was up to 98 and now it's up to max 45-50.

Revision history for this message

In Linux Kernel Bug Tracker #12309, zenith22.22.22 (zenith22.22.22-linux-kernel-bugs) wrote on 2011-04-04:

#716

You can deny reality however much you see fit, it won't change the fact that writing onto ext3 partition causes freeze, while writing to ntfs does not on the same system. And this is not a VM but physical machine. Denial of reality and passing the blame is what's causing this project to sit on its hands for 3 years.

Revision history for this message

In Linux Kernel Bug Tracker #12309, nalimilan (nalimilan-linux-kernel-bugs) wrote on 2011-04-04:

#717

There are probably different bugs at stake here, and investigating one doesn't mean denying the other. Please be more respectful of people that try to improve our understanding of the problem instead of ranting.

Just a guess: ntfs-3g driver is using FUSE, while ext3 driver is in kernelspace. *Maybe* this can explain the difference (ntfs-3g isn't considered as in-kernel as regards I/O scheduling).

Revision history for this message

In Linux Kernel Bug Tracker #12309, loki (loki-linux-kernel-bugs) wrote on 2011-04-04:

#718

I'm sorry if I offended you in any way. Again I'm not in denial, and I'm not blaming anyone I'm merely pointing out that it's not only a ext3 problem because I had the same problem on xfs, and that, as you pointed it out, the kernels from 3 years ago didn't have this kind of problem. And with vm I didn't mean Virtual Machine but virtual memory, because I was referring to the sysctl.conf (i.e.
...
vm.dirty_background_ratio = 1
vm.dirty_background_bytes = 0
vm.dirty_ratio = 1
vm.dirty_bytes = 0
vm.dirty_writeback_centisecs = 500
vm.dirty_expire_centisecs = 3000
...
and so on.)

Again, I'm sorry if I have offended you in any way.

Revision history for this message

In Linux Kernel Bug Tracker #12309, mihel (mihel-linux-kernel-bugs) wrote on 2011-04-04:

#719

Another attempt to narrow down the use case for the issue.
You are not going to get anywhere if you continue reporting issues against all of the different breeds of Linuxes. You never know how Fedora or Knoppix patched the kernel, and you should report issues with their kernels to them instead of posting your observations here.

As I see it the only way to trace down the issue is to use the same version of VANILLA kernel (preferably the latest) with different build and runtime configs.
I personally have ext3 compiled in the kernel - could it be the reason why I can't reproduce the issue?

Zenith88: would it take you a lot of effort to produce the latest not patched vanilla kernel with ext3 compiled into it and to see if it makes things any better for you?

Revision history for this message

fgr (f-gritsch) wrote on 2011-07-07:

#41

uo, this bug has a very long history....
are you shure that this is the same problem that I hvae reported? because with 10.10 I did not have the problem, the read performance was as good as it is now in windows 7 on the same pc. It slowed down just after the update to 11.04!

What information do you need, to can investigate the problem? Can anybody give a hint?

Revision history for this message

Badcam (kiwicameron+launchpad) wrote on 2011-07-07:

#42

I haven't had this issue since 10.10
Mint 10 is awesome.

Revision history for this message

Maxime Ritter (airmax) wrote on 2011-07-12: you have got new "show interests" from ladie

#43

I am Nastya, 22 y.o,
I am looking for man to have a strong family.
Please let me know if you are ready :)))
I am on-line now,
my profile is here:

http://sonya201010.com.ua/?message_from=Nastya

Note!
New free services! check info at the site!
( to unsubscribe - please, click link and enter e-mail address .)

Revision history for this message

FriedChicken (domlyons) wrote on 2011-07-12: Re: since Ubuntu karmic Filetransfer to some USB Drives got realy slow

#44

OT: Is there a way to report spam as in the message above?

Revision history for this message

nomnex (nomnex) wrote on 2011-07-12:

#45

Yes, send a message to Nastya... Joke aside, tell Maxime to change is password!

Revision history for this message

Brad Figg (brad-figg) wrote on 2011-07-14: Unsupported series, setting status to "Won't Fix".

#46

This bug was filed against a series that is no longer supported and so is being marked as Won't Fix. If this issue still exists in a supported series, please file a new bug.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	Confirmed → Won't Fix

Revision history for this message

Janusz (yorashtan2) wrote on 2011-10-21: Re: since Ubuntu karmic Filetransfer to some USB Drives got realy slow

#47

Ubuntu team won't fix this bug as it affects all distributions.

Take a look at this, probably might help:

http://mailman.archlinux.org/pipermail/arch-general/2010-June/014470.html

Revision history for this message

In Red Hat Bugzilla #562662, Hedayat (hedayat-redhat-bugs) wrote on 2011-12-17:

#142

This problem still persists in Fedora 16. Again, lowering dirty_ratio and dirty_background_ratio to 2 and 1 respectively (instead of 20 and 10) resulted in constant 4.5 MB/s speed while copying with the default settings the speed was going down... (I stopped it when it was around 1.5MB/s).

Revision history for this message

In Red Hat Bugzilla #562662, Dave (dave-redhat-bugs) wrote on 2011-12-19:

#143

This is probably going to get fixed for real in 3.3, but there's a hack that might make things at least slightly better until then. I'll throw into the next 16 build.

Revision history for this message

In Red Hat Bugzilla #562662, Dave (dave-redhat-bugs) wrote on 2011-12-19:

#144

oh, actually we have that hack in f16 since 3.1.2-0.rc1.1

Revision history for this message

In Red Hat Bugzilla #562662, Hedayat (hedayat-redhat-bugs) wrote on 2011-12-19:

#145

IIRC, I experienced the problem on kernel-3.1.2-1.fc16.x86_64 :(
I hope that it'll be at least really fixed in 3.3.

Revision history for this message

In Linux Kernel Bug Tracker #12309, oleksandr (oleksandr-linux-kernel-bugs) wrote on 2012-01-07:

#720

It seems to be fixed in 3.2.

Revision history for this message

In Linux Kernel Bug Tracker #12309, wolfram (wolfram-linux-kernel-bugs) wrote on 2012-01-20:

#721

> It seems to be fixed in 3.2.

Somewhere in parallel universe I think.

Nothing changed for me on

> Intel Corporation 5 Series/3400 Series Chipset SMBus Controller

Revision history for this message

In Linux Kernel Bug Tracker #12309, james (james-linux-kernel-bugs) wrote on 2012-01-20:

#722

(In reply to comment #561)
> Nothing changed for me on
>
> > Intel Corporation 5 Series/3400 Series Chipset SMBus Controller

Nor here on my ICH8-based notebook, with 2GiB RAM. If anything, 3.2 seems worse than 3.1 when it comes to the ability of one process to binge out on dirtying pages, and then bring the rest of the system down to a snail's pace.

One consistent example case is unpacking to the local SATA drive an ISO image (using Nautilus, for example) stored on another drive. Compute-heavy processes with little disc access suffer (and even those without any I/O do --- CPU usage shoots right down).

Another one is a kernel build. The file cache goes bananas, and even with no other desktop applications loaded, everything gets paged out and it takes around a minute (in the worst case) for the unlock screen prompt to appear.

Revision history for this message

In Linux Kernel Bug Tracker #12309, fedora (fedora-linux-kernel-bugs) wrote on 2012-01-21:

#723

(In reply to comment #561)
> > It seems to be fixed in 3.2.
> Somewhere in parallel universe I think.

There are multiple issues that can lead to a behaviour like the one that is discussed in this bug.

A few patches that went into 3.2 make some situation better. But some problems were still known back then; see http://lwn.net/Articles/467328/

Fixes for those went into 3.3-rc1. Quoting from this weeks LWN.net kernel page (I'm quite sure Jonathan won't mind):

"""
There have been some significant changes made to the memory compaction code to avoid the lengthy stalls experienced by some users when writing data to slow devices (USB keys, for example). This problem was described in this article (http://lwn.net/Articles/467328/), but the solution has evolved considerably. By making a number of changes to how compaction works, the memory management hackers (and Mel Gorman in particular) were able to avoid disabling synchronous compaction, which had the unfortunate effect of reducing huge page usage. See this commit (
http://git.kernel.org/linus/a77ebd333cd810d7b680d544be88c875131c2bd3 ) for a lot of information on how this problem was addressed.
"""

IOW: Best to test 3.3-rc and report bugs if there are still issues.

While at it (and with a view from someone that is not very active in this bug tracker): I'd say opening a new bug and mentioning it here in this report might be the best way forward for any remaining issues, as the long history might be misleading/confusing when it comes to solving today's bugs. Just my 2 cent.

Revision history for this message

Adam Porter (alphapapa) wrote on 2012-01-30:

#48

This article explains that the problem is the Transparent Huge Pages feature of the kernel: http://lwn.net/Articles/467328/

According to this, some of the fixes are in 3.2, and some in 3.3: https://bugzilla.kernel.org/show_bug.cgi?id=12309#c563

This is a horrible bug for desktop use, and for some server use as well. This should be a top priority bug. Ubuntu needs to backport the fixes or consider disabling Transparent Huge Pages in desktop kernels.

Having the entire system freeze for minutes at a time and file copy operations take hours instead of minutes is entirely unacceptable behavior. As Corbet said, it's enough to make users consider the benefits of proprietary operating systems (i.e. Bug #1).

P.S. Automated scripts marking critically important bugs as WONTFIX is also unacceptable behavior.

summary:

- since Ubuntu karmic Filetransfer to some USB Drives got realy slow
+ USB file transfer causes system freezes; ops take hours instead of
+ minutes

Brad Figg (brad-figg) on 2012-01-31

tags:	added: precise
Changed in linux (Ubuntu):
status:	Won't Fix → Confirmed

Revision history for this message

Brad Figg (brad-figg) wrote on 2012-01-31: Test with newer development kernel (3.2.0-12.20)

#49

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

sudo apt-get update
sudo apt-get upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status:	Confirmed → Incomplete
tags:	added: kernel-request-3.2.0-12.20

Revision history for this message

Adam Porter (alphapapa) wrote on 2012-01-31:

#50

As I noted, the kernel bugzilla says that some of the fixes for this bug will be in 3.3. Since the request was to test 3.2, I think it's reasonable to assume that 3.2 will not solve the bug, and that fixes will need to be backported to 3.2. And even if 3.2 were to completely address it, the fixes would still need to be backported to earlier, supported kernels, because this is a very serious bug.

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed
tags:	added: bot-stop-nagging.

Revision history for this message

Brad Figg (brad-figg) wrote on 2012-01-31:

#51

The upstream patcheset has been applied to a version of the Precise kernel. For those wishing to give it a spin to see if it addresses the issue for them you will find built versions at:

http://people.canonical.com/~bradf/lp500069/

Bug Watch Updater (bug-watch-updater) on 2012-02-01

Changed in linux:
importance:	Unknown → High
status:	Unknown → Confirmed

Revision history for this message

Rui Barreiros (rbarreiros) wrote on 2012-02-16:

#52

I'm having this problem for more than a year, and it's terribly annoying.

Only of late I managed to focus on trying to get rid of this since my backup ammount of data done weekly is getting huge and I have to put the desktop doing his backups at night only to arrive sometimes at the morning with it still running, and forced to cancel to be able to work.

Indeed as Adam said, this is unacceptable, and this is comming from a 12+ years of using linux only as my OS and being a huge linux advocate.

I'm going to try today this 3.2 kernel and see how it goes, more news later.

Best regards,

P.S.
My disappointment is not towards ubuntu as I believe ubuntu actually brought linux to a wider range of users and inovated more in linux than any other distro, but mainly towards kernel development (which I gave up contributing at all due to most of their elitist behaviour).

Revision history for this message

Rui Barreiros (rbarreiros) wrote on 2012-02-17:

#53

Hi there,

As promised, I'm right now using 3.2 but on Ocelot, and, it apparently the bug is fixed, as I'm writing this, I'm copying/deleting about 5gb on an external usb2.0 HDD and had no system lock ups yet and speed is acceptable.

I couldn't install due to dependency issues obviously linux-tools and linux-headers (although I think linux-headers-3.2.0-13-generic_3.2.0-13.21~lp500069_amd64.deb has a circular dependency, maybe it's a bug ?)

I'll try to build all this packages here in ocelot and start using them to better test it.

Best regards,

Revision history for this message

bth73 (bth1969) wrote on 2012-02-18:

#54

Wow, this sucks. I too am having the same problem with mint 9x64. How is it that such basic, basic functions can be handled so badly? Ubuntu now sucks and it seems that Mint is no better. Taking over 2 hours to transfer 7.3gigs to a 8gig stick (fat32). USB2 not USB1. What is up? Will we ever have a OS that just works? Sh%^ I'm about to go buy Win 7 or start beta testing Win8 to find a system that can do BASIC FUNCTIONS, IE: open files, manipulate them and move and transfer to different hard drives.
TRANSFERRING AT THE BLAZING SPEED OF 930 KB/SEC.
1TB TRANSFER TO NEW WD 1TB 2.5" USB DRIVE TOOK OVER 19HRS!
WAY TO GO PROGRAMMERS! REALLY YOU SHOULD BE PROUD.

Revision history for this message

bth73 (bth1969) wrote on 2012-02-18:

#55

It is like designing a car that the wheels and tires fall off every 2 miles. The radio works and the motor runs fine, BUT don't try to go anywhere cause there is no tires or wheels - just axles. Or better yet an airplane with no wings.

Revision history for this message

bth73 (bth1969) wrote on 2012-02-18:

#56

740 KB/sec.
sudo apt-get upgrade only hangs with no progress. Probably would only BORK the system anyway.

Revision history for this message

FriedChicken (domlyons) wrote on 2012-02-19:

#57

Linux 3.2 contains some fixes and Linux 3.3 is said to finaly fix it.

@bth73:
Spamming is no solution.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2012-02-21:

#724

The problem is really fixed in 3.3rc4. I installed two guest systems on a first generation ssd. The ssd was only for the virtualisation guest. My system is on a >40000 IOPS ssd.
The first installation was done with kernel 3.2.6, in which the long stalls up to 10 seconds reappeared. Even as bad as in kernel 2.6.2[4-9].
The second installation was done with kernel 3.3rc4. I could even work in an other running virtualisation guest. It's really great. Thanks to all people involved in solving this bug.

Revision history for this message

In Linux Kernel Bug Tracker #12309, oleksandr (oleksandr-linux-kernel-bugs) wrote on 2012-02-21:

#725

Could someone else prove it?

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2012-02-22:

#726

Download full text (3.7 KiB)

(In reply to comment #564)
> Thanks to all people involved in solving this bug.

Does anyone have a link to a discussion list post or a technical article detailing the theory behind the solution to this bug? Since this "bug" encompasses so many scenarios, I have doubts about whether all of them have indeed been resolved. I'm glad one person's problem went away, but until a kernel hacker can stand up and explain exactly what was wrong and how they fixed it, I'm going to assume there are still lurking problems in Linux's I/O subsystem.

One problem we've seen and discussed in this thread is that large numbers of dirty blocks waiting to be flushed to disk can cause eviction of "hot" pages of code that are needed by interactive user processes, thus bringing the system to a state of thrashing in which processes continually trigger page faults because their actively executing code keeps being forced out of RAM by the large buffered write to disk. Even if this problem has been solved (presumably by fixing a bug in the code that is supposed to force a process to flush its own dirty pages to disk once dirty_ratio has been reached), there would still be the problem of the kernel's evicting hot pages from RAM so aggressively in low-memory conditions that interactivity of the system is compromised to the point where it's impossible for the user to resolve the memory shortage.

It's pretty easy to reproduce the thrashing scenario: just mount a tmpfs whose max size is close to the amount of physical memory in the system and start writing data to it. Eventually you may find that you are no longer able to do anything, even to give input focus to your terminal emulator so you can interrupt your writing process (or in some setups, even to move your mouse cursor on the screen), because your entire desktop environment and even the X server have been evicted from RAM and are continually paging back in from disk (and being immediately evicted again), hindering your ability to do anything. I've encountered this scenario while compiling Chromium in a tmpfs. I'd expect the OOM killer to activate, but instead I find that all of my running applications are responding at a snail's pace because they have to keep paging in bits of their program code from disk. I should mention that I run without swap.

I would think one way to solve the thrashing problem would be to introduce a kernel knob that would set how much time must elapse between a page being fetched from disk into RAM due to a page fault and that page becoming eligible for eviction from RAM. If set to, say, 30 seconds, then the user's interactive processes could retain a usable degree of interactivity, even under extremely low memory conditions. This would, of course, mean that the OOM killer would activate sooner than it does now, since pages that the kernel would presently choose to evict in order to free up RAM would be ineligible under this new time limit. Setting the knob to zero would yield the behavior we have now, in which the kernel is free to evict all unlocked pages.

I'll reiterate once more, as a refresher, that this was formerly not such a problem on 32-bit x86 systems because most library code ther...

(In reply to comment #564)
> Thanks to all people involved in solving this bug.

Does anyone have a link to a discussion list post or a technical article detailing the theory behind the solution to this bug? Since this "bug" encompasses so many scenarios, I have doubts about whether all of them have indeed been resolved. I'm glad one person's problem went away, but until a kernel hacker can stand up and explain exactly what was wrong and how they fixed it, I'm going to assume there are still lurking problems in Linux's I/O subsystem.

One problem we've seen and discussed in this thread is that large numbers of dirty blocks waiting to be flushed to disk can cause eviction of "hot" pages of code that are needed by interactive user processes, thus bringing the system to a state of thrashing in which processes continually trigger page faults because their actively executing code keeps being forced out of RAM by the large buffered write to disk. Even if this problem has been solved (presumably by fixing a bug in the code that is supposed to force a process to flush its own dirty pages to disk once dirty_ratio has been reached), there would still be the problem of the kernel's evicting hot pages from RAM so aggressively in low-memory conditions that interactivity of the system is compromised to the point where it's impossible for the user to resolve the memory shortage.

It's pretty easy to reproduce the thrashing scenario: just mount a tmpfs whose max size is close to the amount of physical memory in the system and start writing data to it. Eventually you may find that you are no longer able to do anything, even to give input focus to your terminal emulator so you can interrupt your writing process (or in some setups, even to move your mouse cursor on the screen), because your entire desktop environment and even the X server have been evicted from RAM and are continually paging back in from disk (and being immediately evicted again), hindering your ability to do anything. I've encountered this scenario while compiling Chromium in a tmpfs. I'd expect the OOM killer to activate, but instead I find that all of my running applications are responding at a snail's pace because they have to keep paging in bits of their program code from disk. I should mention that I run without swap.

I would think one way to solve the thrashing problem would be to introduce a kernel knob that would set how much time must elapse between a page being fetched from disk into RAM due to a page fault and that page becoming eligible for eviction from RAM. If set to, say, 30 seconds, then the user's interactive processes could retain a usable degree of interactivity, even under extremely low memory conditions. This would, of course, mean that the OOM killer would activate sooner than it does now, since pages that the kernel would presently choose to evict in order to free up RAM would be ineligible under this new time limit. Setting the knob to zero would yield the behavior we have now, in which the kernel is free to evict all unlocked pages.

I'll reiterate once more, as a refresher, that this was formerly not such a problem on 32-bit x86 systems because most library code there contained relocations that would cause the pages containing the code for libraries to differ from disk, so they could not be evicted (assuming no swap). Now that we use position-independent code on x86_64, most executable pages in RAM are identical to the copies on disk, so they are eligible for evicting since the kernel can just page them back in from disk when they're needed. That convenience turns on us when we find that pages that are needed very frequently (like pages that handle moving the mouse cursor or blinking a cursor) are being evicted aggressively.

Revision history for this message

In Linux Kernel Bug Tracker #12309, nalimilan (nalimilan-linux-kernel-bugs) wrote on 2012-02-22:

#727

Maybe this:
http://lwn.net/Articles/467328/

Revision history for this message

In Linux Kernel Bug Tracker #12309, kernel (kernel-linux-kernel-bugs) wrote on 2012-02-22:

#728

(In reply to comment #567)
> Maybe this:
> http://lwn.net/Articles/467328/

Interesting. Thanks for the link. However, this article doesn't explain why we see thrashing and extremely degraded interactivity on systems that don't have HugeTLB support enabled in the kernel (such as mine). This reinforces the point that there are many scenarios that exhibit poor interactive responsiveness under heavy disk writing load.

Regarding this debate about the transparent huge pages, I have to wonder why the kernel would bother trying to create a huge page in a location where there are dirty pages waiting to be written to disk. Shouldn't it just choose some other area in RAM that doesn't intersect any dirty buffers? This isn't really the place for a discussion of page compaction, though, so I'll discourage anyone from responding to my idle musing here.

Revision history for this message

In Red Hat Bugzilla #562662, Josh (josh-redhat-bugs) wrote on 2012-02-28:

#146

There were further fixes for this issue in 3.2. Is this problem still there on 3.2.7 or newer?

If anyone is willing to test 3.3-rc5, this build should also contain the fixes Dave mentioned:

http://koji.fedoraproject.org/koji/buildinfo?buildID=301620

Revision history for this message

Janusz (yorashtan2) wrote on 2012-03-02:

#58

I discovered that using the noop scheduler helps. However, they seem to have finally fixed this bug.

I'm on Ubuntu 11.10 with 3.3.0-rc5 and these are my results (copying from an external usb drive):

rsync:
733507584 100% 30.89MB/s 0:00:22 (xfer#1, to-check=0/1)

Somebody should verify with 3.2 as I guess this will be the kernel that will ship with Precise.

Revision history for this message

Janusz (yorashtan2) wrote on 2012-03-02:

#59

I did that test with cfq.

Revision history for this message

Damir Butmir (d4m1r2) wrote on 2012-03-09:

#60

This is still an issue in Ubuntu 11.10 (32 bit) with the most up to date kernel provided through update manager:

Linux Damir-Ubuntu 3.0.0-16-generic-pae #28-Ubuntu SMP Fri Jan 27 19:24:01 UTC 2012 i686 athlon i386 GNU/Linux

I do not get transfer speeds faster than 7-8mb/s to a 8GB external USB stick....I cannot believe this issue is so old and it hasn't been addressed yet, this is a critical bug!!!

Joseph Salisbury (jsalisbury) on 2012-03-09

tags:

added: kernel-fixed-upstream

Revision history for this message

adri58 (adri58) wrote on 2012-03-09: Re: [Bug 500069] Re: USB file transfer causes system freezes; ops take hours instead of minutes

#61

I filled several bugs almost 1 year ago and, today I still have the same
problem.
Even with the latest 3.2.5 kernel
Therefore, it seems that reporting problems is useless. That's my point of
view.
I think that slow usb transfer must be a highly critical bug, and most of
the effort should be put on it.
I have another bug pending to be solved (not critical), and there is no
solution yet.

Sorry for my English.
Bye!
El 09/03/2012 18:12, "Joseph Salisbury" <email address hidden>
escribió:

> ** Tags added: kernel-fixed-upstream
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (477843).
> https://bugs.launchpad.net/bugs/500069
>
> Title:
> USB file transfer causes system freezes; ops take hours instead of
> minutes
>
> Status in The Linux Kernel:
> Confirmed
> Status in “linux” package in Ubuntu:
> Confirmed
> Status in “linux” package in Fedora:
> Unknown
>
> Bug description:
> USB Drive is a MP3 Player 2GB
>
> sbec@Diamant:~$ lsusb
> Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 003 Device 002: ID 046d:c50e Logitech, Inc. MX-1000 Cordless Mouse
> Receiver
> Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 001 Device 004: ID 0402:5661 ALi Corp.
> Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
> sbec@Diamant:~$
>
> Linux Diamant 2.6.31-15-generic #50-Ubuntu SMP Tue Nov 10 14:54:29 UTC
> 2009 i686 GNU/Linux
> Ubuntu 2.6.31-15.50-generic
>
> to test, i issued dd command:
> dd if=/dev/zero of=/media/usb-disk/test-file bs=32
>
> while dd is running i run dstat.... this is in the log file attached.
>
> other logs are also in the tar.gz file...
>
> there is a huge USB performance Bug report #1972262. this Report is
> something simular
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linux/+bug/500069/+subscriptions
>

I filled several bugs almost 1 year ago and, today I still have the same
problem.
Even with the latest 3.2.5 kernel
Therefore, it seems that reporting problems is useless. That's my point of
view.
I think that slow usb transfer must be a highly critical bug, and most of
the effort should be put on it.
I have another bug pending to be solved (not critical), and there is no
solution yet.

Sorry for my English.
Bye!
 El 09/03/2012 18:12, "Joseph Salisbury" <joseph.salisbury@canonical.com>
escribió:

> ** Tags added: kernel-fixed-upstream
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (477843).
> https://bugs.launchpad.net/bugs/500069
>
> Title:
>  USB file transfer causes system freezes; ops take hours instead of
>  minutes
>
> Status in The Linux Kernel:
>  Confirmed
> Status in “linux” package in Ubuntu:
>  Confirmed
> Status in “linux” package in Fedora:
>  Unknown
>
> Bug description:
>  USB Drive is a MP3 Player 2GB
>
>  sbec@Diamant:~$ lsusb
>  Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
>  Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
>  Bus 003 Device 002: ID 046d:c50e Logitech, Inc. MX-1000 Cordless Mouse
> Receiver
>  Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
>  Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
>  Bus 001 Device 004: ID 0402:5661 ALi Corp.
>  Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
>  sbec@Diamant:~$
>
>  Linux Diamant 2.6.31-15-generic #50-Ubuntu SMP Tue Nov 10 14:54:29 UTC
> 2009 i686 GNU/Linux
>  Ubuntu 2.6.31-15.50-generic
>
>  to test, i issued dd command:
>  dd if=/dev/zero of=/media/usb-disk/test-file bs=32
>
>  while dd is running i run dstat.... this is in the log file attached.
>
>  other logs are also in the tar.gz file...
>
>  there is a huge USB performance Bug report #1972262. this Report is
>  something simular
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linux/+bug/500069/+subscriptions
>

Joseph Salisbury (jsalisbury) on 2012-03-13

Changed in linux (Ubuntu):
status:	Confirmed → Triaged

Revision history for this message

In Red Hat Bugzilla #562662, Dave (dave-redhat-bugs) wrote on 2012-03-22:

#147

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Revision history for this message

In Linux Kernel Bug Tracker #12309, gatekeeper.mail (gatekeeper.mail-linux-kernel-bugs) wrote on 2012-03-28:

#729

Large IOW on writing/reading to/from any Hard drive disk still occurs. Wasting huge amounts of ticks on any disk IO while _waiting_ is nonsense.

Revision history for this message

Ming Lei (tom-leiming) wrote on 2012-03-31:

#62

Anyway, if you still have this kind of slow usb problem, please
post the usbmon trace(see guide in below link), otherwise it is
difficult to say where is wrong.

[1], http://www.mjmwired.net/kernel/Documentation/usb/usbmon.txt

Thanks,

On Sat, Mar 10, 2012 at 3:08 AM, adri58 <email address hidden> wrote:
> I filled several bugs almost 1 year ago and, today I still have the same
> problem.
> Even with the latest 3.2.5 kernel
> Therefore, it seems that reporting problems is useless. That's my point of
> view.
> I think that slow usb transfer must be a highly critical bug, and most of
> the effort should be put on it.
> I have another bug pending to be solved (not critical), and there is no
> solution yet.
>
> Sorry for my English.

Revision history for this message

adri58 (adri58) wrote on 2012-03-31:

#63

Download full text (6.9 KiB)

USBMON trace:

T: Bus=08 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0
D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1d6b ProdID=0001 Rev= 3.02
S: Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
S: Product=UHCI Host Controller
S: SerialNumber=0000:00:1d.2
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms

T: Bus=07 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0
D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1d6b ProdID=0001 Rev= 3.02
S: Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
S: Product=UHCI Host Controller
S: SerialNumber=0000:00:1d.1
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms

T: Bus=06 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0
D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1d6b ProdID=0001 Rev= 3.02
S: Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
S: Product=UHCI Host Controller
S: SerialNumber=0000:00:1d.0
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms

T: Bus=05 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
B: Alloc= 41/900 us ( 5%), #Int= 3, #Iso= 0
D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1d6b ProdID=0001 Rev= 3.02
S: Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
S: Product=UHCI Host Controller
S: SerialNumber=0000:00:1a.2
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms

T: Bus=05 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=1.5 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
P: Vendor=046d ProdID=c050 Rev=27.20
S: Manufacturer=Logitech
S: Product=USB-PS/2 Optical Mouse
C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr= 98mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=02 Driver=usbhid
E: Ad=81(I) Atr=03(Int.) MxPS= 5 Ivl=10ms

T: Bus=05 Lev=01 Prnt=01 Port=01 Cnt=02 Dev#= 3 Spd=1.5 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
P: Vendor=045e ProdID=00dd Rev= 1.73
S: Manufacturer=Microsoft
S: Product=Comfort Curve Keyboard 2000
C:* #Ifs= 2 Cfg#= 1 Atr=a0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=01 Driver=usbhid
E: Ad=81(I) Atr=03(Int.) MxPS= 8 Ivl=10ms
I:* If#= 1 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=00 Prot=00 Driver=usbhid
E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=10ms

T: Bus=04 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0
D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1d6b ProdID=0001 Rev= 3.02
S: Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
S: Product=UHCI Host Controller
S: SerialNumber=0000:00:1a.1
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00...

USBMON trace:

T:  Bus=08 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12   MxCh= 2
B:  Alloc=  0/900 us ( 0%), #Int=  0, #Iso=  0
D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1d6b ProdID=0001 Rev= 3.02
S:  Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
S:  Product=UHCI Host Controller
S:  SerialNumber=0000:00:1d.2
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms

T:  Bus=07 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12   MxCh= 2
B:  Alloc=  0/900 us ( 0%), #Int=  0, #Iso=  0
D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1d6b ProdID=0001 Rev= 3.02
S:  Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
S:  Product=UHCI Host Controller
S:  SerialNumber=0000:00:1d.1
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms

T:  Bus=06 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12   MxCh= 2
B:  Alloc=  0/900 us ( 0%), #Int=  0, #Iso=  0
D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1d6b ProdID=0001 Rev= 3.02
S:  Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
S:  Product=UHCI Host Controller
S:  SerialNumber=0000:00:1d.0
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms

T:  Bus=05 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12   MxCh= 2
B:  Alloc= 41/900 us ( 5%), #Int=  3, #Iso=  0
D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1d6b ProdID=0001 Rev= 3.02
S:  Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
S:  Product=UHCI Host Controller
S:  SerialNumber=0000:00:1a.2
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms

T:  Bus=05 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  2 Spd=1.5  MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
P:  Vendor=046d ProdID=c050 Rev=27.20
S:  Manufacturer=Logitech
S:  Product=USB-PS/2 Optical Mouse
C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr= 98mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=01 Prot=02 Driver=usbhid
E:  Ad=81(I) Atr=03(Int.) MxPS=   5 Ivl=10ms

T:  Bus=05 Lev=01 Prnt=01 Port=01 Cnt=02 Dev#=  3 Spd=1.5  MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
P:  Vendor=045e ProdID=00dd Rev= 1.73
S:  Manufacturer=Microsoft
S:  Product=Comfort Curve Keyboard 2000
C:* #Ifs= 2 Cfg#= 1 Atr=a0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=01 Prot=01 Driver=usbhid
E:  Ad=81(I) Atr=03(Int.) MxPS=   8 Ivl=10ms
I:* If#= 1 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=00 Prot=00 Driver=usbhid
E:  Ad=82(I) Atr=03(Int.) MxPS=   8 Ivl=10ms

T:  Bus=04 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12   MxCh= 2
B:  Alloc=  0/900 us ( 0%), #Int=  0, #Iso=  0
D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1d6b ProdID=0001 Rev= 3.02
S:  Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
S:  Product=UHCI Host Controller
S:  SerialNumber=0000:00:1a.1
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms

T:  Bus=03 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12   MxCh= 2
B:  Alloc=  0/900 us ( 0%), #Int=  0, #Iso=  0
D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1d6b ProdID=0001 Rev= 3.02
S:  Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
S:  Product=UHCI Host Controller
S:  SerialNumber=0000:00:1a.0
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms

T:  Bus=02 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=480  MxCh= 6
B:  Alloc=  0/800 us ( 0%), #Int=  0, #Iso=  0
D:  Ver= 2.00 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1d6b ProdID=0002 Rev= 3.02
S:  Manufacturer=Linux 3.2.0-2-amd64 ehci_hcd
S:  Product=EHCI Host Controller
S:  SerialNumber=0000:00:1d.7
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   4 Ivl=256ms

T:  Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=480  MxCh= 6
B:  Alloc=  0/800 us ( 0%), #Int=  0, #Iso=  0
D:  Ver= 2.00 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1d6b ProdID=0002 Rev= 3.02
S:  Manufacturer=Linux 3.2.0-2-amd64 ehci_hcd
S:  Product=EHCI Host Controller
S:  SerialNumber=0000:00:1a.7
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   4 Ivl=256ms

2012/3/31 Ming Lei <500069@bugs.launchpad.net>

> Anyway, if you still have this kind of slow usb problem, please
> post the usbmon trace(see guide in below link), otherwise it is
> difficult to say where is wrong.
>
> [1], http://www.mjmwired.net/kernel/Documentation/usb/usbmon.txt
>
> Thanks,
>
> On Sat, Mar 10, 2012 at 3:08 AM, adri58 <500069@bugs.launchpad.net> wrote:
> > I filled several bugs almost 1 year ago and, today I still have the same
> > problem.
> > Even with the latest 3.2.5 kernel
> > Therefore, it seems that reporting problems is useless. That's my point
> of
> > view.
> > I think that slow usb transfer must be a highly critical bug, and most of
> > the effort should be put on it.
> > I have another bug pending to be solved (not critical), and there is no
> > solution yet.
> >
> > Sorry for my English.
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (477843).
> https://bugs.launchpad.net/bugs/500069
>
> Title:
>  USB file transfer causes system freezes; ops take hours instead of
>  minutes
>
> Status in The Linux Kernel:
>  Confirmed
> Status in “linux” package in Ubuntu:
>  Triaged
> Status in “linux” package in Fedora:
>  Unknown
>
> Bug description:
>  USB Drive is a MP3 Player 2GB
>
>  sbec@Diamant:~$ lsusb
>  Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
>  Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
>  Bus 003 Device 002: ID 046d:c50e Logitech, Inc. MX-1000 Cordless Mouse
> Receiver
>  Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
>  Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
>  Bus 001 Device 004: ID 0402:5661 ALi Corp.
>  Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
>  sbec@Diamant:~$
>
>  Linux Diamant 2.6.31-15-generic #50-Ubuntu SMP Tue Nov 10 14:54:29 UTC
> 2009 i686 GNU/Linux
>  Ubuntu 2.6.31-15.50-generic
>
>  to test, i issued dd command:
>  dd if=/dev/zero of=/media/usb-disk/test-file bs=32
>
>  while dd is running i run dstat.... this is in the log file attached.
>
>  other logs are also in the tar.gz file...
>
>  there is a huge USB performance Bug report #1972262. this Report is
>  something simular
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linux/+bug/500069/+subscriptions
>

Revision history for this message

Ming Lei (tom-leiming) wrote on 2012-03-31:

#64

Download full text (5.1 KiB)

On Sat, Mar 31, 2012 at 9:32 PM, adri58 <email address hidden> wrote:
> USBMON trace:

The below is not usbmon trace at all, please read the doc in the link below

http://www.mjmwired.net/kernel/Documentation/usb/usbmon.txt

then post out your usbmon trace.

Also you can refer to LP624510 about how to do it.

https://bugs.launchpad.net/bugs/624510

Thanks,

>
> T: Bus=08 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
> B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0
> D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
> P: Vendor=1d6b ProdID=0001 Rev= 3.02
> S: Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
> S: Product=UHCI Host Controller
> S: SerialNumber=0000:00:1d.2
> C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
> E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms
>
> T: Bus=07 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
> B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0
> D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
> P: Vendor=1d6b ProdID=0001 Rev= 3.02
> S: Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
> S: Product=UHCI Host Controller
> S: SerialNumber=0000:00:1d.1
> C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
> E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms
>
> T: Bus=06 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
> B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0
> D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
> P: Vendor=1d6b ProdID=0001 Rev= 3.02
> S: Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
> S: Product=UHCI Host Controller
> S: SerialNumber=0000:00:1d.0
> C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
> E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms
>
> T: Bus=05 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
> B: Alloc= 41/900 us ( 5%), #Int= 3, #Iso= 0
> D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
> P: Vendor=1d6b ProdID=0001 Rev= 3.02
> S: Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
> S: Product=UHCI Host Controller
> S: SerialNumber=0000:00:1a.2
> C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
> E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms
>
> T: Bus=05 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=1.5 MxCh= 0
> D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
> P: Vendor=046d ProdID=c050 Rev=27.20
> S: Manufacturer=Logitech
> S: Product=USB-PS/2 Optical Mouse
> C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr= 98mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=02 Driver=usbhid
> E: Ad=81(I) Atr=03(Int.) MxPS= 5 Ivl=10ms
>
> T: Bus=05 Lev=01 Prnt=01 Port=01 Cnt=02 Dev#= 3 Spd=1.5 MxCh= 0
> D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
> P: Vendor=045e ProdID=00dd Rev= 1.73
> S: Manufacturer=Microsoft
> S: Product=Comfort Curve Keyboard 2000
> C:* #Ifs= 2 Cfg#= 1 Atr=a0 MxPwr=100mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=01 Driver=usbhid
> E: Ad=81(I) Atr=03(Int.) MxPS= 8 Ivl=10ms
> I:* If#= 1 Alt= 0 #EPs= 1 Cls=03(HID...

On Sat, Mar 31, 2012 at 9:32 PM, adri58 <500069@bugs.launchpad.net> wrote:
> USBMON trace:

The below is not usbmon trace at all, please read the doc in the link below

http://www.mjmwired.net/kernel/Documentation/usb/usbmon.txt

then post out your usbmon trace.

Also you can refer to LP624510 about how to do it.

https://bugs.launchpad.net/bugs/624510

Thanks,

>
> T:  Bus=08 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12   MxCh= 2
> B:  Alloc=  0/900 us ( 0%), #Int=  0, #Iso=  0
> D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
> P:  Vendor=1d6b ProdID=0001 Rev= 3.02
> S:  Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
> S:  Product=UHCI Host Controller
> S:  SerialNumber=0000:00:1d.2
> C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
> E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms
>
> T:  Bus=07 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12   MxCh= 2
> B:  Alloc=  0/900 us ( 0%), #Int=  0, #Iso=  0
> D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
> P:  Vendor=1d6b ProdID=0001 Rev= 3.02
> S:  Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
> S:  Product=UHCI Host Controller
> S:  SerialNumber=0000:00:1d.1
> C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
> E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms
>
> T:  Bus=06 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12   MxCh= 2
> B:  Alloc=  0/900 us ( 0%), #Int=  0, #Iso=  0
> D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
> P:  Vendor=1d6b ProdID=0001 Rev= 3.02
> S:  Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
> S:  Product=UHCI Host Controller
> S:  SerialNumber=0000:00:1d.0
> C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
> E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms
>
> T:  Bus=05 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12   MxCh= 2
> B:  Alloc= 41/900 us ( 5%), #Int=  3, #Iso=  0
> D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
> P:  Vendor=1d6b ProdID=0001 Rev= 3.02
> S:  Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
> S:  Product=UHCI Host Controller
> S:  SerialNumber=0000:00:1a.2
> C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
> E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms
>
> T:  Bus=05 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  2 Spd=1.5  MxCh= 0
> D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
> P:  Vendor=046d ProdID=c050 Rev=27.20
> S:  Manufacturer=Logitech
> S:  Product=USB-PS/2 Optical Mouse
> C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr= 98mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=01 Prot=02 Driver=usbhid
> E:  Ad=81(I) Atr=03(Int.) MxPS=   5 Ivl=10ms
>
> T:  Bus=05 Lev=01 Prnt=01 Port=01 Cnt=02 Dev#=  3 Spd=1.5  MxCh= 0
> D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
> P:  Vendor=045e ProdID=00dd Rev= 1.73
> S:  Manufacturer=Microsoft
> S:  Product=Comfort Curve Keyboard 2000
> C:* #Ifs= 2 Cfg#= 1 Atr=a0 MxPwr=100mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=01 Prot=01 Driver=usbhid
> E:  Ad=81(I) Atr=03(Int.) MxPS=   8 Ivl=10ms
> I:* If#= 1 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=00 Prot=00 Driver=usbhid
> E:  Ad=82(I) Atr=03(Int.) MxPS=   8 Ivl=10ms
>
> T:  Bus=04 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12   MxCh= 2
> B:  Alloc=  0/900 us ( 0%), #Int=  0, #Iso=  0
> D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
> P:  Vendor=1d6b ProdID=0001 Rev= 3.02
> S:  Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
> S:  Product=UHCI Host Controller
> S:  SerialNumber=0000:00:1a.1
> C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
> E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms
>
> T:  Bus=03 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12   MxCh= 2
> B:  Alloc=  0/900 us ( 0%), #Int=  0, #Iso=  0
> D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
> P:  Vendor=1d6b ProdID=0001 Rev= 3.02
> S:  Manufacturer=Linux 3.2.0-2-amd64 uhci_hcd
> S:  Product=UHCI Host Controller
> S:  SerialNumber=0000:00:1a.0
> C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
> E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms
>
> T:  Bus=02 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=480  MxCh= 6
> B:  Alloc=  0/800 us ( 0%), #Int=  0, #Iso=  0
> D:  Ver= 2.00 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
> P:  Vendor=1d6b ProdID=0002 Rev= 3.02
> S:  Manufacturer=Linux 3.2.0-2-amd64 ehci_hcd
> S:  Product=EHCI Host Controller
> S:  SerialNumber=0000:00:1d.7
> C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
> E:  Ad=81(I) Atr=03(Int.) MxPS=   4 Ivl=256ms
>
> T:  Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=480  MxCh= 6
> B:  Alloc=  0/800 us ( 0%), #Int=  0, #Iso=  0
> D:  Ver= 2.00 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
> P:  Vendor=1d6b ProdID=0002 Rev= 3.02
> S:  Manufacturer=Linux 3.2.0-2-amd64 ehci_hcd
> S:  Product=EHCI Host Controller
> S:  SerialNumber=0000:00:1a.7
> C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
> I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
> E:  Ad=81(I) Atr=03(Int.) MxPS=   4 Ivl=256ms

Revision history for this message

elatllat (elatllat) wrote on 2012-03-31:

#65

maybe Ming is trying to say, do this:

1) started a capture using this command:
cat /sys/kernel/debug/usb/usbmon/1u > /tmp/1u.mon.out

2) connect the external drive

3) copy a file to the external drive

4) kill the capture with CTRL-C

5) zip and add attach 1u.mon.out.zip here.

Revision history for this message

Ming Lei (tom-leiming) wrote on 2012-04-01:

#66

On Sat, Mar 31, 2012 at 11:55 PM, Dmole <email address hidden> wrote:
> maybe Ming is trying to say, do this:
>
> 1) started a capture using this command:
> cat /sys/kernel/debug/usb/usbmon/1u > /tmp/1u.mon.out
>
> 2) connect the external drive
>
> 3) copy a file to the external drive
>
> 4) kill the capture with CTRL-C
>
> 5) zip and add attach 1u.mon.out.zip here.

Exactly, that is just what I wanted.

Thanks,

Revision history for this message

adri58 (adri58) wrote on 2012-04-01:

#67

1u.mon.out Edit (4.2 MiB, application/octet-stream; name="1u.mon.out")

Here's the file

2012/4/1 Ming Lei <email address hidden>

> cat /sys/kernel/debug/usb/usbmon/1u > /tmp/1u.mon.out
>

Revision history for this message

Ming Lei (tom-leiming) wrote on 2012-04-02:

#68

On Sun, Apr 1, 2012 at 1:15 PM, adri58 <email address hidden> wrote:
https://bugs.launchpad.net/bugs/500069/+attachment/2980204/+files/1u.mon.out

adri58, thanks for your post.

From your usbmon trace, I found that it may take about ~22ms averagely
to complete writing 120KB[1] into your usb mass storage, so the max write
performance is about 5.3MB/sec, for example:

/*send WRITE cmd from host to usb mass storage device*/
ffff880037674d40 905709519 S Bo:2:007:2 -115 31 = 55534243 9f080000
00e00100 00000a2a 00000457 c60000f0 00000000 000000
ffff880037674d40 905709611 C Bo:2:007:2 0 31 >

/*write 120KB data to usb mass storage device*/
ffff8801139ac080 905709619 S Bo:2:007:2 -115 122880 = 831683e5
c00e55d7 83e9c00e 95c1f2bf fb0300c6 81908c93 c98144a9 5980441a
ffff8801139ac080 905731863 C Bo:2:007:2 0 122880 >

/*read the status of writing operation*/
ffff880037674d40 905731871 S Bi:2:007:1 -115 13 <
ffff880037674d40 905733112 C Bi:2:007:1 0 13 = 55534253 9f080000 00000000 00

The above 3 steps are an intact procedures to write 120KB into usb
mass storage device.

Also looks no any error information is found in your trace, so your problem
should be that the usb mass storage is slow device, especially wrt. writing
performance.

I suggest you to do some tests on windows to see if you can get same
performance with ubuntu.

[1], 120KB is the max transfer unit per scsi command, also it is the most
frequent transfer unit in linux usb mass storage read/write.

Thanks
--
Ming Lei

Revision history for this message

adri58 (adri58) wrote on 2012-04-02:

#69

Download full text (3.1 KiB)

In Windows I have no problem at all. So, there must be something wrong with
the Linux kernel

2012/4/2 Ming Lei <email address hidden>

> On Sun, Apr 1, 2012 at 1:15 PM, adri58 <email address hidden> wrote:
>
> https://bugs.launchpad.net/bugs/500069/+attachment/2980204/+files/1u.mon.out
>
> adri58, thanks for your post.
>
> >From your usbmon trace, I found that it may take about ~22ms averagely
> to complete writing 120KB[1] into your usb mass storage, so the max write
> performance is about 5.3MB/sec, for example:
>
> /*send WRITE cmd from host to usb mass storage device*/
> ffff880037674d40 905709519 S Bo:2:007:2 -115 31 = 55534243 9f080000
> 00e00100 00000a2a 00000457 c60000f0 00000000 000000
> ffff880037674d40 905709611 C Bo:2:007:2 0 31 >
>
> /*write 120KB data to usb mass storage device*/
> ffff8801139ac080 905709619 S Bo:2:007:2 -115 122880 = 831683e5
> c00e55d7 83e9c00e 95c1f2bf fb0300c6 81908c93 c98144a9 5980441a
> ffff8801139ac080 905731863 C Bo:2:007:2 0 122880 >
>
> /*read the status of writing operation*/
> ffff880037674d40 905731871 S Bi:2:007:1 -115 13 <
> ffff880037674d40 905733112 C Bi:2:007:1 0 13 = 55534253 9f080000 00000000
> 00
>
> The above 3 steps are an intact procedures to write 120KB into usb
> mass storage device.
>
> Also looks no any error information is found in your trace, so your problem
> should be that the usb mass storage is slow device, especially wrt. writing
> performance.
>
> I suggest you to do some tests on windows to see if you can get same
> performance with ubuntu.
>
> [1], 120KB is the max transfer unit per scsi command, also it is the most
> frequent transfer unit in linux usb mass storage read/write.
>
> Thanks
> --
> Ming Lei
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (477843).
> https://bugs.launchpad.net/bugs/500069
>
> Title:
> USB file transfer causes system freezes; ops take hours instead of
> minutes
>
> Status in The Linux Kernel:
> Confirmed
> Status in “linux” package in Ubuntu:
> Triaged
> Status in “linux” package in Fedora:
> Unknown
>
> Bug description:
> USB Drive is a MP3 Player 2GB
>
> sbec@Diamant:~$ lsusb
> Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 003 Device 002: ID 046d:c50e Logitech, Inc. MX-1000 Cordless Mouse
> Receiver
> Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 001 Device 004: ID 0402:5661 ALi Corp.
> Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
> sbec@Diamant:~$
>
> Linux Diamant 2.6.31-15-generic #50-Ubuntu SMP Tue Nov 10 14:54:29 UTC
> 2009 i686 GNU/Linux
> Ubuntu 2.6.31-15.50-generic
>
> to test, i issued dd command:
> dd if=/dev/zero of=/media/usb-disk/test-file bs=32
>
> while dd is running i run dstat.... this is in the log file attached.
>
> other logs are also in the tar.gz file...
>
> there is a huge USB performance Bug report #1972262. this Report is
> something simular
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linux/+bug/500069/+subscri...

In Windows I have no problem at all. So, there must be something wrong with
the Linux kernel

2012/4/2 Ming Lei <500069@bugs.launchpad.net>

> On Sun, Apr 1, 2012 at 1:15 PM, adri58 <500069@bugs.launchpad.net> wrote:
>
> https://bugs.launchpad.net/bugs/500069/+attachment/2980204/+files/1u.mon.out
>
> adri58, thanks for your post.
>
> >From your usbmon trace, I found that it may take about ~22ms averagely
> to complete writing 120KB[1] into your usb mass storage, so the max write
> performance is about 5.3MB/sec, for example:
>
> /*send WRITE cmd from host to usb mass storage device*/
> ffff880037674d40 905709519 S Bo:2:007:2 -115 31 = 55534243 9f080000
> 00e00100 00000a2a 00000457 c60000f0 00000000 000000
> ffff880037674d40 905709611 C Bo:2:007:2 0 31 >
>
> /*write 120KB data to usb mass storage device*/
> ffff8801139ac080 905709619 S Bo:2:007:2 -115 122880 = 831683e5
> c00e55d7 83e9c00e 95c1f2bf fb0300c6 81908c93 c98144a9 5980441a
> ffff8801139ac080 905731863 C Bo:2:007:2 0 122880 >
>
> /*read the status of writing operation*/
> ffff880037674d40 905731871 S Bi:2:007:1 -115 13 <
> ffff880037674d40 905733112 C Bi:2:007:1 0 13 = 55534253 9f080000 00000000
> 00
>
> The above 3 steps are an intact procedures to write 120KB into usb
> mass storage device.
>
> Also looks no any error information is found in your trace, so your problem
> should be that the usb mass storage is slow device, especially wrt. writing
> performance.
>
> I suggest you to do some tests on windows to see if you can get  same
> performance with ubuntu.
>
> [1], 120KB is the max transfer unit per scsi command, also it is the most
> frequent transfer unit in linux usb mass storage read/write.
>
> Thanks
> --
> Ming Lei
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (477843).
> https://bugs.launchpad.net/bugs/500069
>
> Title:
>  USB file transfer causes system freezes; ops take hours instead of
>  minutes
>
> Status in The Linux Kernel:
>  Confirmed
> Status in “linux” package in Ubuntu:
>  Triaged
> Status in “linux” package in Fedora:
>  Unknown
>
> Bug description:
>  USB Drive is a MP3 Player 2GB
>
>  sbec@Diamant:~$ lsusb
>  Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
>  Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
>  Bus 003 Device 002: ID 046d:c50e Logitech, Inc. MX-1000 Cordless Mouse
> Receiver
>  Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
>  Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
>  Bus 001 Device 004: ID 0402:5661 ALi Corp.
>  Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
>  sbec@Diamant:~$
>
>  Linux Diamant 2.6.31-15-generic #50-Ubuntu SMP Tue Nov 10 14:54:29 UTC
> 2009 i686 GNU/Linux
>  Ubuntu 2.6.31-15.50-generic
>
>  to test, i issued dd command:
>  dd if=/dev/zero of=/media/usb-disk/test-file bs=32
>
>  while dd is running i run dstat.... this is in the log file attached.
>
>  other logs are also in the tar.gz file...
>
>  there is a huge USB performance Bug report #1972262. this Report is
>  something simular
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linux/+bug/500069/+subscriptions
>

Revision history for this message

Ming Lei (tom-leiming) wrote on 2012-04-02:

#70

On Mon, Apr 2, 2012 at 10:28 PM, adri58 <email address hidden> wrote:
> In Windows I have no problem at all.

OK, so what is your problem in linux? just you fell that the writing
is very slow?
or USB file transfer may cause system freezes?

> So, there must be something wrong with
> the Linux kernel

Could you post output of the below commands on your effected machine?

        uname -a
        lsusb -vv #plug your usb mass storage into machine
        lcpci -vv -n

Thanks
--
Ming Lei

Revision history for this message

elatllat (elatllat) wrote on 2012-04-02:

#71

Download full text (3.2 KiB)

USB 1 is 001.5 MB/s
USB 2 is 060.0 MB/s
USB 3 is 625.0 MB/s

adri58 is getting 5.3 MB/s, I would bet that is the max speed of his device.
But if not please post the output of a disk speed testing tool from some other OS.

Speed tests showing how slow flash drives are:

---------------------------------------------------------------------------------------------------------------------------
--flash fat32 drive

Darwin imac 11.3.0 Darwin Kernel Version 11.3.0: Thu Jan 12 18:47:41 PST 2012; root:xnu-1699.24.23~1/RELEASE_X86_64 x86_64

writing:
1024+0 records in
1024+0 records out
102400000 bytes transferred in 32.315127 secs (3168795 bytes/sec)

reading:
1024+0 records in
1024+0 records out
102400000 bytes transferred in 3.597781 secs (28461989 bytes/sec)

---------------------------------------------------------------------------------------------------------------------------
--flash fat32 drive

Linux ubuntu 3.2.0-20-generic-pae #33-Ubuntu SMP Tue Mar 27 17:05:18 UTC 2012 i686 i686 i386 GNU/Linux

writing:
1024+0 records in
1024+0 records out
102400000 bytes (102 MB) copied, 47.9621 s, 2.1 MB/s

reading:
1024+0 records in
1024+0 records out
102400000 bytes (102 MB) copied, 3.9149 s, 26.2 MB/s

---------------------------------------------------------------------------------------------------------------------------
--sata ext4 drive

Linux ubuntu 2.6.32-39-generic #86-Ubuntu SMP Mon Feb 13 21:47:32 UTC 2012 i686 GNU/Linux

writing:
1024+0 records in
1024+0 records out
102400000 bytes (102 MB) copied, 0.759231 s, 135 MB/s

reading:
1024+0 records in
1024+0 records out
102400000 bytes (102 MB) copied, 1.74171 s, 58.8 MB/s

---------------------------------------------------------------------------------------------------------------------------
--usb ntfs drive

Linux ubuntu 2.6.32-39-generic #86-Ubuntu SMP Mon Feb 13 21:47:32 UTC 2012 i686 GNU/Linux

writing:
1024+0 records in
1024+0 records out
102400000 bytes (102 MB) copied, 3.08867 s, 33.2 MB/s

reading:
1024+0 records in
1024+0 records out
102400000 bytes (102 MB) copied, 3.27496 s, 31.3 MB/s

---------------------------------------------------------------------------------------------------------------------------
--3 usb lvm dm_crypt ext4 drives (dm_crypt is messing with the write speed here)

Linux ubuntu 2.6.32-39-generic #86-Ubuntu SMP Mon Feb 13 21:47:32 UTC 2012 i686 GNU/Linux

writing:
1024+0 records in
1024+0 records out
102400000 bytes (102 MB) copied, 0.463585 s, 221 MB/s

reading:
1024+0 records in
1024+0 records out
102400000 bytes (102 MB) copied, 3.04551 s, 33.6 MB/s

---------------------------------------------------------------------------------------------------------------------------
--the test used:

#!/bin/bash

#
# test_drive_speed.sh
#

OUT=./file1G.tmp
uname -a
echo "spin you right round" >$OUT;
sleep 1
echo -e "\nwriting:"
if [ "$1" == "-u" ]; then
dd if=/dev/urandom of=/dev/shm/$OUT bs=100000 count=1024 >/dev/null 2>&1
dd if=/dev/shm/$OUT of=$OUT bs=100000 count=1024
rm /dev/shm/$OUT;
else
dd if=/dev/zero of=$OUT bs=100000 count=1024
fi
sync
W=$(which purge);
if [ "$W" == "" ] ; then
sudo echo 3 > /proc/sys/vm/drop_caches;
else
purge;
fi
sleep 1
echo...