Comment 324 for bug 620074

Revision history for this message
In , thomas.pi (thomas.pi-linux-kernel-bugs) wrote :

Created attachment 21054
test case: Takes the time of mouse click events

All my results shows a high probability of high latencies, when there is a high system time. Most posts where related on high latencies during high IO with SSH connection or with the X-Server. Both uses a network/socket connection. The bug may be in the network stack and not in the io scheduler or block layer.

Here my first test.
The "Example Network Job" test (Flexible IO Tester) shows a regression since 2.6.22.
(see the last test on http://global.phoronix-test-suite.com/?k=profile&u=ebird-3722-22013-9288 )

And here the mouse click test. This test case shows exactly the same regression on all kernels and the same behavior I have recognized in a real environment.

It's !!not!! caused by the fsync bug.

The test case is just clicking on a label and takes the time till the event arrives. It's using the platform's native input queue (see java.util.Robot).

The test case is only a quick solution and has no error handling. It expects a factor as parameter. A high factor like 40.0 means a high sensitiveness and produces a high probability for high latencies, but increases the probability for a missing precondition (no high cpu usage and no high system time) on the current kernels. A value below 5.0 means a bad sensitiveness, which reduces the system time and reduces the probability of capture a high latency event. These values may differ on other machines, as it is not tested on other machines.

For generating the high io, I have used the following commands, but it's enough to copy a big folder (> memory size) too.
# for i in 1 2 3 4 5 6; do dd if=/dev/zero of=t-$i bs=1M count=1K & done

The error occurs with the kernels 2.6.17, 2.6.18 and 2.6.20 only while the cache is filling up withing the first five seconds.

kernel no IO high IO
2.6.17 max 160ms max 35ms (max 2.859s within the first 5 seconds)
2.6.18 max 152ms max 101ms (max 2.430s within the first 5 seconds)
2.6.20 max 164ms max 100ms (max 1.049s within the first 5 seconds)

2.6.27 max 46ms max 6.988s (during IO)
2.6.28 max 51ms max 3.778s (during IO)
2.6.29 max 99ms max 3.632s (during IO)
2.6.30-rc2 max 50ms max 4.993s (during IO)

Unable to run test on this kernel, because of missing preconditions.
2.6.22
2.6.30-rc2 (smp) max 3.624s (during IO)

An output like this or no cpu usage means missing preconditions for the test, reduce the factor.
> High total latency of last 19 events at 138.783s - total latency : 646ms

A factor below 5.0 means the test is not able to be run on this kernel.

P.S.
All tests where done on a kernel without SMP support to reduce multi core scheduler differences with a 250Hz timer and without cpu scaling.
On multi cores system you should busy n-1 cores with an job like this.
# bzip2 -c /dev/zero >/dev/null &