Comment 22 for bug 838811

Revision history for this message
Doug Smythies (dsmythies) wrote :

1.) Doing what was asked:
First, note that the image I needed was not available in the RC5 directory, so I used the appropriate image from the RC4 directory. See "2" below for the explanation as to why I know the results would be the same for RC5.

While the local terminal did not work with this version, I was able to connect via SSH and perform the test.

The issue is the same. See also "2" below.

doug@test-smy:~$ uname -a
Linux test-smy 3.3.0-030300rc4-generic-pae #201202181935 SMP Sun Feb 19 00:53:06 UTC 2012 i686 i686 i386 GNU/Linux
doug@test-smy:~$ cat /proc/version
Linux version 3.3.0-030300rc4-generic-pae (root@gomeisa) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #201202181935 SMP Sun Feb 19 00:53:06 UTC 2012

2.) Why these various kernel tests are not actually needed. (and please challenge me on this if you disagree):
Everything related to the load average calculations is contained in a fairly small piece of code in kernel/sched.c (or in the 3.3 kernel: /kernel/sched/core.c).
About 380 lines starting with "/* Variables and functions for calc_load */".
It is not possible for this issue to be fixed without code changes in this area, either my proposed patch or some other.
Therefore it is sufficient to compare this code area for changes. There are no changes. I got the 3.3RC5 code from kernel.org and compared it with the code from 3.3RC2 that I got a few weeks ago (see post #3 above) and compared it with the code I have been using on my 11.10 test machine.

3.) Some other notes:

I have done further readings on CodingStyle I made some violations on my proposed patch. I have fixed them and will make a new proposed patch posting.

I didn't know how to post a patch showing the differences the way others I have seen were done. I read how to do it properly (I think) and the new posting will be better (I hope).

If the process idle frequency is high enough, I thought my proposed patch was showing lower load averages than the control tests done with CONFIG_NO_HZ=n. However, more detailed showed no. Another proposed patch (variant 2) came out of this work.
Of course, it is understood that at some frequency everything will break down, and to also prevent incorrect high load averages the nohz disabled/enabled results may need to differ under some conditions. It is now realized that their is no difference between the two patches, as the one call to idle_fold almost always returned zero.

I have struggled to understand the timing relationship between calc_load_account_active and calc_global_nohz and the number of times that calc_load_fold_idle will be called in a LOAD_FREQ interval. On my (i7 8 cpu) system I have seen that typically calc_load_account_active is executed 8 times, as expected, but not always. I have tried various countings and handshakes between the two, but never had results as good as the mindless just call calc_load_fold_idle once (or not at all, see proposed patch variant 2) from calc_global_nohz during the 10 tick window.
So then one wonders what might the proposed patch have compromised? I.E. is there a senario where the patch would cause incorrect high load averages? (I have not been able to find such a senario. (Not saying they do not exist, just saying I could not find them.))

I did have a crash on 2012.02.28 running the proposed patch under extreme load conditions (c/waiter 12 100), but have been unable to repeat it.

Attachments to be posted: Updated original proposed patch; Proposed patch, variant 2; 410 hertz control graph; 410 hertz proposed solution graph; 410 hertz variant 2 graph.
See also my web notes (link in post #3 above), which contains more test results than are posted herein.