I've been putting up with the high load averages for a few months now on our production system. I've also been experiencing what I thought was an unrelated problem but I've come to suspect is tied up with this bug: every now and again the system would appear to lock up and become unresponsive but because I often keep an SSH session open to the server I can see that it's still running and load averages have spiked to over 50 and nothing can be killed and only simple processes can be started (ps). I have been able to just wait it out in the past and it fixes itself but because this is an important production system my best option is to force a restart (it usually responds to a 'reboot').
This happened every couple of weeks but recently it seems to have been happening more often. As far as I can recall this new since Lucid so I'm suspecting that it's related to this load reporting problem.
It's happened 3 times now in the last week and is becoming increasingly frustrating so I've restarted this system with one of the test kernels posted here (aki-84b75ded). I can confirm that this has fixed the original load average bug and the system has been running for 24 hours with no appreciable problems. I can report back here if the same load spike problem happens again, if it does then I guess its a new bug but I wouldn't be confident pinning it on Lucid in particular. I guess if it doesn't show up again we can assume that (a) the originally reported problem caused wider problems and (b) the new kernels have fixed those problems.
I've been putting up with the high load averages for a few months now on our production system. I've also been experiencing what I thought was an unrelated problem but I've come to suspect is tied up with this bug: every now and again the system would appear to lock up and become unresponsive but because I often keep an SSH session open to the server I can see that it's still running and load averages have spiked to over 50 and nothing can be killed and only simple processes can be started (ps). I have been able to just wait it out in the past and it fixes itself but because this is an important production system my best option is to force a restart (it usually responds to a 'reboot').
This happened every couple of weeks but recently it seems to have been happening more often. As far as I can recall this new since Lucid so I'm suspecting that it's related to this load reporting problem.
It's happened 3 times now in the last week and is becoming increasingly frustrating so I've restarted this system with one of the test kernels posted here (aki-84b75ded). I can confirm that this has fixed the original load average bug and the system has been running for 24 hours with no appreciable problems. I can report back here if the same load spike problem happens again, if it does then I guess its a new bug but I wouldn't be confident pinning it on Lucid in particular. I guess if it doesn't show up again we can assume that (a) the originally reported problem caused wider problems and (b) the new kernels have fixed those problems.