Comment 37 for bug 708920

Revision history for this message
Matt Wilson (msw-amazon) wrote :

I've done a lot of looking at this today. It feels like the problem may lie in the process scheduler. When I pin the CPU burning process to CPU0 (through "taskset -pc 0 $pid_printed_by_a_out"), and pin a bash shell also to CPU0, I see failure of the bash process to wake after sleeping (i.e., it's runnable, but CFS isn't giving it time). I've seen the bash process start to be scheduled after around 3 minutes, and I've also seen it just sit there.

Every time I've seen a scheduler debug trace (triggered via "echo w > /proc/sysrq-trigger"), there have been other runnable processes on the spinning CPU that don't seem to be getting scheduled at all.

I've not been able to reproduce this problem on the kernel used in the Amazon Linux AMI (currently 2.6.34.7). This is in line with other user's observations (http://twitter.com/#!/synack/status/30415380321140737).

I think that Canonical might need to look into what (if any) changes they've made to CFS in the 10.04 kernel tree. It's also possible that improvements have been made in CFS between 2.6.32 and 2.6.34 that account for better performance.