Comment 14 for bug 999755

Revision history for this message
Stefan Bader (smb) wrote : Re: Kernel crash on EC2 m1.large instances

So from the disassembly and the registers of the crash, it is clear that both variations at some point do a schedule which calls on to pick_next_task_fair(). As that calls into pick_next_entity() it can be assumed that (struct cfs_rq *)->nr_running is not 0.
But then (struct cfs_rq *)->rb_leftmost seem to be NULL (as well as (struct cfs_rq *)->skip).
The code in pick_next_entity() never checks for rb_leftmost to be NULL but I assume that this should never happen as long as the number of running entities is not 0, too.

This does not really help in understanding what goes wrong, but it should allow to create a special kernel that would dump the memory of the affected structure before dying. Might be interesting to see whether the areas around that pointer seem wrong as well. I have not heard, yet, of similar issues when not running as PVM guests on EC2. That points to some Xen interference but how I cannot explain.