Comment 10 for bug 999755

Revision history for this message
Stefan Bader (smb) wrote : Re: Kernel crash on EC2 m1.large instances

The Xen version is mostly to check for correlations. So far both times those have been reported it seemed to be a 3.4.3 version (not sure whether one can still hit other versions, but if then it would be great to know whether those are affected the same way). I know its not possible to know what one gets beforehand.

At least in all cases it is the same fatal condition. I suspect some pointer is unexpectedly NULL and then trying to reference an element at offset 0x10. It is in the scheduling code but at least there were two paths that seemed to lead there. So maybe it is not so much a specific action than scheduling new processes a lot...

This likely will be a beast to find as whatever is the cause is not the place it breaks. I'll try to isolate the exact instruction that breaks. Maybe it is possible to find out more by detecting the corruption and dumping more information. And also try to re-create this on a local system hopefully close enough to ec2...

Meanwhile if anybody can confirm breakage (or not) with other versions of Xen that may or may not be seen, please add to this report.