Comment 12 for bug 651370

Revision history for this message
Brandon Black (blblack) wrote :

Well, I had a hunch this morning that perhaps my test AMI was faulty (perhaps some stupid issue related to block-device mapping, etc, which varies between the variations on c1.xlarge), since it wasn't packaged by the same methods/tools as the official one.

It seems this may be the case. Going off the hint from Mikael that m2.4xlarge may exhibit the problems more reliably, I did the following experiment this morning using EBS root persistence to make the change, rather than custom instance-store AMIs:

1) Booted ami-548c783d (Maverick 64-bit EBS official) on m1.large in us-east-1.
2) Logged into this machine and edited /boot/grub/menu.lst manually to add "intel_idle.max_cstate=0 idle=nomwait" to the kernel bootflags.
3) Rebooted, instance came up fine with messages showing intel_idle disabled.
4) Stopped the instance, used ec2-modify-instance-attributes to move it to type m2.4xlarge
5) Booted on m2.4xlarge successfully, no crash (cpuinfo shows Xeon X5550, which is also "model 26" like the failing c1.xlarges)
6) Edited menu.lst to remove the added bootflags and rebooted the instance again, (staying on same m2.4xlarge hardware)
7) Instance crashed on boot in intel_idle code as always

Given these results, I think the kernel flags will workaround this issue, I just built a bad test AMI during my first tests yesterday. Could someone rebuild a set of Maverick AMIs with these flags added from the get-go using whatever the official method of packaging Maverick AMIs is, for public testing among those of us experiencing the bug?