Comment 56 for bug 1011792

Revision history for this message
Andrew Shieh (shandrew) wrote :

Mike, I've been successfully running this configuration:

hi1.4xlarge instance in VPC
ami-eafa5883, official Ubuntu 12 PV instance-store AMI
linux 3.2.0-31-virtual (picked up on upgrade), booting with grub kernel option "noautogroup"
These are running a write-heavy mysql replica load to XFS/MD/LVM on the SSDs.

This configuration has been stable across 10 instances for the past 34 hours.

Prior to running with "noautogroup" these instances would fail at approximately 20%/day, so there was roughly a 10% chance that they'd survive one day without at least one crashing. The failures I saw were mostly complete failures, aside from a couple where processes were in a state where they'd partially work. The console logs had "INFO: rcu_sched detected stalls on CPUs/tasks [...]" Curiously, during the failures, Cloudwatch would report a 6% cpu usage, which is one core on these instances.