Comment 41 for bug 1011792

Revision history for this message
Steven Noonan (steven-valvesoftware) wrote :

Stefan, the kernel version in the Amazon Linux AMI that Matt pointed at is 3.2.21-1.32.6.amzn1.x86_64, so it is very close to comparable with the affected Ubuntu kernel (yes, there are source differences, but they at least have a merge base of 3.2.21 so they share significant lineage).

I was able to reproduce the deadlock on the latest Ubuntu 12.04 PV AMI (ami-3d4ff254), running linux-image-3.2.0-31-virtual (3.2.0-31.50). It didn't take very long until the thing was totally frozen (less than a GiB of space on /mnt consumed).

I'm currently trying something new now. I've built the same Ubuntu kernel from git (Ubuntu-3.2.0-31.50-0-g0d9657d), but instead of using the Ubuntu kernel config, I grabbed the kernel config used in the Amazon Linux AMI Matt mentioned (ami-aecd60c7). So far it hasn't keeled over (at 13GiB right now, running for about 1 hour 20 minutes).

I'm looking through the config diff right now, from Ubuntu config -> Amazon Linux config. They have significant differences, but mostly in the drivers selected. These are some highlights that stand out to me:

CONFIG_DEFAULT_IOSCHED="noop" instead of "deadline"
CONFIG_HZ=1000 instead of CONFIG_HZ=250
No CONFIG_IOSCHED_{DEADLINE,CFQ}
No CONFIG_COMPACTION
No CONFIG_CLEANCACHE
No CONFIG_CFS_BANDWIDTH
No CONFIG_SCHED_AUTOGROUP
No CONFIG_CGROUP_MEM_RES_CTLR*
No CONFIG_XEN_SELFBALLOONING
No hugepage-related options (HUGETLBFS, TRANSPARENT_HUGEPAGE, etc)

Numerous device drivers not relevant to a VM are disabled (CONFIG_DVB_*, CONFIG_VIDEO_*, CONFIG_SND_*, etc), though these code paths are largely not exercised in the guest regardless of whether they're built, and I'd find it difficult to believe one of these causes the deadlock.

I suspect the faulting path is hit by one of the above config options. I'll continue my investigation.