Comment 42 for bug 1011792

Revision history for this message
Steven Noonan (steven-valvesoftware) wrote :

OK, with a few more iterations over kernel configs, it looks like it all comes down to this:

--- config-3.2.0-31-virtual 2012-10-10 01:02:10.000000000 +0000
+++ config-3.2.0-31-virtual-noautogroup 2012-10-11 01:33:14.886307000 +0000
@@ -144,7 +144,7 @@
 CONFIG_USER_NS=y
 CONFIG_PID_NS=y
 CONFIG_NET_NS=y
-CONFIG_SCHED_AUTOGROUP=y
+# CONFIG_SCHED_AUTOGROUP is not set
 CONFIG_MM_OWNER=y
 # CONFIG_SYSFS_DEPRECATED is not set
 CONFIG_RELAY=y

So the initial theory that autogroups were screwing things up seems to be correct. I've been running this configuration with pgslam.py going for 4 hours, 45 minutes now.

Next we should find out whether 'noautogroup' on the kernel boot line is enough to resolve this. Earlier results in this bug report indicate that doing the sysctl kernel.sched_autogroup_enabled=0 didn't resolve the problem. I postulate that the sysctl is applied too late during the boot process, and the test-critical processes are already autogrouped by the time we try to opt out.