Comment 36 for bug 999755

Revision history for this message
Stefan Bader (smb) wrote :

Sorry for not updating this before. So finally the problem has been identified. It is somewhat papered over by a change that went into linux-3.3. Though the potential race was still not fixed. What happens is that when a process calls setsid it is placed into a sperate schduling task group. However that group (a pointer) is changed while holding a different lock than when moving task to another CPU to balance the load. And even worse, while doing so (moving the task). the code before was looking up the task group pointer 4 times in a row while not being protected against changes. So whenever this crash happened, the pointer changed between two lookups to assign values to cfq scheduling elements. There could also be a inconsistency between cfq elements and rt scheduling element or between the two rt elements. Not sure which exact effects this would yield.

There is now a patch making its way upstream that will close the race. While Quantal is not affected that badly, Natty, Oneiric and Precise should be fixed.