Comment 152 for bug 296167

Revision history for this message
In , Peter Hutterer (peter-hutterer) wrote :

Here's a braindump of what I know so far.
I have not been able to reproduce the bug yet and since everything so far indicates a race condition, this is not an easy one to track down. This information is from a (very) remote ssh debugging session.

The server has two screen pointers around the Event Queue (EQ). One is miEventQueue.pEnqueueScreen, the other miEventQueue.pDequeueScreen. Enqueue is used during signal handling to shove new events into the EQ, Dequeue during event processing to take them out and process them further.
Both are modified through mieqSwitchScreen(), with Dequeue being conditional on a parameter.

Other interesting variables are miPointer.pScreen (the screen of the rendered sprite) and sprite.screen (the screen as seen during event processing).

The usual order of updates to these four variables is:
pEnqueueScreen -> miPointer.pScreen -> miEventQueue.pDequeueScreen -> sprite.screen

When the screwup happens, I noticed the order isn't 1,2,3,4 as above, but instead 1,2,1,2,3,4. From then on, the first two always have different values than the other two.

The question is how the screens get out of sync. I looked at the code and I can't explain it.
One remote guess is XineramaCheckMotion(), where the root x/y coordinates are used. I think by then they should be in per-screen coordinates already, so that would give us the wrong screen, possibly triggering that. Although this should happen all the time, not just sometimes.
Without being able to run gdb on a busted server, I can't really say more.