Comment 33 for bug 595555

Revision history for this message
In , tim.liim (tim.liim-redhat-bugs) wrote :

Created attachment 437758
Root cause analysis 1

I believe the issue is with Xorg, in Xext/sync.c. Here is the trace.
I saw this issue about 60-80% of the runs, on several F13 with various
(three, that is) video cards.

#1 What is g-ss thinking?
   output from "gnome-screensaver --no-daemon --debug"; see [1] for details

   Good case:
        [_gs_watcher_set_session_idle_notice] gs-watcher-x11.c:202 (12:11:16):
  Changing idle notice state: 1
      <User hit a key>
        [watcher_idle_notice_cb] gs-monitor.c:125 (12:11:19):
  Idle notice signal detected: 0 <=== '0' in good case

   Bad case:
 [_gs_watcher_set_session_idle_notice] gs-watcher-x11.c:202 (12:08:15):
  Changing idle notice state: 1
      <User hit a key>
 [watcher_idle_cb] gs-monitor.c:95 (12:08:25):
  Idle signal detected: 1 <=== '1' in bad case

#2 Does g-ss receive that fading-interruption msg at all?
   output from "strace -tt -e read=all -e write=all -o g-ss.strace
  gnome-screensaver --no-daemon --debug"; see [2] for details.
   Good case:
 write(2, "... Changing idle notice state: 1")
 read (6, "/org/gnome/SessionManager/Presence ... StatusChanged ..."
 write(2, "... Idle notice signal detected: 0")
   Bad case:
 write(2, "... Changing idle notice state: 1")
 write(2, "... Idle signal detected: 1")

   So in bad case, no, g-ss did not receive fading-interruption msg at
   all, which came in at fd 6. Who sent (or should send) this msg?
   Luckily the sender announced its name in msg: SessionManager
   ("gnome-session", "g-s" for short).

#3 Why does g-s not sending the fading-interrupt msg?
   output from "strace -tt -e read=all -e write=all -o g-s.strace
  -p $(pgrep gnome-session)" [3]
   - good case:
     g-s received a msg from fd 3,
 00:23:24.569436 read(3, "`\1\237\1\206\0\300\0\0"..., 4096) = 32
     then wrote "StatusChanged" msg on fd 6, which goes to g-ss (You
     can do strace on both g-s and g-ss at the same time to confirm;
     the msg content will be identical, but the time stamp will be
     slightly off. There is a dbus in the middle acting as event
     channel, thus the delay.).
   - bad case:
     g-s receives no msg from fd 3; instead, at the end of fading, g-ss
     sends "ActiveChanged" to inform g-s "screen saver started already."

#4 Who sent the msg to g-s on fd 3?
   This one took much tracing. Previously (bug566350) I tracked it
   down to Xorg (the X sever), in Xext/sync.c.

   sync.c provides some counter services, one of which is the idle
   counter. The idle counter (in msec) goes up as time passes, and
   resets to 0 upon user activity (mouse, keyboard). (sync.c keeps a
   "last activity time stamp," which is updated with every user
   activity. The idle counter is actually the difference of current
   time and last activity time stamp).

   g-ss uses two "alarms" on idle counter: positive transition
   ("+trans") and negative transition ("-trans"). Eg. with idle time
   set to 60 sec, g-ss registers two alarms: one +trans 60,000ms, the
   other -trans 60,000ms. When the user idles, the idle counter goes
   from 0ms and up. When it reaches 60,000ms or more, the +trans
   alarm is fired. The "transition" means the alarm is fired only
   once when the counter cross the threshold. When the user hits a
   key after idling for 70 sec, the counter drops from 70,000ms to
   0ms, triggering the -trans alarm (the counter was more than
   60,000ms, then becomes less then 60,000, thus a negative
   transition).

#5 How do I debug Xext/sync.c?
   For debug code, see bug591750 Coment#17.
   Also see bug591750 Coment#11 on how to run the debug code.
   For this bug, we will need this msg [4].
        "%s #5 SyncChangeCounter newval=%8u, oldval=%8u
               pIdleTimeValueLess=%8d, pIdleTimeValueGreater=%8d\n"
   See [5] for output for good and bad cases.
   Note that the oldval is always 60000 in bad cases.
        oldval= 60000 pIdleTimeValueLess= 9999,
   In good cases, oldval is 60001-60003ms, but never 60000.

#6 So how is a "+/- transition" defined, exactly?
   See [6] for code listing; after stripping the anti-syntatic-sugar,
   they boil down to
        test #1 (+trans): oldval < test_value <= newval
        test #2 (-trans): newval <= test_value < oldval
   This works under all usual cases. But here is a corner case that
   breaks this code.

  - Assume we have test_value of 60000ms, with a pair of +trans
    trigger (when the user is idle for >=60sec) and -trans (when the
    user is no longer idle, ie. idle time drops to be <60 sec from
    >=60 sec).
  - In good cases, the newval is usually 60001-60003ms when +trans
    passes test #1
        oldval (eg.59999) < test_value (60000) <= newval (eg. 60002)
    So when it comes to test for -trans, we have test #2
        newval (eg. 1ms) <= test_value (60000) < oldval (eg. 60002)
    which is true, so the -trans is triggered, and g-ss is no longer idle.

  - In bad cases, the newval is always 6000ms when +trans passes test #1
        oldval (eg.59999) < test_value (60000) <= newval (eg. 60000)
    So when it comes to test for -trans, we have test #2
        newval (eg. 1ms) <= test_value (60000) < oldval (eg. 60000)
    which is false, so the -trans is *not* triggered, and g-ss does not
    receives no-longer-idle notice.

  - so I think test #2 should be
        newval (eg. 1ms) < test_value (60000) <= oldval (eg. 60000)

#7 Changing test #2 to be
    SyncCheckTriggerNegativeTransition(SyncTrigger *pTrigger, CARD64 oldval)
    {
        return (pTrigger->pCounter == NULL ||
                (XSyncValueGreaterOrEqual(oldval, pTrigger->test_value) &&
                 XSyncValueLessThan(pTrigger->pCounter->value,
                                       pTrigger->test_value)));
    }
   And now my g-ss fading is interruptable, everytime. It seems the
   proposed change in Xext/sync.c fixes this bug.

I opened
    bug622651 Xext/sync.c IDLETIME counter sometimes fails to fire
              negative transition alarms
to track this issue in xorg-x11-server-Xorg-1.8.2-3.fc13.i686.

See the attachment for Notes [1]-[6], which is too long and boring for
the body of a bug comment.