Comment 40 for bug 595555

Revision history for this message
In , tim.liim (tim.liim-redhat-bugs) wrote :

Some thoughts on this issue.

1. Why 60-80% of the cases are bad? Why not 100%? Where is the
   randomness coming from?
   - Looks like it's from the timing. When alarms are triggered, they
     may or may not be right on the requested moment. It's perfectly
     fine to be late by a few msec. When it's right on time, it
     turned out be a corner case (bug612620 Comment#21) that is not
     handled well by sync.c.

   - people with faster machine (than my 1997 vintage desktop) will
     probably experience higher rate (eg. 95%) of bad cases, because
     faster machines have higher chance to trigger the alarm right on
     time.

2. Why was it not an issue in F12?
   - in F12, the alarms is also triggered right on time, which
     _would_ be the corner case. However in F12 sync.c always invoke
     SyncChangeCounter() a few more times than necessary, after the
     alarm is triggered. The net result is that the counter value is
     never right on the border.
 00:02:11.110 #5 SyncChangeCounter newval=60000, oldval=10003
 00:02:11.110 #4 SyncAlarmTriggerFired alarm id 0x00c0000d,counter=60000
 00:02:11.111 #5 SyncChangeCounter newval=60001, oldval=60000
     Note that "newval= 60001", not 60000 (the border, aka.
     test_value). In F12 the newval always ends up a few msec more
     than the test value.

   - in F13, this extra invocation of SyncChangeCounter is
     eliminated. So when the alarm is triggered, newval remains right
     on the border.
 17:34:58.532 #5 SyncChangeCounter newval=60000, oldval=20010
 17:34:58.533 #4 SyncAlarmTriggerFired alarm id 0x00c00015,counter=60000
 17:35:04.796 #5 SyncChangeCounter newval= 1, oldval=60000
     Note that, after #4 SyncAlarmTriggerFired, newval remains 60000,
     the boundary condition that exposes an existing old bug. Also
     note that the second "#5 SyncChangeCounter" in F13 was 6 sec
     later, unlike F12, which is within 1 msec.

   - so my guess is that sync.c in F13 has some good improvements
     (removing extra calls to SyncChangeCounter), which exposes an
     existing old boundary-condition bug.