TSC Clocksource can cause hangs and time jumps

Bug #221351 reported by Tim Gardner on 2008-04-24
12
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Hardy
Medium
Tim Gardner
Intrepid
Medium
Unassigned

Bug Description

Comments from the original post :

We already catch most of the TSC problems by sanity checks, but there
is a subtle bug which has been in the code for ever. This can cause
time jumps in the range of hours.

This was reported in:
     http://lkml.org/lkml/2007/8/23/96
and
     http://lkml.org/lkml/2008/3/31/23

Tim Gardner (timg-tpi) wrote :
Changed in linux:
assignee: nobody → timg-tpi
importance: Undecided → Medium
milestone: none → ubuntu-8.04.1
status: New → Fix Committed
Tim Gardner (timg-tpi) wrote :

SRU Justification:

Impact: Time jumps can cause processes to appear to hang.

Fix Description: To prevent this TSC specific wreckage we need to compare the TSC value against the reference value and return the latter when it is larger than the actual TSC value.

Patch: http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=commit;h=91d34e2c845a1c24a5802fd353165b1d68720a1a

TEST CASE: (From the patch description) I was able to reproduce the problem with a gettimeofday loop test on a dual core and a quad core machine which both have sychronized TSCs. The TSCs seems not to be perfectly in sync though, but the kernel is not able to detect the slight delta in the sync check. Still there exists an extremly small window where this delta can be observed with a real big time jump. So far I was only able to reproduce this with the vsyscall gettimeofday implementation, but in theory this might be observable with the syscall based version as well.

Colin Watson (cjwatson) wrote :

Accepted into hardy-proposed.

Tim Gardner (timg-tpi) wrote :

linux (2.6.24-17.31) hardy-proposed; urgency=low

  [Alessio Igor Bogani]

  * rt: Fix mutex in the toshiba_acpi driver
  * rt: Updated configuration files

  [Ben Collins]

  * build: Fix revert detection in git-ubuntu-log
  * SAUCE: Re-add eeprom_bad_csum_allow module-param
    - LP: #60388

  [Stefan Bader]

  * Pulled updates to openvz custom build. Fixes openvz 'refuses to boot' problem.
    - LP: #210672
  * sched: retain vruntime, fix delayed key events when CONFIG_FAIR_GROUP_SCHED.
    - LP: #218516
  * UBUNTU: SAUCE: Add blacklist support to fix Belkin bluetooth dongle.
    - LP: #140511

  [Tim Gardner]

  * Enable CONFIG_ISCSI_TCP for -virtual
    - LP: #218215
  * build: Add fancontrol modules to powerpc64-smp debian installer
  * Fix Xen Dom0/DomU bridging
    - LP: #218126
  * TSC Clocksource can cause hangs and time jumps
    - LP: #221351
  * Kernel should use CONFIG_FAIR_CGROUP_SCHED. Fixes high load issues
    with pulseaudio.
    - LP: #188226

  [Upstream Kernel Changes]

  * KVM: MMU: prepopulate guest pages after write-protecting
    - LP: #221032

 -- Tim Gardner < <email address hidden>> Fri, 11 Apr 2008 07:59:10 -0600

Changed in linux:
assignee: timg-tpi → nobody
status: Fix Committed → Fix Released
Steve Langasek (vorlon) wrote :

Tim,

Have you already tested that the package in the archive resolves this bug? I don't think we need any further verification than that, given that this seems to be a relatively infrequent bug.

Changed in linux:
importance: Undecided → Medium
milestone: none → ubuntu-8.04.1
status: New → Fix Committed
milestone: ubuntu-8.04.1 → none
status: Fix Released → Triaged
Martin Pitt (pitti) wrote :

linux 2.6.24-17.31 copied to hardy-updates.

Changed in linux:
status: Fix Committed → Fix Released
Tim Gardner (timg-tpi) wrote :

This commit appears to cause suspend/resume regressions. Reverted in linux_2.6.24-19.33.

Changed in linux:
status: Fix Released → Invalid
Tim Gardner (timg-tpi) wrote :

Already upstream

Changed in linux:
status: Triaged → Invalid
peterr_100 (hp-rosinger) wrote :

Hi,

I think I have the same problem on my machine and its really annoying because everything freezes for couple of minutes :-(
Reported also a bug 244376 - but no response so far :-(

Did a update yesterday to the latest release but still got:

Jul 9 19:20:21 whiterabbit-desktop kernel: [ 6190.968249] Clocksource tsc unstable (delta = 159352314008 ns)

Is there somewhere a fix for this?

How can I install the fix and solve the problem?

Thx
Hans-Peter

Tim Gardner (timg-tpi) wrote :

This is the second attempt to fix this problem, this time by applying the whole patch:

http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=commit;h=898ad535e2c81e0b02628c1ee5d8400c971500dd

Changed in linux:
assignee: nobody → timg-tpi
milestone: ubuntu-8.04.1 → ubuntu-8.04.2
status: Invalid → Fix Committed
Steve Langasek (vorlon) wrote :

Accepted into -proposed, please test and give feedback here. Please see https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Steve Beattie (sbeattie) wrote :

Tim, can you attach your gettimeofday() loop test for verification? Thanks!

Steve Beattie (sbeattie) wrote :

I ended up writing my own gettimeofday(1) loop program; unfortunately, I was unable to get it to trigger the RTC problem on my single die quad-core machine. I've attached the program so that others who saw this bug can reproduce it for verification. I note that the original upstream reporter claimed to have more success triggering the vsyscall version of gettimeofday(1) (i.e. on amd64/x86_64).

Martin Pitt (pitti) wrote :

linux 2.6.24-21 copied to hardy-updates.

Changed in linux:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers