Addition of leap second causes spuriously high CPU usage and futex lockups
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
base-files (Ubuntu) |
Undecided
|
Unassigned | |||
Lucid |
Undecided
|
Unassigned | |||
Natty |
Undecided
|
Unassigned | |||
Oneiric |
Undecided
|
Unassigned | |||
Precise |
Undecided
|
Unassigned | |||
Quantal |
Undecided
|
Unassigned | |||
linux (Ubuntu) |
Medium
|
Brad Figg | |||
Lucid |
Medium
|
Brad Figg | |||
Natty |
Medium
|
Brad Figg | |||
Oneiric |
Medium
|
Brad Figg | |||
Precise |
Medium
|
Brad Figg | |||
Quantal |
Medium
|
Brad Figg |
Bug Description
[Impact]
Software that relies on fine-grained pthread timeouts will spin indefinitely and drive up system load following a leap second, when the kernel's idea of time has become desynced and sub-1s timeouts are all hit immediately. Mysql and Java are in particular reported to be affected by this. This is a transient issue, in that it will go away the first time the system is rebooted after the leap second and is expected to be fixed before the next leap second occurs; nevertheless admins have been caught off-guard by this misbehavior and in some cases may not have noticed the problem or know what to do about it, so we should help them along by resetting the kernel clock with a minimal-risk base-files update.
[Test Case]
1. Find a system that has been online, with mysqld or a java-based process running since before 2012-06-30.
2. Verify that one or more processes on the system are spinning in futex and driving up the system load.
3. Upgrade to the base-files package from -proposed.
4. Verify that the system load comes back down immediately.
5. A stress-test for leap-second handling has been provided at https:/
[Regression potential]
No analysis has been done on the effect of resetting the date on applications that require a high-accuracy clock. While this fixes the problem with the pthreads interfaces, it may cause other problems for other software. Since the proposed fix is to reset the kernel's date to the current date, which is not atomic, there will be a slight skew of the clock backwards in time. ntp *should* fix this shortly thereafter for machines that have it enabled.
Also, because there's a single version check for each copy of the SRU, users whose applications are negatively affected by the running of this date command will also be negatively affected on each subsequent upgrade of the system, up to and including the quantal devel release.
As widely reported, the addition of the leap second on 2012-06-30 has
caused high CPU usage and futex lockups in a lot of applications
including JVMs, Mysql as well as desktop apps like Firefox and
Thunderbird.
https:/
http://
https:/
We've seen this ourselves on the Canonical infrastructure on both
current Lucid and Precise kernels, i.e.
ii linux-image-
ii linux-image-
We can also confirm the 'date -s $(date)' workaround fixes the problem
without requiring a reboot.
Adam Conrad (adconrad) wrote : | #1 |
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
tags: | added: kernel-da-key lucid precise |
James Troup (elmo) wrote : | #2 |
This is perhaps redundant, but for the avoidance of doubt, a reboot
does appear to fix the problem too.
Changed in linux (Ubuntu): | |
assignee: | nobody → Canonical Kernel Team (canonical-kernel-team) |
importance: | Medium → Undecided |
status: | New → Triaged |
importance: | Undecided → Medium |
Michael S. Fischer (otterley) wrote : | #3 |
Relevant LKML thread:
Changed in linux (Ubuntu Precise): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Oneiric): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Natty): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Lucid): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Quantal): | |
assignee: | Canonical Kernel Team (canonical-kernel-team) → Andy Whitcroft (apw) |
Changed in linux (Ubuntu Quantal): | |
assignee: | Andy Whitcroft (apw) → Brad Figg (brad-figg) |
status: | Triaged → In Progress |
Changed in linux (Ubuntu Precise): | |
status: | New → Confirmed |
status: | Confirmed → Triaged |
Changed in linux (Ubuntu Oneiric): | |
status: | New → Triaged |
Changed in linux (Ubuntu Natty): | |
status: | New → Triaged |
Changed in linux (Ubuntu Lucid): | |
status: | New → Triaged |
tags: | added: kernel-key |
Changed in linux (Ubuntu Lucid): | |
assignee: | nobody → Brad Figg (brad-figg) |
Changed in linux (Ubuntu Natty): | |
assignee: | nobody → Brad Figg (brad-figg) |
Changed in linux (Ubuntu Oneiric): | |
assignee: | nobody → Brad Figg (brad-figg) |
Changed in linux (Ubuntu Precise): | |
assignee: | nobody → Brad Figg (brad-figg) |
Steve Langasek (vorlon) wrote : | #5 |
13:41 < infinity> It would, perhaps, be vaguely close to harmless to have a one-time addition to the kernel postinst
13:47 < slangasek> infinity: any package other than the kernel could use a version check in the postinst
13:47 < slangasek> e.g. base-files
Launchpad Janitor (janitor) wrote : | #6 |
This bug was fixed in the package base-files - 6.5ubuntu8
---------------
base-files (6.5ubuntu8) quantal; urgency=low
* Call date -s $(date -R) on upgrade, to resync any clocks that might
be desynced (and causing pthread spinning in the kernel) due to the leap
second. LP: #1020285.
-- Steve Langasek <email address hidden> Tue, 03 Jul 2012 10:43:12 -0700
Changed in base-files (Ubuntu Quantal): | |
status: | New → Fix Released |
Hello James, or anyone else affected,
Accepted into base-files-proposed and precise-proposed. The package will build now and be available at http://
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-
Further information regarding the verification process can be found at https:/
Changed in base-files (Ubuntu Precise): | |
status: | New → Fix Committed |
Changed in linux (Ubuntu Precise): | |
status: | Triaged → Fix Committed |
status: | Fix Committed → New |
status: | New → Triaged |
Changed in base-files (Ubuntu Oneiric): | |
status: | New → Fix Committed |
tags: | added: verification-needed |
Adam Conrad (adconrad) wrote : | #8 |
Hello James, or anyone else affected,
Accepted base-files into oneiric-proposed. The package will build now and be available at http://
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-
Further information regarding the verification process can be found at https:/
description: | updated |
Adam Conrad (adconrad) wrote : | #9 |
Hello James, or anyone else affected,
Accepted base-files into natty-proposed. The package will build now and be available at http://
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-
Further information regarding the verification process can be found at https:/
description: | updated |
Changed in base-files (Ubuntu Natty): | |
status: | New → Fix Committed |
description: | updated |
Adam Conrad (adconrad) wrote : | #10 |
Hello James, or anyone else affected,
Accepted base-files into lucid-proposed. The package will build now and be available at http://
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-
Further information regarding the verification process can be found at https:/
Changed in base-files (Ubuntu Lucid): | |
status: | New → Fix Committed |
> For the record, those looking for a runtime workaround might prefer:
>
> date -u -s "$(date -u -R)"
>
> The extra switches are to avoid locales and ambiguous timezones getting in your way, and the quoting is, well, for proper quoting. :P
For what it's worth, an even simpler command to do this is:
date -s now
(as mentioned at the top of
http://
)
Nathan
Debbugs #679882 pulls together a list of various leap-second-related kernel patches:
http://
Steve Langasek (vorlon) wrote : | #13 |
SRUs withdrawn; the window when this would have been a useful SRU has since passed.
Changed in base-files (Ubuntu Lucid): | |
status: | Fix Committed → Won't Fix |
Changed in base-files (Ubuntu Natty): | |
status: | Fix Committed → Won't Fix |
Changed in base-files (Ubuntu Oneiric): | |
status: | Fix Committed → Won't Fix |
Changed in base-files (Ubuntu Precise): | |
status: | Fix Committed → Won't Fix |
Changed in base-files (Ubuntu Quantal): | |
status: | Fix Released → Won't Fix |
Note that even though it's been a while since the leap second, a kernel affected by this bug could persist with its desynced internal idea of time, and the system would show no noticable symptoms until someone eventually runs an effected application. (See https:/
The attatched short Python script can be used to check a particular system to see if the kernel is left in that desynced state (and avoids causing high CPU usage during the test).
(Revised the script to set its exit status based on the results of the testing.)
Red Hat bug no. 836803, "RHEL6: Potential fix for leapsecond caused futex related load spikes":
https:/
Changed in linux (Ubuntu Lucid): | |
status: | Triaged → Fix Committed |
Luis Henriques (henrix) wrote : | #17 |
This bug is awaiting verification that the kernel for Lucid in -proposed solves the problem (2.6.32-42.95). Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-
If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/
tags: | added: verification-needed-lucid |
Luis Henriques (henrix) wrote : | #18 |
I've executed the test case on a Lucid system. Here's the output for the 2.6.32-41.94 kernel:
# uname -a
Linux lucid 2.6.32-41-generic #94-Ubuntu SMP Fri Jul 6 16:51:39 UTC 2012 i686 GNU/Linux
# ./leap_seconds
now: 1343779191:18314044 diff: 0:275081 rem: 0:0
now: 1343779191:
now: 1343779192:18885544 diff: 0:132426 rem: 0:0
now: 1343779192:
now: 1343779193:19389186 diff: 0:203142 rem: 0:0
now: 1343779193:
now: 1343779194:20769790 diff: 0:154342 rem: 0:0
now: 1343779194:
now: 1343779195:21301107 diff: 0:368364 rem: 0:0
now: 1343779195:
now: 1343779196:21818618 diff: 0:207906 rem: 0:0
now: 1343779196:
now: 1343779197:22276862 diff: 0:206450 rem: 0:0
now: 1343779197:
now: 1343779198:22686111 diff: 0:208255 rem: 0:0
now: 1343779198:
now: 1343779199:24238584 diff: 0:1313652 rem: 0:0
now: 1343779199:
now: 1343779199:24829471 diff: 0:-999711119 rem: 0:0
now: 1343779199:24938078 diff: 0:-499891393 rem: 0:0
now: 1343779199:24982361 diff: 0:-499955717 rem: 0:0
now: 1343779199:25022793 diff: 0:-499959568 rem: 0:0
now: 1343779199:25063446 diff: 0:-499959347 rem: 0:0
now: 1343779199:25104451 diff: 0:-499958995 rem: 0:0
now: 1343779199:25147178 diff: 0:-499957273 rem: 0:0
now: 1343779199:25193846 diff: 0:-499953332 rem: 0:0
now: 1343779199:25243894 diff: 0:-499949952 rem: 0:0
now: 1343779199:25286937 diff: 0:-499956957 rem: 0:0
now: 1343779199:25332647 diff: 0:-499954290 rem: 0:0
now: 1343779199:25374189 diff: 0:-499958458 rem: 0:0
now: 1343779199:25413411 diff: 0:-499960778 rem: 0:0
...
# dmesg
[ 161.756872] Clock: inserting leap second 23:59:60 UTC
With the new 2.6.32-42.95 kernel, here's the output:
# uname -a
Linux lucid 2.6.32-42-generic #95-Ubuntu SMP Wed Jul 25 15:57:54 UTC 2012 i686 GNU/Linux
root@lucid:
now: 1343692791:
now: 1343692791:
now: 1343692792:
now: 1343692792:
now: 1343692793:
now: 1343692793:
now: 1343692794:
now: 1343692794:
now: 1343692795:
now: 1343692795:
now: 1343692796:
now: 1343692796:
now: 1343692797:
now: 1343692797:
now: 1343692798:
now: 1343692798:
now: 1343692799:
now: 1343692799:
now: 1343692800:
now: 1343692800:
now: 1343692801:
now: 1343692801:
tags: |
added: verification-done-lucid removed: verification-needed-lucid |
Adam Conrad (adconrad) wrote : Update Released | #19 |
The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.
Launchpad Janitor (janitor) wrote : | #20 |
This bug was fixed in the package linux - 2.6.32-42.95
---------------
linux (2.6.32-42.95) lucid-proposed; urgency=low
[Luis Henriques]
* Release Tracking Bug
- LP: #1027831
[ Upstream Kernel Changes ]
* hugetlb: fix resv_map leak in error path
- LP: #1004621
- CVE-2012-2390
* mm: fix vma_resv_map() NULL pointer
- LP: #1004621
- CVE-2012-2390
* net: sock: validate data_len before allocating skb in
sock_
- LP: #1006622
- CVE-2012-2136
* 2.6.32.x: ntp: Fix leap-second hrtimer livelock
- LP: #1020285
* 2.6.32.x: ntp: Correct TAI offset during leap second
- LP: #1020285
* 2.6.32.x: timekeeping: Fix CLOCK_MONOTONIC inconsistency during
leapsecond
- LP: #1020285
* 2.6.32.x: time: Move common updates to a function
- LP: #1020285
* 2.6.32.x: hrtimer: Provide clock_was_
- LP: #1020285
* 2.6.32.x: timekeeping: Fix leapsecond triggered load spike issue
- LP: #1020285
* 2.6.32.x: timekeeping: Maintain ktime_t based offsets for hrtimers
- LP: #1020285
* 2.6.32.x: hrtimers: Move lock held region in hrtimer_interrupt()
- LP: #1020285
* 2.6.32.x: timekeeping: Provide hrtimer update function
- LP: #1020285
* 2.6.32.x: hrtimer: Update hrtimer base offsets each hrtimer_interrupt
- LP: #1020285
* 2.6.32.x: timekeeping: Add missing update call in timekeeping_
- LP: #1020285
-- Luis Henriques <email address hidden> Tue, 24 Jul 2012 16:34:35 +0100
Changed in linux (Ubuntu Lucid): | |
status: | Fix Committed → Fix Released |
tags: | removed: kernel-key |
Herton R. Krzesinski (herton) wrote : | #21 |
Fixed already on Oneiric (3.0.0-24.40) and Precise (3.2.0-29.46), they didn't have a BugLink pointing here as the fixes were applied through stable updates, so no automatic status change to Fix Released.
Changed in linux (Ubuntu Oneiric): | |
status: | Triaged → Fix Released |
Changed in linux (Ubuntu Precise): | |
status: | Triaged → Fix Released |
Herton R. Krzesinski (herton) wrote : | #22 |
By the way, I confirmed through leapseconds test that they are ok on Oneiric/Precise (using leapseconds test integrated on our autotest).
Herton R. Krzesinski (herton) wrote : | #23 |
This is fixed already for Quantal as well, and leap_seconds test is passing on latest Quantal kernel.
Changed in linux (Ubuntu Quantal): | |
status: | In Progress → Fix Released |
jan (jan-ubuntu-h-i-s) wrote : | #24 |
This solution may have introduced a regression problem on Lucid.
https:/
Abel Lopez (al592b) wrote : | #25 |
Any update on a natty build for this? The next leap second is coming up end of december
Paul Collins (pjdc) wrote : | #26 |
No leap second is scheduled for December 2012.
Abel Lopez (al592b) wrote : | #27 |
Sorry, didn't mean to split hairs, For all intents and purposes, with the holiday schedule, I'm considering Jan 1 2013 as good as Dec 31st 2012.
Point being, Jan 1 has a leap second
ftp://tycho.
Paul Collins (pjdc) wrote : | #28 |
Leap seconds are inserted at the end of a given six-month period. You can see that the leap second that was inserted at the end of 30 June 2012 (IERS bulletin C43) results in an update for "1 Jul 2012" in your file. You can then see that the line for "1 Jan 2013", which is not yet enabled (note the "#" at the beginning of the line) would correspond to a leap second inserted at the end of 31 December 2012, which, as IERS bulletin C44 states, will not occur.
You can obtain bulletin C44 at my link above, and more of them are available at http://
Also, Ubuntu 11.04 (Natty) has reached end of life, so I would imagine it's unlikely that this issue will be addressed for that release. https:/
Julian Wiedmann (jwiedmann) wrote : | #29 |
This release has reached end-of-life [0].
Changed in linux (Ubuntu Natty): | |
status: | Triaged → Invalid |
For the record, those looking for a runtime workaround might prefer:
date -u -s "$(date -u -R)"
The extra switches are to avoid locales and ambiguous timezones getting in your way, and the quoting is, well, for proper quoting. :P