buggy TSC_DEADLINE not disabled for xenial/trusty kernels

Bug #1741564 reported by Dan Streetman
4
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Trusty
New
Low
Unassigned
Xenial
New
Low
Unassigned

Bug Description

[Impact]
An upstream commit added Intel cpu model/microcode checks to disable TSC_DEADLINE due to cpu errata:

commit bd9240a18edfbfa72e957fc2ba831cf1f13ea073
Author: Peter Zijlstra <email address hidden>
Date: Wed May 31 17:52:03 2017 +0200

    x86/apic: Add TSC_DEADLINE quirk due to errata

That commit is included in the Ubuntu kernels starting at artful v4.13.

The Xenial 4.4 and Trusty 3.13 kernels do not yet have this commit, and so may trigger the TSC_DEADLINE microcode bug.

Some details on the errata, under HSD173 (page 66):
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf

The kernel commit initially working around this microcode errata contains a description of what happens when the bug is triggered, "TSC deadline timer stops working or creates an interrupt storm..." (from commit 855615eee9b1989cac8ec5eaae4562db081a239b which removes the workaround after adding microcode level check).

[Test Case]

On a system with a CPU model and microcode containing the TSC errata, check the /proc/cpuinfo contents; if the 'tsc_deadline_timer' feature is listed, this bug exists. After patching, it should not be listed as a CPU feature. The specific model/stepping/ucode numbers are listed in the commits (bd9240a18edfbfa72e957fc2ba831cf1f13ea073 and 616dd5872e52493863b0202632703eebd51243dc).

[Regression Potential]

The major regression potentials I see for this are:

1) incorrectly disabling the TSC_DEADLINE timer, on a system where the microcode errata doesn't apply, and
2) new reports of problems by T and X users who previously did not know their CPU microcode was buggy and now see the "TSC_DEADLINE disabled" boot log error, and report that as a "new bug".

[Other Info]

There are two follow-on fixes/refinements for this that should also be backported to T and X kernels:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1724612
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1724912

Those commits update the specific microcode levels for specific cpu model/steppings, and prevent disabling the timer on virtualized guests (where the timer is also virtualized), respectively.

Additionally several related commits add, and then remove (once the microcode-level-checks are in place) workarounds for this microcode errata; commits 5bae156241e05d25171b18ee43e49f103c3f8097 and 8c9b9d87b855226a823b41a77a05f42324497603 (and possibly others) add the workaround, then 855615eee9b1989cac8ec5eaae4562db081a239b removes it. Those are fist included in 4.10, so neither the T nor X kernels include the workaround.

Dan Streetman (ddstreet)
Changed in linux (Ubuntu Trusty):
importance: Undecided → Low
Changed in linux (Ubuntu Xenial):
importance: Undecided → Low
Changed in linux (Ubuntu):
status: New → Fix Released
Changed in linux (Ubuntu Trusty):
status: New → In Progress
Changed in linux (Ubuntu Xenial):
status: New → In Progress
Changed in linux (Ubuntu Trusty):
assignee: nobody → Dan Streetman (ddstreet)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Dan Streetman (ddstreet)
Dan Streetman (ddstreet)
Changed in linux (Ubuntu Xenial):
assignee: Dan Streetman (ddstreet) → nobody
Changed in linux (Ubuntu Trusty):
assignee: Dan Streetman (ddstreet) → nobody
Changed in linux (Ubuntu Xenial):
status: In Progress → Triaged
Changed in linux (Ubuntu Trusty):
status: In Progress → New
Changed in linux (Ubuntu Xenial):
status: Triaged → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.