Improve TSC refinement (and calibration) reliability

Bug #1877858 reported by Guilherme G. Piccoli
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Guilherme G. Piccoli
Xenial
Fix Released
High
Guilherme G. Piccoli
Bionic
Fix Released
High
Guilherme G. Piccoli

Bug Description

[Impact]
* We received a report recently of a missing TSC refinement across multiple reboots of a server, in an Intel Skylake-based processor. This was only reproducible in Bionic pre-5.0.

* After checking kernel commits, we came up with 2 commits that largely improve the situation: a786ef152cdc ("x86/tsc: Make calibration refinement more robust") [git.kernel.org/linus/a786ef152cdc] and 604dc9170f24 ("x86/tsc: Use CPUID.0x16 to calculate missing crystal frequency") [git.kernel.org/linus/604dc9170f24]. We hereby request SRU for both of them.

* The first commit contains improvement in comments and in an offset to match more recent (fast) machines, but the important part is a retry mechanism in the TSC refinement (in case it fails due to some disturbance on TSC read, like NMIs/SMIs).

* The second commit is an improvement in TSC calibration for Skylake (and some other models), by checking a register instead of relying on table-based hardcoded values.

* A note for Xenial (kernel 4.4): the second patch would require the inclusion of more commits, so given the "maturity" of this release (and the fact kernel 4.15 is an HWE for Xenial), I've kept it out of Xenial, backporting only the first and more important patch for 4.4 .

[Test case]
* Unfortunately there's not an easy way to test the effectiveness of the commits, specially the refinement improvement.

* The user that reported us the missing refinements was able to test 300 reboots with a regular Bionic kernel (and it reproduced the issue at least once), whereas when they tested with Bionic kernel + both hereby proposed commits, the problem didn't happen.

* Regarding the calibration commit, it was well-tested by community using multiple machines and checking the TSC calibration read vs. tables present in instlatx64.atw.hu .

[Regression potential]
* We consider the regression potential low, specially due to the nature of the patches: the first is basically a retry mechanism (and some improvement in an offset to reflect more recent machines), and the 2nd is an improvement for TSC calibration on some platforms (that are currently hardcoded in a table-based way in kernel). Also, the patches are present upstream for a while and I couldn't find any fixes for them.

* An hypothetical regression from the 2nd patch could be in TSC precision calculation, which refinement itself might as well circumvent. From the first patch, a bug in code is the one hypothetical regression I could think.

CVE References

description: updated
Changed in linux (Ubuntu Xenial):
status: New → In Progress
importance: Undecided → High
Changed in linux (Ubuntu Bionic):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

SRU just submitted to kernel team mailing-list: https://lists.ubuntu.com/archives/kernel-team/2020-May/109698.html

Cheers,

Guilherme

Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
tags: added: verification-needed-xenial
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Verified by code inspection on Bionic (4.15.0-102) and Xenial (4.4.0-180). I don't have a system with Skylake available; there's an user that experienced this and I'm waiting on his test, as soon as he responds, I'll comment here. But marking as verifying anyway based on the code lookup, we need the patches in this cycle.

Cheers,

Guilherme

tags: added: verification-done-bionic verification-done-xenial
removed: verification-needed-bionic verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (18.8 KiB)

This bug was fixed in the package linux - 4.15.0-106.107

---------------
linux (4.15.0-106.107) bionic; urgency=medium

  * CVE-2020-0543
    - SAUCE: x86/cpu: Add a steppings field to struct x86_cpu_id
    - SAUCE: x86/cpu: Add 'table' argument to cpu_matches()
    - SAUCE: x86/speculation: Add Special Register Buffer Data Sampling (SRBDS)
      mitigation
    - SAUCE: x86/speculation: Add SRBDS vulnerability and mitigation documentation
    - SAUCE: x86/speculation: Add Ivy Bridge to affected list

linux (4.15.0-103.104) bionic; urgency=medium

  * bionic/linux: 4.15.0-103.104 -proposed tracker (LP: #1881272)

  * "BUG: unable to handle kernel paging request" when testing
    ubuntu_kvm_smoke_test.kvm_smoke_test with B-KVM in proposed (LP: #1881072)
    - KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobs
    - KVM: VMX: Mark RCX, RDX and RSI as clobbered in vmx_vcpu_run()'s asm blob

linux (4.15.0-102.103) bionic; urgency=medium

  * bionic/linux: 4.15.0-102.103 -proposed tracker (LP: #1878856)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * debian/scripts/file-downloader does not handle positive failures correctly
    (LP: #1878897)
    - [Packaging] file-downloader not handling positive failures correctly

  * Kernel log flood "ceph: Failed to find inode for 1" (LP: #1875884)
    - ceph: don't check quota for snap inode
    - ceph: quota: cache inode pointer in ceph_snap_realm

  * [UBUNTU 18.04] zpcictl --reset - contribution for kernel (LP: #1870320)
    - s390/pci: Recover handle in clp_set_pci_fn()
    - s390/pci: Fix possible deadlock in recover_store()

  * Bionic update: upstream stable patchset 2020-05-12 (LP: #1878256)
    - drm/edid: Fix off-by-one in DispID DTD pixel clock
    - drm/qxl: qxl_release leak in qxl_draw_dirty_fb()
    - drm/qxl: qxl_release leak in qxl_hw_surface_alloc()
    - drm/qxl: qxl_release use after free
    - btrfs: fix block group leak when removing fails
    - btrfs: fix partial loss of prealloc extent past i_size after fsync
    - mmc: sdhci-xenon: fix annoying 1.8V regulator warning
    - mmc: sdhci-pci: Fix eMMC driver strength for BYT-based controllers
    - ALSA: hda/realtek - Two front mics on a Lenovo ThinkCenter
    - ALSA: hda/hdmi: fix without unlocked before return
    - ALSA: pcm: oss: Place the plugin buffer overflow checks correctly
    - PM: ACPI: Output correct message on target power state
    - PM: hibernate: Freeze kernel threads in software_resume()
    - dm verity fec: fix hash block number in verity_fec_decode
    - RDMA/mlx5: Set GRH fields in query QP on RoCE
    - RDMA/mlx4: Initialize ib_spec on the stack
    - vfio: avoid possible overflow in vfio_iommu_type1_pin_pages
    - vfio/type1: Fix VA->PA translation for PFNMAP VMAs in vaddr_get_pfn()
    - iommu/qcom: Fix local_base status check
    - scsi: target/iblock: fix WRITE SAME zeroing
    - iommu/amd: Fix legacy interrupt remapping for x2APIC-enabled system
    - ALSA: opti9xx: shut up gcc-10 range warning
    - nfs: Fix potential posix_acl refcnt leak in nfs3_set_acl
    - dmaengine: dmatest: Fix iteration non-stop logic
    - selinux: properly handle multiple messages in ...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (31.3 KiB)

This bug was fixed in the package linux - 4.4.0-184.214

---------------
linux (4.4.0-184.214) xenial; urgency=medium

  * CVE-2020-0543
    - SAUCE: x86/cpu: Add a steppings field to struct x86_cpu_id
    - SAUCE: x86/cpu: Add 'table' argument to cpu_matches()
    - SAUCE: x86/speculation: Add Special Register Buffer Data Sampling (SRBDS)
      mitigation
    - SAUCE: x86/speculation: Add SRBDS vulnerability and mitigation documentation
    - SAUCE: x86/speculation: Add Ivy Bridge to affected list

linux (4.4.0-181.211) xenial; urgency=medium

  * xenial/linux: 4.4.0-181.211 -proposed tracker (LP: #1881170)

  * CVE-2020-12769
    - spi: spi-dw: Add lock protect dw_spi rx/tx to prevent concurrent calls

  * I2C bus on Dell Edge Gateway stops working after upgrading to
    Ubuntu-4.4.0-180.210 (LP: #1881124)
    - SAUCE: Revert: Revert "ACPI / LPSS: allow to use specific PM domain during
      ->probe()"

linux (4.4.0-180.210) xenial; urgency=medium

  * xenial/linux: 4.4.0-180.210 -proposed tracker (LP: #1878873)

  * Xenial update: 4.4.223 upstream stable release (LP: #1878232)
    - mwifiex: fix PCIe register information for 8997 chipset
    - drm/qxl: qxl_release use after free
    - drm/qxl: qxl_release leak in qxl_draw_dirty_fb()
    - staging: rtl8192u: Fix crash due to pointers being "confusing"
    - usb: gadget: f_acm: Fix configfs attr name
    - usb: gadged: pch_udc: get rid of redundant assignments
    - usb: gadget: pch_udc: reorder spin_[un]lock to avoid deadlock
    - usb: gadget: udc: core: don't starve DMA resources
    - MIPS: Fix macro typo
    - MIPS: ptrace: Drop cp0_tcstatus from regoffset_table[]
    - MIPS: BMIPS: Fix PRID_IMP_BMIPS5000 masking for BMIPS5200
    - MIPS: smp-cps: Stop printing EJTAG exceptions to UART
    - MIPS: scall: Handle seccomp filters which redirect syscalls
    - MIPS: BMIPS: BMIPS5000 has I cache filing from D cache
    - MIPS: BMIPS: Clear MIPS_CACHE_ALIASES earlier
    - MIPS: BMIPS: local_r4k___flush_cache_all needs to blast S-cache
    - MIPS: BMIPS: Pretty print BMIPS5200 processor name
    - MIPS: Fix HTW config on XPA kernel without LPA enabled
    - MIPS: BMIPS: Adjust mips-hpt-frequency for BCM7435
    - MIPS: math-emu: Fix BC1{EQ,NE}Z emulation
    - MIPS: Fix BC1{EQ,NE}Z return offset calculation
    - MIPS: perf: Fix I6400 event numbers
    - MIPS: KVM: Fix translation of MFC0 ErrCtl
    - MIPS: SMP: Update cpu_foreign_map on CPU disable
    - MIPS: c-r4k: Fix protected_writeback_scache_line for EVA
    - MIPS: Octeon: Off by one in octeon_irq_gpio_map()
    - bpf, mips: fix off-by-one in ctx offset allocation
    - MIPS: RM7000: Double locking bug in rm7k_tc_disable()
    - MIPS: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
    - mips/panic: replace smp_send_stop() with kdump friendly version in panic
      path
    - ARM: dts: armadillo800eva Correct extal1 frequency to 24 MHz
    - ARM: imx: select SRC for i.MX7
    - ARM: dts: kirkwood: gpio pin fixes for linkstation ls-wxl/wsxl
    - ARM: dts: kirkwood: gpio pin fixes for linkstation ls-wvl/vl
    - ARM: dts: kirkwood: gpio-leds fixes for linkstation ls-wxl/wsxl
    - ARM: dts: kirkwood: gpio-leds fixes for linkstation ls-wvl/v...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.