Hard lockups due to unrestricted lapic timer delay

Bug #1817918 reported by Guilherme G. Piccoli
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Guilherme G. Piccoli
Xenial
Fix Released
High
Guilherme G. Piccoli
Bionic
Fix Released
Low
Guilherme G. Piccoli

Bug Description

[Impact]

* There is a long-time report of an issue with the TSC delay present
in wait_lapic_expire() - basically the guest could have an expiration
timer configured in a way it induces host to wait a long time (with
preemption disabled), so there's a potential scenario for host lockups.

* The stack trace we have access (from an user report of this issue)
is (summarized) below:

NMI watchdog: Watchdog detected hard LOCKUP on cpu 16
[...]
CPU: 16 PID: 3024910 Comm: CPU 0/KVM Not tainted 4.4.0-139-generic #165-Ubuntu
RIP: 0010:[<addr>] [<addr>] delay_tsc+0x20/0x60
[...]
 __delay+0x15/0x20
wait_lapic_expire+0xc3/0x150 [kvm]
vcpu_enter_guest+0x743/0x11d0 [kvm]
kvm_arch_vcpu_ioctl_run+0xe6/0x410 [kvm]
kvm_vcpu_ioctl+0x33d/0x620 [kvm]
do_vfs_ioctl+0x2af/0x4b0
? __do_page_fault+0x1c1/0x410
? fire_user_return_notifiers+0x3e/0x50
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x22/0xc1

This matches the reported problem in the KVM mailing-list: https://marc.info/?l=kvm&m=146374488028339

* A fix was proposed in the above thread, but discarded in favor of the
following approach: https://marc.info/?l=kvm&m=146647260109315
The patch was merged in Linus tree, hence we hereby request the SRU:
b606f189c7d5 ("KVM: LAPIC: cap __delay at lapic_timer_advance_ns").
There's one additional patch needed, which is just the header adjustment
for exporting a necessary function.

* The patch is missing only in 4.4 kernel series; Bionic (4.15) and the other newer releases have the patch already.

[Test Case]

* Unfortunately this is a hard to reproduce issue; we have reports of
this lockup from an user, hence the SRU request here.
Also, the patch was introduced originally in kernel 4.7, approx. 2.5 years
ago. So, we are confident that community is running this code long enough
without errors reported. Also, checked in the Linus tree and no fixes
for this code were introduced since kernel 4.7.

[Regression Potential]

* The code modification requested here affects the amount of delay in
a specific timer; the patch introduces a maximum time for delay, preventing unbounded delays in host.
The regression potential is considered low, and given the nature of the
modification, latency issues in guests are likely to be the most problematic regression potential we have.

Changed in linux (Ubuntu Bionic):
status: New → Fix Released
importance: Undecided → Low
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Changed in linux (Ubuntu Xenial):
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
description: updated
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

SRU request sent to the kernel team mailing list: https://lists.ubuntu.com/archives/kernel-team/2019-February/098872.html

Changed in linux (Ubuntu Xenial):
status: Confirmed → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Tested kernel in -proposed, version 4.4.0-144-generic (in Xenial). Problem does not reproduce anymore.

I've used tscdeadline_delay test wrote by Thadeu Cascardo as a kvm-unit-tests "module" - this specific test is not upstream yet, Cascardo wrote for this LP in particular and is finishing/polishing before submission. As soon as it gets merged, I'll point the link here for reference.

In summary, I ran the test against 4.4.0-143, which gave me this output:

$ ./x86/run x86/tscdeadline_delay.flat -smp 1 -cpu host,+tsc-deadline

/usr/bin/qemu-system-x86_64 -nodefaults -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -machine accel=kvm -kernel x86/tscdeadline_delay.flat -smp 1 -cpu host,+tsc-deadline # -initrd /tmp/tmp.bDXlFhhhBJ

enabling apic
paging enabled
cr0 = 80010011
cr3 = 458000
cr4 = 20
apic version: 50014
PASS: apic existence
tsc deadline timer enabled
FAIL: delta: 4294968162

SUMMARY: 2 tests, 1 unexpected failures

After that, I've rebooted the machine and tested against 4.4.0-144:

$ ./x86/run x86/tscdeadline_delay.flat -smp 1 -cpu host,+tsc-deadline

/usr/bin/qemu-system-x86_64 -nodefaults -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -machine accel=kvm -kernel x86/tscdeadline_delay.flat -smp 1 -cpu host,+tsc-deadline # -initrd /tmp/tmp.qQvOapKgzp

enabling apic
paging enabled
cr0 = 80010011
cr3 = 458000
cr4 = 20
apic version: 50014
PASS: apic existence
tsc deadline timer enabled
PASS: no large delta
SUMMARY: 2 tests

tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (26.1 KiB)

This bug was fixed in the package linux - 4.4.0-145.171

---------------
linux (4.4.0-145.171) xenial; urgency=medium

  * linux: 4.4.0-145.171 -proposed tracker (LP: #1821724)

  * linux-generic should depend on linux-base >=4.1 (LP: #1820419)
    - [Packaging] Fix linux-base dependency

linux (4.4.0-144.170) xenial; urgency=medium

  * linux: 4.4.0-144.170 -proposed tracker (LP: #1819660)

  * Packaging resync (LP: #1786013)
    - [Packaging] resync getabis
    - [Packaging] update helper scripts
    - [Packaging] resync retpoline extraction

  * C++ demangling support missing from perf (LP: #1396654)
    - [Packaging] fix a mistype

  * CVE-2019-9213
    - mm: enforce min addr even if capable() in expand_downwards()

  * CVE-2019-3460
    - Bluetooth: Check L2CAP option sizes returned from l2cap_get_conf_opt

  * Xenial update: 4.4.176 upstream stable release (LP: #1818815)
    - net: fix IPv6 prefix route residue
    - vsock: cope with memory allocation failure at socket creation time
    - hwmon: (lm80) Fix missing unlock on error in set_fan_div()
    - net: Fix for_each_netdev_feature on Big endian
    - net: Add header for usage of fls64()
    - tcp: tcp_v4_err() should be more careful
    - net: Do not allocate page fragments that are not skb aligned
    - tcp: clear icsk_backoff in tcp_write_queue_purge()
    - vxlan: test dev->flags & IFF_UP before calling netif_rx()
    - net: stmmac: Fix a race in EEE enable callback
    - net: ipv4: use a dedicated counter for icmp_v4 redirect packets
    - x86: livepatch: Treat R_X86_64_PLT32 as R_X86_64_PC32
    - mfd: as3722: Handle interrupts on suspend
    - mfd: as3722: Mark PM functions as __maybe_unused
    - net/x25: do not hold the cpu too long in x25_new_lci()
    - mISDN: fix a race in dev_expire_timer()
    - ax25: fix possible use-after-free
    - Linux 4.4.176

  * sky2 ethernet card don't work after returning from suspension
    (LP: #1798921) // Xenial update: 4.4.176 upstream stable release
    (LP: #1818815)
    - sky2: Increase D3 delay again

  * Xenial update: 4.4.175 upstream stable release (LP: #1818813)
    - drm/bufs: Fix Spectre v1 vulnerability
    - staging: iio: adc: ad7280a: handle error from __ad7280_read32()
    - ASoC: Intel: mrfld: fix uninitialized variable access
    - scsi: lpfc: Correct LCB RJT handling
    - ARM: 8808/1: kexec:offline panic_smp_self_stop CPU
    - dlm: Don't swamp the CPU with callbacks queued during recovery
    - x86/PCI: Fix Broadcom CNB20LE unintended sign extension (redux)
    - powerpc/pseries: add of_node_put() in dlpar_detach_node()
    - serial: fsl_lpuart: clear parity enable bit when disable parity
    - ptp: check gettime64 return code in PTP_SYS_OFFSET ioctl
    - staging:iio:ad2s90: Make probe handle spi_setup failure
    - staging: iio: ad7780: update voltage on read
    - ARM: OMAP2+: hwmod: Fix some section annotations
    - modpost: validate symbol names also in find_elf_symbol
    - perf tools: Add Hygon Dhyana support
    - soc/tegra: Don't leak device tree node reference
    - f2fs: move dir data flush to write checkpoint process
    - f2fs: fix wrong return value of f2fs_acl_create
    - sunvdc: Do not spin in an infin...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.