KVM: arm64: softlockups in stage2_apply_range

Bug #2056227 reported by Krister Johansen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
Jammy
Fix Released
Medium
Krister Johansen

Bug Description

[Impact]

Tearing down kvm VMs on arm64 can cause softlockups to appear on console. When
terminating VMs with > 100Gb of memory and 4k pages, the memory unmap times
often exceed 20 seconds, which can trigger the softlockup detector. Portions of
the unmap path also have interrupts disabled while tlb invalidation instructions
run, which can further contribute to latency problems. My team has observed
networking latency problems if the cpu where the teardown is occurring is also
mapped to handle a NIC interrupt.

Fortunately, a solution has been in place since Linux 6.1. A small pair of
patches modify stage2_apply_range to operate on smaller memory ranges before
performing a cond_resched. With these patches applied, softlockups are no
longer observed when tearing down VMs with large amounts of memory.

Although I also submitted the patches to 5.15 LTS (link to LTS submission in
"Backport" section), I'd appreciate it if Ubuntu were willing to take this
submission in parallel since the impact has left us unable to utilize arm64 for
kvm until we can either migrate our hypervisors to hugepages, pick up this fix,
or some combination of the two.

[Backport]

Backport the following fixes from linux 6.1:

3b5c082bbf KVM: arm64: Work out supported block level at compile time
5994bc9e05 KVM: arm64: Limit stage2_apply_range() batch size to largest block

The fix is in 5994bc9e05 and 3b5c082bbf is a dependency that was submitted as
part of the series. The original submission is here:

https://<email address hidden>/

I've also submitted the patches to 5.15 LTS here:

https://<email address hidden>/

Both fixes cherry picked cleanly and there were no conflicts.

[Test]

Executed the test from 5994bc9e05 as well as my own run of kvm_page_table_test
on a VM with 4k pages and a memory size > 100Gb. Without the patches,
softlockups were observed in both tests. With the patches applied, the tests
ran without incident.

This was tested against both LTS 5.15.150 and linux-aws-5.15.0-1055.

[Potential Regression]

Regression potential is low. These patches have been present in Linux since 6.1
and appear to have needed no further maintenance.

CVE References

Revision history for this message
Krister Johansen (kmjohansen) wrote :

This specifically affects Jammy and the 5.15 series. I have the necessary patches prepared and will e-mail those to the kernel team's mailing list.

Revision history for this message
Krister Johansen (kmjohansen) wrote :
Changed in linux (Ubuntu Jammy):
importance: Undecided → Medium
status: New → In Progress
Changed in linux (Ubuntu):
status: New → Invalid
Changed in linux (Ubuntu Jammy):
status: In Progress → Fix Committed
assignee: nobody → Krister Johansen (kmjohansen)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.15.0-104.114 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux' to 'verification-done-jammy-linux'. If the problem still exists, change the tag 'verification-needed-jammy-linux' to 'verification-failed-jammy-linux'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-v2 verification-needed-jammy-linux
Revision history for this message
Krister Johansen (kmjohansen) wrote :

For posterity, LTS 5.15 picked up this fix in 5.15.154

tags: added: verification-done-jammy-linux
removed: verification-needed-jammy-linux
Revision history for this message
Krister Johansen (kmjohansen) wrote :

I've tested linux/5.15.0-104.114 and it passes my tests. Marking verification-done-jammy-linux.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/5.15.0-1064.73 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-azure' to 'verification-done-jammy-linux-azure'. If the problem still exists, change the tag 'verification-needed-jammy-linux-azure' to 'verification-failed-jammy-linux-azure'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-azure-v2 verification-needed-jammy-linux-azure
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (30.9 KiB)

This bug was fixed in the package linux - 5.15.0-106.116

---------------
linux (5.15.0-106.116) jammy; urgency=medium

  * jammy/linux: 5.15.0-106.116 -proposed tracker (LP: #2061812)

  * CVE-2024-2201
    - x86/bugs: Use sysfs_emit()
    - KVM: x86: Update KVM-only leaf handling to allow for 100% KVM-only leafs
    - KVM: x86: Advertise CPUID.(EAX=7,ECX=2):EDX[5:0] to userspace
    - KVM: x86: Use a switch statement and macros in __feature_translate()
    - x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs file
    - x86/syscall: Don't force use of indirect calls for system calls
    - x86/bhi: Add support for clearing branch history at syscall entry
    - x86/bhi: Define SPEC_CTRL_BHI_DIS_S
    - x86/bhi: Enumerate Branch History Injection (BHI) bug
    - x86/bhi: Add BHI mitigation knob
    - x86/bhi: Mitigate KVM by default
    - KVM: x86: Add BHI_NO
    - [Config] Set CONFIG_BHI to enabled (auto)

  * Drop fips-checks script from trees (LP: #2055083)
    - [Packaging] Remove fips-checks script

  * alsa/realtek: adjust max output valume for headphone on 2 LG machines
    (LP: #2058573)
    - ALSA: hda/realtek: fix the hp playback volume issue for LG machines

  * A general-proteciton exception during guest migration to unsupported PKRU
    machine (LP: #2032164)
    - x86/fpu: Allow caller to constrain xfeatures when copying to uabi buffer
    - KVM: x86: Constrain guest-supported xfeatures only at KVM_GET_XSAVE{2}

  * [ICX] [SPR] [ipc/msg] performance: Mitigate the lock contention with percpu
    counter (LP: #2058485)
    - ipc: check checkpoint_restore_ns_capable() to modify C/R proc files
    - ipc/ipc_sysctl.c: remove fallback for !CONFIG_PROC_SYSCTL
    - ipc: Store mqueue sysctls in the ipc namespace
    - ipc: Store ipc sysctls in the ipc namespace
    - ipc: Use the same namespace to modify and validate
    - ipc: Remove extra1 field abuse to pass ipc namespace
    - ipc: Check permissions for checkpoint_restart sysctls at open time
    - percpu: add percpu_counter_add_local and percpu_counter_sub_local
    - ipc/msg: mitigate the lock contention with percpu counter

  * Jammy update: v5.15.149 upstream stable release (LP: #2059014)
    - ksmbd: free ppace array on error in parse_dacl
    - ksmbd: don't allow O_TRUNC open on read-only share
    - ksmbd: validate mech token in session setup
    - ksmbd: fix UAF issue in ksmbd_tcp_new_connection()
    - ksmbd: only v2 leases handle the directory
    - iio: adc: ad7091r: Set alert bit in config register
    - iio: adc: ad7091r: Allow users to configure device events
    - iio: adc: ad7091r: Enable internal vref if external vref is not supplied
    - dmaengine: fix NULL pointer in channel unregistration function
    - scsi: ufs: core: Simplify power management during async scan
    - scsi: ufs: core: Remove the ufshcd_hba_exit() call from ufshcd_async_scan()
    - iio:adc:ad7091r: Move exports into IIO_AD7091R namespace.
    - ext4: allow for the last group to be marked as trimmed
    - btrfs: sysfs: validate scrub_speed_max value
    - crypto: api - Disallow identical driver names
    - PM: hibernate: Enforce ordering during image compression/decompression
    - hwrng...

Changed in linux (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-gke/5.15.0-1058.63 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-gke' to 'verification-done-jammy-linux-gke'. If the problem still exists, change the tag 'verification-needed-jammy-linux-gke' to 'verification-failed-jammy-linux-gke'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-gke-v2 verification-needed-jammy-linux-gke
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-gcp/5.15.0-1059.67 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-gcp' to 'verification-done-jammy-linux-gcp'. If the problem still exists, change the tag 'verification-needed-jammy-linux-gcp' to 'verification-failed-jammy-linux-gcp'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-gcp-v2 verification-needed-jammy-linux-gcp
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-ibm/5.15.0-1054.57 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-ibm' to 'verification-done-jammy-linux-ibm'. If the problem still exists, change the tag 'verification-needed-jammy-linux-ibm' to 'verification-failed-jammy-linux-ibm'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-ibm-v2 verification-needed-jammy-linux-ibm
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws/5.15.0-1061.67 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-aws' to 'verification-done-jammy-linux-aws'. If the problem still exists, change the tag 'verification-needed-jammy-linux-aws' to 'verification-failed-jammy-linux-aws'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-aws-v2 verification-needed-jammy-linux-aws
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-intel-iotg/5.15.0-1056.62 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-intel-iotg' to 'verification-done-jammy-linux-intel-iotg'. If the problem still exists, change the tag 'verification-needed-jammy-linux-intel-iotg' to 'verification-failed-jammy-linux-intel-iotg'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-intel-iotg-v2 verification-needed-jammy-linux-intel-iotg
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-raspi/5.15.0-1054.57 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-raspi' to 'verification-done-jammy-linux-raspi'. If the problem still exists, change the tag 'verification-needed-jammy-linux-raspi' to 'verification-failed-jammy-linux-raspi'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-raspi-v2 verification-needed-jammy-linux-raspi
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-hwe-5.15/5.15.0-106.116~20.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal-linux-hwe-5.15' to 'verification-done-focal-linux-hwe-5.15'. If the problem still exists, change the tag 'verification-needed-focal-linux-hwe-5.15' to 'verification-failed-focal-linux-hwe-5.15'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-focal-linux-hwe-5.15-v2 verification-needed-focal-linux-hwe-5.15
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-intel-iotg-5.15/5.15.0-1056.62~20.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal-linux-intel-iotg-5.15' to 'verification-done-focal-linux-intel-iotg-5.15'. If the problem still exists, change the tag 'verification-needed-focal-linux-intel-iotg-5.15' to 'verification-failed-focal-linux-intel-iotg-5.15'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-focal-linux-intel-iotg-5.15-v2 verification-needed-focal-linux-intel-iotg-5.15
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-bluefield/5.15.0-1043.45 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-bluefield' to 'verification-done-jammy-linux-bluefield'. If the problem still exists, change the tag 'verification-needed-jammy-linux-bluefield' to 'verification-failed-jammy-linux-bluefield'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-bluefield-v2 verification-needed-jammy-linux-bluefield
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.