kvm: Running perf against qemu processes results in page fault inside guest

Bug #2054218 reported by Matthew Ruffell
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Mantic
Fix Released
Medium
Matthew Ruffell

Bug Description

BugLink: https://bugs.launchpad.net/bugs/2054218

[Impact]

Running perf against a QEMU/kvm process results in the guest suffering a page fault in trying to store Precise Event Based Sampling (PEBS) records for the host. This affects both using perf against a single process, in which it crashes the targeted guest, or using perf system wide, in which it crashes all running guests on the system.

The issue was introduced in 6.0 by:

commit c59a1f106f5cd4843c097069ff1bb2ad72103a67
Author: Like Xu <email address hidden>
Date: Mon Apr 11 18:19:36 2022 +0800
Subject: KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c59a1f106f5cd4843c097069ff1bb2ad72103a67

This affects all 6.2 and 6.5 kernels. There is no known workaround, apart from not using perf on affected systems.

[Fix]

The issue was fixed in 6.7 by:

commit 971079464001c6856186ca137778e534d983174a
Author: Paolo Bonzini <email address hidden>
Date: Thu Jan 4 16:15:17 2024 +0100
Subject: KVM: x86/pmu: fix masking logic for MSR_CORE_PERF_GLOBAL_CTRL
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=971079464001c6856186ca137778e534d983174a

This reinstates the logic for setting MSR_CORE_PERF_GLOBAL_CTRL to what it was before "KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS".

- .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
+ .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask,

The faulty logic includes any bit that isn't both marked as exclude_guest and using PEBS, while it should really be excluding PEBS from the host.

[Testcase]

Start a bare metal server. Enable KVM, start a few VMs. The VMs can be idle, they don't require any workload.

$ sudo apt-get install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils uvtool
$ sudo reboot
$ ssh-keygen
$ uvt-simplestreams-libvirt sync --source http://cloud-images.ubuntu.com/daily release=jammy arch=amd64
$ uvt-kvm create --cpu 4 --memory 4096 --disk 10 jammy-a release=jammy arch=amd64
$ uvt-kvm create --cpu 4 --memory 4096 --disk 10 jammy-b release=jammy arch=amd64
$ uvt-kvm create --cpu 4 --memory 4096 --disk 10 jammy-c release=jammy arch=amd64
$ virsh list
 Id Name State
-------------------------
 2 jammy-a running
 3 jammy-b running
 4 jammy-c running
$ uvt-kvm ssh jammy-a
Check it works.
$ ps aux | grep qemu
Find the pid of jammy-a
$ perf top -p $PID
$ virsh console jammy-a
Escape character is ^] (Ctrl + ])
[ 357.793039] BUG: unable to handle page fault for address: fffffe49178c6028
$ uvt-kvm ssh jammy-a
(no response)

Test packages are available in the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/sf379502-test

If you install it, then running perf against the PID of qemu processes will no longer crash the guest, and they will be accessible by SSH afterward.

[Where problems could occur]

We are rearranging the logic of setting the PEBS MSRs, which affects processor sampling of events. This will affect any profiling tools running against KVM based virtual machines, namely perf against QEMU.

If a regression were to occur, running perf against a VM could cause it to page fault and subsequently crash, resulting in downtime.

The only workaround will be to disable all profiling tools until a fix is available.

Changed in linux (Ubuntu Mantic):
status: New → In Progress
Changed in linux (Ubuntu):
status: New → Fix Released
Changed in linux (Ubuntu Mantic):
importance: Undecided → Medium
assignee: nobody → Matthew Ruffell (mruffell)
description: updated
tags: added: mantic sts
Revision history for this message
Matthew Ruffell (mruffell) wrote :
Stefan Bader (smb)
Changed in linux (Ubuntu Mantic):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/6.5.0-27.28 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-mantic-linux' to 'verification-done-mantic-linux'. If the problem still exists, change the tag 'verification-needed-mantic-linux' to 'verification-failed-mantic-linux'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-mantic-linux-v2 verification-needed-mantic-linux
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Performing verification for mantic.

I deployed mantic onto a bare metal server, with kernel 6.5.0-26-generic from
-updates.

I installed a KVM stack, synced a cloud image, and tested VM creation.

$ uvt-kvm create --cpu 4 --memory 4096 --disk 10 jammy-a release=jammy arch=amd64
$ uvt-kvm ssh jammy-a
Welcome to Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-101-generic x86_64)

From there, I found the PID of the VM:

$ ps aux | grep qemu
libvirt+ 1799 107 1.5 9642380 1044752 ? Sl 03:21 0:38 /usr/bin/qemu-system-x86_64 -name guest=jammy-a,debug-threads=on -S -object {"qom-ty

We run perf:

$ sudo perf top -p 1799

$ virsh console jammy-a
Connected to domain 'jammy-a'
Escape character is ^] (Ctrl + ])
[ 161.413890] BUG: unable to handle page fault for address: fffffe18cdb1c028
[ 161.419474] #PF: supervisor read access in kernel mode
[ 161.423707] #PF: error_code(0x0000) - not-present page
[ 161.427949] PGD 17ffca0[ 161.429508] BUG: unable to handle page fault for address: fffffe18cdb1c028
[ 161.429513] #PF: supervisor read access in kernel mode
[ 161.429516] #PF: error_code(0x0000) - n

The VM suffers a page fault and crashes, and is not accessible over ssh.

$ uvt-kvm ssh jammy-a
(no response).

I then enabled -proposed, and installed 6.5.0-27-generic:

$ uname -rv
6.5.0-27-generic #28-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar 7 18:21:00 UTC 2024

$ uvt-kvm create --cpu 4 --memory 4096 --disk 10 jammy-a release=jammy arch=amd64
$ uvt-kvm ssh jammy-a
Welcome to Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-101-generic x86_64)

We get the pid of the VM:

$ ps aux | grep qemu
libvirt+ 1786 42.5 1.5 7960876 1022308 ? Sl 03:37 0:40 /usr/bin/qemu-system-x86_64 -name guest=jammy-a,debug-threads=on -S -object {"qom-ty

We run perf:

$ sudo perf top -p 1786

This time, the VM does not crash, and stays running:

$ virsh console jammy-a
Connected to domain 'jammy-a'
Escape character is ^] (Ctrl + ])

jammy-a login:
jammy-a login:

We can also ssh to the VM just fine:

$ uvt-kvm ssh jammy-a
Welcome to Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-101-generic x86_64)

This is much better. The kernel in -proposed fixes the issue, happy to mark verified for mantic.

tags: added: verification-done-mantic-linux
removed: verification-needed-mantic-linux
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (41.0 KiB)

This bug was fixed in the package linux - 6.5.0-27.28

---------------
linux (6.5.0-27.28) mantic; urgency=medium

  * mantic/linux: 6.5.0-27.28 -proposed tracker (LP: #2055584)

  * Packaging resync (LP: #1786013)
    - [Packaging] drop ABI data
    - [Packaging] update annotations scripts
    - debian.master/dkms-versions -- update from kernel-versions (main/2024.03.04)

  * CVE-2024-26597
    - net: qualcomm: rmnet: fix global oob in rmnet_policy

  * CVE-2024-26599
    - pwm: Fix out-of-bounds access in of_pwm_single_xlate()

  * Drop ABI checks from kernel build (LP: #2055686)
    - [Packaging] Remove in-tree abi checks

  * Cranky update-dkms-versions rollout (LP: #2055685)
    - [Packaging] remove update-dkms-versions
    - Move debian/dkms-versions to debian.master/dkms-versions
    - [Packaging] Replace debian/dkms-versions with $(DEBIAN)/dkms-versions

  * linux: please move erofs.ko (CONFIG_EROFS for EROFS support) from linux-
    modules-extra to linux-modules (LP: #2054809)
    - UBUNTU [Packaging]: Include erofs in linux-modules instead of linux-modules-
      extra

  * performance: Scheduler: ratelimit updating of load_avg (LP: #2053251)
    - sched/fair: Ratelimit update to tg->load_avg

  * IB peer memory feature regressed in 6.5 (LP: #2055082)
    - SAUCE: RDMA/core: Introduce peer memory interface

  * linux-tools-common: man page of usbip[d] is misplaced (LP: #2054094)
    - [Packaging] rules: Put usbip manpages in the correct directory

  * CVE-2024-23851
    - dm: limit the number of targets and parameter size area

  * CVE-2024-23850
    - btrfs: do not ASSERT() if the newly created subvolume already got read

  * x86: performance: tsc: Extend watchdog check exemption to 4-Sockets platform
    (LP: #2054699)
    - x86/tsc: Extend watchdog check exemption to 4-Sockets platform

  * linux: please move dmi-sysfs.ko (CONFIG_DMI_SYSFS for SMBIOS support) from
    linux-modules-extra to linux-modules (LP: #2045561)
    - [Packaging] Move dmi-sysfs.ko into linux-modules

  * Fix AMD brightness issue on AUO panel (LP: #2054773)
    - drm/amdgpu: make damage clips support configurable

  * Mantic update: upstream stable patchset 2024-02-28 (LP: #2055199)
    - f2fs: explicitly null-terminate the xattr list
    - pinctrl: lochnagar: Don't build on MIPS
    - ALSA: hda - Fix speaker and headset mic pin config for CHUWI CoreBook XPro
    - mptcp: fix uninit-value in mptcp_incoming_options
    - wifi: cfg80211: lock wiphy mutex for rfkill poll
    - wifi: avoid offset calculation on NULL pointer
    - wifi: mac80211: handle 320 MHz in ieee80211_ht_cap_ie_to_sta_ht_cap
    - debugfs: fix automount d_fsdata usage
    - nvme-core: fix a memory leak in nvme_ns_info_from_identify()
    - drm/amd/display: update dcn315 lpddr pstate latency
    - drm/amdgpu: Fix cat debugfs amdgpu_regs_didt causes kernel null pointer
    - smb: client, common: fix fortify warnings
    - blk-mq: don't count completed flush data request as inflight in case of
      quiesce
    - nvme-core: check for too small lba shift
    - hwtracing: hisi_ptt: Handle the interrupt in hardirq context
    - hwtracing: hisi_ptt: Don't try to attach a task
    - ASoC: wm8974:...

Changed in linux (Ubuntu Mantic):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-raspi/6.5.0-1014.17 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-mantic-linux-raspi' to 'verification-done-mantic-linux-raspi'. If the problem still exists, change the tag 'verification-needed-mantic-linux-raspi' to 'verification-failed-mantic-linux-raspi'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-mantic-linux-raspi-v2 verification-needed-mantic-linux-raspi
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.