XSAVE consistency problem disabling avx on KVM guests

Bug #2019108 reported by gberche
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

On an openstack KVM guest VM the following kernel message is displayed: "XSAVE consistency problem: size 2560 != kernel_size 0" (see further traces collected through "ubuntu-bug linux" command)

As a result, the avx feature isn't enabled in the kernel, failing some programs that require it (e.g mongodb requires avx instruction set).

This problem reproduces on 23.04 Lunar kernel (6.2.0-20), 22.04 Jammy HWE kernel (5.19.0-35) but not on Jammy GA kernel (5.15.0-71.78).

I'm suspecting a kernel regression since v>5.15 that would be more strict with possibly QEMU emulated XSAVE CPUID features.

Related yet unanswered support question: https://askubuntu.com/questions/1467238/xsave-consistency-problem-preventing-avx-on-kvm-guests-running-jammy-hwe

This looks similar to the following kernel thread https://<email address hidden>/T/

Although I don't yet have access to the KVM hypervisor host, I'd like to know if there is a possible workaround by fixing KVM/QEMU configuration or by upgrading KVM/QEMU version.

Full diagnostics performed so far at https://github.com/orange-cloudfoundry/paas-templates/issues/1960#issuecomment-1534484042

ProblemType: Bug
DistroRelease: Ubuntu 23.04
Package: linux-image-6.2.0-20-generic 6.2.0-20.20
ProcVersionSignature: Ubuntu 6.2.0-20.20-generic 6.2.6
Uname: Linux 6.2.0-20-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 May 10 09:35 seq
 crw-rw---- 1 root audio 116, 33 May 10 09:35 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.26.1-0ubuntu2
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
CasperMD5CheckResult: unknown
Date: Wed May 10 12:31:18 2023
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lspci: Error: [Errno 2] No such file or directory: 'lspci'
Lspci-vt: Error: [Errno 2] No such file or directory: 'lspci'
Lsusb: Error: [Errno 2] No such file or directory: 'lsusb'
Lsusb-t: Error: [Errno 2] No such file or directory: 'lsusb'
Lsusb-v: Error: [Errno 2] No such file or directory: 'lsusb'
MachineType: OpenStack Foundation OpenStack Nova
PciMultimedia:

ProcFB: 0 cirrusdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.2.0-20-generic root=UUID=03a4e7a1-715a-4ceb-8fa6-d86a817a7092 ro vconsole.keymap=us net.ifnames=0 biosdevname=0 crashkernel=auto selinux=0 plymouth.enable=0 console=ttyS0,115200n8 earlyprintk=ttyS0 rootdelay=300 ipv6.disable=1 audit=1 cgroup_enable=memory swapaccount=1 systemd.unified_cgroup_hierarchy=false quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-6.2.0-20-generic N/A
 linux-backports-modules-6.2.0-20-generic N/A
 linux-firmware 20230323.gitbcdcfbcf-0ubuntu1
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/01/2014
dmi.bios.release: 0.0
dmi.bios.vendor: SeaBIOS
dmi.bios.version: rel-1.10.2-0-g5f4c7b1-20181220_000000-szxrtosci10000
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-2.8
dmi.modalias: dmi:bvnSeaBIOS:bvrrel-1.10.2-0-g5f4c7b1-20181220_000000-szxrtosci10000:bd04/01/2014:br0.0:svnOpenStackFoundation:pnOpenStackNova:pvr13.2.1-20220808115737_bd245dd:cvnQEMU:ct1:cvrpc-i440fx-2.8:sku:
dmi.product.family: Virtual Machine
dmi.product.name: OpenStack Nova
dmi.product.version: 13.2.1-20220808115737_bd245dd
dmi.sys.vendor: OpenStack Foundation
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 May 10 09:35 seq
 crw-rw---- 1 root audio 116, 33 May 10 09:35 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.26.1-0ubuntu2
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
CasperMD5CheckResult: unknown
DistroRelease: Ubuntu 23.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lspci: Error: [Errno 2] No such file or directory: 'lspci'
Lspci-vt: Error: [Errno 2] No such file or directory: 'lspci'
Lsusb: Error: [Errno 2] No such file or directory: 'lsusb'
Lsusb-t: Error: [Errno 2] No such file or directory: 'lsusb'
Lsusb-v: Error: [Errno 2] No such file or directory: 'lsusb'
MachineType: OpenStack Foundation OpenStack Nova
Package: linux (not installed)
PciMultimedia:

ProcFB: 0 cirrusdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.2.0-20-generic root=UUID=03a4e7a1-715a-4ceb-8fa6-d86a817a7092 ro vconsole.keymap=us net.ifnames=0 biosdevname=0 crashkernel=auto selinux=0 plymouth.enable=0 console=ttyS0,115200n8 earlyprintk=ttyS0 rootdelay=300 ipv6.disable=1 audit=1 cgroup_enable=memory swapaccount=1 systemd.unified_cgroup_hierarchy=false quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 6.2.0-20.20-generic 6.2.6
RelatedPackageVersions:
 linux-restricted-modules-6.2.0-20-generic N/A
 linux-backports-modules-6.2.0-20-generic N/A
 linux-firmware 20230323.gitbcdcfbcf-0ubuntu1
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: lunar
Uname: Linux 6.2.0-20-generic x86_64
UnreportableReason: This report is about a package that is not installed.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: False
dmi.bios.date: 04/01/2014
dmi.bios.release: 0.0
dmi.bios.vendor: SeaBIOS
dmi.bios.version: rel-1.10.2-0-g5f4c7b1-20181220_000000-szxrtosci10000
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-2.8
dmi.modalias: dmi:bvnSeaBIOS:bvrrel-1.10.2-0-g5f4c7b1-20181220_000000-szxrtosci10000:bd04/01/2014:br0.0:svnOpenStackFoundation:pnOpenStackNova:pvr13.2.1-20220808115737_bd245dd:cvnQEMU:ct1:cvrpc-i440fx-2.8:sku:
dmi.product.family: Virtual Machine
dmi.product.name: OpenStack Nova
dmi.product.version: 13.2.1-20220808115737_bd245dd
dmi.sys.vendor: OpenStack Foundation

Revision history for this message
gberche (guillaume-berche) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 2019108

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
gberche (guillaume-berche) wrote : Re: XSAVE consistency problem preventing avx on KVM guests

This looks similar to the following kernel thread https://<email address hidden>/T/
> Apparently "size 832 != kernel_size 0" so let the debugging continue...
[...]
> So we've actually found and fixed the issue, but XSAVE and therefore
automatically gnarly.
> https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=c3bd0b83ea5b7c0da6542687436042eeea1e7909
>
>There is no real hardware with XSAVEC but not XSAVES; the spec does try
>to distinguish the two, and it's useful for virt to offer XSAVEC without
>XSAVES.
>
>CPUID.0xd[1].ebx is spec'd as the total size for XSAVES of all current
>XCR0|XSS states. This is known dodgy already for native, as it leaks
>the current MSR_XSS setting into userspace.
>
>I had written the logic originally to hide this dynamic field if XSAVES
>wasn't enumerated, but Linux now uses it if XSAVEC is enumerated, to
>cross-check what it can see elsewhere in the CPUID state.

On a guest VM running the same hypervisor, we see the xsavec feature is present but not xsaves.

https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=c3bd0b83ea5b7c0da6542687436042eeea1e7909 mentions
> While the SDM isn't very clear about this, our present behavior make
> Linux 5.19 unhappy. As of commit 8ad7e8f69695 ("x86/fpu/xsave: Support
> XSAVEC in the kernel") they're using this CPUID output also to size
> the compacted area used by XSAVEC. Getting back zero there isn't really
> liked, yet for PV that's the default on capable hardware: XSAVES isn't
> exposed to PV domains.
>
> Considering that the size reported is that of the compacted save area,
> I view Linux'es assumption as appropriate (short of the SDM properly
> considering the case). Therefore we need to populate the field also when
> only XSAVEC is supported for a guest.

Revision history for this message
gberche (guillaume-berche) wrote : CurrentDmesg.txt

apport information

description: updated
tags: added: apport-collected
description: updated
Revision history for this message
gberche (guillaume-berche) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
gberche (guillaume-berche) wrote : ProcEnviron.txt

apport information

Revision history for this message
gberche (guillaume-berche) wrote : ProcInterrupts.txt

apport information

Revision history for this message
gberche (guillaume-berche) wrote : ProcModules.txt

apport information

Revision history for this message
gberche (guillaume-berche) wrote : UdevDb.txt

apport information

Revision history for this message
gberche (guillaume-berche) wrote : WifiSyslog.txt

apport information

Revision history for this message
gberche (guillaume-berche) wrote : acpidump.txt

apport information

Revision history for this message
gberche (guillaume-berche) wrote : Re: XSAVE consistency problem preventing avx on KVM guests

> If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'

I'm running a vm image optimized for use within cloudfoundry (see https://bosh.io/stemcells/#ubuntu-jammy) for Jammy 22.04 and then upgraded to Lunar 23.04. I'm suspecting this breaks some reports, althrough I have run again the `apport-collect 2019108` command

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
description: updated
summary: - XSAVE consistency problem preventing avx on KVM guests
+ XSAVE consistency problem disabling avx on KVM guests
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.