[Xenial] KVM trusty guest 3.13.0-68 raid6-pq panic in raid6_avx21_gen_syndrome() while probing grub devices [was: Xenial KVM: updating Trusty guest from 3.13.0-68 to 3.13.0-71 causes kernel exception]
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| linux (Ubuntu) |
High
|
Unassigned |
Bug Description
The symptom I saw was this (note the segfault, and apt-get upgrade hangs after this):
Setting up linux-image-
Running depmod.
update-initramfs: deferring update (hook will be called later)
Examining /etc/kernel/
run-parts: executing /etc/kernel/
run-parts: executing /etc/kernel/
update-initramfs: Generating /boot/initrd.
run-parts: executing /etc/kernel/
run-parts: executing /etc/kernel/
Generating grub configuration file ...
Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT is set is no longer supported.
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Segmentation fault
done
Setting up linux-firmware (1.127.19) ...
Setting up linux-image-
run-parts: executing /etc/kernel/
run-parts: executing /etc/kernel/
update-initramfs: Generating /boot/initrd.
run-parts: executing /etc/kernel/
run-parts: executing /etc/kernel/
Generating grub configuration file ...
Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT is set is no longer supported.
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
In dmesg, I saw a corresponding kernel stack trace:
[ 522.649091] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
[ 522.654031] JFS: nTxBlock = 8192, nTxLock = 65536
[ 522.660515] NTFS driver 2.1.30 [Flags: R/O MODULE].
[ 522.672519] QNX4 filesystem 0.2.3 registered.
[ 522.677257] xor: measuring software checksum speed
[ 522.715613] prefetch64-sse: 17306.000 MB/sec
[ 522.755589] generic_sse: 16039.000 MB/sec
[ 522.755590] xor: using function: prefetch64-sse (17306.000 MB/sec)
[ 522.823619] raid6: sse2x1 10481 MB/s
[ 522.891614] raid6: sse2x2 13303 MB/s
[ 522.959616] raid6: sse2x4 15209 MB/s
[ 522.963634] invalid opcode: 0000 [#1] SMP
[ 522.963645] Modules linked in: raid6_pq(+) xor ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables snd_hda_intel snd_hda_codec snd_hwdep qxl snd_pcm kvm_intel ttm snd_page_alloc kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drm_kms_helper snd_timer aesni_intel snd aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd drm soundcore lp parport serio_raw i2c_piix4 mac_hid pata_acpi floppy psmouse
[ 522.963746] CPU: 2 PID: 11288 Comm: modprobe Not tainted 3.13.0-68-generic #111-Ubuntu
[ 522.963751] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-
[ 522.963755] task: ffff880059363000 ti: ffff88005dec6000 task.ti: ffff88005dec6000
[ 522.963759] RIP: 0010:[<
[ 522.963767] RSP: 0018:ffff88005d
[ 522.963771] RAX: 0000000000000000 RBX: ffff88005dec7c88 RCX: ffff880059363000
[ 522.963774] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 0000000000000012
[ 522.963778] RBP: ffff88005dec7c70 R08: 0000000000000086 R09: 000000000000025f
[ 522.963781] R10: 0000000000000000 R11: ffff88005dec79ae R12: 0000000000001000
[ 522.963785] R13: ffff880043a42000 R14: ffff880043a43000 R15: 0000000000000012
[ 522.963789] FS: 00007fa33082374
[ 522.963793] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 522.963796] CR2: 00007fa330623000 CR3: 0000000036acf000 CR4: 00000000001006e0
[ 522.963801] Stack:
[ 522.963803] 0000000000000080 ffffffffa049e238 ffffffffa04b0720 ffff880043a42000
[ 522.963810] 0000000000003cd7 000000010000d992 ffff88005dec7d40 ffffffffa00d20fb
[ 522.963817] 0000000000000000 ffffffffa04a0600 ffffffffa04a1600 ffffffffa04a2600
[ 522.963824] Call Trace:
[ 522.963838] [<ffffffffa00d2
[ 522.963843] [<ffffffffa00d2
[ 522.963849] [<ffffffff81002
[ 522.963854] [<ffffffff81059
[ 522.963859] [<ffffffff810e2
[ 522.963863] [<ffffffff810de
[ 522.963868] [<ffffffff810e3
[ 522.963873] [<ffffffff81734
[ 522.963876] Code: 00 00 00 00 53 48 89 d3 48 83 ec 08 48 89 75 d0 4c 8b 2c c2 4c 8b 74 32 08 e8 13 f9 b7 e0 84 c0 0f 84 f1 00 00 00 e8 c6 f9 b7 e0 <c5> fd 6f 05 ee 2a 01 00 c5 e5 ef db 4d 85 e4 0f 84 c0 00 00 00
[ 522.963940] RIP [<ffffffffa049d
[ 522.963946] RSP <ffff88005dec7c40>
[ 522.963949] ---[ end trace 7324d498bc862f81 ]---
ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-
ProcVersionSign
Uname: Linux 3.13.0-68-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version k3.13.0-68-generic.
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.14.1-0ubuntu3.19
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/
CRDA: Error: [Errno 2] No such file or directory: 'iw'
Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer'
Card0.Amixer.
Date: Tue Dec 8 13:08:13 2015
HibernationDevice: RESUME=
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
ProcEnviron:
TERM=xterm-
PATH=(custom, no user)
XDG_RUNTIME_
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB: 0 qxldrmfb
ProcKernelCmdLine: BOOT_IMAGE=
RelatedPackageV
linux-
linux-
linux-firmware 1.127.19
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/01/2014
dmi.bios.vendor: SeaBIOS
dmi.bios.version: Ubuntu-
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.
dmi.modalias: dmi:bvnSeaBIOS:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.
dmi.sys.vendor: QEMU
Mike Pontillo (mpontillo) wrote : | #1 |
summary: |
- Xenial KVM: updating guest from 3.13.0-68 to 3.13.0-71 causes kernel - exception + Xenial KVM: updating Trusty guest from 3.13.0-68 to 3.13.0-71 causes + kernel exception |
tags: | added: kernel-key |
Mike Pontillo (mpontillo) wrote : Re: Xenial KVM: updating Trusty guest from 3.13.0-68 to 3.13.0-71 causes kernel exception | #2 |
Mike Pontillo (mpontillo) wrote : | #3 |
Also, the hypervisor's /proc/cpuinfo reports the following CPU configuration (8 cores of the same):
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz
stepping : 3
microcode : 0x1e
cpu MHz : 2500.097
cache size : 6144 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs :
bogomips : 4988.49
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
And the hypervisor's kernel is version 4.2.0-19-generic.
This change was made by a bot.
Changed in linux (Ubuntu): | |
status: | New → Confirmed |
Mike Pontillo (mpontillo) wrote : Re: Xenial KVM: updating Trusty guest from 3.13.0-68 to 3.13.0-71 causes kernel exception | #5 |
I just tried again with two logical CPUs and the core set to "Westmere" (I think that was the default), and didn't see the bug. So this seems related specifically to "[x] Copy host CPU configuration" being checked. Here's the /proc/cpuinfo from the "virtual Westmere" where it works:
Changed in linux (Ubuntu): | |
importance: | Undecided → High |
So the host exposing some of the physical cpu characteristics is triggering use of this specific syndrome generator (which makes sense as it is h/w specific) raid6_avx21_
summary: |
- Xenial KVM: updating Trusty guest from 3.13.0-68 to 3.13.0-71 causes - kernel exception + [trusty] 3.13.0-68 raid6-pq panic in raid6_avx21_gen_syndrome() while + probing grub devices (was: Xenial KVM: updating Trusty guest from + 3.13.0-68 to 3.13.0-71 causes kernel exception) |
Andy Whitcroft (apw) wrote : | #7 |
I will note that this has thrown an illegal instruction trap while doing this:
[ 522.963634] invalid opcode: 0000 [#1] SMP
So it is likely that dispite offering up the avx2 flags to the guest the host is not actually providing the instructions?
summary: |
- [trusty] 3.13.0-68 raid6-pq panic in raid6_avx21_gen_syndrome() while - probing grub devices (was: Xenial KVM: updating Trusty guest from - 3.13.0-68 to 3.13.0-71 causes kernel exception) + [Xenial] KVM trusty guest 3.13.0-68 raid6-pq panic in + raid6_avx21_gen_syndrome() while probing grub devices [was: Xenial KVM: + updating Trusty guest from 3.13.0-68 to 3.13.0-71 causes kernel + exception] |
Chris J Arges (arges) wrote : | #8 |
Ok I attempted to reproduce without success. What I tried:
- Xenial on Xenial on machine/guest with avx2
- Trusty 3.13.0-{68,71} on Xenial on machine/guest with avx2
In both of these instances I modprobed raid6_pq, in the trusty instance I triggered an upgrade from 68 to 71 and neither of these caused a segfault.
The following information would be useful in debugging this:
1) The differences between the guests /proc/cpuinfo with it reproducing and not-reproducing. This can verify the avx2 bit is to blame.
2) Can you simply 'sudo modprobe -r raid6_pq && sudo modprobe raid6_pq' to trigger this issue?
3) Can you test with a xenial guest and a similar configuration to see if this still triggers the issue?
4) If you have the ability testing with different versions of hypervisor (trusty vs xenial) might also be useful.
Thanks,
--chris j arges
Changed in linux (Ubuntu): | |
assignee: | nobody → Chris J Arges (arges) |
Mike Pontillo (mpontillo) wrote : | #9 |
On my Trusty VM with "[x] Copy host CPU configuration" checked in virt-manager, running 'sudo modprobe -r raid6_pq && sudo modprobe raid6_pq' is enough to reproduce the kernel exception in dmesg (though I didn't see it print "Segmentation Fault", so that may be somewhat of a red herring).
Here's /proc/cpuinfo on the guest:
http://
And here's /proc/cpuinfo on the hypervisor host:
http://
I have a separate issue (which I have not filed, because I haven't triaged, and am not sure I can reproduce yet) that was blocking me from creating a Xenial guest, but I can try again shortly.
Launchpad Janitor (janitor) wrote : | #10 |
Status changed to 'Confirmed' because the bug affects multiple users.
Changed in kvm (Ubuntu): | |
status: | New → Confirmed |
Serge Victor (ser) wrote : | #11 |
It affects me, trying to bootstrap xenial vm on xenial, getting OOPS like above.
# virt-install --name $SHORT \
--ram $MEM \
--vcpus $CPUS \
--location=http://
--disk path=$DISK \
--autostart --nographics \
Serge Victor (ser) wrote : | #12 |
To be honest, it's pretty annoying bug, as I am not able to install any xenial VM :-(
Mike Pontillo (mpontillo) wrote : | #13 |
@Serge, A workaround for me is to use "hypervisor default" in virt-manager. I'm not sure what the equivalent is in virt-install, but maybe using --cpu=host-
From the man page:
of the host CPUs features (better performance), but may cause issues if migrating the guest to a
host without an identical CPU.
--cpu host-model-only
used for a guest on any of the hosts.
Use --cpu=? to see a list of all available sub options. Complete details at
<http://
Serge Victor (ser) wrote : | #14 |
The bug persists in kernel 4.3.0-2-generic #11-Ubuntu.
@Mike - thanks for a workaround, it works, indeed :-)
--cpu=host-
for virt-manager.
William Grant (wgrant) wrote : | #15 |
If you need avx2 support, --cpu Haswell-
The most confusing problem is that qemu's definition of "Haswell" is actually Haswell-E, -EP and -EX; Haswell itself lacks x2apic, which qemu's Haswell requires. x2apic dates back to Nehalem, but qemu's CPU definitions only include it back to Sandy Bridge, so a standard desktop or laptop Haswell CPU falls all the back to Westmere and then adds flags including avx, avx2 and xsave on top of that[0].
Advertising support for AVX and AVX2 is just a matter of setting CPUID.1:ECX.AVX and CPUID.7:EBX.AVX2, but the instructions won't actually work unless XCR0.AVX is set, and kvm_load_guest_xcr0 only sets XCR0 from the guest if the guest's XR4.OSXSAVE is set. The guest's fpu__init_
The bug is probably that raid6_have_avx2 only checks that AVX2 is supported in CPUID, not that it's enabled. Checking for X86_FEATURE_OSXSAVE might work, though I'm not sure if the value checked by boot_cpu_has is stored too early for that.
I am also a little suspicious of kvm_load_
[0] "-cpu Westmere,
William Grant (wgrant) wrote : | #16 |
One could argue that libvirt should exclude x2apic from the host-model checks, as it's emulated by qemu whether or not the host supports it.
no longer affects: | kvm (Ubuntu) |
tags: |
added: kernel-da-key removed: kernel-key |
Dr. Jens Harbott (j-harbott) wrote : | #17 |
I am seeing the same issue on some of my OpenStack compute nodes, interestingly those which seem to have a newer CPU than others.
Affected CPU: Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
Mapped in guest to: Intel Core i7 9xx (Nehalem Class Core i7)
Unaffected Host CPU: Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
Mapped in guest to: Intel Xeon E312xx (Sandy Bridge)
Booting a Xenial guest on the affected host crashes during boot, see http://
Both hosts are running nova-compute from Wily. Please let me know if you need any further details.
Blake Rouse (blake-rouse) wrote : | #18 |
Just hit this same issue with nova-compute on Xenial and creating a Xenial instance in nova. I worked around the issue with:
juju set-config nova-compute cpu-mode=
Dmitry Sutyagin (dsutyagin) wrote : | #19 |
Hit this issue when booting Ubuntu Trusty VM on an Ubuntu host, worked around via editing VM XML by adding this in <cpu> section:
<feature policy='disable' name='avx2'/>
This + destroy->start the VM and it booted fine.
Changed in linux (Ubuntu): | |
assignee: | Chris J Arges (arges) → nobody |
shane (sygibson) wrote : | #20 |
I've hit this bug as well, OpenStack Nova Compute 2:12.0.
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
(NOTE "avx2")
CPU model is Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Guest running Ubuntu Xenial (16.04.1) - started up with the following arguments (note again "avx2"):
libvirt+ 7221 1 5 14:22 ? 00:01:14 qemu-system-x86_64 -enable-kvm -name instance-00000002 -S -machine pc-i440fx-
Symptom: VM Crashes on boot up with similar stack trace as documented here.
For what it's worth, unchecking "[ ] Copy host CPU configuration" in virt-manager and selecting "Hypervisor default" for the CPU is a workaround. (/proc/cpuinfo reports "QEMU Virtual CPU version 2.4.0")