[regression] Xenial host with 4.4.0-108.131 fails to power some KVM guests

Bug #1742286 reported by Simon Déziel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Critical
Joseph Salisbury
Xenial
Confirmed
Critical
Joseph Salisbury

Bug Description

I have an (old) hypervisor with an Intel Xeon E3110 (Core 2 Duo E8400 rebranded) that is unable to boot 2 of my 17 KVM guests. The guests that boot fine are all Xenial (running 4.4.0-104.127). The 2 that do not work are: a Trusty VM (3.13.0-137.186) and an OpenBSD 6.2 VM (latest kernel).

With previous kernels, the host used those boot arguments:

  hugepages=3008 kaslr nmi_watchdog=0 possible_cpus=2 transparent_hugepage=never vsyscall=none

Now with 4.4.0-108.131, to be able to boot the Trusty and OpenBSD VMs, I had to drop most boot args to only have this:

  hugepages=3008

I will try to find which arg(s) is responsible for the guest boot problem.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-108-generic 4.4.0-108.131
ProcVersionSignature: Ubuntu 4.4.0-108.131-generic 4.4.98
Uname: Linux 4.4.0-108-generic x86_64
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl
AlsaDevices:
 total 0
 crw-rw----+ 1 root audio 116, 1 Jan 9 15:28 seq
 crw-rw----+ 1 root audio 116, 33 Jan 9 15:28 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.1-0ubuntu2.15
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: [Errno 2] No such file or directory: 'fuser'
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Date: Tue Jan 9 15:54:18 2018
HibernationDevice: RESUME=UUID=36924004-26f8-48e3-abe3-41be311bd7af
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: System manufacturer P5E-VM HDMI
PciMultimedia:

ProcEnviron:
 LANGUAGE=en_CA:en
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_CA.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-108-generic root=UUID=40a302c9-82d9-4574-ac2f-efcf5c9d126f ro hugepages=3008
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-108-generic N/A
 linux-backports-modules-4.4.0-108-generic N/A
 linux-firmware 1.157.14
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 03/26/2010
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 0709
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: P5E-VM HDMI
dmi.board.vendor: ASUSTeK Computer INC.
dmi.board.version: Rev 1.xx
dmi.chassis.asset.tag: Asset-1234567890
dmi.chassis.type: 3
dmi.chassis.vendor: Chassis Manufacture
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr0709:bd03/26/2010:svnSystemmanufacturer:pnP5E-VMHDMI:pvrSystemVersion:rvnASUSTeKComputerINC.:rnP5E-VMHDMI:rvrRev1.xx:cvnChassisManufacture:ct3:cvrChassisVersion:
dmi.product.name: P5E-VM HDMI
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer

Revision history for this message
Simon Déziel (sdeziel) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Simon Déziel (sdeziel) wrote :
Download full text (10.3 KiB)

xeon is the host/hypervisor and it has the following boot args:

sdeziel@xeon:~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.4.0-108-generic root=UUID=40a302c9-82d9-4574-ac2f-efcf5c9d126f ro hugepages=3008 kaslr nmi_watchdog=0 possible_cpus=2 transparent_hugepage=never vsyscall=none

The Trusty VM hangs during boot:

sdeziel@xeon:~$ virsh start --console smtp # Trusty VM
Domain smtp started
Connected to domain smtp
Escape character is ^]
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Initializing cgroup subsys cpuacct
[ 0.000000] Linux version 3.13.0-137-generic (buildd@lgw01-amd64-058) (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) ) #186-Ubuntu SMP Mon Dec 4 19:09:19 UTC 2017 (Ubuntu 3.13.0-137.186-generic 3.13.11-ckt39)
[ 0.000000] Command line: root=UUID=4c6c4ccd-99e2-43fa-af96-91bdeb1258aa ro console=ttyS0
[ 0.000000] KERNEL supported cpus:
[ 0.000000] Intel GenuineIntel
[ 0.000000] AMD AuthenticAMD
[ 0.000000] Centaur CentaurHauls
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000017fdffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000017fe0000-0x0000000017ffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] SMBIOS 2.8 present.
[ 0.000000] Hypervisor detected: KVM
[ 0.000000] No AGP bridge found
[ 0.000000] e820: last_pfn = 0x17fe0 max_arch_pfn = 0x400000000
[ 0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[ 0.000000] found SMP MP-table at [mem 0x000f6630-0x000f663f] mapped at [ffff8800000f6630]
[ 0.000000] Scanning 1 areas for low memory corruption
[ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[ 0.000000] init_memory_mapping: [mem 0x17800000-0x179fffff]
[ 0.000000] init_memory_mapping: [mem 0x14000000-0x177fffff]
[ 0.000000] init_memory_mapping: [mem 0x00100000-0x13ffffff]
[ 0.000000] init_memory_mapping: [mem 0x17a00000-0x17fdffff]
[ 0.000000] RAMDISK: [mem 0x17a62000-0x17fcffff]
[ 0.000000] ACPI: RSDP 00000000000f6450 000014 (v00 BOCHS )
[ 0.000000] ACPI: RSDT 0000000017fe16fa 000034 (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001)
[ 0.000000] ACPI: FACP 0000000017fe0c14 000074 (v01 BOCHS BXPCFACP 00000001 BXPC 00000001)
[ 0.000000] ACPI: DSDT 0000000017fe0040 000BD4 (v01 BOCHS BXPCDSDT 00000001 BXPC 00000001)
[ 0.000000] ACPI: FACS 0000000017fe0000 000040
[ 0.000000] ACPI: SSDT 0000000017fe0c88 0009C2 (v01 BOCHS BXPCSSDT 00000001 BXPC 00000001)
[ 0.000000] ACPI: APIC 0000000017fe164a 000078 (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001)
[ 0.000000] ACPI: HPET 0000000017fe16c2 000038 (v01 BOCHS BXPCHPET 00000001 BXPC 00000001)
[ 0.000000] No NUMA configurat...

description: updated
Revision history for this message
Simon Déziel (sdeziel) wrote :

After multiple reboots, it turns out that the problematic host's boot arg is "nmi_watchdog=0".

If I set the host with only those:

sdeziel@xeon:~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.4.0-108-generic root=UUID=40a302c9-82d9-4574-ac2f-efcf5c9d126f ro hugepages=3008 kaslr possible_cpus=2 transparent_hugepage=never vsyscall=none

All 17 guests boot fine.

Revision history for this message
Simon Déziel (sdeziel) wrote :

For what it's worth, a similar setup but using an AMD CPU (C-60) has no problem with nmi_watchdog=0.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can you see if this bug happens with the following kernel:

http://kernel.ubuntu.com/~jsalisbury/lp1741934/

Changed in linux (Ubuntu):
importance: Undecided → Critical
Changed in linux (Ubuntu Xenial):
importance: Undecided → Critical
status: New → Confirmed
tags: added: kernel-key
Changed in linux (Ubuntu Xenial):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Emilio (ereus004) wrote :

I cannot boot system on a desktop computer, not virtual. I need press reset buton to resart freeze system.
I've a i5-4460 processor. I've returned to 4.4.0-104 generic kernel to restart my computer.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.