Frequent kernel panic crashes in Linux KVM guest (happens on different host hardware, so hardware fault unlikely)

Bug #1643533 reported by ellie
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
High
Unassigned
Xenial
Expired
High
Unassigned

Bug Description

With Ubuntu 16.04, I get frequent kernel panic crashes in my Linux KVM guest. After notifying the hosting company the machine was migrated to a different physical host which didn't fix the problem. A complete reinstall of Ubuntu 16.04 also didn't fix anything. Therefore, I suspect some sort of bug in the kernel and not a hardware fault or a corrupted install.

/proc/version_signature: Ubuntu 4.4.0-47.68-generic 4.4.24

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-47-generic 4.4.0-47.68
ProcVersionSignature: Ubuntu 4.4.0-47.68-generic 4.4.24
Uname: Linux 4.4.0-47-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Nov 21 12:51 seq
 crw-rw---- 1 root audio 116, 33 Nov 21 12:51 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: [Errno 2] No such file or directory: 'fuser'
Date: Mon Nov 21 12:57:11 2016
HibernationDevice: RESUME=UUID=704a373a-9754-4c59-bff2-001601f1c0bf
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
 Bus 001 Device 002: ID 0627:0001 Adomax Technology Co., Ltd
 Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: Hetzner vServer
PciMultimedia:

ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-47-generic root=UUID=fc92f1ac-ee98-4991-8a92-93451bd36d56 ro nomodeset elevator=noop net.ifnames=0
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-47-generic N/A
 linux-backports-modules-4.4.0-47-generic N/A
 linux-firmware 1.157.4
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/01/2014
dmi.bios.vendor: SeaBIOS
dmi.bios.version: 1.8.2
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-2.6
dmi.modalias: dmi:bvnSeaBIOS:bvr1.8.2:bd04/01/2014:svnHetzner:pnvServer:pvr2:cvnQEMU:ct1:cvrpc-i440fx-2.6:
dmi.product.name: vServer
dmi.product.version: 2
dmi.sys.vendor: Hetzner

Revision history for this message
ellie (et1234567) wrote :
Revision history for this message
ellie (et1234567) wrote :

The kernel panic itself locks up the system instantly and therefore no logging to disk happens of that error. Therefore, see the attached screenshot of the qemu screen after the crash.

Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.9 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.9-rc6

Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Xenial):
status: New → Incomplete
importance: Undecided → High
tags: added: kernel-da-key
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
ellie (et1234567) wrote :

For what it's worth, I am now running 4.9.0-040900rc7-generic since a few minutes ago. The bug always hit me after 2-5 days but always under a week, so I can tell you in at the earliest a week that the bug is or is not in upstream with high certainty. (so around 9th of December)

Revision history for this message
ellie (et1234567) wrote :

It appears as if the issue is NOT present in mainline 4.9-rc7. I will wait a few more days to be sure before marking this 'kernel-fixed-upstream', since the bug always appeared in a varying interval of 2-5 days, just to get a higher level of confidence.

Revision history for this message
ellie (et1234567) wrote :

I am now very confident that this bug is NOT present in 4.9-rc7, therefore marking kernel-fixed-upstream.

tags: added: kernel-fixed-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu Xenial):
status: Incomplete → Confirmed
Revision history for this message
ellie (et1234567) wrote :

I got the impression from my virtual server hoster (which runs the hypervisor) that other customers don't have this issue.

I was also inquiried about my use of BTRFS as a filesystem, and I also use Docker very heavily which makes heavy use of special BTRFS, DeviceMapper and Cgroup features, which might be something not every other customer does. I am just adding this information in case it is helpful for tracking down the cause of this crash.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

We can perform a "Reverse" kernel bisect to identify the commit that fixes this in v4.9-rc7.

We first need to identify the last bad kernel and first good one. Can you test the v4.9-rc1 kernel to try and narrow the affect versions down further. It can be downloaded from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.9-rc1/

If v4.9-rc1 is also good, we would want to test 4.8 final. If it is bad, we would want to test some of the newere 4.9 release candidates.

Thanks in advance!

Changed in linux (Ubuntu Xenial):
status: Confirmed → Incomplete
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu Xenial) because there has been no activity for 60 days.]

Changed in linux (Ubuntu Xenial):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.