qemu VM crashes the host

Bug #1367932 reported by Simon Déziel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Chris J Arges

Bug Description

My qemu host keeps crashing every once in a while and this seems to happen when there is some load and/or network traffic happening in the 2 guests it powers.

This time I was able to capture a crash trace:

Sep 10 16:49:45 ocelot kernel: [204709.569951] general protection fault: 0000 [#1] SMP
Sep 10 16:49:45 ocelot kernel: [204709.570004] Modules linked in: vhost_net vhost macvtap macvlan bridge nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_LOG xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_owner xt_conn
track nf_conntrack iptable_filter ip_tables x_tables eeepc_wmi asus_wmi sparse_keymap video 8021q garp stp mrp llc radeon kvm_amd kvm ttm drm_kms_helper drm sp5100_tco serio_raw k10temp i2c_algo_bit i2c_piix4 wmi mac_hid configfs n
ls_iso8859_1 hid_generic usbhid hid psmouse r8169 mii ahci libahci
Sep 10 16:49:45 ocelot kernel: [204709.570337] CPU: 0 PID: 2944 Comm: qemu-system-x86 Not tainted 3.13.0-36-generic #63-Ubuntu
Sep 10 16:49:45 ocelot kernel: [204709.570375] Hardware name: System manufacturer System Product Name/C60M1-I, BIOS 0305 08/07/2012
Sep 10 16:49:45 ocelot kernel: [204709.570414] task: ffff8802120f1800 ti: ffff88020f428000 task.ti: ffff88020f428000
Sep 10 16:49:45 ocelot kernel: [204709.570446] RIP: 0010:[<ffffffff81092ef8>] [<ffffffff81092ef8>] preempt_notifier_unregister+0x18/0x40
Sep 10 16:49:45 ocelot kernel: [204709.570497] RSP: 0018:ffff88020f429e00 EFLAGS: 00010206
Sep 10 16:49:45 ocelot kernel: [204709.570523] RAX: 0020000000000000 RBX: ffff88020f430000 RCX: 0000000000000176
Sep 10 16:49:45 ocelot kernel: [204709.572483] RDX: ffff8802120f1a18 RSI: 0000000081730c20 RDI: ffff88020f430008
Sep 10 16:49:45 ocelot kernel: [204709.574491] RBP: ffff88020f429e00 R08: ffffffffa013e180 R09: 0000000000000000
Sep 10 16:49:45 ocelot kernel: [204709.576508] R10: 00ffffffffffffff R11: 0000000000000000 R12: 0000000000000000
Sep 10 16:49:45 ocelot kernel: [204709.578511] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Sep 10 16:49:45 ocelot kernel: [204709.580492] FS: 00007f77aefe2700(0000) GS:ffff88021ec00000(0000) knlGS:0000000000000000
Sep 10 16:49:45 ocelot kernel: [204709.582461] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep 10 16:49:45 ocelot kernel: [204709.584402] CR2: 000009f39cad593d CR3: 00000002110b6000 CR4: 00000000000007f0
Sep 10 16:49:45 ocelot kernel: [204709.586351] Stack:
Sep 10 16:49:45 ocelot kernel: [204709.588261] ffff88020f429e18 ffffffffa0141d8b ffff88020f430000 ffff88020f429ec0
Sep 10 16:49:45 ocelot kernel: [204709.590194] ffffffffa0141eae ffffffff810d8e51 000000010f429e78 00007f77bf54c000
Sep 10 16:49:45 ocelot kernel: [204709.592119] ffff880212f14a80 00000000000001c0 00007f77bf54c1c0 0000000000000081
Sep 10 16:49:45 ocelot kernel: [204709.594027] Call Trace:
Sep 10 16:49:45 ocelot kernel: [204709.595921] [<ffffffffa0141d8b>] vcpu_put+0x1b/0x30 [kvm]
Sep 10 16:49:45 ocelot kernel: [204709.597806] [<ffffffffa0141eae>] kvm_vcpu_ioctl+0x10e/0x5b0 [kvm]
Sep 10 16:49:45 ocelot kernel: [204709.599645] [<ffffffff810d8e51>] ? futex_wake+0x1b1/0x1d0
Sep 10 16:49:45 ocelot kernel: [204709.601449] [<ffffffff81636f85>] ? sk_run_filter+0x295/0x700
Sep 10 16:49:45 ocelot kernel: [204709.603222] [<ffffffff811d0390>] do_vfs_ioctl+0x2e0/0x4c0
Sep 10 16:49:45 ocelot kernel: [204709.604957] [<ffffffff8110dfa3>] ? __secure_computing+0x73/0x260
Sep 10 16:49:45 ocelot kernel: [204709.606676] [<ffffffff811d05f1>] SyS_ioctl+0x81/0xa0
Sep 10 16:49:45 ocelot kernel: [204709.608358] [<ffffffff8172f1bf>] tracesys+0xe1/0xe6
Sep 10 16:49:45 ocelot kernel: [204709.610003] Code: 05 18 02 00 00 48 89 47 08 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 07 48 8b 57 08 55 48 85 c0 48 89 e5 48 89 02 74 04 <48> 89 50 08 48 b8 00 01 10 00 00 00 ad de 48 89
 07 48 b8 00 02
Sep 10 16:49:45 ocelot kernel: [204709.613642] RIP [<ffffffff81092ef8>] preempt_notifier_unregister+0x18/0x40
Sep 10 16:49:45 ocelot kernel: [204709.615392] RSP <ffff88020f429e00>
Sep 10 16:49:45 ocelot kernel: [204709.622204] ---[ end trace 8804eab7229156e0 ]---

CPU 0 is powering an OpenBSD 5.5 64 bit guests that is pinned to this CPU. At the time of the crash, the other guest (Trusty 64 bit) was running and pinned on CPU 1.

$ lsb_release -rd
Description: Ubuntu 14.04.1 LTS
Release: 14.04

# Note: Even if I run a kernel from -proposed, the crashed happened before.
$ apt-cache policy linux-image-3.13.0-36-generic qemu-system-x86
linux-image-3.13.0-36-generic:
  Installed: 3.13.0-36.63
  Candidate: 3.13.0-36.63
  Version table:
 *** 3.13.0-36.63 0
        500 http://archive.ubuntu.com/ubuntu/ trusty-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
qemu-system-x86:
  Installed: 2.0.0+dfsg-2ubuntu1.3
  Candidate: 2.0.0+dfsg-2ubuntu1.3
  Version table:
 *** 2.0.0+dfsg-2ubuntu1.3 0
        500 http://archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu/ trusty-security/main amd64 Packages
        500 http://archive.ubuntu.com/ubuntu/ trusty-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     2.0.0~rc1+dfsg-0ubuntu3 0
        500 http://archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-36-generic 3.13.0-36.63
ProcVersionSignature: Ubuntu 3.13.0-36.63-generic 3.13.11.6
Uname: Linux 3.13.0-36-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Sep 10 16:58 seq
 crw-rw---- 1 root audio 116, 33 Sep 10 16:58 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.14.1-0ubuntu3.4
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory: 'iw'
Date: Wed Sep 10 17:11:33 2014
HibernationDevice: RESUME=/dev/mapper/vgocelot-swap
InstallationDate: Installed on 2014-07-30 (42 days ago)
InstallationMedia: Ubuntu-Server 14.04.1 LTS "Trusty Tahr" - Release amd64 (20140722.3)
MachineType: System manufacturer System Product Name
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-36-generic root=/dev/mapper/vgocelot-root ro noswapaccount possible_cpus=2 nodelayacct debug ignore_loglevel
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-36-generic N/A
 linux-backports-modules-3.13.0-36-generic N/A
 linux-firmware 1.127.5
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 08/07/2012
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 0305
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: C60M1-I
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev X.0x
dmi.chassis.asset.tag: Asset-1234567890
dmi.chassis.type: 3
dmi.chassis.vendor: Chassis Manufacture
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr0305:bd08/07/2012:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnC60M1-I:rvrRevX.0x:cvnChassisManufacture:ct3:cvrChassisVersion:
dmi.product.name: System Product Name
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer

Revision history for this message
Simon Déziel (sdeziel) wrote :
Revision history for this message
Simon Déziel (sdeziel) wrote :

Attaching the libvirt definition of the 2 guests

Revision history for this message
Simon Déziel (sdeziel) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

id this issue occur in a previous version of Ubuntu, or is this a new issue?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.17 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.17-rc4-utopic/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
tags: added: kernel-da-key
Revision history for this message
Simon Déziel (sdeziel) wrote : Re: [Bug 1367932] Re: qemu VM crashes the host

On 09/10/2014 06:03 PM, Joseph Salisbury wrote:
> id this issue occur in a previous version of Ubuntu, or is this a new
> issue?

The problem also happened when the host was running Saucy and the
OpenBSD guest was at version 5.4.

Looking at the trace more closely, I noticed
"__secure_computing+0x73/0x260" could it be related to seccomp filtering
that I enabled on QEMU (had that on with Saucy too)?

> Would it be possible for you to test the latest upstream kernel? Refer
> to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
> v3.17 kernel[0].

I will test the latest upstream eventually but I just noticed a newer
BIOS/EFI was available. The changelog says "Improve system stability"
so I'll give that a try and see if I can reproduce the issue with
Ubuntu's stock kernel and this new BIOS/EFI.

Thanks Joseph

Revision history for this message
Chris J Arges (arges) wrote :

@sdeziel:

Hi,

How often does this occur?
Have you been able to reproduce this issue without pinning your VMs to cpus?
Can you still reproduce the issue after disabling seccomp filtering?

Another approach would be to get a full system crash using the following:
https://wiki.ubuntu.com/Kernel/CrashdumpRecipe

Thanks for your bug report,

Revision history for this message
Simon Déziel (sdeziel) wrote :

Hi Chris,

On 09/11/2014 01:16 PM, Chris J Arges wrote:
> How often does this occur?

It depends: can be once a week or 3 days in a row. It almost always
happened at night (during backup jobs).

> Have you been able to reproduce this issue without pinning your VMs to cpus?
> Can you still reproduce the issue after disabling seccomp filtering?

Good idea, I will try to play with both settings but I will first try to
reproduce it with the newer BIOS/EFI version.

> Another approach would be to get a full system crash using the following:
> https://wiki.ubuntu.com/Kernel/CrashdumpRecipe

Interesting, enabling this now to capture useful dump next crash.

Thanks!
Simon

penalvch (penalvch)
tags: added: bios-outdated-0502
tags: added: saucy
Revision history for this message
Simon Déziel (sdeziel) wrote :

On 09/12/2014 07:33 AM, Christopher M. Penalver wrote:
> ** Tags added: bios-outdated-0502

I'm not sure about this flag since the outdated BIOS version was 305 and
the 502 one is the latest.

Chris J Arges (arges)
Changed in linux (Ubuntu):
assignee: nobody → Chris J Arges (arges)
Revision history for this message
Simon Déziel (sdeziel) wrote :

Not any crash since I updated the BIOS to version 502 and after installing linux-crashdump. It's been 11 days since the last crash so maybe the BIOS update was the fix. I'll wait some more days and will remove linux-crashdump just to see if this could have some impact.

Revision history for this message
Simon Déziel (sdeziel) wrote :

Finally marking as invalid since the BIOS upgrade fixed the problem.

Changed in linux (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.