VM shuts down due to error in qemu block.c

Bug #1652011 reported by Andrew Parker
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
qemu (Fedora)
Won't Fix
Medium
qemu (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

On a Trusty KVM host one of the guest VMs shut down without any user interaction. The system is running:

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.5 LTS"

$ dpkg -l libvirt0 qemu-kvm qemu-system-common qemu-system-x86
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-============================================================-===================================-===================================-==============================================================================================================================
ii libvirt0 1.2.2-0ubuntu13.1.17 amd64 library for interfacing with different virtualization systems
ii qemu-kvm 2.0.0+dfsg-2ubuntu1.27 amd64 QEMU Full virtualization
ii qemu-system-common 2.0.0+dfsg-2ubuntu1.27 amd64 QEMU full system emulation binaries (common files)
ii qemu-system-x86 2.0.0+dfsg-2ubuntu1.27 amd64 QEMU full system emulation binaries (x86)

In the VMs log in /var/lib/libvirt/qemu/hostname we see:

2016-11-17 09:18:42.603+0000: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -name hostname,process=qemu:hostname -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 12697 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 51766564-ed8e-41aa-91b5-574220af4ac3 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/hostname.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/disk1/hostname,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/disk2/hostname_mnt_data,if=none,id=drive-virtio-disk1,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/dev/disk1/hostname_tmp,if=none,id=drive-virtio-disk2,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk2,id=virtio-disk2 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=30 -device virtio-net-pci,tx=bh,netdev=hostnet0,id=net0,mac=52:54:00:45:e7:d9,bus=pci.0,addr=0x6 -netdev tap,fd=31,id=hostnet1,vhost=on,vhostfd=32 -device virtio-net-pci,tx=bh,netdev=hostnet1,id=net1,mac=52:54:00:f6:6c:77,bus=pci.0,addr=0x7 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 0.0.0.0:5 -device VGA,id=video0,bus=pci.0,addr=0x2 -device AC97,id=sound0,bus=pci.0,addr=0x3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9
char device redirected to /dev/pts/6 (label charserial0)
qemu-system-x86_64: /build/qemu-PVxDqC/qemu-2.0.0+dfsg/block.c:3491: bdrv_error_action: Assertion `error >= 0' failed.
2016-12-22 09:49:49.392+0000: shutting down

In /var/lib/libvirt/libvirtd.log we see:

2016-12-22 09:49:49.298+0000: 6946: error : qemuMonitorIO:656 : internal error: End of file from monitor

We investigated to see if this is a known issue and came across a bug report for Fedora (https://bugzilla.redhat.com/show_bug.cgi?id=1147398), but nothing references changes upstream that fix this.

The guest OS is Ubuntu Precise (12.04.5) running kernel linux-image-3.2.0-101-virtual 3.2.0-101.141. There wasn't any significant load (CPU or IO) on the guest at the time that it shut down and there wasn't any appreciable disk IO on the KVM host either. The disks for the guest are on the KVM host box.

Revision history for this message
In , Evan (evan-redhat-bugs) wrote :

Description of problem:
VM Guests occasionally hard shutdown unexpectedly with error:

"qemu-system-x86_64: block.c:2806: bdrv_error_action: Assertion `error >= 0' failed."

In my environment it only appears to be my Windows 2008R2 guests that are affected, although I know of another environment with SLES guests that are affected. As per email here: http://lists.gnu.org/archive/html/qemu-discuss/2014-06/msg00094.html

Version-Release number of selected component (if applicable):

  * qemu-system-x86-1.6.2-5.fc20.x86_64

How reproducible:

It is difficult to reproduce, it occurs roughly once a week each Guest VM for me.

Additional info:
  * I am running Openstack Icehouse on Fedora 20 (via packstack)
  * Kernels: kernel-3.14.4-200.fc20.x86_64 / kernel-3.14.8-200.fc20.x86_64
  * libvirt-1.1.3.5-2.fc20.x86_64
  * Guest OS: Windows 2008R2
  * Guest VirtIO driver version 0.1-81
  * Guest Storage is via NFS export from a Netapp FAS 6220 cluster.
  * These unexpected shutdowns do not occur for me at busy times for either the guests or the hosts.

Revision history for this message
In , Richard (richard-redhat-bugs) wrote :

*** Bug 1147398 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Richard (richard-redhat-bugs) wrote :

bdrv_error_action is called from 3 places. What is going to
help most of all here is a stack trace. Easiest thing is
to enable core dumps and make sure the core dump is captured when
qemu fails.

Revision history for this message
In , Markus (markus-redhat-bugs) wrote :

Thanks for the idea. Sounds better than mine to recompile qemu with debug messages. Can you give a hint how to achieve it in an OVirt/libvirt environment.

Revision history for this message
In , QiangGuan (qiangguan-redhat-bugs) wrote :

I met this problem with qemu-1.6.1 too, while my problem is found at debian7 guests.

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora 'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 20 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Revision history for this message
In , Cole (cole-redhat-bugs) wrote :

Since F20 is EOL soon, closing this. If anyone can still reproduce with F21+, please reopen and I'll take a look

Revision history for this message
In , Matt (matt-redhat-bugs) wrote :

We see this in upstream openstack CI testing, viewable here:

http://logs.openstack.org/07/251407/2/check/gate-tempest-dsvm-full/144f7fc/logs/libvirt/libvirtd.txt.gz#_2015-11-30_18_20_18_168

2015-11-30 18:20:18.168+0000: 31539: error : qemuMonitorIO:656 : internal error: End of file from monitor
2015-11-30 18:20:18.168+0000: 31539: debug : qemuMonitorIO:710 : Error on monitor internal error: End of file from monitor
2015-11-30 18:20:18.168+0000: 31539: debug : qemuMonitorIO:731 : Triggering EOF callback
2015-11-30 18:20:18.168+0000: 31539: debug : qemuProcessHandleMonitorEOF:300 : Received EOF on 0x7fa310011240 'instance-00000066'
2015-11-30 18:20:18.168+0000: 31539: debug : qemuProcessHandleMonitorEOF:318 : Monitor connection to 'instance-00000066' closed without SHUTDOWN event; assuming the domain crashed
2015-11-30 18:20:18.168+0000: 31539: debug : virObjectEventNew:643 : obj=0x7fa340aab850
2015-11-30 18:20:18.168+0000: 31539: debug : qemuProcessStop:4235 : Shutting down vm=0x7fa310011240 name=instance-00000066 id=150 pid=17830 flags=0

This was the domain log:

http://logs.openstack.org/07/251407/2/check/gate-tempest-dsvm-full/144f7fc/logs/libvirt/qemu/instance-00000066.txt.gz

I noticed this:

char device redirected to /dev/pts/1 (label charserial1)
qemu-system-x86_64: /build/qemu-5LgLIn/qemu-2.0.0+dfsg/block.c:3491: bdrv_error_action: Assertion `error >= 0' failed.
2015-11-30 18:20:18.168+0000: shutting down

This is a volume-backed VM. I think around the time that this fails, we should be trying to plug a virtual interface.

Possibly also helpful:

http://logs.openstack.org/07/251407/2/check/gate-tempest-dsvm-full/144f7fc/logs/screen-n-net.txt.gz#_2015-11-30_18_19_45_252

2015-11-30 18:19:45.251 DEBUG oslo_concurrency.processutils [req-8911e8c7-2466-408f-832e-af4b78e9adec tempest-TestVolumeBootPattern-2142876884 tempest-TestVolumeBootPattern-1970238908] CMD "sudo nova-rootwrap /etc/nova/rootwrap.conf ebtables --concurrent -t nat -D PREROUTING --logical-in br100 -p ipv4 --ip-src 10.1.0.3 ! --ip-dst 10.1.0.0/20 -j redirect --redirect-target ACCEPT" returned: 255 in 0.147s execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:297

Revision history for this message
In , Matt (matt-redhat-bugs) wrote :

For comment 7, this is mitaka openstack.

libvirt version: 1.2.2

QEMU 2.0.0

Ubuntu 14.04 for the compute host.

Revision history for this message
In , Cole (cole-redhat-bugs) wrote :

If you are hitting this on ubuntu, you need to file an ubuntu bug.

Revision history for this message
Thomas Huth (th-huth) wrote :

Please do not report distribution specific bugs (since you're using qemu-kvm 2.0.0+dfsg-2ubuntu1.27) against the QEMU project bugtracker - use the distribution's bugtracker instead.

affects: qemu → qemu (Ubuntu)
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in qemu (Ubuntu):
status: New → Confirmed
Changed in qemu (Fedora):
importance: Unknown → Medium
status: Unknown → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.