Bug #1022901 “Ubuntu 12.04 64bit Guest (qemu/kvm) dies randomly” : Bugs : qemu-kvm package : Ubuntu

Revision history for this message

Ubuntu Foundations Team Bug Bot (crichton) wrote on 2012-07-10:

#1

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1022901/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags:

added: bot-comment

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-10:

#2

I think it is the qemu-kvm package responsible...

affects:

ubuntu → qemu-kvm (Ubuntu)

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-07-10:

#3

@Todor,

do you know if any other virsh activity is going on every time that happens? For instance, create, destroying, or even just listing guests?

I'm going to mark this as also affect libvirt (which i think is more likely to be the problem). Coudl you run 'apport-collect 1022901' to have apport upload some debugging information?

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-07-10:

#4

@Todor,

actually, before you run apport-collect, could you

cd /usr/share/apport/package-hooks
sudo mv source_libvirt-bin.py source_libvirt.py

to make sure that the libvirt debugging information is collected.

thanks!

Changed in libvirt (Ubuntu):
importance:	Undecided → High
Changed in qemu-kvm (Ubuntu):
importance:	Undecided → High
status:	New → Incomplete
Changed in libvirt (Ubuntu):
status:	New → Incomplete

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-10:

#5

Maybe listing, because virt-manager is going on constantly. I'll stop it and see what happens. I'm not sure what are these apport things about, but will do, immediately!
Thanks!

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-10: BootDmesg.txt

#6

BootDmesg.txt Edit (52.7 KiB, text/plain)

apport information

tags:	added: apport-collected precise
description:	updated

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-10: CurrentDmesg.txt

#7

CurrentDmesg.txt Edit (12.9 KiB, text/plain)

apport information

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-10: Dependencies.txt

#8

Dependencies.txt Edit (2.8 KiB, text/plain)

apport information

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-10: KernLog.txt

#9

KernLog.txt Edit (1.5 KiB, text/plain)

apport information

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-10: KvmCmdLine.txt

#10

KvmCmdLine.txt Edit (2.3 KiB, text/plain)

apport information

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-10: Lspci.txt

#11

Lspci.txt Edit (16.0 KiB, text/plain)

apport information

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-10: ProcCpuinfo.txt

#12

ProcCpuinfo.txt Edit (3.6 KiB, text/plain)

apport information

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-10: ProcInterrupts.txt

#13

ProcInterrupts.txt Edit (2.2 KiB, text/plain)

apport information

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-10: ProcModules.txt

#14

ProcModules.txt Edit (2.6 KiB, text/plain)

apport information

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-10: RelatedPackageVersions.txt

#15

RelatedPackageVersions.txt Edit (2.0 KiB, text/plain)

apport information

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-10: UdevDb.txt

#16

UdevDb.txt Edit (109.8 KiB, text/plain)

apport information

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-10: UdevLog.txt

#17

UdevLog.txt Edit (247.6 KiB, text/plain)

apport information

Serge Hallyn (serge-hallyn) on 2012-07-11

Changed in libvirt (Ubuntu):
status:	Incomplete → New
Changed in qemu-kvm (Ubuntu):
status:	Incomplete → New

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-07-11:

#18

Have you been able to verify whether this does not happen when virt-manager is not running?

Please try the following:

sudo mv /var/log/libvirt/libvirtd.log /var/log/libvirt/libvirtd.log.orig
echo "log_level = 1" | sudo tee -a /etc/libvirt/libvirtd.conf
sudo stop libvirt-bin; sudo start libvirt-bin

Then after one of the VMs has shut itself down,

sudo cp /var/log/libvirt/libvirtd.log /var/log/libvirt/log.submit
sudo stop libvirt-bin
sudo sed -i '/^log_level/d' /etc/libvirt/libvirtd.conf
sudo start libvirt-bin

Then attach /var/log/libvirt/log.submit to this report. This should provide more debugging information from libvirt about what has happened.

Changed in libvirt (Ubuntu):
status:	New → Incomplete
Changed in qemu-kvm (Ubuntu):
status:	New → Incomplete

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-07-11:

#19

Sorry, one more thing - could you also attach /var/log/kernel.log from a guest which has recently shut down? This may include useful acpi and kernel information.

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-11:

#20

Kern.log at the times of diying Edit (125.6 KiB, text/plain)

Of course. It has shut off at these hours (directly from host's /var/log/libvirt/qemu/ubuntu-pdc.log):
2012-07-10 09:56:23.767+0000: shutting down
and
2012-07-10 11:50:09.191+0000: shutting down

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-12:

#21

I have set the log_level and have virt_manager disconnected. Thank you for your cooperation! Now I'm waiting, when it happens will write immideately.

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-13:

#22

libvirtd.log Edit (98.4 MiB, text/plain)

Now it happened again. By the way, I have made a script to autostart it 10 seconds after ping lose. This time the virt-manager was not connected at the time it died. Also, the script couldn't start it again, because of some error. When I connected with virt-manager to see what is the status - it was Paused. I had to do a right-mouse-button > Force Off.
Now it did it again, at the time I'm writing this. Again had to Force Off, because it was Paused.

The log is about 100MB in size. The machine was lost at 7:47 and 8:05 (log times) (the logs are in UTC time, please write me if I can make them in my time-zone, which is +3 hours now).

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-13:

#23

There are no errors in the guest's syslog. Well I now saw the log and it seems it has IO errors on "device": "drive-virtio-disk1". How could I determine which of the two is this virtual disk - is there drive-virtio-disk0 or this is the first?
Also, now it was obviously in Paused state, but the majority of cases it just dies, it is not in Paused state, so I'm not sure this is the actual case we are waiting. Only god knows :-)
Thanks for any help!

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-13:

#24

Now it did it again - at 10:47. Same errors and symptoms (Paused).

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-13:

#25

Now it did it again - at 11:07. Same errors and symptoms (Paused).
Well, I see a pattern here. Every max. 15 min. before a round hour (for example 11.00) and every max. 10min. after this hour there are IO_ERRORs in the log and therefore pausing the machine.
I saw that other dyings are also 15 minutes around the hour. Not exactly at some minute, but definetly within 15 minutes before or after the round hour.

Revision history for this message

Alexander Hawk (saclista) wrote on 2012-07-13:

#26

Hi Serge & Todor,

Same problem here :(

Ubuntu 12.04 server 64 bit host
Guests:
1. Debian 6 32 bit graphic desktop
2. Debian 6 64 bit graphic desktop

I`m new to kvm.
I created and maintain 2 debian machines over LAN from Virtual Machine Manager(http://virt-manager.org/).

Once or twice a day error message is:
qemuMonitorIO:603 : internal error End of file from monitor

and one or both of Debian shut down :(

Any help appreciated.
Alex

Revision history for this message

Alexander Hawk (saclista) wrote on 2012-07-13:

#27

Download full text (3.5 KiB)

A part of /var/log/libvirtd/libvirtd.log file:

2012-07-11 15:15:11.083+0000: 1230: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-11 15:15:32.479+0000: 1225: error : virNetSocketReadWire:996 : End of file while reading data: Input/output error
2012-07-11 18:06:25.243+0000: 1225: error : qemuMonitorIO:603 : internal error End of file from monitor
2012-07-12 09:25:37.092+0000: 1231: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory
2012-07-12 09:25:37.092+0000: 1231: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-12 09:25:37.857+0000: 1236: error : virConnectNumOfInterfaces:9803 : this function is not supported by the connection driver: virConnectNumOfInterfaces
2012-07-12 09:25:47.537+0000: 1232: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory
2012-07-12 09:25:47.537+0000: 1232: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-12 09:25:48.291+0000: 1233: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory
2012-07-12 09:25:48.291+0000: 1233: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-12 09:26:08.761+0000: 1229: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory
2012-07-12 09:26:08.761+0000: 1229: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-12 09:26:09.636+0000: 1230: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory
2012-07-12 09:26:09.636+0000: 1230: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-12 14:03:55.930+0000: 1225: error : virNetSocketReadWire:996 : End of file while reading data: Input/output error
2012-07-12 14:04:16.928+0000: 1231: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory
2012-07-12 14:04:16.928+0000: 1231: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-12 14:04:17.536+0000: 1237: error : virConnectNumOfInterfaces:9803 : this function is not supported by the connection driver: virConnectNumOfInterfaces
2012-07-12 14:04:28.160+0000: 1232: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory
2012-07-12 14:04:28.160+0000: 1232: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-12 14:04:29.021+0000: 1230: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory
2012-07-12 14:04:29.021+0000: 1230: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-12 14:04:53.009+0000: 1230: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory
2012-07-12 14:04:53.009+0000: 1230: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-12 14:04:53.645+0000: 1233: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory
2012-07-12 14:04:53.645+0000: 1233...

A part of /var/log/libvirtd/libvirtd.log file:

2012-07-11 15:15:11.083+0000: 1230: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-11 15:15:32.479+0000: 1225: error : virNetSocketReadWire:996 : End of file while reading data: Input/output error
2012-07-11 18:06:25.243+0000: 1225: error : qemuMonitorIO:603 : internal error End of file from monitor
2012-07-12 09:25:37.092+0000: 1231: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory
2012-07-12 09:25:37.092+0000: 1231: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-12 09:25:37.857+0000: 1236: error : virConnectNumOfInterfaces:9803 : this function is not supported by the connection driver: virConnectNumOfInterfaces
2012-07-12 09:25:47.537+0000: 1232: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory
2012-07-12 09:25:47.537+0000: 1232: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-12 09:25:48.291+0000: 1233: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory
2012-07-12 09:25:48.291+0000: 1233: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-12 09:26:08.761+0000: 1229: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory
2012-07-12 09:26:08.761+0000: 1229: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-12 09:26:09.636+0000: 1230: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory
2012-07-12 09:26:09.636+0000: 1230: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-12 14:03:55.930+0000: 1225: error : virNetSocketReadWire:996 : End of file while reading data: Input/output error
2012-07-12 14:04:16.928+0000: 1231: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory
2012-07-12 14:04:16.928+0000: 1231: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-12 14:04:17.536+0000: 1237: error : virConnectNumOfInterfaces:9803 : this function is not supported by the connection driver: virConnectNumOfInterfaces
2012-07-12 14:04:28.160+0000: 1232: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory
2012-07-12 14:04:28.160+0000: 1232: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-12 14:04:29.021+0000: 1230: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory
2012-07-12 14:04:29.021+0000: 1230: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-12 14:04:53.009+0000: 1230: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory
2012-07-12 14:04:53.009+0000: 1230: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-12 14:04:53.645+0000: 1233: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory
2012-07-12 14:04:53.645+0000: 1233: warning : qemuCapsInit:856 : Failed to get host power management capabilities
2012-07-12 17:01:45.926+0000: 1225: error : virNetSocketReadWire:996 : End of file while reading data: Input/output error
2012-07-13 01:04:39.631+0000: 1225: error : qemuMonitorIO:603 : internal error End of file from monitor
2012-07-13 01:04:49.163+0000: 1225: error : qemuMonitorIO:603 : internal error End of file from monitor

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-07-13:

#28

@Alexander,

do you also see the same 15 minute pattern as Todor?

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-07-13:

#29

@Todor,

could you run
   sudo grep -i cron /var/log/* > /tmp/crongrep
   ps -ef > /tmp/psout
   ls -l /etc/cron* > /tmp/cronls

in the ubuntu-pdc guest, and attach the resulting /tmp/crongrep, /tmp/cronls, and /tmp/psout files?

How many VMs do you have, how much memory is each allocated, and how much RAM do you have in the main machine (I think the boot messages said 4G?) Is it possible that VMs are shutting down because of memory pressure on the host, or do the VMs shut down even when you leave at least 500M free on the host?

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-13:

#30

I can confirm that the "Paused" case with IO_ERROR is happening when a user (samba client) starts to copy files (70GB of pictures with 10MB/s). It crashes almost immediately!
@Serge, the Ubuntu machine is with 1GB allocated and the Windows machine is with 2GB allocated. The host is 4GB. I had a look at this possibility already, but it is not the case.
At the time of writing this, I saw that the space on host's partition is finished and now it cannot start at all, becoming Paused right after the guest tries to write something. Well, OK, but the image size by definition is this:

root@nobel-host1:/home/images# qemu-img info ubuntu-pdc-vdb.img
image: ubuntu-pdc-vdb.img
file format: qcow2
virtual size: 1.7T (1825361100800 bytes)
disk size: 1.7T
cluster_size: 65536
Snapshot list:
ID TAG VM SIZE DATE VM CLOCK
11 snapshot.sh 0 2012-07-06 00:02:02 00:00:00.000
12 snapshot.sh 0 2012-07-07 00:02:02 00:00:00.000
13 snapshot.sh 0 2012-07-10 00:02:02 00:00:00.000
14 snapshot.sh 0 2012-07-12 00:06:01 00:00:00.000
15 snapshot.sh 0 2012-07-13 00:06:03 00:00:00.000

But the real size is (as I am observing from a while) increasing. I don't know if it is from snapshot taking into the image file itself? Also now it didn't do any snapshot, the only thing is copying files into the guest. Isn't the size of the image supposed to be fixed at that, mentioned above? Why is it now this:

root@nobel-host1:/home/images# du -sm ubuntu-pdc-vdb.img
1785438 ubuntu-pdc-vdb.img

and at the beginning it was this:
1741809 ubuntu-pdc-vdb.img

It has grown with over 40GB for about half a month. I'm doing a snapshot every evening at working days, but am deleting the old ones too, leaving only 5 snapshots at a time.

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-16:

#31

I have copied all of the content of the guest machine, deleted the image ubuntu-pdc-vdb.img and created a new one without "fallocate", but only with preallocation=metadata. The problem is that the image is growing progressively and has obviously no limit, although I gave it one. The root filesystem's image is the same case:

qemu-img info ubuntu-pdc-vda.img
image: ubuntu-pdc-vda.img
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 14G
cluster_size: 65536

and for confirmation:
du -sh ubuntu-pdc-vda.img
15G ubuntu-pdc-vda.img

I made a test and saw that when I delete something from the guest, the real size of the image is not decreasing (I read it is normal). OK, but when I write something again, it doesn't use the freed space, but instead grows the image. So for example:
1. The initial physical size of the image is 1GB.
2. I copy 1GB of data in the guest. It's physical size becomes 2GB.
3. I delete this data (1GB). The physical size of the image remains 2GB.
4. I copy another 1GB of data to the guest.
5. The physical size of the image becomes 3GB.
6. And so on with no limit. It doesn't care if the virtual size is less.

Is this normal - the real/physical size of the image to be larger than the virtual limit???

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-07-16:

#32

OK, I have created another bug report for the increasing size of image file:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1025244

I'm monitoring now, after creating a new image (with no fallocate) and having enough space on the host, if the machine will die again. I suppose it will, because when there were no space, libvirtd.log said IO_ERROR and the machine was ALWAYS paused, not dead. My original problem, however, was the machine dies randomly and appeared as Shutoff. So now I'm monitoring with libvirtd.log in debugging mode and waiting.
Will write as soon as I have info.

Revision history for this message

Alexander Hawk (saclista) wrote on 2012-07-16:

#33

@Serge There are no "15 minutes patterns" and no any IO_ERRORS (in guests` /var/log/syslog files)

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-07-16:

#34

@Alexander,

could you please file a new bug? It sounds like yours has a different cause from Todor's. Please first rename /usr/share/apport/package-hooks/source_libvirt-bin.py to /usr/share/apport/package-hooks/source_libvirt.py. In other words:

1. sudo mv /usr/share/apport/package-hooks/source_libvirt-bin.py /usr/share/apport/package-hooks/source_libvirt.py
2. apport-bug libvirt-bin

@Todor,

I will mark this a duplicate of the new bug you created. thanks.

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-09-11:

#35

I can confirm that the machine doesn't die or pause when there is no "space" problem. All of the problems were caused by the space issue!
Although now I have another problem - qcow2 image is broken after 6 snapshots :-) Will file another bug.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-09-11:

#36

@Todor,

since you filed this bug, should this bug be marked invalid?

Revision history for this message

Todor Andreev (toshko3) wrote on 2012-09-11:

#37

Well, I don't think that this is a desired behavior. I don't think it should be put in the wishlist neither.
Since the machine dies, pauses and whatever, I think this is not a desired/predicted way of managing such a situation.
I think it should manage to be at least alive! Don't you think so?
Then it should give some warnings in any way to the administrator.
I imagine how many admins will go through this and only after some nightmare days will realize what is happening. Because, strangely enough, the space was not at the end when I was checking it, it has been going to end only when the client was moving some 100GB of files to the virtual machine, through samba. And after getting an error, he was canceling the transfer and the image was back at its original size, again. Black magic or whatever, but a fact!
I'm sorry I can't help with it... I'm not a programmer and can help only with filling bugs :-(

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-09-11:

#38

@Todor,

sorry, please disregard - I was misremembering the cause.

Revision history for this message

Johan Boger (jboger) wrote on 2012-09-14:

#39

I have the same issue, without filling virtual space. All the other things are exactly the same, though.

Is there a patch available yet?

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-09-14:

#40

@Johan

Could you please file a new bug using 'ubuntu-bug qemu-kvm' and describe exactly what you see and how you reproduce it?

Revision history for this message

Davide Guerri (davide-guerri) wrote on 2012-10-19:

#41

I also think this is not a duplicate of bug #1025244 although they might be somewhat related to each other.

Same issue here: random "internal error End of file from monitor" and "graceful" shutdown. No space issue on the VM

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-10-20:

#42

@Davide,

what exactly do you mean by not a duplicate? Do you mean your bug is not caused by out of space? If so, then again please file a new bug.

Technically I'd probably agree that this is not a duplicate of bug 1025244 - rather it is correct behavior (cleanly shutting down when unable to get enough space) in the face of the other bug. But this bug, as filed by the submitter, is about shutdowns caused by lack of disk space.

Affects		Status	Importance	Assigned to	Milestone
	libvirt (Ubuntu)	Incomplete	High	Unassigned
	qemu-kvm (Ubuntu)	Incomplete	High	Unassigned

Ubuntu
qemu-kvm package

Ubuntu 12.04 64bit Guest (qemu/kvm) dies randomly

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntuqemu-kvm package

Ubuntu 12.04 64bit Guest (qemu/kvm) dies randomly

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
qemu-kvm package