kvm segfaults
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| KVM |
Unknown
|
Unknown
|
||
| kvm (Ubuntu) |
High
|
Unassigned | ||
| Hardy |
High
|
Dustin Kirkland | ||
| Intrepid |
High
|
Dustin Kirkland | ||
| Jaunty |
High
|
Unassigned |
Bug Description
Binary package hint: kvm
Hello!
KVM quite frequently segfaulting. This segfault occured twice within a 3 days(freebsd 6.2 i386 as guest):
Apr 6 18:06:03 t-x kernel: [696152.848709] kvm[2706]: segfault at 29c ip 0000000000431879 sp 00007fffc8296040 error 4 in kvm[400000+1e1000]
Apr 9 23:47:54 t-x kernel: [47546.239426] kvm[2790]: segfault at 29c ip 0000000000431859 sp 00007fffde8025b0 error 4 in kvm[400000+1e1000]
And I tried to run JeOS as "kvm -m 128 -smp 1 -drive file=disk0.qcow2 -curses" in screen. Then I came back to server I saw this string:
"Segmentation fault" and relevant string in dmesg:
[95718.809774] kvm[28765]: segfault at 4e0 ip 00007f7f57b61011 sp 00007fff6179edd0 error 4 in libncurses.
I'm using jaunty AMD64 with all updates on dual xeon L5410 server.
Changed in kvm (Ubuntu): | |
importance: | Undecided → Medium |
Dustin Kirkland (kirkland) wrote : | #1 |
Soren Hansen (soren) wrote : | #2 |
I get the exact same problem with the kvm-84 backport for Hardy. I've got a ton of these:
[9257662.920957] kvm[32576]: segfault at 29c rip 431db7 rsp 7fff34552300 error 4
[9326565.509652] kvm[21403]: segfault at 29c rip 431f87 rsp 7fff8f921710 error 4
[9326565.539824] kvm[574]: segfault at 29c rip 431f87 rsp 7fffaeff4da0 error 4
[9552622.565234] kvm[27268]: segfault at 29c rip 431f87 rsp 7fff8b9d4780 error 4
[9708621.294288] kvm[19549]: segfault at 29c rip 431f87 rsp 7fffde194f40 error 4
[9760586.598334] kvm[3009]: segfault at 29c rip 431f87 rsp 7fffc9d3a990 error 4
[9760584.171162] kvm[3013]: segfault at 29c rip 431f87 rsp 7fff798af660 error 4
[9771095.594420] kvm[697]: segfault at 29c rip 431f87 rsp 7fffe235ee10 error 4
[9771093.117852] kvm[703]: segfault at 29c rip 431f87 rsp 7fff45d7bb30 error 4
[9771205.352935] kvm[1628]: segfault at 29c rip 431f87 rsp 7fff73a0b4e0 error 4
[9771205.353095] kvm[1635]: segfault at 29c rip 431f87 rsp 7fffa2fdcd90 error 4
[9774766.320499] kvm[23869]: segfault at 29c rip 431f87 rsp 7fff6c3235b0 error 4
[9779884.735849] kvm[23940]: segfault at 29c rip 431f87 rsp 7fff78e76c70 error 4
[9779887.686334] kvm[2554]: segfault at 29c rip 431f87 rsp 7fff0509ae90 error 4
[9797955.573851] kvm[20882]: segfault at 29c rip 431f87 rsp 7fff6f8a2220 error 4
[9797952.661145] kvm[20893]: segfault at 29c rip 431f87 rsp 7fff76cb2a60 error 4
[9808209.562440] kvm[20890]: segfault at 29c rip 431f87 rsp 7fffde9ea7e0 error 4
[9816715.343731] kvm[14903]: segfault at 29c rip 431f87 rsp 7fff09dd2790 error 4
[9828895.668024] kvm[16372]: segfault at 29c rip 431f87 rsp 7fffc8cb8a70 error 4
[9828892.784692] kvm[16375]: segfault at 29c rip 431f87 rsp 7fff8edd3bc0 error 4
[9853494.092946] kvm[5473]: segfault at 29c rip 431f87 rsp 7fff81c4fa00 error 4
[9853494.206620] kvm[5464]: segfault at 29c rip 431f87 rsp 7fff24d7bbc0 error 4
[9917668.410834] kvm[28710]: segfault at 29c rip 431f87 rsp 7fff199c3f60 error 4
[9917671.202794] kvm[28722]: segfault at 29c rip 431f87 rsp 7fffc36f54a0 error 4
These are all with guests that ran perfectly under the stock Hardy kvm userspace and the kernelspace stuff from the hardy kernel.
Changed in kvm (Ubuntu): | |
importance: | Medium → High |
status: | New → Triaged |
Soren Hansen (soren) wrote : | #3 |
I forgot to mention that I don't use curses. I'm VNC all the way :)
Soren Hansen (soren) wrote : | #4 |
I managed to extract this backtrace.
Soren Hansen (soren) wrote : | #5 |
It seems the bug is somewhere in the block layer. The segfault is caused by the IDE DMA callback being called, but the embedded IDEState is NULL. Lots of stuff has changed in the block layer since kvm-84 came out, so this might prove tricky to narrow down.
Soren Hansen, I have same thoughts because last message from guest was smth like "ad0 read dma timeout, retrying...".
Dustin Kirkland, yes, it's segfaults anyway :(
Soren Hansen (soren) wrote : | #7 |
exe, did you just happen to be looking at the guest at the time of the crash or do you have a way to trigger it?
I hadn't looked at the guest when it crashed. It was just last message on serial console.
I can't trigger it.Seems it's occurs only once a boot or under heavy io load. I can't experiment with a server because it's in production environment. I will make backups soon. If problem in io-subsystem it will crash again and I will try to reproduce it in another(testing) virtual container.
Last message on serial console from guest:
=======
FreeBSD/i386 (virt002.serv) (ttyd0)
login: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=194373119
virsh #
=======
And then in dmesg:
[736228.331135] kvm[2933]: segfault at 29c ip 0000000000431859 sp 00007fffa94391e0 error 4 in kvm[400000+1e1000]
Unfortunally I cannot say how to reproduce this bug. I tried to use bonnie++ to load io-subsystem, but guest hadn't failed.
klighter6 (klighter6) wrote : | #10 |
I get a segfault whenever I run a qemu-img convert, or copy a large file (over 5 GB) as root in a terminal, not to mention have firefox open and trying to surf at the same time... very annoying.
I have package in my company PPA (~linux2go) that fixes this. It includes the following upstream commits:
5f7a4ea7ad2355d
1f310e069d8e6c9
2f615dfb773043e
c48260ed4c7824d
82aa3e8ddba30f1
These should be SRU'd into Jaunty.
Soren Hansen (soren) wrote : | #12 |
Sorry, "Eucalyptus bugbot" is my alter ago.
Soren Hansen (soren) wrote : | #13 |
Proposed debdiff.
Changed in kvm (Ubuntu Jaunty): | |
importance: | Undecided → High |
milestone: | none → jaunty-updates |
status: | New → Triaged |
Dustin Kirkland (kirkland) wrote : | #14 |
For those experiencing this problem on kvm-84 on Hardy or Intrepid, I have uploaded a new package with soren's patch to the PPA.
* https:/
If you are able to test that, please leave feedback here.
:-Dustin
Martin Pitt (pitti) wrote : | #15 |
Ugh, quite intrusive, but I agree that crashing VMs on LTSes are bad. So please upload this to -propose and give it proper testing.
On Wed, May 06, 2009 at 09:48:20AM -0000, Martin Pitt wrote:
> Ugh, quite intrusive, but I agree that crashing VMs on LTSes are bad.
This if for Jaunty, so it's not an LTS issue at the moment. However,
we're working on getting kvm-84 into shape to get it into Hardy, and
this bug is a serious blocker for that.
> So please upload this to -propose and give it proper testing.
It already there (since yesterday evening).
Thanks.
Martin Pitt (pitti) wrote : | #17 |
Accepted kvm into jaunty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https:/
Changed in kvm (Ubuntu Jaunty): | |
status: | Triaged → Fix Committed |
tags: | added: verification-needed |
Dustin Kirkland (kirkland) wrote : | #18 |
Now, I'm nominating for Hardy and Intrepid. This fix should be included in the kvm-84 backport that I'm working on.
:-Dustin
Changed in kvm (Ubuntu Hardy): | |
assignee: | nobody → Dustin Kirkland (kirkland) |
importance: | Undecided → High |
milestone: | none → ubuntu-8.04.3 |
status: | New → Triaged |
Changed in kvm (Ubuntu Intrepid): | |
importance: | Undecided → High |
milestone: | none → intrepid-updates |
status: | New → Triaged |
assignee: | nobody → Dustin Kirkland (kirkland) |
I'm beginning to test this. I will post to this topic If it segfaults.
Mark Darbyshire (markdarb) wrote : | #20 |
Installing the update from jaunty-proposed (amd64) has stopped my virtual machine from segfaulting. Before my XP virtual machine was segfaulting all the time. It would always segfault during a disk consistency check when booting up, and if I skipped that check it wouldn't take all that long before it segfaulted while I was doing things anyway. Anyway, it's working now so thank you very much for the fix. :)
With kvm 1:84+dfsg-
ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=20875903
Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address = 0xc90ecfad
fault code = supervisor read, page not present
instruction pointer = 0x20:0xc07e1624
stack pointer = 0x28:0xe9e01c30
frame pointer = 0x28:0xe9e01c3c
code segment = base 0x0, limit 0xfffff, type 0x1b
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 21 (swi6: task queue)
trap number = 12
panic: page fault
cpuid = 2
Uptime: 3d17h7m41s
Cannot dump. No dump device defined.
Automatic reboot in 15 seconds - press a key on the console to abort
Rebooting...
cpu_reset: Stopping other CPUs
commit 7403b14eeb4670d
Author: aliguori <aliguori@
Date: Sat Mar 28 16:11:25 2009 +0000
Fix DMA API when handling an immediate error from block layer (Avi Kivity)
--
commit c240b9af599d20e
Author: aliguori <aliguori@
Date: Sat Mar 28 16:11:20 2009 +0000
Fix vectored aio bounce handling immediate errors (Avi Kivity)
Should fix this issue.
Oh-oh-oh, after a couple of unsleepy nights it seems bug https:/
Anyway I can easy trigger this problem. No solution yet :(((
oops, segfaults have returned. I migrated to kvm-85.
On Mon, Jun 22, 2009 at 3:47 AM, exe<email address hidden> wrote:
> oops, segfaults have returned. I migrated to kvm-85.
Please clarify... are you seeing the segfaults in kvm-85 or kvm-84?
:-Dustin
I saw segfaults with kvm-84.
I have built package from debian unstable. Also I've had to rebuilt libvirt as old libvirt-bin from karmic doesn't support kvm newer than 84.
Now uptime of my kvm-85 virtual machine is almost 3 days, no problems detected. Unfortunally I must reboot server soon to replace defective hdd.
DanielB (dbareiro-gmx) wrote : | #27 |
Apparently I am having the same problem with KVM-62 in Ubuntu Hardy Heron server amd64 with kernel 2.6.24-19-server. At the moment it has happened a single time, but I can confirm that the VM is under heavy IO load. Even so this it is a _very_ serious problem from the moment that the VM is a productive host.
KVM start configuration:
# APS2
$KVM -hda /dev/vm/aps2-raiz -hdb /dev/vm/aps2-space \
-hdc /dev/vm/aps2-index -hdd /dev/vm/aps2-cache -m 4096 -smp 4 \
-net nic,vlan=
-localtime -monitor telnet:
-serial telnet:
Next, I copy the lines found in /var/log/messages:
Jun 29 10:28:21 ss02 kernel: [5867223.459407] kvm[30711]: segfault at 284 rip 42ff7f rsp 7fffd9b9f580 error 4
Jun 29 10:28:21 ss02 kernel: [5867223.629174] br0: port 6(tap4) entering disabled state
Jun 29 10:28:21 ss02 kernel: [5867223.670810] device tap4 left promiscuous mode
Jun 29 10:28:21 ss02 kernel: [5867223.670819] audit(124628210
Jun 29 10:28:21 ss02 kernel: [5867223.670821] br0: port 6(tap4) entering disabled state
Hardware:
HP Proliant DL380 G5
2 x Xeon QuadCore - 2 GHz
16 GiB RAM / 1 GiB swap
Network interface: 2 x NC373i 10 / 100 / 1000
HP Smart Array E200 with 8 x 300 GB SAS - 10k / RAID 5.
The VM is running Debian GNU/Linux Lenny 5.0.2 with kernel 2.6.26-2-686 and 'pci=noacpi' kernel option in order to avoid transmit timed out from network interface which turn it inaccessible to the rest of the network. Both /var/log/messages and /var/log/syslog in the VM don't show messages of IDE DMA callback problems.
Regards,
Daniel
Garry Dolley (gdolley) wrote : | #28 |
I'm running jaunty with everything up-to-date.
I was experiencing segfaults similar to those described in this ticket when disk I/O was high. For example, when dd'ing a raw image file to an LVM volume, when the image was about 60G, caused several kvm processes to crash.
I was able to reproduce these crashes in an interesting way:
I gave a FreeBSD VM an AoE volume, with the following command line:
/usr/bin/kvm -S -M pc -m 256 -smp 1 -name freebsd-test -uuid fa4f4230-
Slightly prior to booting this VM, I would turn off the AoE volume where it was exported from. When FreeBSD would try to mount it, the kvm process would crash within about 30 seconds.
After installing the following ppa:
1:84+dfsg-
I have not been able to reproduce the problem. Instead of the kvm process crashing, FreeBSD just reports a crap load of DMA errors:
[root@beta ~]# fsck /dev/ad1s1a
** /dev/ad1s1a
** Last Mounted on /backup
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
5939 files, 145406 used, 108409 free (2713 frags, 13212 blocks, 1.1% fragmentation)
ad1: FAILURE - WRITE_DMA status=
CANNOT WRITE BLK: 128
CONTINUE? [yn] y
ad1: FAILURE - WRITE_DMA status=
ad1: FAILURE - WRITE_DMA status=
ad1: FAILURE - WRITE_DMA status=
ad1: FAILURE - WRITE_DMA status=
ad1: FAILURE - WRITE_DMA status=
ad1: FAILURE - WRITE_DMA status=
ad1: FAILURE - WRITE_DMA status=
Since I pulled the rug from under it (probably equivalent to unplugging a real hard drive if this was a physical machine), I would expect these kinda of DMA errors.
So, all in all, +1 on this PPA.
Garry Dolley, you hit another bug. As I understood, the problem is inside io. When your OS flusches dirty page all read requests are frozen. If in this time freebsd tries to read from disk via DMA-channel it fails to do that as kvm cant read from disk. So freebsd thinks that disk is broken and reports this ERROR.
Dustin Kirkland (kirkland) wrote : | #30 |
Can someone please try to reproduce this with the latest (rc3) packages in:
* https:/
I have backported the patch that Anthony suggested. Please let me know if this solves your problem.
:-Dustin
Garry Dolley (gdolley) wrote : | #31 |
exe, OK. Nevertheless, the 1:84+dfsg-
Dustin Kirkland (kirkland) wrote : | #32 |
Garry-
Could you please test the rc4 package in the PPA? I strongly believe that it should fix your segfault.
:-Dustin
Yann Hamon (yannh) wrote : | #33 |
I had the very same bug a few days ago with the kvm84 backport for hardy:
Jul 4 07:54:38 toulouse kernel: [4540400.485118] kvm[23661]:
segfault at 3e9d20a9 eip 080bcbd8 esp bf9a8774 error 4
It was happening after a while under heavy network load with virtio. The latest version of the package on the ubuntu-virt ppa seems to have fixed it - no segfault since, albeit loads and loads of traffic I made to test it.
Dustin Kirkland (kirkland) wrote : | #34 |
Okay, I just uploaded kvm_84+
Please test and give feedback here. If we can get verification of the fix, then we can see this pushed to jaunty-updates.
Thanks.
:-Dustin
Dustin Kirkland (kirkland) wrote : | #35 |
Uploaded kvm_84+
Please test that in proposed and leave feedback.
:-Dustin
Launchpad Janitor (janitor) wrote : | #36 |
This bug was fixed in the package kvm - 1:84+dfsg-0ubuntu15
---------------
kvm (1:84+dfsg-
* Cherry-pick qcow2 corruption patch from upstream git
- Fix-at-
* Cherry-pick dma error handling patch from upstream git, LP: #359447
- Fix-DMA-
* debian/control: depend on linux-server and linux-generic headers;
this may be a bit overkill, as you only need one of the two,
however, we don't know which one of the two until postinst;
because of this, we get *tons* of bug reports about kvm-source not
being able to build because of missing headers (even though we print
a helpful warning message in postinst), LP: #394953
* debian/
the new kvm module gets loaded and running
-- Dustin Kirkland <email address hidden> Tue, 07 Jul 2009 14:06:52 -0500
Changed in kvm (Ubuntu): | |
status: | Triaged → Fix Released |
Garry Dolley (gdolley) wrote : | #37 |
Dustin,
rc2 in the PPA (1:84+dfsg-
No segfaults in 3 days and I've applied some high disk I/O to the machine.
I can't test rc4 right away b/c rc2 is on a production machine. Once I can declare another maintenance window and reboot the machine, I'll try rc4. However, it seems the only difference between rc2 and rc4 are packaging related items, which I was able to walk through myself.
Dustin Kirkland (kirkland) wrote : | #38 |
Thanks Garry. You are correct about the differences between those two
packages. Thank you for the verification.
:-Dustin
Garry Dolley (gdolley) wrote : | #39 |
No problem! Thanks for cherry-picking those fixes and providing us a more stable kvm package.
Martin Pitt (pitti) wrote : | #40 |
Accepted kvm into jaunty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https:/
Garry Dolley (gdolley) wrote : | #41 |
I'm setting up a new jaunty box within the next few days. I will install the jaunty-proposed version of kvm and report the results.
Dustin Kirkland (kirkland) wrote : | #42 |
So there's quite a list of people on this thread reporting segfault problems.
Would one of those please test the package in jaunty-proposed?
:-Dustin
Dr_Knuth (dbareiro) wrote : | #43 |
Hi, Dustin.
I would like to test the backport of KVM-84 for Hardy Heron server amd64. As I commented above, I found that a KVM process of a VM with high rate of I/O terminated with segmentation fault in an installation of Hardy Heron server amd64 with KVM-62.
Adding the following lines to /etc/apt/
deb http://
deb-src http://
The unique reference that I found about KVM-84 is given by the following package:
# aptitude show kvm-source
Package: kvm-source
State: partially configured
Automatically installed: yes
Version: 1:84+dfsg-
Priority: opcional
Section: misc
Maintainer: Ubuntu Core Developers <email address hidden>
Uncompressed Size: 1655k
Depends: build-essential, bzip2, debhelper (>= 5), dkms, linux-headers, linux-headers-
Suggests: kernel-package
Description: Source for the KVM driver
This package provides the source code for the KVM kernel modules. The kvm package is also required in order to make use of
these modules. Kernel source or headers are required to compile these modules.
Not needed for Ubuntu systems.
This is the last version that would fix the bug? It would only be necessary to install this package and to compile?
Garry Dolley (gdolley) wrote : | #44 |
I enabled jaunty-proposed on a new jaunty box and upgraded kvm:
$ sudo apt-get install kvm
Reading package lists... Done
Building dependency tree
Reading state information... Done
Suggested packages:
ubuntu-vm-builder hal vde2 samba kvm-source kvm-pxe
The following packages will be upgraded:
kvm
1 upgraded, 0 newly installed, 0 to remove and 8 not upgraded.
Need to get 0B/1143kB of archives.
After this operation, 4096B disk space will be freed.
(Reading database ... 23790 files and directories currently installed.)
Preparing to replace kvm 1:84+dfsg-0ubuntu11 (using .../kvm_
Unpacking replacement kvm ...
Processing triggers for man-db ...
Setting up kvm (1:84+dfsg-
* Loading kvm module kvm_intel
...done.
$
The install, as you can see, went without a hitch.
Will report how the VMs handle when I have more data.
After migrating to kvm 85+dfsg-4 uptime of freebsd guest is constantly increasing(15days now). No problems occured in this period.
Garry Dolley (gdolley) wrote : | #46 |
Update:
No crashes on jaunty-proposed, although I have not stressed the box much yet.
No crashes on RC2 (1:84+dfsg-
tags: |
added: verification-done removed: verification-needed |
Launchpad Janitor (janitor) wrote : | #47 |
This bug was fixed in the package kvm - 1:84+dfsg-
---------------
kvm (1:84+dfsg-
* Cherry-pick qcow2 corruption patch from upstream git
- Fix-at-
LP: #392295
* Cherry-pick patch series from upstream to fix segfaults when
cancelling DMA operations in virtual machines. LP: #359447
* Cherry-pick dma error handling patch from upstream git, LP: #359447
- Fix-DMA-
* debian/control: depend on linux-server and linux-generic headers;
this may be a bit overkill, as you only need one of the two,
however, we don't know which one of the two until postinst;
because of this, we get *tons* of bug reports about kvm-source not
being able to build because of missing headers (even though we print
a helpful warning message in postinst), LP: #394953
* debian/
the new kvm module gets loaded and running
* debian/
LP: #382077
-- Dustin Kirkland <email address hidden> Tue, 07 Jul 2009 14:22:26 -0500
Changed in kvm (Ubuntu Jaunty): | |
status: | Fix Committed → Fix Released |
Dustin Kirkland (kirkland) wrote : | #48 |
This bug was fixed in kvm-84 as published to hardy-backports and intrepid-backports. If you're suffering from this bug, please try that package.
:-Dustin
Changed in kvm (Ubuntu Hardy): | |
milestone: | ubuntu-8.04.3 → none |
Changed in kvm (Ubuntu Intrepid): | |
milestone: | intrepid-updates → none |
Changed in kvm (Ubuntu Hardy): | |
status: | Triaged → Fix Released |
Changed in kvm (Ubuntu Intrepid): | |
status: | Triaged → Fix Released |
Kent Tong (kent-tong) wrote : | #49 |
Is it really fixed? I am running kvm 1:84+dfsg-
Sep 3 15:43:00 vm-01 kernel: [2395791.060789] kvm[19481]: segfault at 0 ip 0000000000479a20 sp 00007fffe4f56690 error 4 in kvm[400000+1e1000]
Sep 3 18:00:02 vm-01 kernel: [2404013.508648] kvm[4649]: segfault at 0 ip 0000000000479a20 sp 00007fffee56f070 error 4 in kvm[400000+1e1000]
Looks like a problem in curses.
Do you get the segfault if you leave off the -curses option?
:-Dustin