Ubuntu

Running KVM guest causes kernel panic on host

Reported by SergeiFranco on 2011-05-04
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
qemu-kvm (Ubuntu)
Critical
Unassigned

Bug Description

Binary package hint: qemu-kvm

as per http://ubuntuforums.org/showthread.php?p=10766552#post10766552

Host machine Ubuntu Natty server 2.6.38-8-generic i686
Guest machine Ubuntu Natty server 2.6.38-8-generic i686

Panic occurs in range of minutes to hours of uptime. It is load independent.

SergeiFranco (sergei-franco) wrote :
SergeiFranco (sergei-franco) wrote :
Serge Hallyn (serge-hallyn) wrote :

Thanks for submitting this bug and helping to make Ubuntu better.

In order to give us some more info on your system, could you please run 'apport-collect 776936'?

I wonder whether disable KSM and memballoon might help. Could you disable memballoon by doing

   virsh edit deluge

and deleting the lines:

       <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </memballoon>

Then disable KSM by editing /etc/default/qemu-kvm and changing

  KSM_ENABLED=1
to
  KSM_ENABLED=0

then restarting qemu-kvm:

  restart qemu-kvm

Then see if the problem still reproduces.

SergeiFranco (sergei-franco) wrote :

I am trying to apport-collect 776936 and I get the text mode browser in terminal, I login and allow access, then what? Little bit confusing, and I could not find the non-disruptive way to exit the apport-collect except ctrl+z and kill. I am not sure if it even did anything.

Sorry about that. Could you try doing

   apport-cli qemu-kvm

Then when it asks

  What would you like to do? Your options are:
  S: Send report (813.3 KB)
  V: View report
  K: Keep report file for sending later or copying to somewhere else
  C: Cancel
  Please choose (S/V/K/C):

please choose K. Then attach the file into which it says it placed the
report to this bug (or simply reply to this email with that file
appended or attached).

SergeiFranco (sergei-franco) wrote :
SergeiFranco (sergei-franco) wrote :

I ran it while I was "watch 'dmesg | tail'" and caught glimpse of this:

[91476.783713] [<c142fdd0>] ? __netif_receive_skb+0x180/0x4e0
[91476.785510] [<c14301c2>] ? process_backlog+0x92/0x160
[91476.787300] [<c14319bd>] ? net_rx_action+0x10d/0x200
[91476.789093] [<c1056622>] ? __do_softirq+0x82/0x170
[91476.790892] [<c10565a0>] ? __do_softirq+0x0/0x170
[91476.792808] <IRQ> [<c1431d38>] ? netif_rx_ni+0x28/0x30
[91476.794925] [<c13987fd>] ? tun_chr_aio_write+0x23d/0x4b0
[91476.797021] [<c13985c0>] ? tun_chr_aio_write+0x0/0x4b0
[91476.799036] [<c1127676>] ? do_sync_readv_writev+0xa6/0xe0
[91476.800995] [<c11278a0>] ? do_readv_writev+0xa0/0x190

It still panics after I deleted membaloon (and redefined the config) and disabled the KSM.
I can only afford to experiment once a day, as I don't want for it to panic during resync for obvious reasons.

SergeiFranco (sergei-franco) wrote :

Highlights of this panic is:

"Kernel panic - not syncing: Fatal exception in interrupt"

Changed in qemu-kvm (Ubuntu):
importance: Undecided → Critical
Serge Hallyn (serge-hallyn) wrote :

(installing an i386 natty desktop in order to try to reproduce)

Could you post your host network configuration? Results of 'brctl show', 'ip link', 'ufw status' and 'iptables -L' and 'iptables -t nat -L' (all done as root or under sudo).

SergeiFranco (sergei-franco) wrote :

brctl show
bridge name bridge id STP enabled interfaces
br0 8000.00270e160ec4 no eth1

ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:27:0e:16:0e:c4 brd ff:ff:ff:ff:ff:ff
3: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 00:27:0e:16:0e:c4 brd ff:ff:ff:ff:ff:ff
6: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1000 qdisc pfifo_fast state UNKNOWN qlen 100
    link/ether 06:fc:dc:4a:a4:cb brd ff:ff:ff:ff:ff:ff

ufw status
Status: inactive

iptables -L
Chain INPUT (policy DROP)
target prot opt source destination
ACCEPT all -- anywhere anywhere
ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED
ACCEPT all -- 192.168.1.0/24 anywhere
ACCEPT tcp -- anywhere anywhere tcp dpt:ssh state NEW limit: avg 1/min burst 5
LOG tcp -- anywhere anywhere tcp dpt:ssh state NEW limit: avg 1/min burst 5 LOG level warning prefix `rate-limited SSH: '
REJECT tcp -- anywhere anywhere tcp dpt:ssh state NEW reject-with icmp-port-unreachable
DROP udp -- !192.168.1.0/24 anywhere udp dpt:domain

Chain FORWARD (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere
ACCEPT all -- anywhere anywhere
ACCEPT all -- 192.168.1.0/24 anywhere ctstate NEW
ACCEPT all -- 192.168.1.0/24 anywhere ctstate NEW
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

iptables -t nat -L
Chain PREROUTING (policy ACCEPT)
target prot opt source destination

Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
MASQUERADE all -- anywhere anywhere

Serge Hallyn (serge-hallyn) wrote :

I couldn't reproduce this with a simple kvm guest on i386 natty host (64-bit hardware). I'll try mimicking your networking setup, as one of you panics suggests a netfilter+bridge issue (though the other did not, so I'm not optimistic).

SergeiFranco (sergei-franco) wrote :

One thing I noticed that kernel panic comes much faster if there is network activity on the guest. Although I might be wrong...
Anything else I could do? I will try to run guest without any network activity and see if it is panics.

Serge Hallyn (serge-hallyn) wrote :

Quoting SergeiFranco (<email address hidden>):
> One thing I noticed that kernel panic comes much faster if there is network activity on the guest. Although I might be wrong...
> Anything else I could do? I will try to run guest without any network activity and see if it is panics.

If it could be shown that you can't reproduce either with a NATed bridge, or without the custom iptables rules, that would be very informative.

Serge Hallyn (serge-hallyn) wrote :

Can you append whatever script you use to set up the iptables rules, so I can as closely as possible emulate?

SergeiFranco (sergei-franco) wrote :

Hi, here is the iptables-save output:

*filter
:INPUT DROP [992:148145]
:FORWARD ACCEPT [24:1496]
:OUTPUT ACCEPT [215528:64537610]
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -s 192.168.1.0/24 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 22 -m state --state NEW -m limit --limit
1/min -j ACCEPT
-A INPUT -p tcp -m tcp --dport 22 -m state --state NEW -m limit --limit
1/min -j LOG --log-prefix "rate-limited SSH: "
-A INPUT -p tcp -m tcp --dport 22 -m state --state NEW -j REJECT
--reject-with icmp-port-unreachable
-A INPUT ! -s 192.168.1.0/24 -p udp -m udp --dport 53 -j DROP
-A FORWARD -i eth1 -o tun0 -j ACCEPT
-A FORWARD -i br0 -o tun0 -j ACCEPT
-A FORWARD -i eth1 -s 192.168.1.0/24 -m conntrack --ctstate NEW -j ACCEPT
-A FORWARD -i br0 -s 192.168.1.0/24 -m conntrack --ctstate NEW -j ACCEPT
-A FORWARD -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
COMMIT
*nat
:PREROUTING ACCEPT [1920:290675]
:POSTROUTING ACCEPT [16622:1247968]
:OUTPUT ACCEPT [16618:1247716]
-A POSTROUTING -j MASQUERADE
COMMIT
*mangle
:PREROUTING ACCEPT [171074:67106601]
:INPUT ACCEPT [169741:66910030]
:FORWARD ACCEPT [24:1496]
:OUTPUT ACCEPT [215528:64537610]
:POSTROUTING ACCEPT [215565:64541384]
COMMIT

On 11 May 2011 10:29, Serge Hallyn <email address hidden> wrote:

> Can you append whatever script you use to set up the iptables rules, so
> I can as closely as possible emulate?
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/776936
>
> Title:
> Running KVM guest causes kernel panic on host
>
> Status in “linux” package in Ubuntu:
> New
> Status in “qemu-kvm” package in Ubuntu:
> New
>
> Bug description:
> Binary package hint: qemu-kvm
>
> as per http://ubuntuforums.org/showthread.php?p=10766552#post10766552
>
> Host machine Ubuntu Natty server 2.6.38-8-generic i686
> Guest machine Ubuntu Natty server 2.6.38-8-generic i686
>
> Panic occurs in range of minutes to hours of uptime. It is load
> independent.
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/776936/+subscribe
>

SergeiFranco (sergei-franco) wrote :

Today I noticed that although I removed the membaloon it didn't stick (possibly due to panic not syncing the file).
I removed once again membaloon and started the guest with little bit better results.
The system does not freeze immediately, but it leaves a lot of "oopses" in the syslog/dmesg.
I am attaching these "oopses".

The highlight of the "oopses" is "BUG: unable to handle kernel paging request at".

I am going to try kvm in NAT mode (btw how do I do that?) and see if this is happening still.

SergeiFranco (sergei-franco) wrote :

In nat mode it does not panic.

Having virtual machine in nat mode is piss poor, especially when the network performance in nat mode is very unreliable (perhaps issue with MTU? apt-get install gets stuck on 0% [Waiting for headers], wget google.com also hangs, while I can ping everything fine), another issue is I have to port forward and deal will all associated nightmare. I am seriously considering moving to physical box, or ditch ubuntu, install debian and go with xen...

I need this box in bridge mode (it is a virtual server after all).

Serge Hallyn (serge-hallyn) wrote :

Quoting SergeiFranco (<email address hidden>):
> In nat mode it does not panic.

Thanks for that info.

Is that true even with memballoon enabled?

...

> I need this box in bridge mode (it is a virtual server after all).

Absolutely. This just gives us a better chance of tracking down the
bug(s).

SergeiFranco (sergei-franco) wrote :
Download full text (5.1 KiB)

Yes, membalooning is enabled right now:

<domain type='kvm'>
  <name>deluge</name>
  <uuid>de900b25-6dc3-cfd6-4fb1-55c6f9755b00</uuid>
  <memory>524288</memory>
  <currentMemory>524288</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type arch='i686' machine='pc-0.14'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/deluge.qcow2'/>
      <target dev='hda' bus='ide'/> ...

Read more...

SergeiFranco (sergei-franco) wrote :

This problem looks very similar to this: http://www.spinics.net/lists/linux-net/msg17690.html

Serge Hallyn (serge-hallyn) wrote :

Quoting SergeiFranco (<email address hidden>):
> This problem looks very similar to this: http://www.spinics.net/lists
> /linux-net/msg17690.html

Ah, thanks very much. That commit is in the natty-proposed kernel, but
not yet in the natty kernel. Can you test the natty-proposed kernel and
confirm whether that fixes the bug for you?

Adam Koczur (aerradon) wrote :

I have the same problem and kernel 2.6.38-9-server from natty-proposed does not fix anything. Once I start any kvm guest, the whole machine hangs after few seconds throwing kernel panic.

Serge Hallyn (serge-hallyn) wrote :

@Adam,

Yours may be the same bug, but may not. Please open a new bug with the information which I"d asked of Sergei, in particular your /proc/cpuinfo, uname -a, iptables -L, brctl show, and virsh dumpxml and release info for an affected guest, and any relevant log info from both guest and host.

SergeiFranco (sergei-franco) wrote :

Good news! kernel from proposed repositories works. It didn't crash overnight in bridge mode.

Serge Hallyn (serge-hallyn) wrote :

Great, thanks for the info, Sergei.

Changed in qemu-kvm (Ubuntu):
status: New → Invalid
Changed in linux (Ubuntu):
status: New → Fix Committed
importance: Undecided → High
Changed in linux (Ubuntu):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
Stefan Bader (smb) wrote :

Reading through comments again I am rather sure this should be in fixed now.

Changed in linux (Ubuntu):
assignee: Stefan Bader (stefan-bader-canonical) → nobody
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers