KVM guests networking issues with no virbr0 and with vhost_net kernel modules loaded

Bug #1029430 reported by Sergio Rubio
52
This bug affects 10 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Adam Gandelman
Folsom
Fix Released
Medium
Adam Gandelman
Ubuntu Cloud Archive
Fix Released
Undecided
Unassigned
libvirt (Ubuntu)
Won't Fix
Low
Unassigned
nova (Ubuntu)
Fix Released
High
Unassigned
Quantal
Fix Released
Undecided
Unassigned

Bug Description

We've found that having vhost_net module and using bridged networking breaks DHCP and (some?) guests do not get an IP address.

The issue has been properly described in the following RH doc:

http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization_Host_Configuration_and_Guest_Installation_Guide/ch11s02.html

"11.2.1. Checksum correction for older DHCP clients"

Their workaround works in Precise too in fact.

Another workaround is to disable/unload the vhost_net module so new guests do not make use of it.

RH has fixed this stuff in libvirt apparently:

"This iptables rule is programmed automatically on the host when the server is started by libvirt, so no further action is required"

My apologies if this stuff does not belong to libvirt.

Some info from our environment:

compute-002:~# lsb_release -rd
Description: Ubuntu 12.04 LTS
Release: 12.04

compute-002:~# uname -a
Linux compute-002 3.2.0-27-generic #43-Ubuntu SMP Fri Jul 6 14:25:57 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

compute-002:~# dpkg -l|egrep "kvm|libvirt|dhcp|dns"
ii dnsmasq-base 2.59-4 Small caching DNS proxy and DHCP/TFTP server
ii dnsmasq-utils 2.59-4 Utilities for manipulating DHCP leases
ii dnsutils 1:9.8.1.dfsg.P1-4ubuntu0.1 Clients provided with BIND
ii isc-dhcp-client 4.1.ESV-R4-0ubuntu5.1 ISC DHCP client
ii isc-dhcp-common 4.1.ESV-R4-0ubuntu5.1 common files used by all the isc-dhcp* packages
ii kvm-ipxe 1.0.0+git-3.55f6c88-0ubuntu1 PXE ROM's for KVM
ii libdns81 1:9.8.1.dfsg.P1-4ubuntu0.1 DNS Shared Library used by BIND
ii libnet-dns-perl 0.66-2ubuntu3 Perform DNS queries from a Perl script
ii libvirt-bin 0.9.8-2ubuntu17.2 programs for the libvirt library
ii libvirt0 0.9.8-2ubuntu17.2 library for interfacing with different virtualization systems
ii munin-libvirt-plugins 0.0.6-1 Munin plugins using libvirt
ii nova-compute-kvm 2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - compute node (KVM)
ii python-libvirt 0.9.8-2ubuntu17.2 libvirt Python bindings
ii qemu-kvm 1.0+noroms-0ubuntu14 Full virtualization on i386 and amd64 hardware

We've also tested this with kernel 3.5 backport from Quantal:

Linux compute-002 3.5.0-6-generic #6~precise1-Ubuntu SMP Tue Jul 24 14:45:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

The guest is running Debian Squeeze:

debian-squeeze-amd64-ext3:~$ uname -a
Linux debian-squeeze-amd64-ext3 2.6.32-5-amd64 #1 SMP Mon Jan 16 16:22:28 UTC 2012 x86_64 GNU/Linux

debian-squeeze-amd64-ext3:~$ dpkg -l|grep dhcp
ii isc-dhcp-client 4.1.1-P1-15+squeeze3 ISC DHCP client
ii isc-dhcp-common 4.1.1-P1-15+squeeze3 common files used by all the isc-dhcp* packages

Revision history for this message
Sergio Rubio (rubiojr) wrote :
description: updated
description: updated
summary: - KVM guests networking issues with bridge and vhost_net loaded
+ KVM guests networking issues when bridge and vhost_net modules loaded
summary: - KVM guests networking issues when bridge and vhost_net modules loaded
+ KVM guests networking issues when bridge and vhost_net kernel modules
+ loaded
Revision history for this message
Sergio Rubio (rubiojr) wrote : Re: KVM guests networking issues when bridge and vhost_net kernel modules loaded

Forgot to mention that the workaround isn't required when using openvswitch with the brcompat module since everything works as expected.

description: updated
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for submitting this bug.

The redhat page you linked suggests that the checksum-fill iptables rule should solve the problem. When I start a 12.04 or 12.10 server, sudo iptables -L -t mangle shows

CHECKSUM udp -- anywhere anywhere udp dpt:bootpc CHECKSUM fill

Is that rule not present on your systems? If so, have you removed the virbr0 NATed bridge? When I remove that from autostart and reboot, I do not see the rule.

Assuming I understand this right, does that mean we should have the libvirt-bin upstart job always unconditionally add that rule?

Changed in libvirt (Ubuntu):
importance: Undecided → High
status: New → Incomplete
Revision history for this message
Sergio Rubio (rubiojr) wrote :

Thanks Serge,

Honestly, I did not check if the rule was present in a vanilla install, I'm sorry. We're now investigating if we have a broken install in this regard, since we have nova-network managing rules and stuff there I'm not sure at this point who's messing with libvirt default rules.

Thank you for pointing us in the right direction.

Besides that, I'm curious about the openvswitch brcompat module and why we don't need that rule with it. Any idea?

Thank you.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

I'm not sure why openvswitch would not need the rule - since the problem is really on the dhcp client - unless it always adds the checksum.

Changed in libvirt (Ubuntu):
status: Incomplete → Triaged
importance: High → Medium
importance: Medium → Low
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

(dropped priority as there is a workaround)

Revision history for this message
Sergio Rubio (rubiojr) wrote :

I can confirm that we removed "/etc/libvirt/qemu/networks/autostart/default.xml" in the past, probably trying to avoid conflicts with our current nova-network setup.

We've been running openvswitch-brcompat for a while, where the problem is apparently not present, so we didn't notice until now.

I'll test our stack with libvirt's default network enabled, and resort to adding the rule to rc.local or libvirt-bin upstart job if we can't work the issues out.

Thanks a bunch.

Revision history for this message
chuanyu (x77126) wrote :

Here is my compute worker's iptables rule (Ubuntu 12.04):

$ sudo iptables -S -t mangle
-A POSTROUTING -o virbr0 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill

And I change it to:
-A POSTROUTING -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill

than my ubuntu guest can get dhcp ip correctly.

summary: - KVM guests networking issues when bridge and vhost_net kernel modules
- loaded
+ KVM guests networking issues with no virbr0 and with vhost_net kernel
+ modules loaded
Revision history for this message
Cirroz (pomozoff-gmail) wrote :

> And I change it to:
> -A POSTROUTING -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill

my debian shows me:

# iptables -t mangle -S -A POSTROUTING -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill
iptables v1.4.8: Cannot use -A with -E

Try `iptables -h' or 'iptables --help' for more information.

I don't understand him :(

Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Tagging this as affecting nova, as it manages its own out-of-band bridge and iptables rules for its libvirt guests. These rules should probably be setup by nova-network (and potentially quantum?)

Changed in nova (Ubuntu):
status: New → Confirmed
importance: Undecided → High
tags: added: folsom-rc-potential
Changed in nova:
status: New → Triaged
importance: Undecided → Medium
tags: added: folsom-backport-potential
removed: folsom-rc-potential
Jian Wen (wenjianhn)
Changed in nova:
assignee: nobody → Jian Wen (wenjianhn)
Changed in nova (Ubuntu):
assignee: nobody → Jian Wen (wenjianhn)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/18336

Changed in nova:
assignee: Jian Wen (wenjianhn) → Adam Gandelman (gandelman-a)
status: Triaged → In Progress
Jian Wen (wenjianhn)
Changed in nova (Ubuntu):
assignee: Jian Wen (wenjianhn) → nobody
Revision history for this message
Jian Wen (wenjianhn) wrote :

I can't assign the bug to you. :(
It says "No items matched "gandelman-a"" and "Adam Gandelman ".

I was fixing this bug yesterday.
Already 3 bugs I was going to fix, but fixed by the other guys in the end.
Glad to see we are closing bugs :)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/18336
Committed: http://github.com/openstack/nova/commit/901a3dacb6f2d36cbe8d23707dba75452e91df33
Submitter: Jenkins
Branch: master

commit 901a3dacb6f2d36cbe8d23707dba75452e91df33
Author: Adam Gandelman <email address hidden>
Date: Tue Dec 18 09:50:46 2012 -0800

    Add an iptables mangle rule per-bridge for DHCP.

    When vhost-net is present on a host, and DHCP services are
    run on the same system as guests (multi_host), an iptables
    rule is needed to fill packet checksums. This adds a rule
    per-bridge for multi_host networks when vhost-net is present,
    similar to how newer versions of libvirt handle the issue for
    bridges/networks that it manages.

    Fixes LP: #1029430

    EDIT: Updated tests and pep8.

    Change-Id: I1a51c1d808fa47a77e713dbfe384ffad183d6031

Changed in nova:
status: In Progress → Fix Committed
tags: removed: folsom-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/folsom)

Fix proposed to branch: stable/folsom
Review: https://review.openstack.org/18450

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/folsom)

Reviewed: https://review.openstack.org/18450
Committed: http://github.com/openstack/nova/commit/4bfc8f1165b05c2cc7c5506641b9b85fa8e1e144
Submitter: Jenkins
Branch: stable/folsom

commit 4bfc8f1165b05c2cc7c5506641b9b85fa8e1e144
Author: Adam Gandelman <email address hidden>
Date: Tue Dec 18 09:50:46 2012 -0800

    Add an iptables mangle rule per-bridge for DHCP.

    When vhost-net is present on a host, and DHCP services are
    run on the same system as guests (multi_host), an iptables
    rule is needed to fill packet checksums. This adds a rule
    per-bridge for multi_host networks when vhost-net is present,
    similar to how newer versions of libvirt handle the issue for
    bridges/networks that it manages.

    Fixes LP: #1029430

    EDIT: Updated tests and pep8.

    (Backported from commit 901a3dacb6f2d36cbe8d23707dba75452e91df33)

    Change-Id: I1a51c1d808fa47a77e713dbfe384ffad183d6031

Thierry Carrez (ttx)
Changed in nova:
milestone: none → grizzly-2
status: Fix Committed → Fix Released
Mark McLoughlin (markmc)
Changed in nova:
milestone: grizzly-2 → 2012.2.3
status: Fix Released → Fix Committed
Mark McLoughlin (markmc)
Changed in nova:
milestone: 2012.2.3 → grizzly-2
status: Fix Committed → Fix Released
Changed in nova (Ubuntu):
status: Confirmed → Fix Released
Changed in cloud-archive:
status: New → Confirmed
Changed in nova (Ubuntu Quantal):
status: New → Confirmed
Revision history for this message
Scott Moser (smoser) wrote :

For reference, it seems like devstack running on 3.3 or later kernel will see this. I found it running on quantal:

<adam_g> smoser: devstack doesn't configure its networks as multi_host. my fix only addresses the issue for multi_host networks, where its assumed the dhcp server is alwasy running on the same host as compute
<adam_g> smoser: devstack that just happens to be the case, but its not multihost
<adam_g> smoser: anyway, the workaround is to jus rmmod vhost_net or add the iptables mangle rule described in that bug

So specifically for devstack:
 rmmod vhost_net
or
    [ -e /dev/vhost-net ] &&
     sudo iptables -t mangle -A POSTROUTING -o br100 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill

Revision history for this message
Clint Byrum (clint-fewbar) wrote : Please test proposed package

Hello Sergio, or anyone else affected,

Accepted nova into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/nova/2012.2.3-0ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in nova (Ubuntu Quantal):
status: Confirmed → Fix Committed
tags: added: verification-needed
Revision history for this message
Alex Vitola (vitola) wrote :

I had the same problem, i do the update the kernel to 3.2.0-38-generic and now this working again

Ubuntu 12.04 LTS 64Bits

Feb 28 08:23:16 os0 dnsmasq-dhcp[2092]: DHCPREQUEST(br100) 172.16.88.229 fa:16:3e:44:90:46
Feb 28 08:23:16 os0 dnsmasq-dhcp[2092]: DHCPACK(br100) 172.16.88.229 fa:16:3e:44:90:46 00000021

nova-api=2012.2.1+stable-20121212-a99a802e-0ubuntu1.1~cloud0
nova-cert=2012.2.1+stable-20121212-a99a802e-0ubuntu1.1~cloud0
nova-common=2012.2.1+stable-20121212-a99a802e-0ubuntu1.1~cloud0
nova-compute=2012.2.1+stable-20121212-a99a802e-0ubuntu1.1~cloud0
nova-compute-kvm=2012.2.1+stable-20121212-a99a802e-0ubuntu1.1~cloud0
nova-consoleauth=2012.2.1+stable-20121212-a99a802e-0ubuntu1.1~cloud0
nova-network=2012.2.1+stable-20121212-a99a802e-0ubuntu1.1~cloud0
nova-novncproxy=2012.2.1+stable-20121212-a99a802e-0ubuntu1.1~cloud0
nova-scheduler=2012.2.1+stable-20121212-a99a802e-0ubuntu1.1~cloud0
python-nova=2012.2.1+stable-20121212-a99a802e-0ubuntu1.1~cloud0
python-novaclient=1:2.9.0-0ubuntu1~cloud0

Alex Vitola (vitola)
Changed in nova (Ubuntu):
assignee: nobody → Alex Vitola (vitola)
assignee: Alex Vitola (vitola) → nobody
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

My understanding is that this relates to non-libvirt-maintained bridges, so marking wontfix for libvirt.

If we wanted a generic way to address this, perhaps /etc/init/qemu-kvm.conf, if it is modprobing vhost_net, should add the iptables rule if not already present?

no longer affects: libvirt (Ubuntu Quantal)
Changed in libvirt (Ubuntu):
status: Triaged → Won't Fix
Revision history for this message
Lorin Hochstein (lorinh) wrote :

I'm running into an issue with similar symptoms on precise, but I don't have the vhost_net kernel module loaded on my compute nodes, and adding in the iptables rule doesn't seem to help. I'm at a loss as to how to try to debug this, since DHCP leases work when I set:

libvirt_use_virtio_for bridges=false

But they fail if I set it to true. I've tested with quantal and raring guests, and both fail to get an IP via DHCP if the above flag is true, but they succeed if above flag is false.

Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Lorin- What is the kernel version?

Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-2 → 2013.1
Revision history for this message
Dave Walker (davewalker) wrote :

Hello Sergio, or anyone else affected,

Accepted nova into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/nova/2012.2.3-0ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Adam Gandelman (gandelman-a) wrote : Verification report.

Please find the attached test log from the Ubuntu Server Team's CI infrastructure. As part of the verification process for this bug, the OpenStack components have been deployed and configured across multiple nodes using quantal-proposed as an installation source. After successful bring-up and configuration of the cluster, a number of exercises and smoke tests have be invoked to ensure the updated package did not introduce any regressions. A number of test iterations were carried out to catch any possible transient errors.

These proposed packages were deployed and tested in several different configurations. Attached are tarballs with various test logs from each configuration. In addition to the base components, variables in deployments include:

quantal_folsom.tar: nova-network (FlatDHCP), glance (Ceph backend), cinder (Ceph backend),
quantal_folsom_nova-volume.tar: nova-network (FlatDHCP), glance (local file), nova-volume (iSCSI backend)
quantal_folsom_quantum.tar: quantum (OVS plugin), glance (Ceph backend), nova-volume (Ceph backend)

Please note the versions_tested file in each tarball, which contains details about relevant package versions installed and tested.

For records of upstream test coverage of this update, please see the Jenkins links in the comments of the relevant upstream code-review(s):

Trunk review: https://review.openstack.org/18336
Stable review: https://review.openstack.org/18450

As per the provisional Micro Release Exception granted to this package by the Technical Board, we hope this contributes toward verification of this update.

Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Test coverage log.

Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Test coverage log.

Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Test coverage log.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nova - 2012.2.3-0ubuntu2

---------------
nova (2012.2.3-0ubuntu2) quantal-proposed; urgency=low

  * Re-sync with latest security updates.
  * SECURITY UPDATE: fix denial of service via fixed IPs when using extensions
    - debian/patches/CVE-2013-1838.patch: add explicit quota for fixed IP
    - CVE-2013-1838
  * SECURITY UPDATE: fix VNC token validation
    - debian/patches/CVE-2013-0335.patch: force console auth service to flush
      all tokens associated with an instance when it is deleted
    - CVE-2013-0335
  * SECURITY UPDATE: fix denial of service
    - CVE-2013-1664.patch: Add a new utils.safe_minidom_parse_string function
      and update external API facing Nova modules to use it
    - CVE-2013-1664
 -- James Page <email address hidden> Fri, 22 Mar 2013 12:40:07 +0000

Changed in nova (Ubuntu Quantal):
status: Fix Committed → Fix Released
tags: added: cloud-archive
Changed in cloud-archive:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.