Comment 5 for bug 1006898

Revision history for this message
Chuck Short (zulcss) wrote :

** Issue **

There is an issue with the way nova uses dnsmasq in VLAN mode. It starts
up a single copy of dnsmasq for each vlan on the network host (or on
every host in multi_host mode). The problem is in the way that dnsmasq
binds to an ip address and port[2]. Both copies can respond to broadcast
packet, but unicast packets can only be answered by one of the copies.

In nova this means that guests from only one project will get responses
to their unicast dhcp renew requests. Unicast projects from guests in
other projects get ignored. What happens next is different depending on
the guest os. Linux generally will send a broadcast packet out after
the unicast fails, and so the only effect is a small (tens of ms) hiccup
while interface is reconfigured. It can be much worse than that,
however. I have seen cases where Windows just gives up and ends up with
a non-configured interface.

This bug was first noticed by some users of openstack who rolled their
own fix. Basically, on linux, if you set the SO_BINDTODEVICE socket
option, it will allow different daemons to share the port and respond to
unicast packets, as long as they listen on different interfaces. I
managed to communicate with Simon Kelley, the maintainer of dnsmasq and
he has integrated a fix[3] for the issue in the current version[1] of
dnsmaq.

[3] http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=9380ba70d67db6b69f817d8e318de5ba1e990b12

** Development Fix **

This has been fixed in quantal with the newer version of dnmasq.

** Stable Fix **

I have backported the patch which fixes this issue, I have attached the debdiff and the buildlog.

** Test Case **

1. Install openstack with vlan mode.
2. Watch instances loose their IP addresses.

** Regression Potential **

Minimal, most installations dont use this type of networking.