Heavy Network IO Crashed dnsmasq with no recovery

Bug #1037065 reported by Thomas Vachon
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
New
Undecided
Unassigned

Bug Description

Nova: 2012.1+stable~20120612-3ee026e-0ubuntu1.2

I was running a load test against a 4 node Cassandra cluster in Openstack. I have separate tenancy for each node to ensure there was no funny contention. Running the test 3 times produced the same results each time.

About 1/3 of the way through the test, the dnsmasq process crashes (with no warning or error in any log). The instance will continue "working" but only inside of the VNC console as all outside connectivity is now unroutable.

Here is a log from the dnsmasq process. The first two rows show that dnsmasq was working, then it just fails to route correctly back to the instance.

<snip>
Aug 14 15:34:17 compute4 dnsmasq-dhcp[4146]: DHCPREQUEST(br300) 10.0.0.11 fa:16:3e:03:89:9d
Aug 14 15:34:17 compute4 dnsmasq-dhcp[4146]: DHCPACK(br300) 10.0.0.11 fa:16:3e:03:89:9d cass1
Aug 14 15:34:28 compute4 dnsmasq-dhcp[4146]: DHCPDISCOVER(br300) 10.0.0.11 fa:16:3e:03:89:9d
Aug 14 15:34:28 compute4 dnsmasq-dhcp[4146]: DHCPOFFER(br300) 10.0.0.11 fa:16:3e:03:89:9d
Aug 14 15:34:31 compute4 dnsmasq-dhcp[4146]: DHCPDISCOVER(br300) 10.0.0.11 fa:16:3e:03:89:9d
Aug 14 15:34:31 compute4 dnsmasq-dhcp[4146]: DHCPOFFER(br300) 10.0.0.11 fa:16:3e:03:89:9d
Aug 14 15:34:36 compute4 dnsmasq-dhcp[4146]: DHCPDISCOVER(br300) 10.0.0.11 fa:16:3e:03:89:9d
Aug 14 15:34:36 compute4 dnsmasq-dhcp[4146]: DHCPOFFER(br300) 10.0.0.11 fa:16:3e:03:89:9d
Aug 14 15:34:47 compute4 dnsmasq-dhcp[4146]: DHCPDISCOVER(br300) 10.0.0.11 fa:16:3e:03:89:9d
Aug 14 15:34:47 compute4 dnsmasq-dhcp[4146]: DHCPOFFER(br300) 10.0.0.11 fa:16:3e:03:89:9d
Aug 14 15:34:59 compute4 dnsmasq-dhcp[4146]: DHCPDISCOVER(br300) 10.0.0.11 fa:16:3e:03:89:9d
Aug 14 15:34:59 compute4 dnsmasq-dhcp[4146]: DHCPOFFER(br300) 10.0.0.11 fa:16:3e:03:89:9d
</snip>

Here is the running dnsmasq processes after the meltdown (of interesting note, 10.0.0.3 is NOT assigned to an instance, and in fact a different server has that IP for a different instance)

nobody 4146 1 0 Aug07 ? 00:00:02 /usr/sbin/dnsmasq --strict-order --bind-interfaces --conf-file= --domain=ewr.domain.com --pid-file=/var/lib/nova/networks/nova-br300.pid --listen-address=10.0.0.5 --except-interface=lo --dhcp-range=10.0.0.3,static,120s --dhcp-lease-max=256 --dhcp-hostsfile=/var/lib/nova/networks/nova-br300.conf --dhcp-script=/usr/bin/nova-dhcpbridge --leasefile-ro
root 4147 4146 0 Aug07 ? 00:00:01 /usr/sbin/dnsmasq --strict-order --bind-interfaces --conf-file= --domain=ewr.domain.com --pid-file=/var/lib/nova/networks/nova-br300.pid --listen-address=10.0.0.5 --except-interface=lo --dhcp-range=10.0.0.3,static,120s --dhcp-lease-max=256 --dhcp-hostsfile=/var/lib/nova/networks/nova-br300.conf --dhcp-script=/usr/bin/nova-dhcpbridge --leasefile-ro

Revision history for this message
Thomas Vachon (vachon) wrote :
Revision history for this message
Mark McLoughlin (markmc) wrote :

Thanks for finding the root cause, marking this bug as a duplicate of the libvirt bug

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.