Heavy Network IO Crashed dnsmasq with no recovery
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
New
|
Undecided
|
Unassigned |
Bug Description
Nova: 2012.1+
I was running a load test against a 4 node Cassandra cluster in Openstack. I have separate tenancy for each node to ensure there was no funny contention. Running the test 3 times produced the same results each time.
About 1/3 of the way through the test, the dnsmasq process crashes (with no warning or error in any log). The instance will continue "working" but only inside of the VNC console as all outside connectivity is now unroutable.
Here is a log from the dnsmasq process. The first two rows show that dnsmasq was working, then it just fails to route correctly back to the instance.
<snip>
Aug 14 15:34:17 compute4 dnsmasq-dhcp[4146]: DHCPREQUEST(br300) 10.0.0.11 fa:16:3e:03:89:9d
Aug 14 15:34:17 compute4 dnsmasq-dhcp[4146]: DHCPACK(br300) 10.0.0.11 fa:16:3e:03:89:9d cass1
Aug 14 15:34:28 compute4 dnsmasq-dhcp[4146]: DHCPDISCOVER(br300) 10.0.0.11 fa:16:3e:03:89:9d
Aug 14 15:34:28 compute4 dnsmasq-dhcp[4146]: DHCPOFFER(br300) 10.0.0.11 fa:16:3e:03:89:9d
Aug 14 15:34:31 compute4 dnsmasq-dhcp[4146]: DHCPDISCOVER(br300) 10.0.0.11 fa:16:3e:03:89:9d
Aug 14 15:34:31 compute4 dnsmasq-dhcp[4146]: DHCPOFFER(br300) 10.0.0.11 fa:16:3e:03:89:9d
Aug 14 15:34:36 compute4 dnsmasq-dhcp[4146]: DHCPDISCOVER(br300) 10.0.0.11 fa:16:3e:03:89:9d
Aug 14 15:34:36 compute4 dnsmasq-dhcp[4146]: DHCPOFFER(br300) 10.0.0.11 fa:16:3e:03:89:9d
Aug 14 15:34:47 compute4 dnsmasq-dhcp[4146]: DHCPDISCOVER(br300) 10.0.0.11 fa:16:3e:03:89:9d
Aug 14 15:34:47 compute4 dnsmasq-dhcp[4146]: DHCPOFFER(br300) 10.0.0.11 fa:16:3e:03:89:9d
Aug 14 15:34:59 compute4 dnsmasq-dhcp[4146]: DHCPDISCOVER(br300) 10.0.0.11 fa:16:3e:03:89:9d
Aug 14 15:34:59 compute4 dnsmasq-dhcp[4146]: DHCPOFFER(br300) 10.0.0.11 fa:16:3e:03:89:9d
</snip>
Here is the running dnsmasq processes after the meltdown (of interesting note, 10.0.0.3 is NOT assigned to an instance, and in fact a different server has that IP for a different instance)
nobody 4146 1 0 Aug07 ? 00:00:02 /usr/sbin/dnsmasq --strict-order --bind-interfaces --conf-file= --domain=
root 4147 4146 0 Aug07 ? 00:00:01 /usr/sbin/dnsmasq --strict-order --bind-interfaces --conf-file= --domain=
Root bug I think: https:/ /bugs.launchpad .net/ubuntu/ +source/ libvirt/ +bug/997978/