libvirt dnsmasq causes runaway chain reaction

Bug #956655 reported by Martin Pool
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Invalid
High
Unassigned

Bug Description

I've repeatedly seen dnsmasq get into a situation where it repeatedly spams out many many queries. I'm not sure why; perhaps it's got into a state where it's sending queries to itself? I'm not precisely sure how to reproduce it but it's something to do with my machine suspending and resuming and/or bringing up/down wired and wireless interfaces.

partial strace:

select(76, [3 5 6 7 8 11 12 13 14 15 16 17 21 25 27 29 32 37 40 42 44 46 47 49 51 53 56 58 59 60 63 66 67 71 73 75], [], [], NULL) = 1 (in [6])
recvmsg(6, {msg_name(16)={sa_family=AF_INET, sin_port=htons(6667), sin_addr=inet_addr("192.168.178.23")}, msg_iov(1)=[{"\214(\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 47
sendto(16, " X\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 47, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 47
select(76, [3 5 6 7 8 11 12 13 14 15 16 17 21 25 27 29 32 37 40 42 44 46 47 49 51 53 56 58 59 60 63 66 67 71 73 75], [], [], NULL) = 1 (in [6])
recvmsg(6, {msg_name(16)={sa_family=AF_INET, sin_port=htons(6667), sin_addr=inet_addr("192.168.178.23")}, msg_iov(1)=[{"\214(\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 47
sendto(16, " X\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 47, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 47
select(76, [3 5 6 7 8 11 12 13 14 15 16 17 21 25 27 29 32 37 40 42 44 46 47 49 51 53 56 58 59 60 63 66 67 71 73 75], [], [], NULL) = 1 (in [6])
recvmsg(6, {msg_name(16)={sa_family=AF_INET, sin_port=htons(42547), sin_addr=inet_addr("192.168.178.23")}, msg_iov(1)=[{"\6\216\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 47
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 18
fcntl(18, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(18, F_SETFL, O_RDWR|O_NONBLOCK) = 0
bind(18, {sa_family=AF_INET, sin_port=htons(52043), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
sendto(18, "\27\376\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 47, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 47
select(76, [3 5 6 7 8 11 12 13 14 15 16 17 18 21 25 27 29 32 37 40 42 44 46 47 49 51 53 56 58 59 60 63 66 67 71 73 75], [], [], NULL) = 1 (in [6])
recvmsg(6, {msg_name(16)={sa_family=AF_INET, sin_port=htons(42547), sin_addr=inet_addr("192.168.178.23")}, msg_iov(1)=[{"\6\216\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 47
sendto(18, "\27\376\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 47, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 47
select(76, [3 5 6 7 8 11 12 13 14 15 16 17 18 21 25 27 29 32 37 40 42 44 46 47 49 51 53 56 58 59 60 63 66 67 71 73 75], [], [], NULL) = 1 (in [6])
recvmsg(6, {msg_name(16)={sa_family=AF_INET, sin_port=htons(42547), sin_addr=inet_addr("192.168.178.23")}, msg_iov(1)=[{"\6\216\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 47
sendto(18, "\27\376\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 47, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 47
select(76, [3 5 6 7 8 11 12 13 14 15 16 17 18 21 25 27 29 32 37 40 42 44 46 47 49 51 53 56 58 59 60 63 66 67 71 73 75], [], [], NULL) = 1 (in [6])
recvmsg(6, {msg_name(16)={sa_family=AF_INET, sin_port=htons(42547), sin_addr=inet_addr("192.168.178.23")}, msg_iov(1)=[{"\6\216\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 47
sendto(18, "\27\376\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 47, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 47
select(76, [3 5 6 7 8 11 12 13 14 15 16 17 18 21 25 27 29 32 37 40 42 44 46 47 49 51 53 56 58 59 60 63 66 67 71 73 75], [], [], NULL) = 1 (in [6])
recvmsg(6, {msg_name(16)={sa_family=AF_INET, sin_port=htons(42547), sin_addr=inet_addr("192.168.178.23")}, msg_iov(1)=[{"\6\216\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 47
sendto(18, "\27\376\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 47, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 47
select(76, [3 5 6 7 8 11 12 13 14 15 16 17 18 21 25 27 29 32 37 40 42 44 46 47 49 51 53 56 58 59 60 63 66 67 71 73 75], [], [], NULL) = 1 (in [6])
recvmsg(6, {msg_name(16)={sa_family=AF_INET, sin_port=htons(42547), sin_addr=inet_addr("192.168.178.23")}, msg_iov(1)=[{"\6\216\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 47
sendto(18, "\27\376\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 47, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 47
select(76, [3 5 6 7 8 11 12 13 14 15 16 17 18 21 25 27 29 32 37 40 42 44 46 47 49 51 53 56 58 59 60 63 66 67 71 73 75], [], [], NULL) = 1 (in [6])
recvmsg(6, {msg_name(16)={sa_family=AF_INET, sin_port=htons(42547), sin_addr=inet_addr("192.168.178.23")}, msg_iov(1)=[{"\6\216\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 47
sendto(18, "\27\376\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 47, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 47
select(76, [3 5 6 7 8 11 12 13 14 15 16 17 18 21 25 27 29 32 37 40 42 44 46 47 49 51 53 56 58 59 60 63 66 67 71 73 75], [], [], NULL) = 1 (in [6])
recvmsg(6, {msg_name(16)={sa_family=AF_INET, sin_port=htons(42547), sin_addr=inet_addr("192.168.178.23")}, msg_iov(1)=[{"\6\216\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 47
sendto(18, "\27\376\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 47, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 47
select(76, [3 5 6 7 8 11 12 13 14 15 16 17 18 21 25 27 29 32 37 40 42 44 46 47 49 51 53 56 58 59 60 63 66 67 71 73 75], [], [], NULL) = 1 (in [6])
recvmsg(6, {msg_name(16)={sa_family=AF_INET, sin_port=htons(44461), sin_addr=inet_addr("192.168.178.23")}, msg_iov(1)=[{"\237\30\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 47
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 19
fcntl(19, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(19, F_SETFL, O_RDWR|O_NONBLOCK) = 0
bind(19, {sa_family=AF_INET, sin_port=htons(58093), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
sendto(19, "SQ\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 47, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 47
select(76, [3 5 6 7 8 11 12 13 14 15 16 17 18 19 21 25 27 29 32 37 40 42 44 46 47 49 51 53 56 58 59 60 63 66 67 71 73 75], [], [], NULL) = 1 (in [6])
recvmsg(6, {msg_name(16)={sa_family=AF_INET, sin_port=htons(44461), sin_addr=inet_addr("192.168.178.23")}, msg_iov(1)=[{"\237\30\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 47
sendto(19, "SQ\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 47, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 47
select(76, [3 5 6 7 8 11 12 13 14 15 16 17 18 19 21 25 27 29 32 37 40 42 44 46 47 49 51 53 56 58 59 60 63 66 67 71 73 75], [], [], NULL) = 1 (in [6])
recvmsg(6, {msg_name(16)={sa_family=AF_INET, sin_port=htons(44461), sin_addr=inet_addr("192.168.178.23")}, msg_iov(1)=[{"\237\30\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 47
sendto(19, "SQ\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 47, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 47
select(76, [3 5 6 7 8 11 12 13 14 15 16 17 18 19 21 25 27 29 32 37 40 42 44 46 47 49 51 53 56 58 59 60 63 66 67 71 73 75], [], [], NULL) = 1 (in [6])
recvmsg(6, {msg_name(16)={sa_family=AF_INET, sin_port=htons(44461), sin_addr=inet_addr("192.168.178.23")}, msg_iov(1)=[{"\237\30\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 47
sendto(19, "SQ\1\0\0\1\0\0\0\0\0\0\10accounts\6google\3com"..., 47, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 47
select(76, [3 5 6 7 8 11 12 13 14 15 16 17 18 19 21 25 27 29 32 37 40 42 44 46 47 49 51 53 56 58 59 60 63 66 67 71 73 75], [], [], NULL) = 1 (in [6])

Revision history for this message
Martin Pool (mbp) wrote :

So my machine had, every time I booted up, several instances:

mbp@joy% ps wwwax -o pid,ppid,cmd |grep dnsm
 1752 1 dnsmasq -u lxc-dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/lxc/dnsmasq.pid --conf-file= --listen-address 10.0.3.1 --dhcp-range 10.0.3.2,10.0.3.254 --dhcp-lease-max=253 --dhcp-no-override --except-interface=lo --interface=lxcbr0
 1779 1 /usr/sbin/dnsmasq -u libvirt-dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/default.pid --conf-file= --except-interface lo --listen-address 192.168.122.1 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override
 2780 1385 /usr/sbin/dnsmasq --no-resolv --keep-in-foreground --no-hosts --bind-interfaces --pid-file=/var/run/sendsigs.omit.d/network-manager.dnsmasq.pid --listen-address=127.0.0.1 --conf-file=/var/run/nm-dns-dnsmasq.conf --cache-size=0 --proxy-dnssec
 3242 3179 grep --color=auto dnsm
mbp@joy% ps 1385
  PID TTY STAT TIME COMMAND
 1385 ? Ssl 0:00 NetworkManager

and it looks like 1752 and 2780 were furiously talking to each other, and also flooding the network. 1752 was sending to 127.0.0.1:53 and 2780 received that message and echoed it back to 10.0.3.1 plus my external DNS server:

2780 recvmsg(4, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5269), sin_addr=inet_addr("127.0.0.1")}, msg_iov(1)=[{"\252\305\1\0\0\1\0\0\0\0\0\0\4talk\6google\3com\5fr
i"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 43
2780 sendto(12, "\270\25\1\0\0\1\0\0\0\0\0\0\4talk\6google\3com\5fri"..., 43, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.3.1")}, 16) = 43
2780 sendto(12, "\270\25\1\0\0\1\0\0\0\0\0\0\4talk\6google\3com\5fri"..., 43, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.178.1")}, 16) = 43
2780 sendto(12, "\270\25\1\0\0\1\0\0\0\0\0\0\4talk\6google\3com\5fri"..., 43, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.3.1")}, 16) = 43
2780 sendto(12, "\270\25\1\0\0\1\0\0\0\0\0\0\4talk\6google\3com\5fri"..., 43, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.178.1")}, 16) = 43
2780 select(19, [3 4 5 6 9 10 11 12 14 15 17 18], [], [], NULL) = 1 (in [4])

summary: - dnsmasq spams out thousands of queries
+ libvirt dnsmasq causes runaway chain reaction
Revision history for this message
Martin Pool (mbp) wrote :

mbp@joy% cat /var/run/nm-dns-dnsmasq.conf
server=10.0.3.1
server=192.168.178.1
server=10.0.3.1
server=192.168.178.1

I think this is an interaction between a few things that are themselves reasonable:

 - network-manager depends on dnsmasq, and makes it the default nameserver in /etc/resolv.conf
 - lxc also sets up a dnsmasq for the guest, forwarding to the host
 - lxc(?) inserts into dhclient.conf
   ./dhcp/dhclient.conf:prepend domain-name-servers 10.0.3.1;
 - so the host ends up querying the guest, which loops back to the host

This might be just something in my own set up, and obviously I can avoid it here, but it does look like it could bite others.

Revision history for this message
Martin Pool (mbp) wrote :

removing lxc has avoided the problem.

Revision history for this message
Martin Pool (mbp) wrote :

ok, after discussion on irc, apparently it's launchpad's setuplxc that's inserting the "conf:prepend domain-name-servers 10.0.3.1;" line.

I'm not sure if this problem could still be hit without it.

Revision history for this message
Martin Pool (mbp) wrote :

see bug 936817 for a broadly similar problem of it modifying the host os.

Revision history for this message
Martin Pool (mbp) wrote :

It's arguably a dnsmasq/lxc problem that this can occur, but I think the most relevant problem is Launchpad's setuplxc changing my dhclient configuration.

I guess this was done so that the host could see launchpad.dev or whatever.

But:

 - changing basic networking of the host without warning people seems rude to say the least
 - it screws up my whole network

affects: dnsmasq (Ubuntu) → launchpad
Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 956655] Re: libvirt dnsmasq causes runaway chain reaction

Quoting Martin Pool (<email address hidden>):
> - lxc(?) inserts into dhclient.conf
> ./dhcp/dhclient.conf:prepend domain-name-servers 10.0.3.1;
> - so the host ends up querying the guest, which loops back to the host
>
> This might be just something in my own set up, and obviously I can avoid
> it here, but it does look like it could bite others.

Hi Martin,

thanks for looking into this. While the package currently isn't inserting
that dhclient.conf entry, we are, in the lxc server guide, recommending
adding that entry to /etc/resolvconf/resolv.conf.d/head. So if it is
causing a problem, it would be good to figure out why!

(see lp:~serge-hallyn/serverguide/serverguide-lxc, the lxc section under
virtualization)

Changed in launchpad:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Martin Pool (mbp) wrote :

> So if it is causing a problem, it would be good to figure out why!

I think the situation is:

 * host is configured to query the 'real' dns servers (in my case, my router) plus the lxc guest
 * guest is configured to query the host

So, when the loop occurs, they both query each other back and forth, at the same time as querying the external server as fast as they can. In my case that makes the router fall over.

One question is, why doesn't this happen every time? Perhaps the loop will terminate if the external DNS server responds fast enough that the host dnsmasq can query the response. But, it seems like at least a dangerous setup.

If this is correct it seems it would be good to
- teach dnsmasq never to send a query back to the originating client (does it really do this??), and also
- arrange the servers in an acyclic graph

Revision history for this message
Robert Collins (lifeless) wrote :

On Mon, Mar 26, 2012 at 8:35 PM, Martin Pool <email address hidden> wrote:
>>  So if it is causing a problem, it would be good to figure out why!
>
> I think the situation is:
>
>  * host is configured to query the 'real' dns servers (in my case, my router) plus the lxc guest
>  * guest is configured to query the host

It shouldn't ever be configured to talk to the lxc guest; libvirt runs
a dnsmasq on the bridge interface, which gets all the containers (and
vm's) started registered with it.

So, perhaps you have something unusual in your libvirt setup that
triggered the code to configure things wrongly.

-Rob

Revision history for this message
Martin Pool (mbp) wrote :

I think setuplxc configures the host to talk to the guest. If that
never normally happens, perhaps this bug is only against lp.

Revision history for this message
Robert Collins (lifeless) wrote :

On Tue, Mar 27, 2012 at 8:53 PM, Martin Pool <email address hidden> wrote:
> I think setuplxc configures the host to talk to the guest.  If that
> never normally happens, perhaps this bug is only against lp.

It configures it to talk to the *host side address of the libvirt
bridge*. Or that is the intent. setuplxc is not intended to configure
resolve.conf to talk to guest ip addresses.

-Rob

William Grant (wgrant)
Changed in launchpad:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.