Comment 15 for bug 1314697

Revision history for this message
Simon Kelley (simon-thekelleys) wrote : Re: [Bug 1314697] Re: DNS resolution no longer works; dnsmasq uses 100% CPU

Annoyingly, I still can't reproduce this on the systems I have
available. On a system where the problem occurs, can it be reproduced
when dnsmasq is started standalone with the same command-line
parameters? The idea situation would be to get the bug to show up in a
dnsmasq instance running under gdb.

The strace gives lots of valuable information:

during inialisation in network.c, create_bound_listeners() calls
create_listeners() once. The creates a UDP socket and a TCP socket each
bound to 127.0.1.1:53, with file-descriptors 4 and 5. Those file
descriptors are stored in a struct listener object which will be the
only one in a chain pointed to by daemon->listeners (or
dnsmasq_daemon->listeners, in gdb). The file descriptors are stored in
the ->fd and ->tcpfd fields of the struct.

By the time dnsmasq gets to the select loop in dnsmasq.c, those two
fields have somehow been zeroed - that's enough to exactly match what's
in the strace. dnsmasq selects for read events on fd 0 instead of fd 4
and fd 5 and when select says that reading is OK, is makes the syscalls
is would make for fd 4 (recvfrom) and fd 5 (accept) but to fd 0 instead.

If someone can reproduce this in gdb I suggest doing the following.

1) Set a breakpoint in create_bound_listeners() and trace through until
dnsmasq_daemon->listeners->fd and dnsmasq_daemon->listeners->tcpfd have
the correct values (4 and 5)

2) Set watchpoints on those two expressions, and then continue
execution. Gdb should then tell us where those locations are being
overwritten.

Cheers,

Simon.