Comment 33 for bug 94940

ed_p (edpizzi) wrote :

After some stracing and tcpdumping, it looks like the changed behavior here is that when mdns gets an NXDomain response, it retries up to 5 seconds, then reports a "timeout" to the requesting client, rather than immediately reporting that the record doesn't exist.

Is there a reason why requests that get NXDomain responses are retried? I can't think of a situation where that would be what you'd want, but maybe I'm missing something.

Trace excerpts are below.

Disabling mdns, request is over in 13 ms, and we do not retry (stracing sshd, following forks):

[pid 7728] 02:10:20.474565 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4
[pid 7728] 02:10:20.474649 connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("68.87.76.178")}, 28) = 0
[pid 7728] 02:10:20.474745 fcntl64(4, F_GETFL) = 0x2 (flags O_RDWR)
[pid 7728] 02:10:20.474806 fcntl64(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 7728] 02:10:20.474870 gettimeofday({1199614220, 474899}, NULL) = 0
[pid 7728] 02:10:20.474943 poll([{fd=4, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
[pid 7728] 02:10:20.475030 send(4, "\214\222\1\0\0\1\0\0\0\0\0\0\0011\0010\003168\003192\7"..., 42, MSG_NOSIGNAL) = 42
[pid 7728] 02:10:20.475158 poll([{fd=4, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
[pid 7728] 02:10:20.488319 ioctl(4, FIONREAD, [42]) = 0
[pid 7728] 02:10:20.488425 recvfrom(4, "\214\222\201\203\0\1\0\0\0\0\0\0\0011\0010\003168\0031"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("68.87.76.178")}, [16]) = 42
[pid 7728] 02:10:20.488640 close(4) = 0

There are no further communications with the dns server in the trace. (It's not real clear here, but the IP being looked up is 192.168.0.1.)

With mdns enabled, we retry several times (stracing avahi-daemon). I've annotated it with shell-style comments, since it's much longer.

# avahi-daemon gets the RESOLVE-ADDRESS command from sshd over its socket
02:23:43.930581 poll([{fd=7, events=POLLIN}, {fd=3, events=POLLIN, revents=POLLIN}, {fd=16, events=POLLIN}, {fd=15, events=POLLIN}, {fd=14, events=POLLIN}, {fd=13, events=POLLIN}, {fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=9, events=POLLIN}], 9, 2196610) = 1 02:23:43.930711 gettimeofday({1199615023, 930740}, NULL) = 0
02:23:43.930778 read(3, "RESOLVE-ADDRESS 192.168.0.1\n", 20480) = 28
# (snip)
# request #1
02:23:44.035615 sendmsg(14, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5353), sin_addr=inet_addr("224.0.0.251")}, msg_iov(1)=[{"\0\0\0\0\0\1\0\0\0\0\0\0\0011\0010\003168\003192\7in-a"..., 42}], msg_controllen=24, {cmsg_len=24, cmsg_level=SOL_IP, cmsg_type=, ...}, msg_flags=0}, 0) = 42
# (snip)
02:23:44.036015 poll([{fd=7, events=POLLIN}, {fd=3, events=POLLIN}, {fd=16, events=POLLIN}, {fd=15, events=POLLIN}, {fd=14, events=POLLIN, revents=POLLIN}, {fd=13, events=POLLIN}, {fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=9, events=POLLIN}], 9, 100) = 1
02:23:44.036144 gettimeofday({1199615024, 36173}, NULL) = 0
02:23:44.036212 ioctl(14, FIONREAD, [42]) = 0
02:23:44.036307 recvmsg(14, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5353), sin_addr=inet_addr("192.168.0.50")}, msg_iov(1)=[{"\0\0\0\0\0\1\0\0\0\0\0\0\0011\0010\003168\003192\7in-a"..., 42}], msg_controllen=40, {cmsg_len=24, cmsg_level=SOL_IP, cmsg_type=, ...}, msg_flags=0}, 0) = 42
# (snip)
# request #2, 1 second after first request
02:23:45.039623 sendmsg(14, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5353), sin_addr=inet_addr("224.0.0.251")}, msg_iov(1)=[{"\0\0\0\0\0\1\0\0\0\0\0\0\0011\0010\003168\003192\7in-a"..., 42}], msg_controllen=24, {cmsg_len=24, cmsg_level=SOL_IP, cmsg_type=, ...}, msg_flags=0}, 0) = 42
02:23:45.039807 write(8, "W", 1) = 1
02:23:45.039899 read(7, "WW", 10) = 2
02:23:45.039965 gettimeofday({1199615025, 39994}, NULL) = 0
02:23:45.040032 poll([{fd=7, events=POLLIN}, {fd=3, events=POLLIN}, {fd=16, events=POLLIN}, {fd=15, events=POLLIN}, {fd=14, events=POLLIN, revents=POLLIN}, {fd=13, events=POLLIN}, {fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=9, events=POLLIN}], 9, 100) = 1
02:23:45.040162 gettimeofday({1199615025, 40190}, NULL) = 0 02:23:45.040229 ioctl(14, FIONREAD, [42]) = 0
02:23:45.040307 recvmsg(14, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5353), sin_addr=inet_addr("192.168.0.50")}, msg_iov(1)=[{"\0\0\0\0\0\1\0\0\0\0\0\0\0011\0010\003168\003192\7in-a"..., 42}], msg_controllen=40, {cmsg_len=24, cmsg_level=SOL_IP, cmsg_type=, ...}, msg_flags=0}, 0) = 42
# (snip)
# request #2, 3 seconds after first request
02:23:47.043610 sendmsg(14, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5353), sin_addr=inet_addr("224.0.0.251")}, msg_iov(1)=[{"\0\0\0\0\0\1\0\0\0\0\0\0\0011\0010\003168\003192\7in-a"..., 42}], msg_controllen=24, {cmsg_len=24, cmsg_level=SOL_IP, cmsg_type=, ...}, msg_flags=0}, 0) = 42
02:23:47.043793 write(8, "W", 1) = 1
02:23:47.043860 read(7, "WW", 10) = 2 02:23:47.043923 gettimeofday({1199615027, 43952}, NULL) = 0
02:23:47.044021 poll([{fd=7, events=POLLIN}, {fd=3, events=POLLIN}, {fd=16, events=POLLIN}, {fd=15, events=POLLIN}, {fd=14, events=POLLIN, revents=POLLIN}, {fd=13, events=POLLIN}, {fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=9, events=POLLIN}], 9, 100) = 1
02:23:47.044153 gettimeofday({1199615027, 44183}, NULL) = 0 02:23:47.044221 ioctl(14, FIONREAD, [42]) = 0
02:23:47.044299 recvmsg(14, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5353), sin_addr=inet_addr("192.168.0.50")}, msg_iov(1)=[{"\0\0\0\0\0\1\0\0\0\0\0\0\0011\0010\003168\003192\7in-a"..., 42}], msg_controllen=40, {cmsg_len=24, cmsg_level=SOL_IP, cmsg_type=, ...}, msg_flags=0}, 0) = 42
# (snip)
# Finally (after 2 more seconds of waiting, we tell sshd that we got a
# "timeout". This seems wrong -- we got almost immediate responses
# from the DNS server, so "timeout" is not a reasonable response here.
# The IP in question is fundamentally unresolveable to a domain name.
# Note that the IP here was 192.168.0.1 02:23:48.931682 gettimeofday({1199615028, 931711}, NULL) = 0
02:23:48.931748 poll([{fd=7, events=POLLIN}, {fd=3, events=POLLOUT, revents=POLLOUT}, {fd=16, events=POLLIN}, {fd=15, events=POLLIN}, {fd=14, events=POLLIN}, {fd=13, events=POLLIN}, {fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=9, events=POLLIN}], 9, 2009) = 1
02:23:48.931877 gettimeofday({1199615028, 931906}, NULL) = 0
02:23:48.931944 write(3, "-15 Timeout reached\n", 20) = 20
02:23:48.935147 write(8, "W", 1) = 1
02:23:48.935590 close(3) = 0