Comment 12 for bug 1003842

Revision history for this message
Simon Kelley (simon-thekelleys) wrote : Re: Precise NM with "dns=dnsmasq" breaks systems with non-equivalent upstream nameservers

>Simon Kelley might have written dnsmaskq with the assumption that all DNS servers upstream have the same view about the >namespace. However, this is not how RFC sees it nor how it is set up in a majority of installations.

Can you provide an authoritative reference for that?

As far as I can see, the "internal" DNS server can provide one of five different answers to a query (there are other possible answers, including delagations, but these are the five possible ones to a stub resolver which sets the RD bit in th query)

1) A valid answer
2) A NODATA answer asserting that the domain exists, but the domain has no information for the type (A, AAAA, MX..) queried.
3) A NXDOMAIN anser asserting that the domain does not exist.
4) No answer.
5) An error return code.

1) and 4) and 5) are not a probem, the next step is obvious.

the argument is what to do in 2) and 3), we can either accept the valid reply that comes from DNS server or we can try again witha another one. Dnsmasq does the former, and that, I assert, is the correct thing to do. I believe it's what the libc resolver does too.

Given the above, the only way to use an "internal" DNS server which knows about local records is to make sure it's always queried first: we can't sensibly send the query to the "external" server and then to the internal server when the external one says "don't know" since THERE IS NO VALID DON'T KNOW ANSWER. My comment about random failures due to UDP packet loss applies here, but if you want dnsmasq to work this way, there's a flag, --strict-order, which will do it.

Assume --strict-order. Since we've decide that the only time we're going to use a second nameserver is when the first one doesn't reply, this affects the timeliness of anwers, if you always send to the one nameserver first, the only circumstance you can use an answer from the second server if after the first one times out. The second server isn't very useful if using it makes all DNS queries take 2-3 seconds. One the other hand, if you arrange that all the servers are equivalent, you can keep a note of which ones are up, or even send the query to all the servers, use the first reply, and discard the rest. Dnsmasq uses both these techniques to improve resilience. If you have very flaky servers, you can even tell it to send every query to all the available servers.

Executive summary: non-equivalent servers are bad, but --strict-order will make things work, for the same value of "work" as the libc resolver). Non-equivalent servers are bad, so don't encourage their use by making --strict-order the default.

HTH

Simon.