Ubuntu

dnsmasq sometimes fails to resolve private names in networks with non-equivalent nameservers

Reported by Thomas Hood on 2012-05-24
182
This bug affects 37 people
Affects Status Importance Assigned to Milestone
dnsmasq (Debian)
New
Unknown
dnsmasq (Ubuntu)
Medium
Unassigned
Precise
Medium
Unassigned
network-manager (Ubuntu)
Medium
Mathieu Trudel-Lapierre
Precise
Medium
Mathieu Trudel-Lapierre

Bug Description

A number of reports already filed against network-manager seem to reflect this problem, but to make things very clear I am opening a new report. Where appropriate I will mark other reports as duplicates of this one.

Consider a pre-Precise system with the following /etc/resolv.conf:

    nameserver 192.168.0.1
    nameserver 8.8.8.8

The first address is the address of a nameserver on the LAN that can resolve both private and public domain names. The second address is the address of a nameserver on the Internet that can resolve only public names.

This setup works fine because the GNU resolver always tries the first-listed address first.

Now the administrator upgrades to Precise and instead of writing the above to resolv.conf, NetworkManager writes

    server=192.168.0.1
    server=8.8.8.8

to /var/run/nm-dns-dnsmasq.conf and "nameserver 127.0.0.1" to resolv.conf. Resolution of private domain names is now broken because dnsmasq treats the two upstream nameservers as equals and uses the faster one, which could be 8.8.8.8.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in network-manager (Ubuntu):
status: New → Confirmed
Thomas Hood (jdthood) on 2012-05-24
summary: - Enabling dnsmasq by default breaks systems with non-equivalent upstream
- nameservers
+ Upgrading to Precise NM with "dns=dnsmasq" breaks systems with non-
+ equivalent upstream nameservers
Scott Moser (smoser) wrote :

I think the most common case for this is a VPN as likely after you've vpn'd in somewhere, those dns servers have additional (local) results, that even possibly differ from external results. The other case is described in bug 993794. Although, to be honest, I'm not really sure what the benefit of dhcp servers on the same network giving 2 dns servers with different information available. I'm not exactly sure what expected behavior would be there.

There is upstream discussion on this at http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2009q3/003295.html .

One potential solution for this is to use:
 --server=/example.com/1.2.3.4
which would send all dns lookups for 'example.com' to 1.2.3.4.

Note also per /usr/share/doc/dnsmasq-base/examples/dnsmasq.conf.example:
  # Example of routing PTR queries to nameservers: this will send all
  # address->name queries for 192.168.3/24 to nameserver 10.1.2.3
  #server=/3.168.192.in-addr.arpa/10.1.2.3

At one point in the past I had a solution for using resolvconf to manage dnsmasq on connection to a vpn using vpnc. I've described that at [2]. I'm not sure whether the code there still works or not. Perhaps a similar approach could be used by network manager.

--
[1] http://seife.kernalert.de/blog/2010/06/22/nifty-dnsmasq-trick-reverse-lookup-using-a-specific-server/
[2] http://smoser.brickies.net/git/?p=att-resolvconf.git;a=blob;f=README;h=f2eff389131f46d8bf7b6b805f4395d89187cd1d;hb=HEAD

Thomas Hood (jdthood) wrote :

> I'm not really sure what the benefit of dhcp servers on
> the same network giving 2 dns servers with different
> information available.
> I'm not exactly sure what expected behavior would be there.

It's not the best way to configure DNS on a network. However, Ubuntu users don't always have control over the networks to which they want to connect.

Apparently Windows and Ubuntu before Precise behave well under the circumstances in question, at least in the sense that they can always resolve names.

Thomas Hood (jdthood) on 2012-05-25
summary: - Upgrading to Precise NM with "dns=dnsmasq" breaks systems with non-
- equivalent upstream nameservers
+ Precise NM with "dns=dnsmasq" breaks systems with non-equivalent
+ upstream nameservers

In the past it has been noticed that dnsmasq does not try the nameservers one after the other as some resolver libraries do (including the GNU libc resolver(3)). People have asked if dnsmasq can be enhanced to exhibit the one-after-the-other behavior. But dnsmasq's author, Simon Kelley, writes (http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2011q2/005060.html):
> [T]he idea of searching a set of servers in a particular order is problematic.
>
> Assume you have two servers, one of which knows about some domains
> but the other does not. You query the "special" server first so that it can
> tell you about those domains. But DNS uses UDP, which is an unreliable
> transport, so at random, the queries to the special server might get
> lost, and then the queries will get answered from the second server, and
> randomly your extra domains get lost. Good luck diagnosing the problem.

This critique pertains to the aforementioned resolver libraries, too, of course.

From this we can infer that the networks with non-equivalent nameservers are badly configured.

Simon Kelley continues:
> Dnsmasq is written with the strong assumption that all "normal" upstream
> servers have the same view of the DNS. You can redirect queries for some
> domains to other servers like this
>
> server=/example.com/1.2.3.4
>
> and *.example.com will go to the special server and only the special
> server

He explains further at http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2009q3/003295.html

Given that such misconfigured networks exist, however, how should Ubuntu help users to deal with them?

* Should "dns=dnsmasq" be optional, not the default?
* Should there be an easy way of disabling "dns=dnsmasq"?
* Would it be possible for Ubuntu automatically to detect nonhomogeneous sets of nameservers and to turn off "dns=dnsmasq" in the event that such a set is detected?

Sergio Callegari (callegar) wrote :

1) Searching servers in order is IMHO not as problematic as the author of dnsmasq suggests. If an udp packet gets lost, a name does not get resolved, because you may switch to the following nameserver. Yet, it is sufficient to retry the operation to have a good chance of success. Which is exactly the behavior that you get with the libc resolv.

2) An alternative would be not to search sequentially, but to keep asking the other nameservers, in case the first that answers fails resolution.

pdf (pdffs) wrote :

@callegar - that's all well and good, and I agree, but this is unlikely to get solved in dnsmasq (though that would be ideal).

@jdthood:
1. Would have been the sane option for an LTS release (and server installs should use traditional resolv.conf model, if that's not the case)
2. Well, commenting the line is not too bad, except that other resolvconf bugs mean that doing so actually results in no name resolution at all
3. I'd suggest that's very hard to impossible

Wolf Rogner (war-rsb) wrote :

Simon Kelley might have written dnsmaskq with the assumption that all DNS servers upstream have the same view about the namespace. However, this is not how RFC sees it nor how it is set up in a majority of installations.

Consider a small installation where the main server also serves DHCP address leases and has to maintain DNS names. All names end in an internal domain name domain.intern (like MS SBS not only recommends but enforces). The DNS server is set up to forward unresolvable requests to the upstream DNS server. Clearly the upstream DNS has no clue about domain.intern addresses.

Why not set up DNS to use the internal server to serve all requests and forward what cannot be resolved? Quite simple - speed and resillience.

If the internal server cannot answer the DNS request, the client redirects the query to the next server. This eliminates the internal server as a resolution bottleneck and allows the clients to continue with a basic set of functionality in case of a server outage (planned or not).

So now you easily have two separate DNS servers, on dealing internal requests and one external.

In our case, the router binds DNS so that it can forward DNS requests. This gives us extra resilience in case of a DNS server outage with our main provider. We use the router to forward to a different DNS server.

Server down -> No mail, no fileservices, no printing services, no database BUT Internet access still works
Router down -> OK, we have a problem
Upstream DNS down -> No problem at all

Also our internal DNS server serves requests to our own external domain as well as some others. So it definitely does not have the same view as upstream DNS. None of this violates RFC definitions.

@callegar: I agree with you: If dnsmask handles DNS resolution it MUST resolve the issue. That means either Kelley adapts his position or dnsmask has to go.

@jdthood: As a user I wasn't asking for dnsmask. It was chosen to improve DNS resolution. Which it does not. In a LTS release this is pretty hard (it is pretty hard in any release). Good design not only suggests but enforces that if a technology substitutes a predicessor it is required to provide a fallback in case of error.

Commenting dns=dnsmask in /etc/NetworkManager/NetworkManager.conf is a workaround but certainly not a solution.

Finding an automatism to resolve different DNS resolution paths would be the responsibility of the programmer in my (simplified) view.

Thomas Hood (jdthood) wrote :

Sergio in #6:
> Yet, it is sufficient to retry the operation to have a good chance of success.

How does an application know that it should retry? It has received an authoritative answer on the first try that the name does not exist.

> An alternative would be not to search sequentially, but to keep asking the other nameservers

You mean send duplicate requests out every single time?

Thomas Hood (jdthood) wrote :

Pdffs in #7:
> Well, commenting the line is not too bad, except that other resolvconf bugs mean that doing so actually results in no name resolution at all

I haven't seen any such bug reported against resolvconf. What are you talking about?

pdf (pdffs) wrote :

@jdthood - I mentioned that resolv.conf was missing in #1000244, and since it wasn't being populated correctly, so if it's not being populated, and you disable dnsmasq, that would result in no resolution...

>Simon Kelley might have written dnsmaskq with the assumption that all DNS servers upstream have the same view about the >namespace. However, this is not how RFC sees it nor how it is set up in a majority of installations.

Can you provide an authoritative reference for that?

As far as I can see, the "internal" DNS server can provide one of five different answers to a query (there are other possible answers, including delagations, but these are the five possible ones to a stub resolver which sets the RD bit in th query)

1) A valid answer
2) A NODATA answer asserting that the domain exists, but the domain has no information for the type (A, AAAA, MX..) queried.
3) A NXDOMAIN anser asserting that the domain does not exist.
4) No answer.
5) An error return code.

1) and 4) and 5) are not a probem, the next step is obvious.

the argument is what to do in 2) and 3), we can either accept the valid reply that comes from DNS server or we can try again witha another one. Dnsmasq does the former, and that, I assert, is the correct thing to do. I believe it's what the libc resolver does too.

Given the above, the only way to use an "internal" DNS server which knows about local records is to make sure it's always queried first: we can't sensibly send the query to the "external" server and then to the internal server when the external one says "don't know" since THERE IS NO VALID DON'T KNOW ANSWER. My comment about random failures due to UDP packet loss applies here, but if you want dnsmasq to work this way, there's a flag, --strict-order, which will do it.

Assume --strict-order. Since we've decide that the only time we're going to use a second nameserver is when the first one doesn't reply, this affects the timeliness of anwers, if you always send to the one nameserver first, the only circumstance you can use an answer from the second server if after the first one times out. The second server isn't very useful if using it makes all DNS queries take 2-3 seconds. One the other hand, if you arrange that all the servers are equivalent, you can keep a note of which ones are up, or even send the query to all the servers, use the first reply, and discard the rest. Dnsmasq uses both these techniques to improve resilience. If you have very flaky servers, you can even tell it to send every query to all the available servers.

Executive summary: non-equivalent servers are bad, but --strict-order will make things work, for the same value of "work" as the libc resolver). Non-equivalent servers are bad, so don't encourage their use by making --strict-order the default.

HTH

Simon.

On 27/05/12 17:58, Simon Kelley wrote:
> Executive summary: non-equivalent servers are bad, but --strict-order
> will make things work, for the same value of "work" as the libc
> resolver). Non-equivalent servers are bad, so don't encourage their
> use by making --strict-order the default.

To be frank, when changing the default system resolver, expected
behavior should be the default. It's all well and good saying that
non-equivalent resolvers are 'bad' - and in the case of dnsmasq, that
might be true - but that's a value judgement that shouldn't have a place
in this scenario, since users haven't made the choice to enable dnsmasq,
and so shouldn't have to be aware of the caveats (ie - "My DNS worked
fine before upgrade").

> To be frank, when changing the default system resolver, expected
> behavior should be the default. It's all well and good saying that
> non-equivalent resolvers are 'bad' - and in the case of dnsmasq, that
> might be true - but that's a value judgement that shouldn't have a place
> in this scenario, since users haven't made the choice to enable dnsmasq,
> and so shouldn't have to be aware of the caveats (ie - "My DNS worked
> fine before upgrade").

It's a delicate trade-off, made all the more ironic by the fact that without dnsmasq, a second DNS server is practically useless (because you have to wait for the time-out on the first for each and every query). The easiest solution is just to delete the second nameserver.

All I can say is that dnsmasq provides both modes, and the default is what it is for the reasons I outlined. If Ubuntu as a distro wants to flip the default, they can. My personal opinion is that it would be a mistake, but that's just my opinion.

Question: does networkmanager's GUI expose the option to divert particular domains to a special nameserver? That's an alternative correct way to achieve layering local names over the global DNS.

Cheers,

Simon.

Sergio Callegari (callegar) wrote :

To Thomas in #9

>> Yet, it is sufficient to retry the operation to have a good chance of success.

> How does an application know that it should retry? It has received an authoritative answer on the first try that the name does not exist.

Most things that are critical to operation get retried even automatically, no matter how authoritative the answer. It only depends on the trade off. If the trade off is to stay off the network at all, I guess the operation will first or later be retried.
With sequential action, retry may likely succeed. With the current dnsmasq behavior, retry will likely not succeed and if you are behind an authorization portal you gonna stay off the network forever. Anyway this is the behavior that all the world has had with libc resolv up to yesterday, so it shouldn't look that weird.

The alternative appears to me as saying that the way in which nameservers are used together with intranets is completely broken.

>> An alternative would be not to search sequentially, but to keep asking the other nameservers

> You mean send duplicate requests out every single time?

From what I understand, dnsmasq already asks all the servers concurrently. Meaning that if you have 3 nameservers, triple requests are sent. However, dnsmasq listens to the first answer only. While I suggest that if the first answer is "no, I do not know this host", some time is given to see if other answers come in too. Which may mean waiting a little, but not every single time. Just when you receive an answer like "I do not know this host" or when a name is really unknown to anybody. In common scenarios, maybe 3% of the time. Seems a tolerable price to pay for accessing nameservers in a non sequential way.

Sergio Callegari (callegar) wrote :

To simon, in #14.

> Question: does networkmanager's GUI expose the option to divert particular domains to a special nameserver? That's an
> alternative correct way to achieve layering local names over the global DNS.

IMHO, this cannot be use to solve all the current problems. The issue came out because many, many network access authorization portals are intranet hosts which can be resolved only by the intranet nameserver. There are tons of these around the world and everybody who goes around with a laptop encounters them all the time. When you get to an authorization portal, you may not know what the domain is. You only know what dhcp tells you.

So, either you implement some smart heuristics (e.g., if the first nameserver is on an intranet ip address, try that sequentially), or you stop trusting an authoritative "I do not know that host" and require a confirmation, or you use the sequential access that is typical of the resolv library and of MS windows, because in the context of authorization portals, non-equivalent servers exist and we have little power to change this as long as they work with MS windows and the MAC, so IMHO we need to workaround.

It is unfortunate, but having a "sane" or "insane" behavior in some linux distro will anyway have negligible effect on the spread of non-equivalent servers. It will not discourage or encourage the use of non-equivalent servers, as most people will anyway take what OSes that have majority market share as the de facto standard and only test against that.

Thomas Hood (jdthood) wrote :

When a decision was made to introduce dnsmasq I doubt that anyone fully realized that this would impair name resolution on systems connected to networks with nonequivalent nameservers ("bad" networks). Dnsmasq was introduced and works well most of the time. For those for whom it does not work well solutions need to be found.

Ideal would be automagical detection of and adaptation to bad neworks by dnsmasq. It might work like this. On encountering NODATA or NXDOMAIN, dnsmasq reiterates the query to all nameservers listed earlier than the one that answered. If one of those nameservers returns an address then dnsmasq uses that answer and switches to strict-order mode until the next change in the nameserver address list.

And NetworkManager should, as Simon indicates, offer a way to restrict certain domain lookups to certain nameservers. A user on a bad network who has configured this correctly will avoid triggering dnsmasq's (global) strict-order fallback behavior.
--
Thomas

Thomas in #17

A heuristic for this is difficult, because you have to prove a negative. If we can assume the first nameserver has local addresses, we can never return a reply from any other nameserver until we have the reply from the first one, in case the first one has different data. Once we see different data from different nameservers, we can go to --strict-order mode, but the opposite is not true: the same answer for a particular query doesn't guarantee that the answers to future queries will always agree. There's no way to be sure that the nameservers are equivalent based on the history of returned queries. Unless we can assume that, we always need to wait for the first nameserver to reply (or a timeout) and have to stay in --strict-order mode forever.

There is one possibility, which is to assume that nameservers are equivalent, but switch to --strict-order mode if conflicting replies are seen. When a query is forwarded to all available servers, and the first reply sent back to the original requestor, keep the record of the reply (at least, a bit indicating NODATA/NXDOMAIN or a valid reply. If another reply comes in later from another nameserver which conflicts, then switch to --strict-order mode. This will not get the first queries right, but it will be triggered eventually (and it might be triggered, swicthing mode forever, by random server glitches)

For a single-host cache, --strict-order might be the simplest fix......

Simon.

My idea for a heuristic was indeed extremely simple. In case the first name server has a non public ip address, auto switch to strict order.

Il giorno 27/mag/2012, alle ore 23:04, Simon Kelley <email address hidden> ha scritto:

> Thomas in #17
>
> A heuristic for this is difficult, because you have to prove a negative.
> If we can assume the first nameserver has local addresses, we can never
> return a reply from any other nameserver until we have the reply from
> the first one, in case the first one has different data. Once we see
> different data from different nameservers, we can go to --strict-order
> mode, but the opposite is not true: the same answer for a particular
> query doesn't guarantee that the answers to future queries will always
> agree. There's no way to be sure that the nameservers are equivalent
> based on the history of returned queries. Unless we can assume that, we
> always need to wait for the first nameserver to reply (or a timeout) and
> have to stay in --strict-order mode forever.
>
> There is one possibility, which is to assume that nameservers are
> equivalent, but switch to --strict-order mode if conflicting replies are
> seen. When a query is forwarded to all available servers, and the first
> reply sent back to the original requestor, keep the record of the reply
> (at least, a bit indicating NODATA/NXDOMAIN or a valid reply. If another
> reply comes in later from another nameserver which conflicts, then
> switch to --strict-order mode. This will not get the first queries
> right, but it will be triggered eventually (and it might be triggered,
> swicthing mode forever, by random server glitches)
>
> For a single-host cache, --strict-order might be the simplest fix......
>
> Simon.
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (993794).
> https://bugs.launchpad.net/bugs/1003842
>
> Title:
> Precise NM with "dns=dnsmasq" breaks systems with non-equivalent
> upstream nameservers
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1003842/+subscriptions

pdf (pdffs) wrote :

On 28/05/12 08:05, Sergio Callegari wrote:
> My idea for a heuristic was indeed extremely simple. In case the first
> name server has a non public ip address, auto switch to strict order.

That may work in many scenarios, but not if the address happens to be
routable, and only until IPv6 is prevalent.

Simon in #18:

> Once we see different data from different nameservers,
> we can go to --strict-order mode, but the opposite is not
> true: the same answer for a particular query doesn't
> guarantee that the answers to future queries will always agree.
> There's no way to be sure that the nameservers are equivalent
> based on the history of returned queries. Unless we can assume
> that, we always need to wait for the first nameserver to reply
> (or a timeout) and have to stay in --strict-order mode forever.

Yes, but it's not so bad to stay in strict-order mode forever^Wuntil the list of nameserver addresses changes. The admin can take action to prevent dnsmasq from entering that mode, e.g., by configuring dnsmasq to direct certain lookups (e.g., of *.internal) to the appropriate nameservers.

> There is one possibility, which is to assume that nameservers
> are equivalent, but switch to --strict-order mode if conflicting
> replies are seen. When a query is forwarded to all available
> servers, and the first reply sent back to the original requestor,
> keep the record of the reply (at least, a bit indicating
> NODATA/NXDOMAIN or a valid reply. If another reply comes
> in later from another nameserver which conflicts, then switch
> to --strict-order mode.

Simon, your suggestion (call it "#18") differs from the suggestion in #17 in two ways. First, #18 sends the first-received reply back to the client without waiting for the results of comparison with other results whereas #17 does wait. Second, #18 switches to strict-order mode when *any* difference is found, whereas #17 proposed only looking for a particular pattern, that being: a NODATA/NXDOMAIN is received from a nameserver that is not listed first and an earlier-listed nameserver does return an address within the standard libc timeout period. In #17's defence... in #17 the client only has to wait for a reply in the case of a NODATA/NXDOMAIN from a non-first nameserver; the client does get the desired address from the earlier-listed nameserver if there is one --- even the first time; and dnsmasq only drops into strict-order mode under the circumstances when it is necessary for it to do so such that clients get needed addresses. There is no point, for example, in dropping into strict-order mode if it's the first nameserver returning NXDOMAIN and a later-listed nameserver returning an address!

What do you think about the possibility of implementing such ideas?

Thomas Hood (jdthood) wrote :

Pdf in #7:
> Well, commenting the line is not too bad, except that
> other resolvconf bugs mean that doing so actually
> results in no name resolution at all

Pdf in #11:
> I mentioned that resolv.conf was missing in #1000244,
> and since it wasn't being populated correctly, so if it's
> not being populated, and you disable dnsmasq, that
> would result in no resolution...

We aren't getting duplicates of #1000244 so it's probably not a frequently occurring problem. If resolvconf were systematically failing to create the resolv.conf symlink we'd be getting hundreds of reports about it. Based on what we know now it's most probable that #1000244 was a result of some sort of administrator error.

The version of resolvconf that was included in Precise wasn't perfect but the bugs were fairly minor and the known ones have since been fixed.

On 29/05/12 07:06, Thomas Hood wrote:
> We aren't getting duplicates of #1000244 so it's probably not a
> frequently occurring problem. If resolvconf were systematically
> failing to create the resolv.conf symlink we'd be getting hundreds of
> reports about it. Based on what we know now it's most probable that
> #1000244 was a result of some sort of administrator error. The version
> of resolvconf that was included in Precise wasn't perfect but the bugs
> were fairly minor and the known ones have since been fixed.

I'm certain that it was not an 'administration error' - I discovered
that my networking was broken within an hour of clean install. Others
at my office have complained of search domains not working, which smells
of resolvconf being broken (my understanding is that resolvconf is
responsible for populating search domains in this new resolution chain).

I also don't know how you can release a system resolver implementation
that "wasn't perfect", or even close to it (when the existing one wasn't
broken and the new implementation adds serious complexity for
questionable gain), particularly in an LTS release, but suggesting that
'the users broke it' isn't going to fix things.

In any case, I'm still subscribed to the other bug, so happy for any
continued discussion of that particular issue to happen there.

pdf: It sounds to me as if you are using the wrong GNU/Linux distribution. You demand a perfect one, whereas Ubuntu is not perfect. Perhaps you should switch to one of the perfect distributions?

If "search domains are not working" for your colleagues then please ask them to look in /etc/resolv.conf to see if the "search" line in there is correct. If it is not correct then please file an informative bug report against resolvconf. If it is correct then please file an informative bug report against network-manager (if that's what they are using).

Regarding issue #1000244, let's carry on talking about that in #1000244.

This report (#1003842) is about dnsmasq and the problem of its not being suitable for networks with nonequivalent nameservers. This is, in my opinion, an important problem. I look forward to further constructive discussion of ways to solve it.

> Simon, your suggestion (call it "#18") differs from the suggestion in #17 in two ways. First, #18 sends the first-received reply back
> to the client without waiting for the results of comparison with other results whereas #17 does wait. Second, #18 switches to
> strict-order mode when *any* difference is found, whereas #17 proposed only looking for a particular pattern, that being: a
> NODATA/NXDOMAIN is received from a nameserver that is not listed first and an earlier-listed nameserver does return an address > within the standard libc timeout period. In #17's defence... in #17 the client only has to wait for a reply in the case of a
> NODATA/NXDOMAIN from a non-first nameserver; the client does get the desired address from the earlier-listed nameserver if
> there is one --- even the first time; and dnsmasq only drops into strict-order mode under the circumstances when it is necessary for > it to do so such that clients get needed addresses. There is no point, for example, in dropping into strict-order mode if it's the first > nameserver returning NXDOMAIN and a later-listed nameserver returning an address!

> What do you think about the possibility of implementing such ideas?

I think that both are implementable. I worry that #17 will make (real) NXDOMAIN/NODATA replies much slower, since there at least two round-trips, and possibly a timeout, if a server never replies.

Cheers,
Simon.

Thomas Hood (jdthood) wrote :

Background: Using dnsmasq can give rise to resolution failures on VPNs and other nonequivalent-nameserver networks (NNNs) where early-listed nameservers have more information than later-listed ones. As Sergio (comment #15) and others point out, the failures are grave: dnsmasq might fail to resolve certain names every time. Now, in the case of a VPN we know we're on a VPN and we could solve the problem by having NM or resolvconf feed only the VPN nameserver addresses to dnsmasq (see Scott's comment #3). But this won't work for NNNs in general. Sergio suggested in #19 that some listed nameserver address being in a local IP address range be used as a trigger to switch dnsmasq to strict-order mode. This would indeed catch all the NNNs (including the VPNs), assuming that all special local nameservers have local IP addresses; but it would also result in some false positives and is subject to pdf's criticisms in comment #20. So we are trying to think of an algorithm for detecting all and only NNNs. Such an algorithm would have to be implemented in dnsmasq because only it has access to the required information.

Simon in #27> I worry that #17 will make (real) NXDOMAIN/NODATA replies much slower, since there at least two round-trips, and possibly a timeout, if a server never replies.

OK, here's #17 but with immediate return as in #18.

#28: "On encountering NODATA or NXDOMAIN, dnsmasq returns the negative result immediately but also reiterates the query to all nameservers listed earlier than the one that answered. If one of those nameservers returns an address then dnsmasq switches to strict-order mode until the next change in the nameserver address list."

With this approach there is no delay in returning results; NNNs are detected quickly, yet the number of duplicated DNS queries is limited; on NNNs, dnsmasq fails to return one or a few addresses before switching modes, but thereafter does not. The disadvantages of strict-order mode can be avoided even on NNNs if "lookup routing" (i.e., the --server=/name/adress feature) is properly configured. (This won't be easy to configure given that the networking environment changes a lot. Could dnsmasq be made smart enough to figure out the appropriate lookup routing on the fly? I don't see how, but maybe someone else has a good idea.)
--
Thomas

Thomas Hood (jdthood) wrote :

In addition to devising an algorithm for dnsmasq to detect all and only NNNs, the implementation of which will no doubt take a while, we should consider implementing a quick fix too, along the lines suggested by Sergio in #19. NM could be changed to do the following.

"If the nameserver address list to be fed to dnsmasq contains one or more local addresses followed by one or more non-local addresses then run dnsmasq with the --strict-order option."

I must confess that I am not sure what exactly should fall under "local addresses" here. In IPv4 I presume that these would be the familiar ranges 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, but what about IPv6? Nevertheless, I think we can safely proceed with this fix without being sure that we have exactly the right definition of local address since dnsmasq works no worse than libc in strict-order mode.

Thomas Hood (jdthood) wrote :

I have marked this issue as affecting dnsmasq since we may want to implement a solution there along the lines of #28 or similar.

I have marked this issue as affecting resolvconf since we may want to implement a fix there along the lines of #29 or similar. (In the absence of NM and in the presence of dnsmasq, resolvconf also feeds a nameserver list to dnsmasq.)

summary: - Precise NM with "dns=dnsmasq" breaks systems with non-equivalent
- upstream nameservers
+ dnsmasq sometimes fails to resolve private names in networks with non-
+ equivalent nameservers

On 31/05/12 08:47, Thomas Hood wrote:
> In addition to devising an algorithm for dnsmasq to detect all and only
> NNNs, the implementation of which will no doubt take a while, we should
> consider implementing a quick fix too, along the lines suggested by
> Sergio in #19. NM could be changed to do the following.
>
> "If the nameserver address list to be fed to dnsmasq contains one or
> more local addresses followed by one or more non-local addresses then
> run dnsmasq with the --strict-order option."
>
> I must confess that I am not sure what exactly should fall under "local
> addresses" here. In IPv4 I presume that these would be the familiar
> ranges 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, but what about IPv6?

I think you're right for IPv4. For IPv6, I'm tempted to treat it as a
tabula rasa and explicitly not support NNNs. the rationale being that
NNN support is to work around historical bad practice and such bad
practice is not supported in the brave new world of IPv6. If that won't
fly, then the IPv6 equivalent would be link-local (fe80::/64),
site-local (fec0::/10) and ULAs (block fc00::/7), I think.

> Nevertheless, I think we can safely proceed with this fix without being
> sure that we have exactly the right definition of local address since
> dnsmasq works no worse than libc in strict-order mode.
>
> ** Also affects: dnsmasq (Ubuntu)
> Importance: Undecided
> Status: New
>
> ** Also affects: resolvconf (Ubuntu)
> Importance: Undecided
> Status: New
>

Thomas Hood (jdthood) wrote :

> I have marked this issue as affecting resolvconf
> since we may want to implement a fix there along
> the lines of #29 or similar. (In the absence of NM
> and in the presence of dnsmasq, resolvconf also
> feeds a nameserver list to dnsmasq.)

Just remembered that the resolvconf hook script that does this feeding is located in the dnsmasq package.

no longer affects: resolvconf (Ubuntu)
James Page (james-page) on 2012-05-31
Changed in dnsmasq (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
Changed in network-manager (Ubuntu):
importance: Undecided → Medium
Scott Moser (smoser) on 2012-05-31
Changed in network-manager (Ubuntu Precise):
status: New → Confirmed
importance: Undecided → Medium
milestone: none → ubuntu-12.04.1
Changed in dnsmasq (Ubuntu Precise):
status: New → Confirmed
importance: Undecided → Medium
milestone: none → ubuntu-12.04.1
Thomas Hood (jdthood) wrote :

#991347 describes a case where there's a nameserver in the list that always replies very quickly with "no data". Dnsmasq currently selects this nameserver because it's quick, the result being that all names fail to be resolved. Ungood.

The measures proposed above would also improve handling of the case just described, so long as it's not the first-listed nameserver that's misbehaving, even though in the case just described a better response would be to detect the malfunction and to ignore the malfunctioning nameserver until it gets fixed. (An even better behavior would be for dnsmasq autonomously to construct a map of which servers can resolve for which domains, but this is asking a lot.)

As a "quick" fix, it might be possible to just include the DNS servers reported by DHCP twice for dnsmasq: once by itself for "global" resolution, and once with the search domain from DHCP so that local network resolution might work. I'll investigate the idea, as that would likely solve at least half of the problem cases here.

Changed in network-manager (Ubuntu):
status: Confirmed → In Progress
assignee: nobody → Mathieu Trudel-Lapierre (mathieu-tl)
Changed in network-manager (Ubuntu Precise):
assignee: nobody → Mathieu Trudel-Lapierre (mathieu-tl)
status: Confirmed → Triaged
Changed in dnsmasq (Debian):
status: Unknown → New
Thomas Hood (jdthood) wrote :

Here's some background information I stumbled across.

Once upon a time NM started dnsmasq in strict-order mode but this was changed.

    https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/903854

This bug was mentioned in the discussion about domain name service changes for Precise.

    https://blueprints.launchpad.net/ubuntu/+spec/foundations-p-dns-resolving

Thomas Hood (jdthood) wrote :

Just to mention that I have run into this problem myself when I connect to work over VPN. I'm using standalone dnsmasq and not using nm-dnsmasq. Turning on strict-order fixes it.

Stéphane Graber (stgraber) wrote :

Untargeted the dnsmasq part of it from 12.04.1 as we realistically won't get a change in dnsmasq by then.

Switching back to strict-order is a bad idea for the reasons listed in bug 903854, namely, we'd loose our biggest advantage from using dnsmasq. But there should be a middle ground here where servers would usually be checked like in strict-order and any server not responding in $AMOUNT_OF_TIME is automatically skipped for later queries + a watchdog querying the server from time to time to see if it's back to life.

Changed in dnsmasq (Ubuntu Precise):
milestone: ubuntu-12.04.1 → none
Thomas Hood (jdthood) wrote :

@Stéphane: The problem doesn't arise from servers not responding. It arises from servers responding with NODATA or NXDOMAIN. See my comment #28.

Changed in network-manager (Ubuntu Precise):
milestone: ubuntu-12.04.1 → ubuntu-12.04.2
Thomas Hood (jdthood) wrote :

I also have this problem when I use nm-dnsmasq and connect to work over VPN.

Although there is now a /etc/NetworkManager/dnsmasq.d directory, adding a file there with "strict-order" in it is not enough to fix the problem. That option seems to have no effect when addresses are conveyed to dnsmasq over D-Bus.

So I now work around the problem by commenting out "dns=dnsmasq" in /e/NM/NM.conf.

Thomas Hood (jdthood) wrote :

@Stéphane: Can you please give us an idea of what, if anything, you think will be done about this problem in Quantal?

Thomas Hood (jdthood) wrote :

It has been a few months since the last comment.

If no solution along the lines of those outlined earlier (see comments #28, #29, #34, #37) is forthcoming then nm-dnsmasq should simply be put back into strict-order mode, thus reversing the change made at the suggestion of bug #903854.

Stéphane wrote in #37:
> Switching back to strict-order is a bad idea for the reasons
> listed in bug 903854, namely, we'd loose our biggest
> advantage from using dnsmasq.

The biggest advantage is only a performance advantage under some circumstances. This in no way stacks up against outright failure under other circumstances — circumstances typical of many LANs. If no solution for this bug (#1003842) is forthcoming then it is time to admit that switching off strict-order was the wrong thing to do. Knowing what we know now, we should switch it back on, and only switch it off again when a solution has been found for this bug. If switching on strict-order eliminates the only advantages of using nm-dnsmasq then nm-dnsmasq itself should be switched off (as proposed at bug #1086693) until that time.

Thomas Hood (jdthood) wrote :

One thing needs to be checked, though. Reading dnsmasq(8):

 -o, --strict-order
              By default, dnsmasq will send queries to any of the
              upstream servers it knows about and tries to favour
              servers that are known to be up. Setting this flag
              forces dnsmasq to try each query with each server
              strictly in the order they appear in /etc/resolv.conf

Will switching on strict-order have the same effect now that nameserver addresses are sent over D-Bus?

Lothar (lothar-tradescape) wrote :

I experienced the problems described where I lost DNS resolution when connected to a corporate VPN.

With help from a coworker I fixed it temporarily by commenting
#dns=dnsmasq
in /etc/NetworkManager/NetworkManager.conf as recommended in bug #903854

P.S.
I lost a lot of time trying to figure out why my VPN connections were suddenly no longer working.
I hope Ubuntu finds a permanent solution that keeps private VPNs working.

Thomas Hood (jdthood) wrote :

Stéphane?

tombert (tombert.live) wrote :

I am having similar problems. In order to get DNS to work I need to restart dnsmasq after boot (manually or via script) in order to get it to resolve hostnames. DHCP works fine though.
I am on 12.10

thx

Thomas Hood (jdthood) wrote :

@tombert: Probably not the same issue, since the issue being discussed here is not fixed by restarting. Please file a new bug report against dnsmasq with a detailed description of your problem.

Steve Riley (steveriley) wrote :

I started using my employer's OpenVPN today and encountered name resolution problems. From my research, this here bug appears to be plaguing me, as well (I'm on 12.10). Commenting the line dns=dnsmasq in /etc/NetworkManager/NetworkManager.conf does fix the problem. However, _all_ DNS is routed out the VPN in this case. I rather like the idea of splitting the DNS responsibilities.

I see there's still the unresolved question of whether re-enabling --strict-order will suffice as a workaround, since 12.10 relies on DBus to populate the nameservers. Is there any extra information on this?

Thomas Hood (jdthood) wrote :

>there's still the unresolved question
> of whether re-enabling --strict-order
> will suffice as a workaround, since
> 12.10 relies on DBus to populate the
> nameservers. Is there any extra
> information on this?

Please try it and report back. :-)

(Put "strict-order" in a file in /etc/NetworkManager/dnsmasq.d/; stop network-manager; make sure all dnsmasq processes are dead; start network-manager.)

On 03/02/13 07:48, Thomas Hood wrote:
>> there's still the unresolved question
>> of whether re-enabling --strict-order
>> will suffice as a workaround, since
>> 12.10 relies on DBus to populate the
>> nameservers. Is there any extra
>> information on this?
>
> Please try it and report back. :-)
>
> (Put "strict-order" in a file in /etc/NetworkManager/dnsmasq.d/; stop
> network-manager; make sure all dnsmasq processes are dead; start
> network-manager.)
>

It doesn't work: It will always use the same server first, but the order
of servers given to the DBus interface isn't preserved internally, and
actually changes each time the DBus interface is used.

Cheers,

Simon.

Sergio Callegari (callegar) wrote :

On 04/02/2013 15:40, Simon Kelley wrote:
> On 03/02/13 07:48, Thomas Hood wrote:
>>> there's still the unresolved question
>>> of whether re-enabling --strict-order
>>> will suffice as a workaround, since
>>> 12.10 relies on DBus to populate the
>>> nameservers. Is there any extra
>>> information on this?
>> Please try it and report back. :-)
>>
>> (Put "strict-order" in a file in /etc/NetworkManager/dnsmasq.d/; stop
>> network-manager; make sure all dnsmasq processes are dead; start
>> network-manager.)
>>
> It doesn't work: It will always use the same server first, but the order
> of servers given to the DBus interface isn't preserved internally, and
> actually changes each time the DBus interface is used.
>
>
> Cheers,
>
> Simon.
Isn't it possible to change dnsmasq behavior to query the servers in any order
or in parallel and in the case the first server to reply says "I don't know"
avoid relying on that information, rather wait and see if in a reasonable time
some other server answers "I do"?

With the current behavior, whenever I need to access a captive portal, I
basically have to press the "reload page" button 50 times until for some reasons
the order in which the nameservers reply becomes the good one.

Cheers,

Sergio

On 04/02/13 15:36, Sergio Callegari wrote:
> On 04/02/2013 15:40, Simon Kelley wrote:
>> On 03/02/13 07:48, Thomas Hood wrote:
>>>> there's still the unresolved question
>>>> of whether re-enabling --strict-order
>>>> will suffice as a workaround, since
>>>> 12.10 relies on DBus to populate the
>>>> nameservers. Is there any extra
>>>> information on this?
>>> Please try it and report back. :-)
>>>
>>> (Put "strict-order" in a file in /etc/NetworkManager/dnsmasq.d/; stop
>>> network-manager; make sure all dnsmasq processes are dead; start
>>> network-manager.)
>>>
>> It doesn't work: It will always use the same server first, but the order
>> of servers given to the DBus interface isn't preserved internally, and
>> actually changes each time the DBus interface is used.
>>
>>
>> Cheers,
>>
>> Simon.
> Isn't it possible to change dnsmasq behavior to query the servers in any order
> or in parallel and in the case the first server to reply says "I don't know"
> avoid relying on that information, rather wait and see if in a reasonable time
> some other server answers "I do"?

You're far from the first person to ask that question. The answer is
that there is no possible response in the DNS protocol which means "I
don't know". NXDOMAIN or NODATA answers _don't_ mean that; they mean "I
know that this domain doesn't exist". They also make up quite a large
proportion of the DNS results returned to the average host, so that all
of those queries would suddenly take much longer.

>
> With the current behavior, whenever I need to access a captive portal, I
> basically have to press the "reload page" button 50 times until for some reasons
> the order in which the nameservers reply becomes the good one.

The fundamental problem lies with the captive portal, and no good
solution which can be implemented by dnsmasq has so far been devised.

Cheers,

Simon.

Sergio Callegari (callegar) wrote :

On 04/02/2013 17:07, Simon Kelley wrote:
> On 04/02/13 15:36, Sergio Callegari wrote:
>> On 04/02/2013 15:40, Simon Kelley wrote:
>>> On 03/02/13 07:48, Thomas Hood wrote:
>>>>> there's still the unresolved question
>>>>> of whether re-enabling --strict-order
>>>>> will suffice as a workaround, since
>>>>> 12.10 relies on DBus to populate the
>>>>> nameservers. Is there any extra
>>>>> information on this?
>>>> Please try it and report back. :-)
>>>>
>>>> (Put "strict-order" in a file in /etc/NetworkManager/dnsmasq.d/; stop
>>>> network-manager; make sure all dnsmasq processes are dead; start
>>>> network-manager.)
>>>>
>>> It doesn't work: It will always use the same server first, but the order
>>> of servers given to the DBus interface isn't preserved internally, and
>>> actually changes each time the DBus interface is used.
>>>
>>>
>>> Cheers,
>>>
>>> Simon.
>> Isn't it possible to change dnsmasq behavior to query the servers in any order
>> or in parallel and in the case the first server to reply says "I don't know"
>> avoid relying on that information, rather wait and see if in a reasonable time
>> some other server answers "I do"?
> You're far from the first person to ask that question. The answer is
> that there is no possible response in the DNS protocol which means "I
> don't know". NXDOMAIN or NODATA answers _don't_ mean that; they mean "I
> know that this domain doesn't exist". They also make up quite a large
> proportion of the DNS results returned to the average host, so that all
> of those queries would suddenly take much longer.

Yes, I realize that the problem is with the setup of the intranet, that should
not add names to a domain that is known on the internet or invent a subdomain of
something that is on the internet.

But as a workaround, having a switch to activate "wait for further answers if
you get an 'it does not exist'" would be nice for those willing to pay the price
of a longer wait (or possibly even auto-activate it if a dns is detected to be
on an intranet).

Best regards,

Sergio

Thomas Hood (jdthood) wrote :

Simon in #49:
> It doesn't work [...] the order of servers given to the DBus
> interface isn't preserved internally

Aha, so the answer to my question

> Will switching on strict-order have the same effect
> now that nameserver addresses are sent over D-Bus?

(in comment #42) is "No". So switching strict-order back on is no solution. And solutions depending on strict-order including mine in #28 also won't work. Unless dnsmasq is somehow changed such that it remembers the order in which nameserver addresses come in over D-Bus so that strict-order is useful in the D-Bus case, if we want to avoid breaking name service on machines connected to NNNs then we have to disable dnsmasq by default; or disable it initially and only enable it when we know that we aren't on a NNN.

(NNN = nonequivalent-nameserver network. As discussed in comment #5, such networks are not properly configured. But as observed several times, there are many NNNs out there. Which is why *many* people have been commenting out "dns=dnsmasq".)

There is another problem with NM-dnsmasq (bug #1072899). Some VPNs have multiple nameservers. NM uses dnsmasq to direct VPN domain name queries to the *first* one. But then, if the first one goes down, the second one is not tried. Once again, for the sake of speed enhancement in the favorable case, users suffer radical name service failure in the unfavorable case. This is not a good deal, IMHO. NM-dnsmasq should be disabled by default until these problems are solved.

On 04/02/13 22:05, Thomas Hood wrote:
> Simon in #49:
>> It doesn't work [...] the order of servers given to the DBus
>> interface isn't preserved internally
>
> Aha, so the answer to my question
>
>> Will switching on strict-order have the same effect
>> now that nameserver addresses are sent over D-Bus?
>
> (in comment #42) is "No". So switching strict-order back on is no
> solution. And solutions depending on strict-order including mine in #28
> also won't work. Unless dnsmasq is somehow changed such that it
> remembers the order in which nameserver addresses come in over D-Bus so
> that strict-order is useful in the D-Bus case, if we want to avoid
> breaking name service on machines connected to NNNs then we have to
> disable dnsmasq by default; or disable it initially and only enable it
> when we know that we aren't on a NNN.

Note that setting --strict-order is pretty much equivalent to telling
dnsmasq to use only the first nameserver, so you can very easily provide
the same behaviour - only pass the first nameserver to dnsmasq. Maybe
provide a button in NM that does this - "press here if you're in a
captive portal".

>
> (NNN = nonequivalent-nameserver network. As discussed in comment #5,
> such networks are not properly configured. But as observed several
> times, there are many NNNs out there. Which is why *many* people have
> been commenting out "dns=dnsmasq".)
>
> There is another problem with NM-dnsmasq (bug #1072899). Some VPNs have
> multiple nameservers. NM uses dnsmasq to direct VPN domain name queries
> to the *first* one. But then, if the first one goes down, the second one
> is not tried. Once again, for the sake of speed enhancement in the
> favorable case, users suffer radical name service failure in the
> unfavorable case. This is not a good deal, IMHO. NM-dnsmasq should be
> disabled by default until these problems are solved.

That's a different problem, and could be solved. Ironically, I think the
problem arises because for nameservers associated with particular
domains, the equivalent of --strict-order is always in play.

Cheers,

Simon.

>

Belay my previous comment about 1072899, it looks like network manager
is losing the second server before it ever gets to dnsmasq. Not a
dnsmasq problem.

Simon.

Thomas Hood (jdthood) wrote :

Hi Simon.

Before I forget to ask: can you please update dnsmasq(8) to include under "--strict-order" a description of what happens when nameserver addresses are passed in via D-Bus instead of via a file?

You wrote,
> you can very easily provide the same behaviour - only pass the first nameserver to dnsmasq

Because NM doesn't use dnsmasq to cache, if NM were to give dnsmasq only one address then I guess the only service that dnsmasq would still provide is that of name-to-server mapping.

And it turns out that the way NM currently uses dnsmasq to do this is seriously flawed. So I conclude that it's better for NM not to use dnsmasq at all until these problems are solved.

> [That NM only supplies one nameserver address per domain name]
> is a different problem, and could be solved.

From the man page it's not completely clear how to solve it. Can you confirm (1) that it's possible to give multiple server options as follows

    server=/google.com/1.2.3.4
    server=/google.com/5.6.7.8

and that the result will be that 1.2.3.4 and 5.6.7.8 will be treated equally for the purpose of resolving names in domain google.com? (2) And likewise via D-Bus?

(3) What effect does strict-order have on this?

> Ironically, I think the
> problem arises because for nameservers associated with particular
> domains, the equivalent of --strict-order is always in play.

What you say here suggests that my proposition #1 above is false. If #1 is false then it seems that in order to fix

Thomas Hood (jdthood) wrote :

[...cont'd after "in order to fix"...] bug #1072899, dnsmasq will have to be enhanced such that proposition #1 is true. But we can discuss the details of that in bug #1072899.

<parenthesis>
There is a close analogy between the problem here (bug #1003842) and a problem we have with avahi. Avahi resolves names in the domain ".local". Networks should not use this TLD, but many do and at least in the past Microsoft actually recommended doing so. When users connect to such networks with avahi enabled the result is malfunction. Upstream purisitically says[*] "If you come across a network where .local is a unicast DNS domain, please contact the local administrator and ask him to move his DNS zone to a different domain. If this is not possible, we recommend not to use Avahi in such a network at all." In practice avahi attempts to detect "bad" networks and disables itself if it thinks it is on a bad network, subject unfortunately both to false positives (bug #327362) and false negatives (bug #80900).

We aren't yet doing even that well. We say that networks ought to have equivalent nameservers and we make no attempt to detect networks that have non-equivalent nameservers, of which there are very many.

[*]http://avahi.org/wiki/AvahiAndUnicastDotLocal
</parenthesis>

On 06/02/13 08:59, Thomas Hood wrote:
> Hi Simon.
>
> Before I forget to ask: can you please update dnsmasq(8) to include
> under "--strict-order" a description of what happens when nameserver
> addresses are passed in via D-Bus instead of via a file?
>
> You wrote,
>> you can very easily provide the same behaviour - only pass the first nameserver to dnsmasq
>
> Because NM doesn't use dnsmasq to cache, if NM were to give dnsmasq only
> one address then I guess the only service that dnsmasq would still
> provide is that of name-to-server mapping.
>
> And it turns out that the way NM currently uses dnsmasq to do this is
> seriously flawed. So I conclude that it's better for NM not to use
> dnsmasq at all until these problems are solved.
>
>> [That NM only supplies one nameserver address per domain name]
>> is a different problem, and could be solved.
>
>>From the man page it's not completely clear how to solve it. Can you
> confirm (1) that it's possible to give multiple server options as
> follows
>
> server=/google.com/1.2.3.4
> server=/google.com/5.6.7.8
>
> and that the result will be that 1.2.3.4 and 5.6.7.8 will be treated
> equally for the purpose of resolving names in domain google.com? (2) And
> likewise via D-Bus?
>
> (3) What effect does strict-order have on this?
>
>> Ironically, I think the
>> problem arises because for nameservers associated with particular
>> domains, the equivalent of --strict-order is always in play.
>
> What you say here suggests that my proposition #1 above is false. If #1
> is false then it seems that in order to fix
>

proposition #1 is true, as is #2: you can configure the same thing via DBus.

Consider

server=1.1.1.1
server=2.2.2.2
server=/google.com/3.3.3.3
server=/google.com/4.4.4.4

Queries not sent to *.google.com will behave in the normal dnsmasq
manner, sent non-deterministically to 1.1.1.1 and/or 2.2.2.2 in a way
which tries to favour the fastest/most up server.

Queries sent to *google.com will be sent 3.3.3.3 or 4.4.4.4 in the same
way as if strict order was set, ie, to 3.3.3.3 first, and only to
4.4.4.4 if 3.3.3.3 returns a SERVFAIL or REFUSED error, or doesn't reply
at all.

This should be changed, but the code which implements it is knarly and
old, and won't stand more tinkering, it needs rewriting. I've not found
the time, as of yet.

Cheers,

Simon.

On 06/02/13 09:18, Thomas Hood wrote:
> [...cont'd after "in order to fix"...] bug #1072899, dnsmasq will
> have to be enhanced such that proposition #1 is true. But we can
> discuss the details of that in bug #1072899.
>
> <parenthesis> There is a close analogy between the problem here (bug
> #1003842) and a problem we have with avahi. Avahi resolves names in
> the domain ".local". Networks should not use this TLD, but many do
> and at least in the past Microsoft actually recommended doing so.
> When users connect to such networks with avahi enabled the result is
> malfunction. Upstream purisitically says[*] "If you come across a
> network where .local is a unicast DNS domain, please contact the
> local administrator and ask him to move his DNS zone to a different
> domain. If this is not possible, we recommend not to use Avahi in
> such a network at all." In practice avahi attempts to detect "bad"
> networks and disables itself if it thinks it is on a bad network,
> subject unfortunately both to false positives (bug #327362) and false
> negatives (bug #80900).
>
> We aren't yet doing even that well. We say that networks ought to
> have equivalent nameservers and we make no attempt to detect networks
> that have non-equivalent nameservers, of which there are very many.
>
> [*]http://avahi.org/wiki/AvahiAndUnicastDotLocal </parenthesis>
>

Detect non-equivalent servers is hard. I'm very much in favour of doing
it, if a way can be found.

Simon.

Thomas Hood (jdthood) wrote :

Simon wrote:
> Consider
[...]
> server=/google.com/3.3.3.3
> server=/google.com/4.4.4.4
[...]
> Queries sent to *google.com will be sent 3.3.3.3 or 4.4.4.4 in the
> same way as if strict order was set, ie, to 3.3.3.3 first, and only to
> 4.4.4.4 if 3.3.3.3 returns a SERVFAIL or REFUSED error, or doesn't
> reply at all.
>
> This should be changed, but the code which implements it is knarly
> and old, and won't stand more tinkering, it needs rewriting. I've
> not found the time, as of yet.

That doesn't sound as if it's urgently needed for anything we are talking about here.

What we do need is for strict-order to work when addresses are provided over D-Bus. (That this requires work: see #49. That this is needed: see below.)

>> We say that networks ought to
>> have equivalent nameservers and we make no attempt to detect networks
>> that have non-equivalent nameservers, of which there are very many.
>
> Detect non-equivalent servers is hard. I'm very much in favour of
> doing it, if a way can be found.

Well, let's look at the ideas that have been put forward so far.

Solution #1. Disable NM-dnsmasq by default. This is the only solution we have right now.

Other ideas that probably need more thought...

Solution #2. Enhance dnsmasq such that it can be given an ordered list of nameservers via D-Bus and can process this list in strict-order fashion. Then do every lookup in strict-order fashion, but detect offline nameservers and omit them temporarily from the list. (This is my interpretation of Stéphane's suggestion in #37.)

Solution #3. Enhance dnsmasq such that it can be given an ordered list of nameservers via D-Bus and can process this list in strict-order fashion. Then do a given lookup in strict-order fashion if
    * the lookup is being routed to a specific nameserver due to a "server" option;
    * the name is in one of the search domains returned by DHCP (as suggested my M T-L in #34);
    * the name is not in any of the recognized TLDs; or
    * we have detected nameserver nonequivalence since the last time the list of nameservers changed. The detection mechanism is as described in #28: on encountering NODATA or NXDOMAIN, dnsmasq returns the negative result immediately but also reiterates the query to all nameservers listed earlier than the one that answered. If one of those nameservers later returns an address then nameserver nonequivalence has been detected. (This combines several earlier suggestions.)

Thomas Hood (jdthood) wrote :

Earlier there was some dispute about what the RFCs say about multiple nameservers.

I found the following RFC which does have something to say about these issues.

    http://www.zoneedit.com/doc/rfc/rfc2182.txt

Here are a couple of passages...

Request for Comments: 2182
Category: Best Current Practice

Selection and Operation of Secondary DNS Servers

Abstract

   The Domain Name System requires that multiple servers exist for every
   delegated domain (zone). This document discusses the selection of
   secondary servers for DNS zones. Both the physical and topological
   location of each server are material considerations when selecting
   secondary servers. The number of servers appropriate for a zone is
   also discussed, and some general secondary server maintenance issues
   considered.

[...]

   With multiple servers, usually one server will be the primary server,
   and others will be secondary servers. Note that while some unusual
   configurations use multiple primary servers, that can result in data
   inconsistencies, and is not advisable.

   The distinction between primary and secondary servers is relevant
   only to the servers for the zone concerned, to the rest of the DNS
   there are simply multiple servers. All are treated equally at first
   instance, even by the parent server that delegates the zone.
   Resolvers often measure the performance of the various servers,
   choose the "best", for some definition of best, and prefer that one
   for most queries. That is automatic, and not considered here.

[...]

Thomas Hood (jdthood) wrote :

The target milestone should be adjusted, I guess.

Steve Langasek (vorlon) on 2013-02-07
Changed in network-manager (Ubuntu Precise):
milestone: ubuntu-12.04.2 → ubuntu-12.04.3
Vladimir (vladimir-kozlov) wrote :

The same problem persists in 14.04. My DHCP server pushes two DNS servers: primary (10.0.0.3), located inside the local network and secondary (10.0.2.1), located in DMZ.

Primary server's zone includes records for some servers that are accessible only from local network.

Periodically (maybe after lease renew?) computer with 14.04 could not resolve local names which records are absent in seconary server's zone.

Workaround is to make internal view for secondary server's zone in order to let computers from local network resolve such local names.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.