lxc-net dnsmasq --strict-order breaks dns for lxc non-recursive nameserver

Bug #1205086 reported by Sidnei da Silva
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Expired
Low
Unassigned
lxc (Ubuntu)
Expired
Low
Unassigned

Bug Description

In my setup I have a non-recursive name server that gets pushed as part of a vpn setup. This name server only resolves addresses that are part of the vpn. It gets pushed to the top of the resolv.conf file by /etc/openvpn/update-resolv-conf.

Since the dnsmasq instance set up by lxc-net is started with --strict-order, containers fail to resolve addresses completely, since they hit the first name server and it does not resolve any addresses outside of the vpn.

Removing the --strict-order option in /etc/init/lxc-net.conf and restarting makes name resolution in the containers work.

Revision history for this message
Sidnei da Silva (sidnei) wrote :

The vpn server is running a dnsmasq instance with the following settings:

"""
addn-hosts=/etc/hosts.openvpn-server
addn-hosts=/etc/hosts.openvpn-clients
no-hosts
dns-forward-max=0
no-resolv
"""

In the vpn server configs, it is pushing it's own IP as a dns server:

"""
push "dhcp-option DNS 10.88.0.1"
push "dhcp-option DOMAIN vpn.ubuntone.info"
"""

On the client configs, it's using the stock update-resolv-conf openvpn scripts to update resolvconf:

"""
up /etc/openvpn/update-resolv-conf
down /etc/openvpn/update-resolv-conf
"""

The end result is that the vpn client resolv.conf contains the following:

"""
$ cat /etc/resolv.conf
nameserver 10.88.0.1
nameserver 127.0.1.1
search vpn.ubuntone.info
"""

Since the lxc dnsmasq doesn't specify what to use as resolver, and it has --strict-order, it ends up querying 10.88.0.1 first and since that name server is setup with no-resolv, then it gets refused and does not move on to the next one (127.0.1.1).

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Does removing --strict-order work for your containers in all cases, or only some of the time? Looking through the dnsmasq manpage,

1. I'd expected --strict-order to mean that if the first nameserver doesn't know the answer, we try the second one. Apparently it only falls back if the first one is down altogether?

2. Given the actual behavior of (1), the default (not --strict-oder and not --all-servers) should just choose a name server at random. I would expect it sometimes happens to choose 10.88.0.1, and that if it is up and says "I dont' know that host", I'd expect fallback to be the SAME as with --strict-order. Which means I would have *expected* dnsmasq to try the next one, but in fact per your findings it should (randomly, half time time) simply fail to resolve.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

So really it looks like the dnsmasq-2.47_no_nxdomain_until_end.patch in dnsmasq source is what - in MY humbe opinion - is what we'd need. Both for this, and for bug 1003842 and 1163147.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Looking more through comments in https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1003842, it appears to me that the most uncontraversial fix, rather than removing --strict-order in lxc's dnsmasq, would be for dnsmasq on your host to have something like:

server=/vpn.ubuntuone.info/10.88.0.1

and not have 10.88.0.1 in your resolv.conf at all.

Mind you I do hate suggesting changes to otherwise-working setups to work around what can appear to be problems elsewhere. But going by http://www.zoneedit.com/doc/rfc/rfc2182.txt that appears to be the "correct" thing to do.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Marking incomplete pending info (from submitter or from me) on comment #2.

Changed in libvirt (Ubuntu):
importance: Undecided → Low
Changed in lxc (Ubuntu):
importance: Undecided → Low
Changed in libvirt (Ubuntu):
status: New → Incomplete
Changed in lxc (Ubuntu):
status: New → Incomplete
Revision history for this message
Sidnei da Silva (sidnei) wrote :

Tested on a cloud instance which doesn't have a local dnsmasq, it ended up with the following config:

$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.88.0.1
nameserver 10.55.60.1
search vpn.ubuntone.info canonistack

Removing the --strict-order seems to solve the problem every single time consistently.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for lxc (Ubuntu) because there has been no activity for 60 days.]

Changed in lxc (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for libvirt (Ubuntu) because there has been no activity for 60 days.]

Changed in libvirt (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Natalia Bidart (nataliabidart) wrote :

Reopening because this is still an issue in saucy and lxc.

Changed in libvirt (Ubuntu):
status: Expired → New
Changed in lxc (Ubuntu):
status: Expired → New
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@sidnei - apologies, I had missed your response, after which the bug autoexpired.

Hoping to get a comment from stgraber or cyphermox.

Revision history for this message
Simon Davy (bloodearnest) wrote :

This is an issue in trusty, and affects squid-deb-proxy as well.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

There is concern that removing strict-order would at least double the dns traffic for most users, and is not the proper fix.

From irc logs (#ubuntu-server, feb 24)

<stgraber> If the reporter is using a desktop machine, the real fix is to use NetworkManager which will properly setup dnsmasq to only use the VPN dns server for requests relevant to it
<stgraber> hallyn: ok, so just did some tests. The problem there is clearly that the remote dns server is misconfigured. Trying with mine, I get NXDOMAIN for an invalid domain from a recursive server (as I should) but SERVFAIL for a domain outside the scope of a non-recursive server.
<stgraber> SERVFAIL causes dnsmasq to query the next server, NXDOMAIN doesn't.
<stgraber> SERVFAIL is nsd's response when non-recursive. REFUSED is bind's response when non-recursive. Both work with dnsmasq.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Stéphane,

given your comments pasted into #12, would you recommend calling this bug wontfix?

Changed in libvirt (Ubuntu):
status: New → Incomplete
Changed in lxc (Ubuntu):
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for lxc (Ubuntu) because there has been no activity for 60 days.]

Changed in lxc (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for libvirt (Ubuntu) because there has been no activity for 60 days.]

Changed in libvirt (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.