Bug #959037 “NM-controlled dnsmasq prevents other DNS servers fr...” : Precise (12.04) : Bugs : network-manager package : Ubuntu

Revision history for this message

Mathieu Trudel-Lapierre (cyphermox) wrote on 2012-03-19:

#1

Well, that's already partly done. dnsmasq will fail to start with bind is running, as it should; based on port 53 already being in use or not.

As another option, you may also wish to switch dns=dnsmasq to dns=bind to use bind directly as a resolver. There are other reasons to have dnsmasq and/or bind installed, so even checking for existence isn't the right way to cover this.

Revision history for this message

Mathieu Trudel-Lapierre (cyphermox) wrote on 2012-03-19:

#2

I don't think we'll cover this particular use case for Precise. I understand your requirement and how the need to change the settings in /etc/NetworkManager/NetworkManager.conf isn't great, but it's a one-time thing and isn't something we can safely do as part of the install processes for dnsmasq or bind. Then, for the reasons above other options aren't available.

There's another possibility to make this easier by making sure Bind always starts before NetworkManager, but most cases will not actually see bind and NetworkManager installed on the same system; and fixing this would require migrating bind from a sysvinit script to a new upstart job.

I'm keeping the task open as it's absolutely a valid request, we just won't have time to focus on fixing this for the Precise release. (Sorry)

Changed in network-manager (Ubuntu):
status:	New → Triaged
importance:	Undecided → Low

Revision history for this message

Alkis Georgopoulos (alkisg) wrote on 2012-03-19:

#3

> I don't think we'll cover this particular use case for Precise.

Excuse me, but how is installing bind9 or dnsmasq a "particular use case"?
I'm talking about the default installation, not some corner case...

> most cases will not actually see bind and NetworkManager installed on the same system

We have 250 schools here that use NetworkManager and dnsmasq as the DNS server, are there any stats that show that this is actually rare?
And, actually more rare than the split VPN need that the local resolver addresses?

Since the local resolver implementation seems a bit immature and needs to break two packages in order to work, one of them in main, wouldn't it be better if it was postponed and not be applied in an LTS release until it's more cooperative?

Kind regards,
Alkis Georgopoulos

Revision history for this message

Mathieu Trudel-Lapierre (cyphermox) wrote on 2012-03-20:

#4

I think I've been unclear. Using NetworkManager with *bind* is a relatively unusual use case. dnsmasq with NetworkManager for resolution is what we're aiming for *by default*, and that's what also part of the default install. Everything has been put in place so that split VPN and such are correctly addressed with NetworkManager spawning dnsmasq as necessary, which is what dns=dnsmasq achieves.

I'm not sure in this case what you mean by breaks two packages. There's a lot of benefits to having a local resolver other than the libc one (split DNS, faster and more efficient resolution, etc.).

I do feel we've tested this well, thoroughly, and that it's very cooperative and efficient. Please, tell me more about your setup so we can make sure we cater for this use case before release.

Revision history for this message

Mathieu Trudel-Lapierre (cyphermox) wrote on 2012-03-20:

#5

What I mean here is that default installs normally don't involve installing a local DNS server, except perhaps as a caching resolver. The caching resolver use case is covered by spawning dnsmasq from NetworkManager; the local DNS server isn't. We do think that there is relatively few such installs of a server that depends on NetworkManager running; and that's definitely not the default setup for Ubuntu Server (where NetworkManager isn't installed by default).

Revision history for this message

Alkis Georgopoulos (alkisg) wrote on 2012-03-20:

#6

> Please, tell me more about your setup so we can make sure we cater for this use case before release.

1) Install precise-desktop-i386.iso to some-pc.
2) Install dnsmasq. Fails to start. OK, annoying but let's see if the problem goes away after reboot.
3) Reboot. Try to `dig @some-pc ubuntu.com` from *another* PC.

Here's the problem. It *sometimes* works. The "caching resolver" implementation introduced a race condition.
So if the nm-spawned dnsmasq starts first, then the dnsmasq package is broken, and doesn't fulfill its stated goal to "provide DNS to a small network" out of the box and without manual editing of nm conffiles.
If the real dnsmasq starts first, then the "caching resolver" is broken instead.

Because of time constrains, I think that checking if [ -d /etc/dnsmasq.d ] before spawning dnsmasq from nm, would satisfy most of dnsmasq users. I don't think there are many users that want to keep the nm-spawned dnsmasq when they install the real one. Maybe something similar can be done for bind too.
In the future, maybe the "caching resolver" implementation can start using /etc/dnsmasq.d itself, along with the KVM-spawned instances too, so that people only have one dnsmasq instance instead of multiple ones?

(The reason we're using the desktop iso instead of the server one, is that we need a desktop environment in our servers for our LTSP thin clients, and because teachers work on our servers, they're not headless).

Revision history for this message

Alkis Georgopoulos (alkisg) wrote on 2012-03-20:

#7

Another idea would be to create a "spawn-local-resolver" sysvinit or upstart job that lists dnsmasq and bind in its dependencies, so that it always starts after any known DNS servers, ensuring that no race conditions occur for the :53 port checking.

Revision history for this message

Alkis Georgopoulos (alkisg) wrote on 2012-03-20:

#8

And yet another idea would be to make a package out of the local resolver configuration, and declare that it Breaks: dnsmasq, bind9.
That way anyone installing dnsmasq or bind9 would get rid of the local resolver package and its conflicting configuration.

Revision history for this message

Mathieu Trudel-Lapierre (cyphermox) wrote on 2012-03-20:

#9

If you're installing dnsmasq on top of the standard desktop install, why is it such an issue to edit the NetworkManager configuration to cater it to your needs? Wouldn't it make sense it this case to go further steps and make sure the network connection is setup in /etc/network/interfaces rather than NM, to ensure you don't suddenly get a different IP address from DHCP?

I don't think adding complexity by creating new virtual packages for configurations is a sensible thing to do; and setting up a special upstart job to spawn a local resolver won't work (NM spawns it itself, using a custom configuration on purpose).

Since NM relies on dnsmasq-base for the standalone binary rather than the 'dnsmasq' package itself; I guess a workable solution would be to check for /etc/default/dnsmasq and not spawn dnsmasq if the value of ENABLED is 1. Working on top of that for later releases we might then be able to try speaking to a running instance via DBus in such cases to pass server changes to it.

Setting to Triaged; we've got a way to possibly deal with this use case...

Changed in network-manager (Ubuntu):
importance:	Low → Medium

Revision history for this message

Mathieu Trudel-Lapierre (cyphermox) wrote on 2012-03-20:

#10

Does it help any if the daemon dnsmasq is configured to only listen on the interface meant for the ltsp clients, if there's a specific interface for this?

Revision history for this message

Mathieu Trudel-Lapierre (cyphermox) wrote on 2012-03-20:

#11

There's other probably far simpler (and safer) workarounds. What's your configuration for the dnsmasq like?

Upstream mentions some configurations at the dnsmasq level that are very relevant for this particular case:

in /etc/dnsmasq.conf:

#except-interface=
# Or which to listen on by address (remember to include 127.0.0.1 if
# you use this.)
#listen-address=

The problem is that listen-address probably shouldn't contain 127.0.0.1 if dnsmasq is meant to be used to resolve things for ltsp clients; also, except-interface=lo may be a good idea here to avoid listening on the loopback interface. That way both instances should start fine.

Revision history for this message

Alkis Georgopoulos (alkisg) wrote on 2012-03-21:

#12

Hi Mathieu,

> If you're installing dnsmasq on top of the standard desktop install, why is it such an issue to edit the NetworkManager configuration to cater it to your needs?
> except-interface=lo may be a good idea here to avoid listening on the loopback interface

It's not about me; it's that the default dnsmasq/bind installations are now broken on desktop installations.
For the needs of our schools here in every LTS release we're making repositories with custom packages for automated installation + configuration, so the nm configuration editing is just a sed away, much less trouble than even reporting the bug in the first place.

> Wouldn't it make sense it this case to go further steps and make sure the network connection is setup in /etc/network/interfaces rather than NM, to ensure you don't suddenly get a different IP address from DHCP?

No, network manager supports static IPs (even though we don't always need them even on LTSP servers) and doing it without /etc/network/interfaces allows teachers to see the network status from the nm applet.

> and setting up a special upstart job to spawn a local resolver won't work (NM spawns it itself, using a custom configuration on purpose).

Right, that's why I'm saying that the local resolver implementation is immature, it doesn't integrate with the rest of the distro, but it breaks other packages by launching a DNS server from hardcoded C code instead of a regular sysvinit/upstart script like all the other daemons.

> I guess a workable solution would be to check for /etc/default/dnsmasq and not spawn dnsmasq if the value of ENABLED is 1.

That would indeed be workable, please do implement it.

> listen-address probably shouldn't contain 127.0.0.1 if dnsmasq is meant to be used to resolve things for ltsp clients

Thin client sessions run on the server, and would be resolved from the nm-spawned dnsmasq instance without caching, while LTSP fat client sessions would be resolved from the normal dnsmasq instance with caching.
Having one DNS server for half of the clients and another for the other half is bound to cause confusion and problems.

Anyway, I think I've made my point, if it's too difficult to do for Precise just postpone it until the next release. To workaround the problem for Greek schools I'll make an ltsp-server-dnsmasq package and sed the nm configuration in its postinst.

Cheers,
Alkis

Hi Mathieu,

> If you're installing dnsmasq on top of the standard desktop install, why is it such an issue to edit the NetworkManager configuration to cater it to your needs?
> except-interface=lo may be a good idea here to avoid listening on the loopback interface

It's not about me; it's that the default dnsmasq/bind installations are now broken on desktop installations.
For the needs of our schools here in every LTS release we're making repositories with custom packages for automated installation + configuration, so the nm configuration editing is just a sed away, much less trouble than even reporting the bug in the first place.

> Wouldn't it make sense it this case to go further steps and make sure the network connection is setup in /etc/network/interfaces rather than NM, to ensure you don't suddenly get a different IP address from DHCP?

No, network manager supports static IPs (even though we don't always need them even on LTSP servers) and doing it without /etc/network/interfaces allows teachers to see the network status from the nm applet.

> and setting up a special upstart job to spawn a local resolver won't work (NM spawns it itself, using a custom configuration on purpose).

Right, that's why I'm saying that the local resolver implementation is immature, it doesn't integrate with the rest of the distro, but it breaks other packages by launching a DNS server from hardcoded C code instead of a regular sysvinit/upstart script like all the other daemons.

> I guess a workable solution would be to check for /etc/default/dnsmasq and not spawn dnsmasq if the value of ENABLED is 1.

That would indeed be workable, please do implement it.

> listen-address probably shouldn't contain 127.0.0.1 if dnsmasq is meant to be used to resolve things for ltsp clients

Thin client sessions run on the server, and would be resolved from the nm-spawned dnsmasq instance without caching, while LTSP fat client sessions would be resolved from the normal dnsmasq instance with caching.
Having one DNS server for half of the clients and another for the other half is bound to cause confusion and problems.

Anyway, I think I've made my point, if it's too difficult to do for Precise just postpone it until the next release. To workaround the problem for Greek schools I'll make an ltsp-server-dnsmasq package and sed the nm configuration in its postinst.

Cheers,
Alkis

Revision history for this message

Mathieu Trudel-Lapierre (cyphermox) wrote on 2012-03-21:

#13

The parsing of /etc/default/dnsmasq won't fly.

Please, do post your dnsmasq configuration so we can try to figure out the right way to integrate this with the current setup.

As for the set of resolvers on the network, that's not exactly the "plan": all systems used to have the libc resolver. Now any system that runs NetworkManager will also be running a local dnsmasq instance since that handles a bunch of issues (more than three servers, split DNS, broken IPv6 DNS, etc) far better than libc. Then they can easily speak to a network DNS server if necessary or resolve directly to the internet.

I don't understand how your systems are setup, and I think that's where the confusion come from. What I'm expecting is that the LTSP server also runs a dnsmasq daemon to provide resolving to all the LTSP clients; with none of the clients running dnsmasq "locally". Isn't that the case?

I do think there are simpler ways to fix this than doing a sed of the nm configuration.

Revision history for this message

Alkis Georgopoulos (alkisg) wrote on 2012-03-21:

#14

> Please, do post your dnsmasq configuration so we can try to figure out the right way to integrate this with the current setup.

Just assume the default dnsmasq configuration, any other settings we have there are completely unrelated to this problem.
When one installs dnsmasq, it's supposed to start listening on 0.0.0.0:53, without manually editing any configuration files at all, i.e. with the stock /etc/dnsmasq.conf.
Now with the local resolver listening on 127.0.0.1:53, dnsmasq complains that the port is in use and fails to start.

> Now any system that runs NetworkManager will also be running a local dnsmasq

Let's step back a bit and talk about that. You're launching a DNS server without using a sysvinit or upstart job. So you're bypassing update-rc.d, policy-rc.d, upstart .override files, package Conflicts:, Provides: etc, all the standard framework for managing services.
Why wouldn't it be more reasonable to start the local resolver service normally like all the other daemons?
Even make a package out of it, and declare that it Conflicts: bind9, dnsmasq, so that people installing those automatically get rid of the local resolver and its conflicting configuration?
If you assume that "network-manager contains a hardcoded DNS server", then the network-manager package itself should conflict with other DNS servers... But that shouldn't be the case, people should be allowed to install any DNS server they want alongside network-manager, and that could be done seamlessly and without editing any configuration files at all if:
network-manager recommented the local-resolver package,
and the local-resolver package conflicted with the other dns server packages.

Then, when I install dnsmasq over the desktop installation, the local-resolver package would be automatically uninstalled, and I wouldn't have to edit any configuration file at all to resolve the conflict, it would be resolved by the package manager.

> I don't understand how your systems are setup, and I think that's where the confusion come from. What I'm expecting is that the LTSP server also runs a dnsmasq daemon to provide resolving to all the LTSP clients; with none of the clients running dnsmasq "locally".

The problem isn't LTSP specific, it applies to anyone that wants to use dnsmasq as a DNS server for his local network.
But yes, for LTSP labs that use dnsmasq, it is exactly as you described it. Now, LTSP clients are all diskless and netbooted, but of two kinds: thin and fat clients. Imagine thin clients like XDMCP clients, i.e. many users working remotely on the same server. So those would be using the local resolver, and miss the caching feature and the speed up that it offers.
Imagine fat clients like regular machines that have nameserver=the LTSP server in their resolv.conf. In the solution you proposed above, those would be using the real dnsmasq instance, with caching and everything.

> Please, do post your dnsmasq configuration so we can try to figure out the right way to integrate this with the current setup.

Just assume the default dnsmasq configuration, any other settings we have there are completely unrelated to this problem.
When one installs dnsmasq, it's supposed to start listening on 0.0.0.0:53, without manually editing any configuration files at all, i.e. with the stock /etc/dnsmasq.conf.
Now with the local resolver listening on 127.0.0.1:53, dnsmasq complains that the port is in use and fails to start.

> Now any system that runs NetworkManager will also be running a local dnsmasq

Let's step back a bit and talk about that. You're launching a DNS server without using a sysvinit or upstart job. So you're bypassing update-rc.d, policy-rc.d, upstart .override files, package Conflicts:, Provides: etc, all the standard framework for managing services.
Why wouldn't it be more reasonable to start the local resolver service normally like all the other daemons?
Even make a package out of it, and declare that it Conflicts: bind9, dnsmasq, so that people installing those automatically get rid of the local resolver and its conflicting configuration?
If you assume that "network-manager contains a hardcoded DNS server", then the network-manager package itself should conflict with other DNS servers... But that shouldn't be the case, people should be allowed to install any DNS  server they want alongside network-manager, and that could be done seamlessly and without editing any configuration files at all if:
network-manager recommented the local-resolver package,
and the local-resolver package conflicted with the other dns server packages.

Then, when I install dnsmasq over the desktop installation, the local-resolver package would be automatically uninstalled, and I wouldn't have to edit any configuration file at all to resolve the conflict, it would be resolved by the package manager.

> I don't understand how your systems are setup, and I think that's where the confusion come from. What I'm expecting is that the LTSP server also runs a dnsmasq daemon to provide resolving to all the LTSP clients; with none of the clients running dnsmasq "locally".

The problem isn't LTSP specific, it applies to anyone that wants to use dnsmasq as a DNS server for his local network.
But yes, for LTSP labs that use dnsmasq, it is exactly as you described it. Now, LTSP clients are all diskless and netbooted, but of two kinds: thin and fat clients. Imagine thin clients like XDMCP clients, i.e. many users working remotely on the same server. So those would be using the local resolver, and miss the caching feature and the speed up that it offers.
Imagine fat clients like regular machines that have nameserver=the LTSP server in their resolv.conf. In the solution you proposed above, those would be using the real dnsmasq instance, with caching and everything.

Revision history for this message

Mathieu Trudel-Lapierre (cyphermox) wrote on 2012-03-21:

#15

Then at this point the issue is that dnsmasq is shipped with a default configuration that while it's technically "correct"; binds on all interfaces and should normally be modified by the admin to suit the needs of their network. That configuration will break with NM making use of dnsmasq-base as a local resolver; and most likely also bombs with qemu/kvm virtual machines.

I want to make this easy for people in your situation, but having a system-wide instance isn't going to work. Not only is it way too complex for what we're trying to achieve (let alone confusing to users to see packages get removed by metapackages), but you always risk that someone modifying the system-wide config meant for use with NetworkManager then causes totally unwanted behavior when NetworkManager tries to add nameservers to the configuration. That's without counting that this still doesn't fix the issue of resolving for virtual machines, which you'll almost certainly want to resolve separately from anything else (and to think of it, installing virt-manager and virtual machine on your setup probably breaks just as bad as NM).

I've been trying hard to offer solutions and I've proposed configuration changes to the shipped config which cover the issue nicely for your case. If you don't want to apply these changes, that's fine; you're obviously free to implement a fix however you see fit :)

For precise +1 there may be a way to move dnsmasq initialization in NM to use 127.0.1.1, and allow this in dnsmasq with upstream's help, but that's not even going to solve this particular issue.

Reducing the priority since we won't look at this until Precise+1 and there aren't many reports about such issues.

Changed in network-manager (Ubuntu):
importance:	Medium → Low

Revision history for this message

Marco Menardi (mmenaz) wrote on 2012-04-05:

#16

I run ltsp also, and even if I remove NM completely, I think that Alkis's setup is interesting and would love to be able to use it also in the near future, so this "breakage" will affect me too.
As general consideration I find scaring that installing a package can bring such problems "just because we think that usually is not used often". I really want GNU/Linux keep being an predictable system and apt packaging a very good one, so please consider to fix this issue before release.
Thanks in advance

Revision history for this message

Asmo Koskinen (asmok) wrote on 2012-04-05:

#17

Me, too. Fix this one. '#dns=dnsmasq' is ugly hack, not for real humans, who run ltsp server at school.

Here is my bug report:

https://bugs.launchpad.net/ubuntu/+source/ltsp/+bug/955785

Best Regards Asmo Koskinen.

Revision history for this message

Mathieu Trudel-Lapierre (cyphermox) wrote on 2012-04-05:

#18

Please read the whole thread and see the various other workarounds provided; granted the default shipped configuration for dnsmasq doesn't play well with NetworkManager, but it's easy to adjust to your particular needs and workaround this issue; which also only happens if the system acting as a server locally runs both dnsmasq and NetworkManager.

We've clearly identified that having dnsmasq bind to particular interfaces is an easy way to work around this and is a very good idea anyway. Please make sure your dnsmasq configuration sets interface= to the interface on which it should listen, and possibly also uncomment bind-interfaces in /etc/dnsmasq.conf. At that point the changes to /etc/NetworkManager/NetworkManager.conf won't be required.

This isn't just a simple fix for this; the default shipped configuration for dnsmasq is just as "guilty" as network-manager for assuming it should bind on all addresses and all interfaces.

Revision history for this message

Alkis Georgopoulos (alkisg) wrote on 2012-04-06:

#19

> This isn't just a simple fix for this; the default shipped configuration for dnsmasq is just as "guilty" as network-manager for assuming it should bind on all addresses and all interfaces.

I disagree; most system services bind to all addresses and interfaces by default (sshd, cupsd, bind, dnsmasq, dhcp, tftp, nbd, inetd, rpc...). And I do want DNS services for my thin client sessions running on the server, so I do want dnsmasq listening in all addresses.

Revision history for this message

Alkis Georgopoulos (alkisg) wrote on 2012-04-09:

#20

Mathieu, some help please?

After my ltsp-pnp package comments out dns=dnsmasq in /etc/NetworkManager/NetworkManager.conf, it runs
invoke-rc.d dnsmasq restart from its postinst,
but that fails as the nm-spawned dnsmasq instance is still listening on port 53.

And if I kill it before starting the normal dnsmasq, that leaves the DNS configuration broken...

How can I tell resolv.conf and network-manager to reload their configurations?
Is it necessary to restart the network-manager service? And if it is, is that enough? I'd hate to have to tell the users that they need to restart their servers... :(

Thanks!

Revision history for this message

Mathieu Trudel-Lapierre (cyphermox) wrote on 2012-04-11:

#21

You need to restart network-manager after changing the configuration value.

It's unfortunate that the configuration needs to be changed, but it's needed. I sympathize with your use case, but there is sufficient benefit in using NM together with dnsmasq and resolvconf to solve other DNS resolution issues to inconvenience those who use dnsmasq separately as a standalone daemon (to have to change the config to suit their needs).

We won't be fixing this for Precise, but I've started discussion with dnsmasq upstream to possibly deal differently with the binding and allow running instances on other IP addresses (such as 127.0.1.1 or so). It's still going to need sufficient amounts of work to fix dnsmasq's method of binding to interfaces and how NM starts and interfaces with dnsmasq (though I already have patches for NM, but they're useless without the fixes in dnsmasq). At this point though, the simplest way to deal with this remains to edit interfaces= to map to the relevant external interfaces (eth0, wlan0, etc.) and let the NM-spawned instance get started on lo.

Revision history for this message

Alkis Georgopoulos (alkisg) wrote on 2012-04-11:

#22

> At this point though, the simplest way to deal with this remains to edit interfaces= to map to the relevant external interfaces (eth0, wlan0, etc.) and let the NM-spawned instance get started on lo.

We can't do that; we need DNS caching for thin client sessions which run on the server with DNS=127.0.0.1. We need to completely disable the nm dnsmasq spawning.

> You need to restart network-manager after changing the configuration value.

Thank you, I think that's too much to do from a postinst so I'll probably document it as part of the installation process.

For the record, I think that the proper way to solve the problem is from libc itself. Ask Simon to allow calling dnsmasq like a library, or communicate with it via a socket, whatever's needed, but no :53 port hooking, this is reserved for real DNS servers, not for helpers for libc shortcomings.

Thanks again for all the feedback,
Alkis

Revision history for this message

Mathieu Trudel-Lapierre (cyphermox) wrote on 2012-04-11:

#23

caching> That's one good reason where this is currently failing. The NM instance won't cache. That's disabled on purpose, but we'll re-enable for 12.10 or later once we can have per-user caches and something secure.

library> unfortunately, that won't help. library use, with not being able to keep state (e.g. have I tried this server yet? did it respond?) is one of the issues we're fixing with dnsmasq, which can't be tackled by a library.

Using dnsmasq via dbus is a likely good way to fix this, but there are countless possible issues with assuming that the centrally running instance of dnsmasq is the one you also want to use for resolving your own stuff, and to update with information from DHCP.

Revision history for this message

Alkis Georgopoulos (alkisg) wrote on 2012-04-13:

#24

Since this won't be fixed for Precise from the network-manager side, the dnsmasq package now is broken by default in desktop installations.
So I've added the dnsmasq package in the "Affects:" list, to make it easier for people to locate the cause of the problem so that fewer duplicate bug reports are filed (it's an LTS release, I suppose many people will be bitten by it in the next 5 years).

Also, even though it's not the correct place to solve the problem, the dnsmasq.postinst could be temporarily modified to disable the local resolver. I can propose a patch for it if the maintainer is interested.

Revision history for this message

Mathieu Trudel-Lapierre (cyphermox) wrote on 2012-04-20:

#25

That wouldn't be the right process though. The configuration itself shipped by default should be patched, that can be done with a simple patch to the dnsmasq package.

Revision history for this message

Alkis Georgopoulos (alkisg) wrote on 2012-04-23:

#26

> The configuration itself shipped by default should be patched

If you mean something like:
except-interface=lo
bind-interfaces

...I just tested them and they do allow both dnsmasq instances to run.

But of course those settings won't be acceptable to most dnsmasq users, as listening on "lo" is usually desired too (local DNS cache; DHCP/TFTP for VMs etc). So I don't think that crippling the default dnsmasq functionality is a good way to solve this problem. DNS clients shouldn't hook port 53; it's reserved for DNS servers.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2012-04-26:

#27

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in dnsmasq (Ubuntu):
status:	New → Confirmed

Revision history for this message

Bert Voegele (bertvoegele-deactivatedaccount) wrote on 2012-04-26:

#28

Just as a short reminder, there are more DNS-resolver/server available as packages out there than just bind and dnsmasq, i.e. djbdns and it's derivates. Until I removed the annoying dns=dnsmasq line in /e/N/Nconf, NM disconnected the WLAN after a couple of minutes, throwing an error about dnsmasq not able to bind to 127.0.0.1.
I'm puzzled about the default inclusion of dnsmasq as a local resolver for standard users. If a connection is to be shared, it might be useful to bind dnsmasq to the shared iface to provide DHCP and DNS, like it's done with libvirt-bin.

Thomas Hood (jdthood) on 2012-06-01

summary:

- Don't start local resolver if a DNS server is installed
+ Standalone dnsmasq is not compatible out of the box with NM+dnsmasq

Alkis Georgopoulos (alkisg) on 2012-06-06

summary:

- Standalone dnsmasq is not compatible out of the box with NM+dnsmasq
+ Don't start local resolver if a DNS server is installed

Revision history for this message

Alkis Georgopoulos (alkisg) wrote on 2012-06-06:

#29

@jdthood: the "Standalone dnsmasq is not compatible out of the box with NM+dnsmasq" title hints that the problem is caused by the dnsmasq package, i.e. that it should be crippled and not listen on "lo" by default in order to coexist with the local resolver implementation.

I don't think this is the case, I don't think the dnsmasq package does anything wrong; I just cross-linked the bug report in case other people hit the problem and try to find it in the dnsmasq bug page.

The problem should be fixed from the network-manager side.

Otherwise, similar bug reports should be filed against all other DNS server packages, not just dnsmasq. But I really think that people do want their DNS servers to listen on "lo" by default. They wouldn't want to break that just to help the local resolver implementation.

Revision history for this message

Mathieu Trudel-Lapierre (cyphermox) wrote on 2012-06-06:

#30

Listening on lo is fine; and blocking other DNS servers from being started isn't. I think we're in violent agreement there. The problem is how to fix this.

I'm not saying dnsmasq should be crippled, but that it should special-case lo and not just listen on 0.0.0.0; because that binds to any further use of port 53, which might not work with any further processes that might want to legitimately listen on port 53.

That's pretty much how the solution is shaping to be: when listening on all interfaces, listen on each interfaces separately; binding to the IP address attached to the interface (or via any other mean). We should then be able to have dnnsmasq listen on 127.0.1.1:53 to satisfy the need for a local resolver.

Revision history for this message

Thomas Hood (jdthood) wrote on 2012-06-06:

#31

@Alkis: Your title "Dont..." is not a description of a problem.

Revision history for this message

Alkis Georgopoulos (alkisg) wrote on 2012-06-06: Re: Local resolver prohibits DNS servers from running

#32

@Thomas: cool, I hope this one's better.

summary:

- Don't start local resolver if a DNS server is installed
+ Local resolver prohibits DNS servers from running

Revision history for this message

Thomas Hood (jdthood) wrote on 2012-06-06:

#33

I just re-read the whole discussion and thought it would be useful (for me, at least) to summarize it.

The original bug report was that NM+dnsmasq and standalone dnsmasq are incompatible because they have overlapping network socket address ranges, 0.0.0.0:53 and 127.0.0.1:53.

One solution is for the administrator to comment out "dns=dnsmasq" in /etc/NetworkManager/NetworkManager.conf.

Another solution is as described by the submitter's title: "[Hey NetworkManager,] Don't start local resolver if a DNS server is installed".

Another solution favored by Mathieu is for the NM-enslaved dnsmasq and the standalone dnsmasq to use disjoint network socket address ranges.

Early on, Mathieu said that solving this problem would not be a top priority because not many users want to combine the DNS server role (running bind or dnsmasq) with the DNS client role (running NM+dnsmasq).

Alkis argued that the incompatibility is a serious bug that should be prevented using package dependencies or eliminated automatically by maintainer scripts or other means. The administrator shouldn't have to search the web to figure out how to make the dnsmasq package work. Troublesome is the fact that standalone dnsmasq sometimes works, sometimes doesn't, in the presence of NM+dnsmasq.

Along the way Alkis levelled some fundamental criticisms against the design of NM+dnsmasq.

I think that there is a clash of civilizations here: the Debian way (modular components that just work together in any combination allowed by package dependencies) versus the RedHat way (big daemons with limited options that own subsystems).

Revision history for this message

Thomas Hood (jdthood) wrote on 2012-06-06:

#34

Alkis: Why do you need the dnsmasq package at all? You want NM and dnsmasq. Why not just use the NM-enslaved dnsmasq?

If the latter doesn't meet your needs, could it be adapted somehow to meet your needs?

Assuming that there are good reasons for using NM and standalone dnsmasq, I'd be inclined to agree with Alkis (if I understood him correctly) that a good solution would be to put the NM-dnsmasq integration stuff into a package and make this conflict with the standalone dnsmasq package.

Revision history for this message

Thomas Hood (jdthood) wrote on 2012-06-06:

#35

Hmm, I wasn't very clear. What I meant in my questions above (#34) was this. If NM+dnsmasq is the best solution for name service for the local host, isn't it also a better solution than NM-together-with-standalone-dnsmasq for remote hosts? If so then another solution approach is to enhance NM so that its enslaved dnsmasq listens on non-loopback addresses too. Once this is implemented the network-manager package could be made to Conflict with the dnsmasq package.

Revision history for this message

Alkis Georgopoulos (alkisg) wrote on 2012-06-06:

#36

Thomas, that was a very good summary at comment #33!

> Why do you need the dnsmasq package at all? You want NM and dnsmasq. Why not just use the NM-enslaved dnsmasq?

The NM-enslaved dnsmasq uses hardcoded options (in C) that provide extremely limited functionality.
* It doesn't listen on ethX (--listen-address=127.0.0.1). So we can't use our servers as DNS servers for our local network PCs, i.e. it's completely useless for LANs.
* It doesn't cache requests (--cache-size=0). No caching ==> no DNS queries speedup. This again is very significant for LANs as there are many concurrent users.
* Finally, we also need the DHCP and TFTP functionality of dnsmasq, so even if NM+dnsmasq included a real DNS server, we'd have to run another dnsmasq instance (without a DNS service in that case) for its 2 other services.

> a good solution would be to put the NM-dnsmasq integration stuff into a package and make this conflict with the standalone dnsmasq package.

I completely agree, and to also conflict with bind9 and any other DNS server packages.

Revision history for this message

Thomas Hood (jdthood) wrote on 2012-06-07:

#37

What lies behind the problem being discussed here is the simple fact that there exists no single adequate network configuration utility for GNU/Linux. I am most familiar with Debian. From Debian we inherit ifupdown which was designed for static configuration. Debian developers have known for more than ten years that ifupdown needed to be replaced, but have never managed to come up with a replacement. From RedHat we get NetworkManager which was never intended to be a general network configurer but in the absence of any alternative continues to be enhanced with new features. Considerable effort has obviously been spent in Ubuntu just to get NM to coexist with other networking packages. It still doesn't fully cooperate with them (see #47379 for another example) and will probably never be well integrated with them.

So we are still forced to choose between two network configuration approaches, NM-oriented in the desktop version and ifupdown-oriented in the server version. Each one has its limitations. If you try to combine the two, as you (Alkis) want to do, then you are confronted with these limitations. You are lucky that all you have to do is comment out one line in a configuration file to get things to work!

We can continue playing around with the existing tools so that they work better in particular use cases but what we really need is a properly designed network configuration utility to supersede both ifupdown and NM.

I am vaguely aware of the Wicd project. Must go read up on that.

Revision history for this message

Thomas Hood (jdthood) wrote on 2012-06-07:

#38

* Some thinking about[0][1], if not much coding of[2], a successor to ifupdown was done in the netconf project[3] led by Debian Developer martin krafft[4][5].

[0]http://people.debian.org/~madduck/talks/netconf_fosdem_2007.02.25/slides.s5.html
[1]http://lists.alioth.debian.org/pipermail/netconf-devel/
[2]http://lists.alioth.debian.org/pipermail/netconf-commits/
[3]https://alioth.debian.org/projects/netconf/
[4]madduck AT debian.org
[5]http://people.debian.org/~madduck/

* One small step toward harmonizing desktop network configuration and server network configuration was taken with the introduction of resolvconf in both versions of 12.04. But there again, NM integrates bare-minimally with resolvconf; NM doesn't let resolvconf prioritize nameserver information according to interface-order(5) but sends resolvconf one big lump of nameserver information called "NetworkManager".

* If Ubuntu doesn't switch to wicd or netconf or something else then another possibility to be explored is to break up NM into components that can be better integrated with other parts of the distro. This is, of course, rather difficult without cooperation from upstream.

Revision history for this message

Thomas Hood (jdthood) wrote on 2012-06-07: Re: NM-controlled dnsmasq prevents other DNS servers from running

#39

Based on comment #28, marked as affecting djbdns.

summary:

- Local resolver prohibits DNS servers from running
+ NM-controlled dnsmasq prevents other DNS servers from running

Thomas Hood (jdthood) on 2012-06-08

summary:

- NM-controlled dnsmasq prevents other DNS servers from running
+ NM-controlled dnsmasq prevents other DNS servers from running, yet
+ network-manager doesn't Conflict with their packages

Revision history for this message

Thomas Hood (jdthood) wrote on 2012-06-08: Re: NM-controlled dnsmasq prevents other DNS servers from running, yet network-manager doesn't Conflict with their packages

#40

But enough dreaming. Given the world as it is, the immediate challenge is to make NM+dnsmasq compatible with standalone nameservers. (Otherwise network-manager should Conflict with those nameservers' packages.)

Solutions mentioned earlier:
* Tell the administrator to comment out "dns=dnsmasq" in /etc/NetworkManager/NetworkManager.conf after installing dnsmasq or another DNS server package.
* Change NM so that it acts as if "dns=dnsmasq" is absent if a DNS server package is installed.
* Change standalone dnsmasq such that it doesn't listen on 0.0.0.0:53, doesn't listen on 127.0.1.1:53 and change NM so that its dnsmasq listens only on 127.0.1.1:53.

Here's a new idea.

* Enhance the resolver(3) so that nameservers can be specified in resolv.conf using the <address>:<port> notation
* Change NM such that it causes its slave dnsmasq to listen on another (than 53) port number P and sends "nameserver 127.0.0.1:P" to resolvconf.

Thomas Hood (jdthood) on 2012-06-14

summary:

- NM-controlled dnsmasq prevents other DNS servers from running, yet
- network-manager doesn't Conflict with their packages
+ NM-controlled dnsmasq prevents other DNS servers from starting

Thomas Hood (jdthood) on 2012-06-20

Changed in pdnsd (Ubuntu):
status:	New → Invalid

Thomas Hood (jdthood) on 2012-06-20

Changed in pdns-recursor (Ubuntu):
status:	New → Invalid

Thomas Hood (jdthood) on 2012-07-04

Changed in dnsmasq (Ubuntu):
status:	Confirmed → Invalid
status:	Invalid → Confirmed

Mathieu Trudel-Lapierre (cyphermox) on 2012-07-10

Changed in pdns-recursor (Ubuntu Precise):
status:	New → Invalid
Changed in pdnsd (Ubuntu Precise):
status:	New → Invalid
Changed in network-manager (Ubuntu Precise):
status:	New → Triaged
Changed in dnsmasq (Ubuntu Precise):
status:	New → Confirmed
Changed in network-manager (Ubuntu Precise):
importance:	Undecided → Low

Launchpad Janitor (janitor) on 2012-07-16

Changed in network-manager (Ubuntu):
status:	Triaged → Fix Released

Launchpad Janitor (janitor) on 2012-08-26

Changed in djbdns (Ubuntu Precise):
status:	New → Confirmed
Changed in djbdns (Ubuntu):
status:	New → Confirmed

Mathieu Trudel-Lapierre (cyphermox) on 2012-09-14

Changed in network-manager (Ubuntu Precise):
assignee:	nobody → Mathieu Trudel-Lapierre (mathieu-tl)
Changed in dnsmasq (Ubuntu):
status:	Confirmed → Fix Released
Changed in dnsmasq (Ubuntu Precise):
assignee:	nobody → Mathieu Trudel-Lapierre (mathieu-tl)
importance:	Undecided → High
status:	Confirmed → Triaged
Changed in network-manager (Ubuntu Precise):
importance:	Low → High

Revision history for this message

Robin Battey (zanfur) wrote on 2012-10-09:

#116

Download full text (4.0 KiB)

> Are you sure? I am only aware of named.conf's "listen-on { IP_ADDRESS; }". If there is a feature such as you describe then presumably named binds ALL:53 and then filters according to the addresses on the specified interfaces.

Nope, I just verified, you're quite correct. I hadn't heard of it either, but upon (mis)reading comments above I presumed without verifying. Bad on me.

> A question about the NSS plugin idea. Will this work only for software that uses glibc? What about alternative resolver libraries?

Anything that uses the gethostbyname(3) call uses the NSS chain. That means essentially everything that isn't a resolver itself uses nsswitch.conf. DNS resolver libraries won't use NSS by design, because they are the resolvers themselves that are *used* by NSS. This is why there are no names in their respective configuration files, save for what they're serving (remote addresses are specified by address). If any DNS resolver itself reads nsswitch.conf, it's doing somethign Very Wrong.

The idea of NSS is that the DNS resolvers aren't *supposed* to use it. They are the exporters of NSS services, not the consumers. I don't know of any of them that use NSS for their own resolution; they are just one link in the NSS chain that is used by the (libc) name resolver libraries. When you hit the DNS service itself, you really *don't* want it to start the NSS chain over, because that would just lead to a loop.

My proposal for using NSS in place of NetworkManager's dnsmasq is to create a new NSS plugin and place it earlier in the NSS chain than the standard DNS resolver. For instance, a line like so:

hosts: files mdns4_minimal [NOTFOUND=return] network_manager [NOTFOUND=return] dns mdns4

This is straight from my Precise install, with the addition of the "network_manager [NOTFOUND=return]" stanza. It says that first you check /etc/hosts (that's "files"), then a subset of avahi ("mdns4_minimal [NOTFOUND=return]"), then your NM plugin "network_manager [NOTFOUND=return]", plain old DNS ("dns"), then avahi again ("mdns4").

It would not conflict with any other NSS plugin, because they are all tried in turn until a match is found. If you place it directly in front of the DNS resolver plugin in nsswitch.conf, it will be used before the standard DNS lookup, allowing you to do all the fancy connection-specific magic you need to do, while returning "Try Next" for anything non-connection specific, thus allowing the normal DNS resolver plugin (which reads resolv.conf) to do things as normal. This is *instead* of hooking in at resolv.conf, as you do now. People can install any resolver they want, and it works as designed. This lets you listen on high-numbered ports as well, *and* lets you have per-user dnsmasq instances (per user vpns?), while still running Bind or a normal dnsmasq instance on *:53.

Right now, the dnsmasq for NM basically hijacks resolv.conf, which means it's hooking into the DNS NSS plugin's resolution (it's the plugin that reads resolv.conf, not the applications, using code in libc). This is causing conflicts, because in order to use resolv.conf, you need to be running on port 53 -- and it would take re-writing ...

> Are you sure? I am only aware of named.conf's "listen-on { IP_ADDRESS; }". If there is a feature such as you describe then presumably named binds ALL:53 and then filters according to the addresses on the specified interfaces.

Nope, I just verified, you're quite correct.  I hadn't heard of it either, but upon (mis)reading comments above I presumed without verifying.  Bad on me.

> A question about the NSS plugin idea. Will this work only for software that uses glibc? What about alternative resolver libraries?

Anything that uses the gethostbyname(3) call uses the NSS chain.  That means essentially everything that isn't a resolver itself uses nsswitch.conf.  DNS resolver libraries won't use NSS by design, because they are the resolvers themselves that are *used* by NSS.  This is why there are no names in their respective configuration files, save for what they're serving (remote addresses are specified by address).  If any DNS resolver itself reads nsswitch.conf, it's doing somethign Very Wrong.

The idea of NSS is that the DNS resolvers aren't *supposed* to use it.  They are the exporters of NSS services, not the consumers.  I don't know of any of them that use NSS for their own resolution; they are just one link in the NSS chain that is used by the (libc) name resolver libraries.  When you hit the DNS service itself, you really *don't* want it to start the NSS chain over, because that would just lead to a loop.

My proposal for using NSS in place of NetworkManager's dnsmasq is to create a new NSS plugin and place it earlier in the NSS chain than the standard DNS resolver.  For instance, a line like so:

hosts:          files mdns4_minimal [NOTFOUND=return] network_manager [NOTFOUND=return] dns mdns4

This is straight from my Precise install, with the addition of the "network_manager [NOTFOUND=return]" stanza.  It says that first you check /etc/hosts (that's "files"), then a subset of avahi ("mdns4_minimal [NOTFOUND=return]"), then your NM plugin "network_manager [NOTFOUND=return]", plain old DNS ("dns"), then avahi again ("mdns4").

It would not conflict with any other NSS plugin, because they are all tried in turn until a match is found. If you place it directly in front of the DNS resolver plugin in nsswitch.conf, it will be used before the standard DNS lookup, allowing you to do all the fancy connection-specific magic you need to do, while returning "Try Next" for anything non-connection specific, thus allowing the normal DNS resolver plugin (which reads resolv.conf) to do things as normal.  This is *instead* of hooking in at resolv.conf, as you do now.  People can install any resolver they want, and it works as designed.  This lets you listen on high-numbered ports as well, *and* lets you have per-user dnsmasq instances (per user vpns?), while still running Bind or a normal dnsmasq instance on *:53.

Right now, the dnsmasq for NM basically hijacks resolv.conf, which means it's hooking into the DNS NSS plugin's resolution (it's the plugin that reads resolv.conf, not the applications, using code in libc).  This is causing conflicts, because in order to use resolv.conf, you need to be running on port 53 -- and it would take re-writing parts of the DNS NSS plugin (or libc!) to change this.  But, you don't need to do that at all.  Just insert the NM NSS plugin *before* the DNS NSS plugin, and you can do all the fancy things you want, without ever breaking any DNS resolution at all. If you don't have anything special to do, return "notfound" and DNS will do its thing. Alternatively, you can *replace* the DNS NSS library with your own (add yours to nsswitch and remove the dns one), and do all processing in there, which will likely involve querying the local dnsmasq instance directly without even bothering with resolv.conf.

Really, the Name Service Switch subsystem is the system designed to handle Switching between multiple Name Service providers.  That's where such things need to be.  See documentation:

http://www.gnu.org/software/libc/manual/html_node/Name-Service-Switch.html

Revision history for this message

Svartalf (frank-earlconsult) wrote on 2012-10-16:

#117

This is a bad idea as it's been implemented, guys- there's tons of local installations that use internal DNS (My CenturyLink router or my day-job's setup, for example...) that this flatly breaks out of box. You've got to do a bunch of manual interventions for MANY corporate desktop and home desktop situations. It doesn't honor lookups against the local, specified by DHCP, DNS servers- it goes out to the DNS roots and goes from there. Works FINE for JUST surfing the 'net. It's an EPIC FAIL for normal, typical DNS use right now because there's no honoring any internal only DNS entries with it as it is out of box.

It's nice that you're trying to make it easier for VPN, etc. but in the corporate desktop story, you're using OpenVPN, PPTP, or something like Sonicwall's solution. This means it's going to re-direct DNS on you ANYHOW, defeating the nice thing you're attempting here. If you think you're changing their minds, think again.

As it stands, I'm going off to cripple this less than well thought out design decision so that things MIGHT work better on my setups. I suggest thinking through *ALL* prospective use-cases of things before implementing something like this in the future- it really, really ticks people off when it doesn't work like it's supposed to.

Revision history for this message

Thomas Hood (jdthood) wrote on 2012-10-16:

#118

@Svartalf: Can you please describe in more technical detail what fails to work on the machines in question, and share with us what you know about the causes of these malfunctionings? Once we have some idea what you're talking about we can help you further.

You wrote:
> there's tons of local installations that use internal DNS

What do you mean by "internal DNS"?

> It doesn't honor lookups against the local, specified by DHCP, DNS servers [...]

Ubuntu 12.04 *does* use DNS nameserver addresses provided by DHCP. Can you please explain what you are talking about here?

> OpenVPN, PPTP, or something like Sonicwall's solution [is] going to re-direct DNS on you ANYHOW
> If you think you're changing their minds, think again.

Ubuntu software works properly in Ubuntu 12.04 (except where it doesn't --- see the BTS). Third party software may fail to work properly, but it's up to the third party to fix that.

Third parties who think they can dictate how free host operating systems work can go fly a kite. Just my personal view.

Revision history for this message

John Hupp (john.hupp) wrote on 2012-11-21:

#119

I don't know how my case enters this discussion, but it is certainly connected to the current default installation wherein network-manager starts an instance of dnsmasq to act as a DHCP, DNS and TFTP server.

I was troubleshooting an LTSP-PNP client boot problem under Lubuntu Quantal. I installed with a single NIC per https://help.ubuntu.com/community/UbuntuLTSP/ltsp-pnp.

The problem is that the LTSP client, after successfully getting DHCP assignments, fails to download the pxelinux boot image. It reports "PXE-E32: TFTP open timeout."

To be more specific on the DHCP assignments, it identifies my hardware router as the DHCP server and the default gateway. It identifies the LTSP server as proxy and boot server.

I can also run this on the server itself to get a similar failure:
$ cd /tmp
$ tftp 192.168.1.102 -v -m binary -c get /ltsp/i386/pxelinux.0
mode set to octet
Connected to 192.168.1.102 (192.168.1.102), port 69
getting from 192.168.1.102:/var/lib/tftpboot/ltsp/i386/pxelinux.0 to pxelinux.0 [octet]
Transfer timed out.

A CRITICAL NOTE: This is using the default network-manager configuration of the network interface (using the default DHCP configuration, and the connection is "Available to all users").

If I merely configure the network interface (again for DHCP) via /etc/network/interfaces, the TFTP error disappears and the LTSP client boots.

But it introduces a new problem on both server and client: DNS resolution fails.

I can fix the DNS resolution problem by creating /etc/resolvconf/resolv.conf.d/tail with contents:
nameserver (my nameserver 1)
nameserver (my nameserver 2)

But trying to identify and perhaps work around the problem with network-manager and dnsmasq, I undid the changes to /etc/network/interfaces and deleted /etc/resolvconf/resolv.conf.d/tail.

It turns out that if I merely
$ sudo service dnsmasq restart
then the LTSP client will boot normally.

Hunting for some diagnostic information, I ran this command before and after restarting dnsmasq:
$ sudo netstat -nap | grep dnsmasq

Relevant output before restarting:
udp 0 0 127.0.0.1:69 0.0.0.0:* 887/dnsmasq

After restarting:
udp 0 0 127.0.0.1:69 0.0.0.0:* 1967/dnsmasq
udp 0 0 192.168.1.102:69 0.0.0.0:* 1967/dnsmasq
(where 192.168.1.102 is the server IP)

So dnsmasq is not binding to my server IP during boot.

If I remove /etc/dnsmasq.d/network-manager (which issues the sole dnsmasq directive to bind all the interfaces instead of listening on 0.0.0.0) and restart the server it allows the client to boot normally.

I don't know how my case enters this discussion, but it is certainly connected to the current default installation wherein network-manager starts an instance of dnsmasq to act as a DHCP, DNS and TFTP server.

I was troubleshooting an LTSP-PNP client boot problem under Lubuntu Quantal.  I installed with a single NIC per https://help.ubuntu.com/community/UbuntuLTSP/ltsp-pnp.

The problem is that the LTSP client, after successfully getting DHCP assignments, fails to download the pxelinux boot image.  It reports "PXE-E32: TFTP open timeout."

To be more specific on the DHCP assignments, it identifies my hardware router as the DHCP server and the default gateway.  It identifies the LTSP server as proxy and boot server.