dnsmasq temporarily breaks DNS resolution when starting for the first time

Bug #1247803 reported by Philip Potter
42
This bug affects 8 people
Affects Status Importance Assigned to Milestone
dnsmasq (Debian)
New
Undecided
Unassigned
dnsmasq (Ubuntu)
Undecided
Unassigned

Bug Description

The first time that dnsmasq is started, DNS resolution is broken for a few seconds. You can see this on initial installation:

root@phil-test-1:~# apt-get install dnsmasq ; dig github.com
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
  dnsmasq
0 upgraded, 1 newly installed, 0 to remove and 23 not upgraded.
Need to get 0 B/15.1 kB of archives.
After this operation, 111 kB of additional disk space will be used.
Selecting previously unselected package dnsmasq.
(Reading database ... 92556 files and directories currently installed.)
Unpacking dnsmasq (from .../dnsmasq_2.59-4ubuntu0.1_all.deb) ...
Processing triggers for ureadahead ...
Setting up dnsmasq (2.59-4ubuntu0.1) ...
 * Starting DNS forwarder and DHCP server dnsmasq [ OK ]

; <<>> DiG 9.8.1-P1 <<>> github.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 56221
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;github.com. IN A

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Nov 4 11:29:16 2013
;; MSG SIZE rcvd: 28

Or you can recreate the problem on an existing installation by removing /var/run/dnsmasq/resolv.conf:

root@phil-test-1:~# service dnsmasq stop
 * Stopping DNS forwarder and DHCP server dnsmasq [ OK ]
root@phil-test-1:~# rm /var/run/dnsmasq/resolv.conf
root@phil-test-1:~# service dnsmasq start; dig github.com
 * Starting DNS forwarder and DHCP server dnsmasq [ OK ]

; <<>> DiG 9.8.1-P1 <<>> github.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 10196
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;github.com. IN A

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Nov 4 11:31:21 2013
;; MSG SIZE rcvd: 28

The REFUSED status line shows that dns resolution has failed in both cases. I expect that if `apt-get install dnsmasq` or `service dnsmasq start` has returned successfully, and resolvconf has had dnsmasq registered as the sole resolver for lo.dnsmasq, then dnsmasq is ready to respond to DNS requests. Therefore, the REFUSED response from dig is the opposite of what I expect to happen.

In both cases, resolution works again after a few seconds, once resolvconf generates the /var/run/dnsmasq/resolv.conf file and dnsmasq polls for and finds it.

Even though the window is short (syslog reports ~ 4 seconds of unavailability), this causes me pain because I am doing a lot of automated installations using puppet; immediately after installing dnsmasq, any other package installations or apt-get update runs fail.

I believe the problem is that the init.d script assumes that /var/run/dnsmasq/resolv.conf is already in place, but it may not be because nothing has caused resolvconf to refresh itself since /etc/resolvconf/update.d/dnsmasq was put in place. One solution would be to get the init.d script to run resolvconf -u if it decided to use /var/run/dnsmasq/resolv.conf. I am happy to submit a patch based on this.

This was on Ubuntu 12.04.3 LTS, using dnsmasq 2.59-4ubuntu0.1

Revision history for this message
Thomas Hood (jdthood) wrote :

When the dnsmasq package is installed its postinst starts the dnsmasq daemon via the initscript. Dnsmasq initially reads what is most probably an empty file from /var/run/dnsmasq/resolv.conf and so initially can't resolve names. (The file is probably empty because it is generated by /etc/resolvconf/update.d/dnsmasq which is included in the dnsmasq package.) Then the dnsmasq initscript tells resolvconf that dnsmasq is listening at 127.0.0.1. In response to this, resolvconf runs the aforementioned hook script /etc/resolvconf/update.d/dnsmasq which writes a new /var/run/dnsmasq/resolv.conf containing information about other nameservers. The dnsmasq binary notices that the latter file has changed and re-reads it. Meanwhile resolvconf updates /etc/resolv.conf to contain "nameserver 127.0.0.1" so that the resolver will talk to dnsmasq.

It's this "meanwhile" that is the problem. Resolvconf may update resolv.conf to point to dnsmasq before dnsmasq is ready to resolve names on the basis of the information just written to /var/run/dnsmasq/resolv.conf.

In other words, you're right. :)

I think that the postinst should be enhanced such that if /etc/resolvconf/update.d/dnsmasq has appeared or changed on install or upgrade then it (the postinst) does "resolvconf -u" before starting dnsmasq. The postinst should refrain from doing the "resolvconf -u" if IGNORE_RESOLVCONF is set in /etc/default/dnsmasq.

Changed in dnsmasq (Ubuntu):
status: New → Confirmed
Revision history for this message
Philip Potter (philip-g-potter) wrote :

I agree that the postinst is a better place than the init script to run "resolvconf -u".

I'm not sure that it should be conditional on IGNORE_RESOLVCONF though - given that the update script will be run next time anything touches resolvconf, what's to be gained by not running it in the postinst? And why stop there? Why not also make it conditional on ENABLED=0?

I've created a branch with an unconditional "resolvconf -u" in the postinst. I'm new to launchpad and bazaar so I'm not sure what the next step is -- do I propose a merge? Do I attach a patch to this ticket?

Revision history for this message
Thomas Hood (jdthood) wrote :

Hmm, good questions . /me thinks.

The (small) gain is that we omit an unneeded update run prior to the update run that occurs shortly afterwards when the dnsmasq initscript calls resolvconf.

When other things touch resolvconf the update run can't be omitted.

We don't want to skip the update run when ENABLED=0 because in that case the initscript itself does not instigate an update run. If no update run is instigated either in the postinst or in the initscript, and the admin later sets ENABLED=1 and IGNORE_RESOLVCONF=no and does "/etc/init.d/dnsmasq start" and nothing else has instigated an update run in the meantime then dnsmasq starts with an out-of-date /var/run/dnsmasq/resolv.conf which is what we are trying to avoid. So when ENABLED=0 the update run must be done in the postinst even if IGNORE_RESOLVCONF is set.

The code should thus look like this:

    #
    # If ENABLED=0 then the initscript does not call resolvconf, so we do an
    # update run here in order to ensure that /var/run/dnsmasq/resolv.conf
    # is up to date should dnsmasq later be started (with ENABLED=1).
    #
    # If ENABLED=1 then the initscript will call resolvconf and thus instigate an
    # update run, thus updating /var/run/dnsmasq/resolv.conf; but, unless
    # IGNORE_RESOLVCONF is "yes", we have to do an update run here so that
    # /var/run/dnsmasq/resolv.conf is valid before dnsmasq starts.
    #
    if [ "$ENABLED" = "0" ] || [ "$IGNORE_RESOLVCONF" != yes ] ; then
        resolvconf -u
    fi

Revision history for this message
Simon Kelley (simon-thekelleys) wrote : Re: [Bug 1247803] Re: dnsmasq temporarily breaks DNS resolution when starting for the first time

On 09/11/13 19:07, Philip Potter wrote:
> I agree that the postinst is a better place than the init script to run
> "resolvconf -u".
>
> I'm not sure that it should be conditional on IGNORE_RESOLVCONF though -
> given that the update script will be run next time anything touches
> resolvconf, what's to be gained by not running it in the postinst? And
> why stop there? Why not also make it conditional on ENABLED=0?
>
> I've created a branch with an unconditional "resolvconf -u" in the
> postinst. I'm new to launchpad and bazaar so I'm not sure what the next
> step is -- do I propose a merge? Do I attach a patch to this ticket?
>

Once this has gone though Ubunutu processes, please send my a diff and
I'll propogate it to the Debian package.

Cheers,

Simon.

Revision history for this message
Philip Potter (philip-g-potter) wrote :

I don't really understand why we need to add a conditional at all. It's always safe to run resolvconf -u one time too many; but running it one time too few will introduce subtle bugs (like this one).

The proposed conditional only suppresses running resolvconf -u if ENABLED=1 and resolvconf is not being used (ie IGNORE_RESOLVCONF=yes). Is this really such a common case that adding complexity to get the minor optimization of not running resolvconf -u is worth it?

Also, doesn't your argument about ENABLED=0 later being changed to ENABLED=1 also apply to IGNORE_RESOLVCONF=yes later being changed to IGNORE_RESOLVCONF=no?

I'm still in favour of just unconditionally running resolvconf -u, as my branch does.

Revision history for this message
Thomas Hood (jdthood) wrote :

We certainly don't want to run "resolvconf -u" too few times. That is the bug.

It causes no logical malfunction to run "resolvconf -u" too many times, but doing so is not efficient. When a resolvconf update occurs then all the scripts in /etc/resolvconf/update.d/ get run. If a "heavy" update script is present (one that copies files, reconfigures things and/or restarts services, etc.) then the update can take a significant amount of time. In that case it's bad to do an extra, unnecessary update. And I think it's ugly to do two updates in a row if one is sufficient.

> The proposed conditional only suppresses running resolvconf -u
> if ENABLED=1 and resolvconf is not being used (ie IGNORE_RESOLVCONF=yes).
> Is this really such a common case that adding complexity to get the minor
> optimization of not running resolvconf -u is worth it?

You have a point there. Perhaps it is, perhaps it is not worth the added code complexity in the postinst. The maintainer will be the judge.

> Also, doesn't your argument about ENABLED=0 later being changed
> to ENABLED=1 also apply to IGNORE_RESOLVCONF=yes later being
> changed to IGNORE_RESOLVCONF=no?

No, I don't think it applies. In the ENABLED=0 case, resolvconf doesn't get run by the dnsmasq initscript. So unless the postinst does "resolvconf -u" there is nothing to ensure that if dnsmasq is later restarted with ENABLED=1 and IGNORE_RESOLVCONF=no then /var/run/dnsmasq/resolv.conf will have been written. In the ENABLED=1 IGNORE_RESOLVCONF=yes case resolvconf does get run in the dnsmasq initscript and /var/run/dnsmasq/resolv.conf gets written (even though it won't be used). So if dnsmasq is later restarted with ENABLED=1 and IGNORE_RESOLVCONF=no then /var/run/dnsmasq/resolv.conf is ready.

Revision history for this message
Philip Potter (philip-g-potter) wrote :

Thanks for your reply, Thomas. I now agree that it is the maintainer's choice as to which of the proposed implementations should be used. I've made a separate branch with the more complex conditional and attached it to this ticket.

What's the next step? I'm new to the whole ubuntu package maintenance process. I see Simon Kelley is listed as the maintainer -- is it up to him to choose one of the proposed solutions?

Also, I've based my patches off of the precise branch, because that's the distro I use most frequently. Is that sensible? Should I have based it off of trusty instead and backported later? Should I attach a patch file instead?

Revision history for this message
David Medberry (med) wrote :

This likely needs to go into an SRU for Precise and get fixed in Trusty and Utopic.

Revision history for this message
foton (luminariascba) wrote :
Revision history for this message
Fabian M. Borschel (onibox) wrote :

Any updates here?

Revision history for this message
Alexander (4-m7il-s) wrote :

I've had this problem on Trusty (14.04).

After installing Xenial (16.04) I cannot reproduce this issue. That's not 100% evidence that it's gone, but looks like it might be.

Could be the move from upstart to systemd, that solved this automatically.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers