lxcbr0 fails to come up when dnsmasq is installed

Bug #928524 reported by Serge Hallyn on 2012-02-07
74
This bug affects 12 people
Affects Status Importance Assigned to Milestone
dnsmasq (Ubuntu)
Undecided
Unassigned
Precise
Undecided
Unassigned
libvirt (Ubuntu)
High
Unassigned
Precise
Undecided
Unassigned
lxc (Ubuntu)
High
Unassigned
Precise
Undecided
Unassigned
network-manager (Ubuntu)
Undecided
Unassigned
Precise
Undecided
Unassigned

Bug Description

============ SRU justification ===========
Impact: lxc (by way of lxc-net) fails to start when dnsmasq is installed
Development fix: install a dnsmasq.d file to prevent the system-wide dnsmasq
from binding to lxcbr0
Stable fix: same as Development fix
Test case:
 sudo apt-get purge lxc
 sudo apt-get -y install dnsmasq
 sudo apt-get -y install lxc
Regression potential: There should be none, since we are simply telling the
system-wide dnsmasq (if any) not to bind to the lxcbr0 which our own dnsmasq
instance will bind to.
==========================================

If dnsmasq is installed and running as default, it will bind all interfaces. As a result lxc can't run dnsmasq against lxcbr0 (and same with libvirt and virbr0).

At the least this should be documented in lxc. Preferably we'd find a way to have dnsmasq be less of a bully.

Changed in lxc (Ubuntu):
status: New → Confirmed
importance: Undecided → Low
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in dnsmasq (Ubuntu):
status: New → Confirmed
Serge Hallyn (serge-hallyn) wrote :

A recommendation in comments 7 and 8 of bug 925511 seems like the best way to solve this for both lxc and libvirt. However, I don't think we'll have time to do this in time for precise. The libvirt patch will have to be sent and vetted upstream, both lxc and libvirt will need to switch from depending on dnsmasq-base to dnsmasq, and this will require extensive testing.

The current situation can be fragile, but at least it is understood. I'd like to address this as soon as p+1 opens up.

Changed in libvirt (Ubuntu):
status: New → Confirmed
importance: Undecided → Low
Jean-Baptiste Lallement (jibel) wrote :

Now that we rely on dnsmasq by default for name resolution, the importance is higher because the bridge comes up before dnsmasq, dnsmasq fails to start and the system can't do name resolution.

Laine Stump (laine) wrote :

Here is the entry in the libvirt wiki describing this problem and the solution:

   http://wiki.libvirt.org/page/Virtual_network_%27default%27_has_not_been_started

In short, the systemwide dnsmasq instance should have "bind-interfaces" added to its configuration, and the particular interfaces you want it to use should be listed, otherwise dnsmasq will attempt to bind to all existing interfaces at the time it is started, and to any new interfaces that are created after that time.

Changed in lxc (Ubuntu):
importance: Low → High
Changed in libvirt (Ubuntu):
importance: Low → High
Changed in dnsmasq (Ubuntu):
status: Confirmed → Invalid
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package lxc - 0.8.0~rc1-4ubuntu10

---------------
lxc (0.8.0~rc1-4ubuntu10) quantal; urgency=low

  [ Serge Hallyn ]
  * 0084-lxc-ubuntu-drop-duplicate-code.patch: drop some duplicate code from
    the ubuntu template. (LP: #1004118)
  * 0085-pivot-dir: use a directory other than /mnt to put the pivot_root
    old dir into (LP: #986385)

  [ Stéphane Graber ]
  * Ship /etc/dnsmasq.d/lxc to configure an eventual system wide
    dnsmasq daemon not to listen on the LXC bridge interface. (LP: #928524)
  * Drop rm calls from postrm for apparmor rules, these were in the purge
    target so didn't really serve any purpose.
 -- Stephane Graber <email address hidden> Tue, 29 May 2012 16:56:25 -0400

Changed in lxc (Ubuntu):
status: Confirmed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 0.9.12-0ubuntu3

---------------
libvirt (0.9.12-0ubuntu3) quantal; urgency=low

  * install apport hook as right name - libvirt-bin is the binary package,
    the source package name is libvirt. (LP: #1007405)
  * install /etc/dnsmasq.d/libvirt to configure system wide dnsmasq to not
    listen on the libvirt bridge. (Following Stéphane's lxc example)
    (LP: #928524) (LP: #231060)
    - postinst: restart dnsmasq; postrm: remove dnsmasq.d/libvirt file and
      restart dnsmasq; rules, libvirt-bin.dirs and libvirt-bin.install:
      install new debian/libvirt-bin.dnsmasq file.
 -- Serge Hallyn <email address hidden> Fri, 01 Jun 2012 09:36:58 -0500

Changed in libvirt (Ubuntu):
status: Confirmed → Fix Released
description: updated

Hello Serge, or anyone else affected,

Accepted lxc into precise-proposed. The package will build now and be available in a few hours. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case details of your testing will help us make a better decision. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in lxc (Ubuntu Precise):
status: New → Fix Committed
tags: added: verification-needed
description: updated
Thomas Hood (jdthood) wrote :

From the SRU justification:
> Regression potential: There should be none, since we
> are simply telling the system-wide dnsmasq (if any) not
> to bind to the lxcbr0 which our own dnsmasq instance
> will bind to.

There is more risk of trouble than that. With this change, after lxc is installed, dnsmasq will operate in bind-interfaces mode whereas it might have been operating in unbound mode before. In bind-interfaces mode dnsmasq will not notice new interfaces, etc.

Stéphane Graber (stgraber) wrote :

That's correct, though currently when installing dnsmasq with lxc, it has a 50% chance of not starting at all (the other 50% being lxc not starting at all).

My initial idea was to set bind-interfaces by default in dnsmasq and SRU that as it's default behaviour is breaking many setups and is extremely annoying.

Although I can see some people complaining about bind-interfaces, it's going to improve reliability for currently broken setups and not affect non-broken setups (as we're pushing this change as part of the lxc package, not dnsmasq).

Thomas Hood (jdthood) wrote :

Hi Stéphane,

Changing the default of dnsmasq to bind-interfaces wouldn't have been a very good solution because some people run dnsmasq without installing those other packages and rely upon the "unbound" mode. The implemented solution is better because the cases of dnsmasq being forced into bind-interfaces mode will be fewer. I guess the only risk of breakage is in cases like the following. Someone is using dnsmasq and requires unbound mode, has installed lxc but disabled it. She upgrades (getting a new lxc in the process) and finds that dnsmasq no longer works as expected. I'm certainly not saying that this is a showstopper, just that risk of malheur isn't nonexistent.

Stéphane Graber (stgraber) wrote :

Indeed, that's the only scenario that'd be a problem, but to workaround that issue, I added a mention of /etc/dnsmasq.d/ in /etc/default/lxc which is where you'd disable lxc's own dnsmasq.

So in theory anyone disabling LXC's own dnsmasq daemon will see the note that they should also update dnsmasq.d

It'd certainly have been better to get that change done pre-release but considering the number of people hitting it, I still think it's worth the risk as an SRU.

On 18/06/12 18:11, Thomas Hood wrote:
> Hi Stéphane,
>
> Changing the default of dnsmasq to bind-interfaces wouldn't have been a
> very good solution because some people run dnsmasq without installing
> those other packages and rely upon the "unbound" mode. The implemented
> solution is better because the cases of dnsmasq being forced into bind-
> interfaces mode will be fewer. I guess the only risk of breakage is in
> cases like the following. Someone is using dnsmasq and requires unbound
> mode, has installed lxc but disabled it. She upgrades (getting a new lxc
> in the process) and finds that dnsmasq no longer works as expected. I'm
> certainly not saying that this is a showstopper, just that risk of
> malheur isn't nonexistent.
>

I'm wondering about adding a _third_ mode, which is has a desirable
mixture of the properties of the current two (--bind-interfaces and NOT
--bind-interfaces). Essentially, dnsmasq would bind the addresses of
individual interfaces rather than the wildcard address, making it less
of a bully for other dnsmasq instances or DNS servers, but it would use
netlink to track the creation of new interfaces or the addition of new
addresses to existing interfaces, and automatically bind them as
required. This mode is inherently Linux-specific, since it needs netlink
to work.

You could either just use it as the default, or as a less problematic
alternative to --bind-interfaces to be dropped into the system dnsmasq
by networkmanager.

Simon.

Stéphane Graber (stgraber) wrote :

Confirmed to work fine here.

tags: added: verification-done
removed: verification-needed
Changed in dnsmasq (Ubuntu Precise):
status: New → Invalid
Stéphane Graber (stgraber) wrote :

Adding a task for network-manager. I believe a similar fix would help quite a few desktop users.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package lxc - 0.7.5-3ubuntu59

---------------
lxc (0.7.5-3ubuntu59) precise-proposed; urgency=low

  [ Serge Hallyn ]
  * 0085-pivot-dir: use a directory other than /mnt to put the pivot_root
    old dir into (LP: #986385)
  * 0086-lxc-unshare-zero-args: fix lxc-unshare segfaulting when no command
    is given (LP: #1011603)
  * 0087-lxc-ls-dash: fix lxc-ls for containers whose names start with a
    dash (LP: #1006332)
  * 0088-ubuntu-template-flock: don't fail when flock is busy, just wait,
    so concurrent lxc-creates don't break. (LP: #1007483)
  * debian/rules, debian/lxc.apport: install apport hook (LP: #1011644)

  [ Stéphane Graber ]
  * Ship /etc/dnsmasq.d/lxc to configure an eventual system wide
    dnsmasq daemon not to listen on the LXC bridge interface. (LP: #928524)
 -- Serge Hallyn <email address hidden> Mon, 11 Jun 2012 19:56:30 -0500

Changed in lxc (Ubuntu Precise):
status: Fix Committed → Fix Released
Thomas Hood (jdthood) on 2012-06-26
Changed in network-manager (Ubuntu Precise):
status: New → Invalid
Changed in network-manager (Ubuntu):
status: New → Invalid
Stéphane Graber (stgraber) wrote :

Reverting Thomas' change as he didn't give a rational for these and isn't the network-manager maintainer.

Changed in network-manager (Ubuntu):
status: Invalid → New
Changed in network-manager (Ubuntu Precise):
status: Invalid → New
Thomas Hood (jdthood) wrote :

My apologies. I thought it was fixed in lxc and nothing needed to be done for n-m. What further changes do you plan for n-m?

Stéphane Graber (stgraber) wrote :

Exactly the same as lxc and libvirt.

That's to have any package using dnsmasq-base also ship /etc/dnsmasq.d/<package name> that contains:
bind-interfaces
except-interface=<interface>

For libvirt, that's for virbr0, for lxc that's for lxcbr0 and for network-manager that'd be the loopback.

This simply prevents dnsmasq from binding with any interface that's managed by another tool, avoiding the current conflicts we have at install and boot time.

Thomas Hood (jdthood) wrote :

Ah, I understand.

In #959037 various ways of resolving (no pun intended) the conflict between standalone dnsmasq and nm-dnsmasq have been discussed and in recent comments we were working on the idea of moving nm-dnsmasq to another loopback address, say 127.0.0.2, which allows standalone dnsmasq to run alongside nm-dnsmasq in bind-interfaces mode without "except-interface=lo"; thus standalone dnsmasq can provide name and other services on lo at 127.0.0.1 as well as on external interfaces.

Stéphane Graber (stgraber) wrote :

That's not completely correct actually (haven't looked at the bug though).

dnsmasq by default binds on 0.0.0.0 which will include (127.0.0.2), so even if Network Manager moves to using 127.0.0.2, which I believe is a good idea, it should still ship a dnsmasq.d config file containing "bind-interfaces" so that dnsmasq only binds 127.0.0.1 on the loopback interface instead of all the IPs.

Thomas Hood (jdthood) wrote :

That's right.

Thomas Hood (jdthood) wrote :

@Stéphane: network-manager should also include a file in dnsmasq.d/ with "bind-interfaces". Can we track that need in bug #959037 and close this report (#928524) as affects network-manager?

Hello Serge, or anyone else affected,

Accepted libvirt into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/libvirt/0.9.8-2ubuntu17.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in libvirt (Ubuntu Precise):
status: New → Fix Committed
tags: removed: verification-done
tags: added: verification-needed

Thomas, Stéphane;
Yes, let's track the change for network-manager in bug #959037 -- marking the task for n-m here as Invalid.

Changed in network-manager (Ubuntu):
status: New → Invalid
Changed in network-manager (Ubuntu Precise):
status: New → Invalid
Serge Hallyn (serge-hallyn) wrote :

Fix verified on precise.

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 0.9.8-2ubuntu17.2

---------------
libvirt (0.9.8-2ubuntu17.2) precise-proposed; urgency=low

  * debian/libvirt-bin.install, debian/rules: name the apport file
    source_libvirt.py, not source_libvirt-bin.py. (LP: #1007405)
  * install /etc/dnsmasq.d/libvirt to configure system wide dnsmasq to not
    listen on the libvirt bridge. (Following Stéphane's lxc example)
    (LP: #928524) (LP: #231060)
    - postinst: restart dnsmasq; postrm: remove dnsmasq.d/libvirt file and
      restart dnsmasq; rules, libvirt-bin.dirs and libvirt-bin.install:
      install new debian/libvirt-bin.dnsmasq file.
  * Warn user about bad pc-0.12 machine type, and help user transition.
    (LP: #1001625)
    - qemu-warn-on-pc-0.12.patch: When defining or starting a VM which uses the
      pc-0.12 machine type, warn in libvirtd.log.
    - debian/libvirt-migrate-qemu-machinetype: automatically migrate QEMU VMs
      to newest machine type. This is not done automatically as there will
      be some users who have good reason to stay with pc-0.12.
 -- Serge Hallyn <email address hidden> Mon, 11 Jun 2012 21:52:02 -0500

Changed in libvirt (Ubuntu Precise):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers