Bridge still not created if bind9 is on

Bug #1367495 reported by Dan Kegel
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
lxc (Ubuntu)
Invalid
Low
Unassigned

Bug Description

This is probably a dup of bug 1240757, created just so I could upload the data requested in that bug report.

On a fresh, probably vanilla, ubuntu 14.04 server, I tried using ubuntu 14.04's default lxc.
It created containers fine, but they failed to start, complaining
  lxc-start: failed to attach 'veth9HNUS9' to the bridge 'lxcbr0' : No such device
/var/log/upstart/lxc-net.log says
  dnsmasq: failed to create listening socket for 10.0.3.1: Address already in use

This didn't happen on a desktop 14.04 instance, where bind9 is not installed by default.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: lxc 1.0.5-0ubuntu0.1
ProcVersionSignature: Ubuntu 3.13.0-35.62-generic 3.13.11.6
Uname: Linux 3.13.0-35-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.3
Architecture: amd64
Date: Tue Sep 9 16:39:34 2014
ProcEnviron:
 LANGUAGE=en_US:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: lxc
UpgradeStatus: No upgrade log present (probably fresh install)
defaults.conf:
 lxc.network.type = veth
 lxc.network.link = lxcbr0
 lxc.network.flags = up
 lxc.network.hwaddr = 00:16:3e:xx:xx:xx

Revision history for this message
Dan Kegel (dank) wrote :
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for reporting this bug. I'm not sure it's actually a dup of 1240757, because the syslog seems to indicate that dnsmasq is offering addresses over lxcbr0.

Could you do 'lxc-start -n container0 -l trace -o debug.out' and attach debug.out here?

Changed in lxc (Ubuntu):
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for lxc (Ubuntu) because there has been no activity for 60 days.]

Changed in lxc (Ubuntu):
status: Incomplete → Expired
Changed in lxc (Ubuntu):
status: Expired → Confirmed
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hi, thanks for re-opening this bug.

Could you please show the results of:

ifconfig -a
brctl show
ps -ef

on the host?

Changed in lxc (Ubuntu):
importance: Undecided → High
Revision history for this message
Yonatan Yehezkeally (yonatany) wrote :
Revision history for this message
Yonatan Yehezkeally (yonatany) wrote :
Revision history for this message
Yonatan Yehezkeally (yonatany) wrote :
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

That all looks fine. Could you please try

sudo start lxc-net

and show any console output as well as attach /var/log/upstart/lxc-net.conf

Revision history for this message
Yonatan Yehezkeally (yonatany) wrote :

Certainly. The service was already running, so I stopped and restarted it for good measure.

    root@srv-Ub1404:~# start lxc-net
    start: Job is already running: lxc-net
    root@srv-Ub1404:~# stop lxc-net
    lxc-net stop/waiting
    root@srv-Ub1404:~# start lxc-net
    lxc-net start/running

I'm attaching `/var/log/upstart/lxc-net.log`

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1367495] Re: Bridge still not created if bind9 is on

Thanks - these results don't quite make sense. What do the following show:

sudo netstat -nr
sudo netstat -lnp | grep 10.0
traceroute 10.0.3.1
dpkg -l | grep dnsmasq
dpkg -l | grep lxc

Revision history for this message
Yonatan Yehezkeally (yonatany) wrote :

`sudo netstat -nr` returns:
    Kernel IP routing table
    Destination Gateway Genmask Flags MSS Window irtt Iface
    0.0.0.0 <my-net>.1 0.0.0.0 UG 0 0 0 eth0
    <my-net>.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0

`sudo netstat -lnp | grep 10.0` returns nothing.

`traceroute 10.0.3.1` goes up my ISP's network, but after 8 hops there it only returns `* * *` for hops 9-30. If I interpret that correctly, 10.0.3.1 isn't anywhere on my local network (and of course I can't get into other people's so there's no point in trying for more hops).

`dpkg -l | grep dnsmasq` returns:
    ii dnsmasq 2.68-1 all Small caching DNS proxy and DHCP/TFTP server
    ii dnsmasq-base 2.68-1 amd64 Small caching DNS proxy and DHCP/TFTP server

And, finally, `dpkg -l | grep lxc` returns:
    ii liblxc1 1.0.6-0ubuntu0.1 amd64 Linux Containers userspace tools (library)
    ii lxc 1.0.6-0ubuntu0.1 amd64 Linux Containers userspace tools
    ii lxc-templates 1.0.6-0ubuntu0.1 amd64 Linux Containers userspace tools (templates)
    ii python3-lxc 1.0.6-0ubuntu0.1 amd64 Linux Containers userspace tools (Python 3.x bindings)

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

note i don't think this is a dup of bug 1240757 as you had original speculated, because you do not seem to have any bind9 or dnsmasq listening on 10.0.3.1.

Please attach your /etc/default/lxc-net file. Your lxc-net.log file showed an error opening /etc/lxc/dnsmasq.conf, but the line which would set that configuration file is commented out in the shipped /etc/default/lxc-net.

Changed in lxc (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Yonatan Yehezkeally (yonatany) wrote :

Well, I can't attest to Dan Kegel's setting, but this is my copy of `/etc/default/lxc-net`. Indeed, that line isn't commented out, and I don't have a `dnsmasq.conf` in `/etc/lxc`.

Revision history for this message
Yonatan Yehezkeally (yonatany) wrote :

This is odd. On advice from Flockport community I attempted to manually configure the bridge (its basic elements, at least) by entering
    sudo brctl addbr lxcbr0
    sudo ifconfig lxcbr0 10.0.3.1 netmask 255.255.255.0 up

This went just fine, with `ifconfig` reporting (beyond those interfaces already present in the attached file:
    lxcbr0 Link encap:Ethernet HWaddr 8e:8c:5c:47:3b:bd
              inet addr:10.0.3.1 Bcast:10.0.3.255 Mask:255.255.255.0
              inet6 addr: fe80::8c8c:5cff:fe47:3bbd/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:35 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:0 (0.0 B) TX bytes:3632 (3.6 KB)

Note in particular the `inet addr`, which is the same one `lxc-net.log` claimed couldn't be used. However, recycling `lxc-net` fails, either with:
    server@srv-Ub1404:~$ sudo restart lxc-net
    restart: Job failed to restart
Or
    server@srv-Ub1404:~$ sudo stop lxc-net
    stop: Job failed while stopping
    server@srv-Ub1404:~$ sudo start lxc-net
    lxc-net start/running

In both cases, the bridge is gone from the output of `ifconfig` as a result. `/var/log/upstart/lxc-net.log` is exactly as it is in the attached version (complaining about `dnsmasq.conf`, which is understandable, but also about 10.0.3.1 already being in use).

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Perhaps the "already in use" message being printed is a bug in the error reporting on not finding the dnsmasq.conf file?

Please comment out that line in /etc/default/lxc-net, and let us know whether that helps.

Revision history for this message
Yonatan Yehezkeally (yonatany) wrote :

Well, it does help `lxc-net` create the bridge (shown on `brctl show` and `ifconfig`), but apparently not to configure it appropriately. `lxc-net` is still not listed when I run `service --status-all`, and attempting to start a container returns:
    server@srv-Ub1404:~$ lxc-start -n u1
    Quota reached
    lxc_container: failed to create the configured network
    lxc_container: failed to spawn 'u1'
    lxc_container: The container failed to start.
    lxc_container: Additional information can be obtained by setting the --logfile and --logpriority options.

The log for `lxc-net` (attached) still contains the same errors (odd; why would it look for `dnsmasq.conf`?), but now it also contains something about a bad IPTABLES rule (I'm not sure how to check which one).

Revision history for this message
Yonatan Yehezkeally (yonatany) wrote :

Wait, 'lxc-net' does show as `running` when I run `initctl list`. I guess that's what I get for not checking if the manuals I've been using are 100% compatible with my distro.

Regardless, it still gives those errors, and starting containers fails with that seemingly network-related message. Attempting to start with the `-l trace -o debug.out` flags doesn't give me any new information, but perhaps I'm missing something there. In particular, I don't see what 'quota' it is that was reached; could that be relevant?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Ok, so we're now at a different bug. Could you please again show what

sudo brctl show
ifconfig -a

show and the container configuratio nfile? You're starting the container
as root? (If not, please show your /etc/lxc/lxc-usernet file contents)

Revision history for this message
Yonatan Yehezkeally (yonatany) wrote :
Download full text (3.2 KiB)

Aha! It was an unprivileged container (I used to get the same error messages with privileged and unprivileged containers), and it turns out in one of my purge - install cycles I forgot to add "$USER veth lxcbr0 2" to `/etc/lxc/lxc-usernet`. I did that (it's now the only line there after the first commented title line), and now I can start the container! (I rechecked: privileged containers also start.)

In both cases, however, the network is unreachable from within the container, as evident by the response `connect: Network is unreachable` to my pinging my host and/or router. I guess that's not surprising considerng that `lxc-net` still returns all of these error messages.

However, that's truly a separate issue now, so it's worth summing up that the uncommented line in `/etc/default/lxc-net` (which was installed that way, for some reason) was the cause of this bug on my machine. I'm not sure what to do regarding the new issue (I haven't googled about it yet), so I'll include what you've asked for just in case you spot an easy fix. (BTW, I don't know if this belongs on this medium, but you have my sincere thanks for your help. It was invaluable.)

    server@srv-Ub1404:~$ sudo brctl show
    bridge name bridge id STP enabled interfaces
    lxcbr0 8000.fedfa607a667 no vethA4H3U6

    server@srv-Ub1404:~$ ifconfig -a
    eth0 Link encap:Ethernet HWaddr 00:1b:fc:8e:95:ba
              inet addr:192.168.2.10 Bcast:192.168.2.255 Mask:255.255.255.0
              inet6 addr: fe80::21b:fcff:fe8e:95ba/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
              RX packets:7417201 errors:0 dropped:0 overruns:0 frame:0
              TX packets:1758168 errors:0 dropped:0 overruns:0 carrier:1
              collisions:0 txqueuelen:1000
              RX bytes:10608089410 (10.6 GB) TX bytes:1646751692 (1.6 GB)
    lo Link encap:Local Loopback
              inet addr:127.0.0.1 Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING MTU:65536 Metric:1
              RX packets:42 errors:0 dropped:0 overruns:0 frame:0
              TX packets:42 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:2867 (2.8 KB) TX bytes:2867 (2.8 KB)
    lxcbr0 Link encap:Ethernet HWaddr fe:df:a6:07:a6:67
              inet addr:10.0.3.1 Bcast:10.0.3.255 Mask:255.255.255.0
              inet6 addr: fe80::98d6:4cff:fe27:2ea2/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
              RX packets:74 errors:0 dropped:0 overruns:0 frame:0
              TX packets:53 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:19844 (19.8 KB) TX bytes:7380 (7.3 KB)
    vethA4H3U6 Link encap:Ethernet HWaddr fe:df:a6:07:a6:67
              inet6 addr: fe80::fcdf:a6ff:fe07:a667/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
              RX packets:47 errors:0 dropped:0 overruns:0 frame:0
              TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:13...

Read more...

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for the update. So I'm still really curious how that defaults file got so confused in the first place... It also may have something to do with why you're still having trouble..

If dnsmasq is still badly installed then that would explain your remaining issue.

What does:

sudo netstat -nr
ifconfig -a
cat /etc/

show in the container, and what does

ps -ef | grep dnsmasq

show on the host?

If the container eth0 does not have an ip address, then if you do

sudo ifconfig eth0 10.0.3.10 netmask 255.255.255.0
sudo route add -net default dev eth0
ping 10.0.3.1

Revision history for this message
Yonatan Yehezkeally (yonatany) wrote :
Download full text (4.2 KiB)

Okay. FYI I went ahead and defined my own bridge to let containers communicate directly with my router's dhcp server, but I've created and started a new privileged container `pr1` and made sure it uses `lxcbr0` and not my `br0`. Still, it's possible I'll have to tell `lxcbr0` to plug into `br0` rather than `eth0`, as `eth0` doesn't get an IP anymore on my machine, and if so I'll need some help doing that. Anyhow:

From within container `pr1`:
root@pr1:/# netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
root@pr1:/# ifconfig -a
    eth0 Link encap:Ethernet HWaddr 00:16:3e:b2:93:2b
              inet6 addr: fe80::216:3eff:feb2:932b/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
              RX packets:10 errors:0 dropped:0 overruns:0 frame:0
              TX packets:30 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:1177 (1.1 KB) TX bytes:7656 (7.6 KB)

    lo Link encap:Local Loopback
              inet addr:127.0.0.1 Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING MTU:65536 Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
    root@pr1:/# cat /etc/
    cat: /etc/: Is a directory

But then after following your other directions:
    root@pr1:/# ifconfig eth0 10.0.3.10 netmask 255.255.255.0
    root@pr1:/# sudo route add -net default dev eth0
    root@pr1:/# ping 10.0.3.1
    PING 10.0.3.1 (10.0.3.1) 56(84) bytes of data.
    64 bytes from 10.0.3.1: icmp_seq=1 ttl=64 time=0.102 ms
    64 bytes from 10.0.3.1: icmp_seq=2 ttl=64 time=0.032 ms
    64 bytes from 10.0.3.1: icmp_seq=3 ttl=64 time=0.049 ms

Even better, if instead I do
    root@pr1:/# sudo route add -host 10.0.3.1 dev eth0
    root@pr1:/# route add -net default gw 10.0.3.1 dev eth0
then I can now ping to my LAN from within the container, and also ping out to 8.8.8.8. At this point I have:
    root@pr1:/# netstat -rn
    Kernel IP routing table
    Destination Gateway Genmask Flags MSS Window irtt Iface
    0.0.0.0 10.0.3.1 0.0.0.0 UG 0 0 0 eth0
    10.0.3.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
    10.0.3.1 0.0.0.0 255.255.255.255 UH 0 0 0 eth0
    root@pr1:/# ifconfig -a
    eth0 Link encap:Ethernet HWaddr 00:16:3e:b2:93:2b
              inet addr:10.0.3.10 Bcast:10.0.3.255 Mask:255.255.255.0
              inet6 addr: fe80::216:3eff:feb2:932b/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
              RX packets:30 errors:0 dropped:0 overruns:0 frame:0
              TX packets:80 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:3579 (3.5 KB) TX bytes:19728 (19.7 KB)
    lo Link encap:Local Loopback
              inet addr:127.0.0.1 Mask:255.0.0.0
      ...

Read more...

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

> Finally, on my host I get:
> root@srv-Ub1404:~# ps -ef | grep dnsmasq
> lxc-dns+ 11309 1 0 Jan15 ? 00:00:00 dnsmasq -u lxc-dnsmasq --strict-order --bind-interfaces --pid-file=/run/lxc/dnsmasq.pid --conf-file= --listen-address 10.0.3.1 --dhcp-range 10.0.1.2,10.0.1.254 --dhcp-lease-max=253 --dhcp-no-override --except-interface=lo --interface=lxcbr0 --dhcp-leasefile=/var/lib/misc/dnsmasq.lxcbr0.leases --dhcp-authoritative

so your dnsmasq is listening on 10.0.3.1 but serving 10.0.1.2..10.0.1.254. Is that by design, or should it be serving 10.0.3.2..10.0.3.254?

Since you're using a non-standard setup anyway, you should be able to
get dns in your containers by putting

nameserver 10.0.3.1

into /etc/resolv.conf, or else

nameserver 8.8.8.8

Revision history for this message
Yonatan Yehezkeally (yonatany) wrote :

Not by design, no. Any of that, really, since I wouldn't have known where to go to make these alterations if I wanted to (well, now that you pointed that out, I should be able to reverse-engineer it and see if it at least assigns IP addresses to containers automatically that way). It's a pretty new installation of Ubuntu server; I can't think of any non-standard thing I did on that machine that might've caused all this mess.

By the way, adding `nameserver 8.8.8.8` to /etc/resolv.conf made no difference. Putting it in /etc/resolvconf/resolv.conf.d/base and running `resolvconf -u` resulted in requests hanging for a few seconds before it returns the same message, that the address couldn't have been resolved. All the while I can ping 8.8.8.8 just fine.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for lxc (Ubuntu) because there has been no activity for 60 days.]

Changed in lxc (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Alexander (zukoff-f) wrote :

bind9 and dnsmasq try to listen the same port on ipv6 (lxcbr0)
lxcbr0 Link encap:Ethernet HWaddr xxx
          inet addr:10.0.3.1 Bcast:0.0.0.0 Mask:255.255.255.0
          inet6 addr: fe80::dc81:d8ff:feea:14b0/64 Scope:Link

tcp6 0 0 fe80::dc81:d8ff:feea:53 :::* LISTEN 9988/dnsmasq

Problem is in bind9 itself, it occupy port before dnsmasq!
        listen-on-v6 { any; };

My solution was to strictly set ipv6 addresses that bind9 should use.

Changed in lxc (Ubuntu):
importance: High → Low
status: Expired → Triaged
Revision history for this message
Robie Basak (racb) wrote :

I don't really understand the bug task on bind9 packaging here. If you install the bind9 package, then surely you expect it to listen on IPv6 ports by default? If you don't want this, then presumably you need to reconfigure bind9?

Setting the bind9 package task Incomplete because as far as I can see this is expected behaviour from the bind9 package.

Changed in bind9 (Ubuntu):
status: New → Incomplete
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Actually I was thinking this was bug 1240757.

@zukoff-f: your comment #25 would be more appropriate there. That's where we should discuss how to have bind9, dnsmasq, and lxc cooperate.

This bug seems to have evidence from several different causes. yoniyo0's was a bad lxc-usernet file. I'm not sure what Dan Kegel's actually was. I'm going to mark this bug invalid for lxc and not affecting bind9 as I don't know that we can sanely identify the actual original problem..

As for bug 1240757, perhaps our stock bind9 packaging should simply exclude anything begging with 'lxc', or only lxcbr0 (since users may define a lxcbr2 and want bind9 to listen on it)

no longer affects: bind9 (Ubuntu)
Changed in lxc (Ubuntu):
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.