vivid container's networking.service fails on boot with signal=PIPE

Bug #1452601 reported by Chris West on 2015-05-07
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
lxc (Ubuntu)
High
Unassigned

Bug Description

When starting a Vivid container, it fails to get an IP address. It believes networking.service was successful, but actually it dies with SIGPIPE. Restarting networking.service gets an IP, as expected.

Starting networking used to work with pre-vivid containers. I'm reasonably sure this fails 100% of the time. Limited user container, very standard setup (no unnecessary config; cgmanager and lxcfs installed), btrfs filesystem but not btrfs-backed (as it's limited user), ...

root@vivid:/# systemctl status networking.service
● networking.service - LSB: Raise network interfaces.
   Loaded: loaded (/etc/init.d/networking)
  Drop-In: /run/systemd/generator/networking.service.d
           └─50-insserv.conf-$network.conf
        /lib/systemd/system/networking.service.d
           └─systemd.conf
   Active: active (exited) since Thu 2015-05-07 07:54:48 UTC; 9s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 459 ExecStart=/etc/init.d/networking start (code=killed, signal=PIPE)

root@vivid:/# systemctl restart networking.service
root@vivid:/# systemctl status networking.service
● networking.service - LSB: Raise network interfaces.
   Loaded: loaded (/etc/init.d/networking)
  Drop-In: /run/systemd/generator/networking.service.d
           └─50-insserv.conf-$network.conf
        /lib/systemd/system/networking.service.d
           └─systemd.conf
   Active: active (running) since Thu 2015-05-07 07:56:38 UTC; 2s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 992 ExecStop=/etc/init.d/networking stop (code=exited, status=0/SUCCESS)
  Process: 1033 ExecStart=/etc/init.d/networking start (code=exited, status=0/SUCCESS)
   CGroup: /user.slice/user-1000.slice/session-c2.scope/lxc/vivid/system.slice/networking.service
           ├─1096 dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0
           ├─1106 /bin/sh /etc/network/if-up.d/ntpdate
           ├─1109 lockfile-touch /var/lock/ntpdate-ifup
           ├─1125 /bin/sh /etc/network/if-up.d/ntpdate
           ├─1128 lockfile-create /var/lock/ntpdate-ifup
           └─1146 /usr/sbin/ntpdate -s ntp.ubuntu.com

root@vivid:/# ip a
...
22: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
...
    inet 10.0.3.102/24 brd 10.0.3.255 scope global eth0

ProblemType: Bug
DistroRelease: Ubuntu 15.04
Package: lxc 1.1.2-0ubuntu3
ProcVersionSignature: Ubuntu 3.19.0-16.16-generic 3.19.3
Uname: Linux 3.19.0-16-generic x86_64
NonfreeKernelModules: nvidia
ApportVersion: 2.17.2-0ubuntu1
Architecture: amd64
Date: Thu May 7 08:53:02 2015
SourcePackage: lxc
UpgradeStatus: No upgrade log present (probably fresh install)
defaults.conf:
 lxc.network.type = veth
 lxc.network.link = lxcbr0
 lxc.network.flags = up
 lxc.network.hwaddr = 00:16:3e:xx:xx:xx

Chris West (faux) wrote :
Serge Hallyn (serge-hallyn) wrote :
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in lxc (Ubuntu):
status: New → Confirmed
Changed in lxc (Ubuntu):
importance: Undecided → High
Chris West (faux) wrote :

No, that proposed work around seems to make no difference.

Joseph Bisch (josephbisch) wrote :

This bug affects gitian-builder when used with lxc. Gitian-builder is the reproducible build tools used by Bitcoin Core and other projects. The relevant gitian-builder bug is https://github.com/devrandom/gitian-builder/issues/95.

I have confirmed this bug with a 15.04 chroot. As you can see from the gitian-builder issue, GitHub user @droark also experienced the issue when using 15.04.

The relevant output from my 15.04 chroot:

root@debian:~/gitian-builder# make-clean-vm --suite trusty --arch amd64
sudo: unable to resolve host debian
lxc-execute: conf.c: instantiate_veth: 2660 failed to attach 'vethU5U6JE' to the bridge 'br0': Operation not permitted
lxc-execute: conf.c: lxc_create_network: 2943 failed to create netdev
lxc-execute: start.c: lxc_spawn: 914 failed to create the network
lxc-execute: start.c: __lxc_start: 1164 failed to spawn 'gitian'

rrva (ragnar-rova) wrote :

For me, it was a matter of creating an empty /etc/dnsmasq.conf so that lxc-net.service could start, thus creating lxcbr0

David Favor (davidfavor) wrote :

IMHO, systemd seems to be the root of all evil.

touch /etc/dnsmasq.conf has no effect.

net4-dev# systemctl restart lxc-net
net4-dev# systemctl status lxc-net
● lxc-net.service - LXC network bridge setup
   Loaded: loaded (/lib/systemd/system/lxc-net.service; enabled; vendor preset: enabled)
   Active: active (exited) since Fri 2015-08-28 18:20:16 CDT; 9s ago
  Process: 15219 ExecStop=/usr/lib/x86_64-linux-gnu/lxc/lxc-net stop (code=exited, status=0/SUCCESS)
  Process: 15223 ExecStart=/usr/lib/x86_64-linux-gnu/lxc/lxc-net start (code=exited, status=0/SUCCESS)
 Main PID: 15223 (code=exited, status=0/SUCCESS)

Shows lxc-net reported status of success + dnsmasq is never run.

There's just so much wrong here...

The entire /usr/lib/x86_64-linux-gnu/lxc/lxc-net script seems never to require a rethink.

The problem is whoever wrote this imagined the Linux runtime environment works sensibly (no edge conditions).

Many times they don't.

A simple situation is if OOM (Out of Memory Killer) runs + scavenges/kill dnsmasq, then much of the "state" (files/directories) lxc-net depends on is out of sync... meaning there's no test for dnsmasq actually running. There's only a test for the lxcbr0 state files existing. This is an insufficient approach.

I think the solution is to rewrite stop() to handle any edge condition, so lxc-net can recover normal errors without manual intervention.

I'll take a stab at a rewrite + if my code isn't to embarrassing, I'll post it.

David Favor (davidfavor) wrote :

Okay, a simple fix that seems to work (at least in my case), is to comment out the first two guards in the stop function.

So...

#[ "x$USE_LXC_BRIDGE" = "xtrue" ] || { exit 0; }
#[ -f "${varrun}/network_up" ] || { exit 0; }

Once these are skipped, the code seems to work.

David Favor (davidfavor) wrote :

Oh... and - touch /etc/dnsmasq.conf - is still required, so this is another bug to be fixed.

David Favor (davidfavor) wrote :

Oh, I see...

I was using OOM as an example. I this case, it appears the missing /etc/dnsmasq.conf file might be the culprit.

So first start of lxc-net, dnsmasq never starts.

After that, it will never start, unless a hard reboot is done.

Christian Mayer (mifix) wrote :

Removing the 2 lines suggested by davidfavor, actually fixed my problem - after restarting lxc-net

(/etc/dnsmasq.conf was never empty for me)

Serge Hallyn (serge-hallyn) wrote :

1. Do you need to remove both lines, or is only removing the second line sufficient?

2. Does anyone still get this on wily? My wily laptop and vms have no problem.

David Favor (davidfavor) wrote :

I had to remove both lines, in my case.

And this is likely a less than correct fix. I'm under the gun to get a client machine I'm hosting setup to run many of their sites in LXC containers, so time I invested looking at this fix was minimal.

Likely the entire stop function is best revisited, to allow functioning across dnsmasq failing to start or dying.

David Favor (davidfavor) wrote :

Just setting up a fresh Vivid LXC host machine + I was incorrect above.

There's no requirement for /etc/dnsmasq.conf to exist for lxc-net to start correctly.

There is a requirement though for /etc/resolv.conf to reference at least one valid name server.

Serge Hallyn (serge-hallyn) wrote :

I just created a vivid container on a vivid host, and got:

root@v1:~# systemctl status networking.service
● networking.service - LSB: Raise network interfaces.
   Loaded: loaded (/etc/init.d/networking)
  Drop-In: /run/systemd/generator/networking.service.d
           └─50-insserv.conf-$network.conf
        /lib/systemd/system/networking.service.d
           └─systemd.conf
   Active: active (running) since Mon 2015-09-07 16:27:41 UTC; 45s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 422 ExecStart=/etc/init.d/networking start (code=exited, status=0/SUCCESS)
   CGroup: /lxc/v1/system.slice/networking.service
           └─591 dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0
Sep 07 16:27:41 v1 dhclient[485]: DHCPREQUEST of 10.0.3.150 on eth0 to 255.255.255.255 port 67 (xid=0x77b9ff3a)
Sep 07 16:27:41 v1 networking[422]: DHCPREQUEST of 10.0.3.150 on eth0 to 255.255.255.255 port 67 (xid=0x77b9ff3a)
Sep 07 16:27:41 v1 networking[422]: DHCPOFFER of 10.0.3.150 from 10.0.3.1
Sep 07 16:27:41 v1 dhclient[485]: DHCPOFFER of 10.0.3.150 from 10.0.3.1
Sep 07 16:27:41 v1 dhclient[485]: DHCPACK of 10.0.3.150 from 10.0.3.1
Sep 07 16:27:41 v1 networking[422]: DHCPACK of 10.0.3.150 from 10.0.3.1
Sep 07 16:27:41 v1 networking[422]: bound to 10.0.3.150 -- renewal in 1680 seconds.
Sep 07 16:27:41 v1 networking[422]: ...done.
Sep 07 16:27:41 v1 systemd[1]: Started LSB: Raise network interfaces..
Sep 07 16:28:02 v1 ntpdate[966]: Can't adjust the time of day: Operation not permitted

So to me this appears to be fix-released.

Can you please show exactly what is still failing for you?

Kevin Dalley (nereocystis) wrote :

I upgraded to Wiley, and the problem has shown up again. I haven't managed to fix it yet under Wiley, though I could fix it under Vivid

Serge Hallyn (serge-hallyn) wrote :

@Kevin,

could you please give some more details? In particular, release of both host and container, where exactly it fails, and the relevant journalctl output.

Kevin Dalley (nereocystis) wrote :

Host is:

Linux awabi 4.2.0-16-generic #19-Ubuntu SMP Thu Oct 8 15:35:06 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

container is:

Linux escalebuild 3.16.0-51-generic #69~14.04.1-Ubuntu SMP Wed Oct 7 15:32:41 UTC 2015 x86_64 GNU/Linux

here is output from my attempt to start:

kevin@awabi:~$ sudo lxc-start -n escale_build --logfile /tmp/lxc-log --logpriority 3
lxc-start: lxc_start.c: main: 344 The container failed to start.
lxc-start: lxc_start.c: main: 346 To get more details, run the container in foreground mode.
lxc-start: lxc_start.c: main: 348 Additional information can be obtained by setting the --logfile and --logpriority options.
kevin@awabi:~$ less /tmp/lxc-log
kevin@awabi:~$ sudo lxc-start -n escale_build --logfile /tmp/lxc-log --logpriority 3 -F
lxc-start: conf.c: instantiate_veth: 2621 failed to attach 'vethJ6CTXM' to the bridge 'lxcbr0': Operation not permitted
lxc-start: conf.c: lxc_create_network: 2904 failed to create netdev
lxc-start: start.c: lxc_spawn: 920 failed to create the network
lxc-start: start.c: __lxc_start: 1172 failed to spawn 'escale_build'
lxc-start: lxc_start.c: main: 344 The container failed to start.

Serge Hallyn (serge-hallyn) wrote :

Hi,

you're actually geting EPERM, which means lxcbr0 exists. Please show the output of:

sudo lxc-start -n escale_build -F -l trace -o /dev/stdout

sudo brctl show
sudo ifconfig -a
sudo journalctl -u lxc-net
sudo systemd-detect-virt

Kevin Dalley (nereocystis) wrote :
Download full text (5.5 KiB)

Thanks.

kevin@awabi:~$ sudo lxc-start -n escale_build -F -l trace -o /dev/stdout
      lxc-start 1446650865.960 INFO lxc_start_ui - lxc_start.c:main:264 - using rcfile /var/lib/lxc/escale_build/config
      lxc-start 1446650865.961 WARN lxc_confile - confile.c:config_pivotdir:1801 - lxc.pivotdir is ignored. It will soon become an error.
      lxc-start 1446650865.961 WARN lxc_cgmanager - cgmanager.c:cgm_get:993 - do_cgm_get exited with error
      lxc-start 1446650865.961 INFO lxc_lsm - lsm/lsm.c:lsm_init:48 - LSM security driver AppArmor
      lxc-start 1446650865.961 INFO lxc_seccomp - seccomp.c:parse_config_v2:324 - processing: .reject_force_umount # comment this to allow umount -f; not recommended.
      lxc-start 1446650865.961 INFO lxc_seccomp - seccomp.c:parse_config_v2:426 - Adding native rule for reject_force_umount action 0
      lxc-start 1446650865.961 INFO lxc_seccomp - seccomp.c:do_resolve_add_rule:216 - Setting seccomp rule to reject force umounts

      lxc-start 1446650865.961 INFO lxc_seccomp - seccomp.c:parse_config_v2:429 - Adding compat rule for reject_force_umount action 0
      lxc-start 1446650865.961 INFO lxc_seccomp - seccomp.c:do_resolve_add_rule:216 - Setting seccomp rule to reject force umounts

      lxc-start 1446650865.961 INFO lxc_seccomp - seccomp.c:parse_config_v2:324 - processing: .[all].
      lxc-start 1446650865.961 INFO lxc_seccomp - seccomp.c:parse_config_v2:324 - processing: .kexec_load errno 1.
      lxc-start 1446650865.961 INFO lxc_seccomp - seccomp.c:parse_config_v2:426 - Adding native rule for kexec_load action 327681
      lxc-start 1446650865.961 INFO lxc_seccomp - seccomp.c:parse_config_v2:429 - Adding compat rule for kexec_load action 327681
      lxc-start 1446650865.961 INFO lxc_seccomp - seccomp.c:parse_config_v2:324 - processing: .open_by_handle_at errno 1.
      lxc-start 1446650865.961 INFO lxc_seccomp - seccomp.c:parse_config_v2:426 - Adding native rule for open_by_handle_at action 327681
      lxc-start 1446650865.961 INFO lxc_seccomp - seccomp.c:parse_config_v2:429 - Adding compat rule for open_by_handle_at action 327681
      lxc-start 1446650865.961 INFO lxc_seccomp - seccomp.c:parse_config_v2:324 - processing: .init_module errno 1.
      lxc-start 1446650865.961 INFO lxc_seccomp - seccomp.c:parse_config_v2:426 - Adding native rule for init_module action 327681
      lxc-start 1446650865.961 INFO lxc_seccomp - seccomp.c:parse_config_v2:429 - Adding compat rule for init_module action 327681
      lxc-start 1446650865.961 INFO lxc_seccomp - seccomp.c:parse_config_v2:324 - processing: .finit_module errno 1.
      lxc-start 1446650865.961 INFO lxc_seccomp - seccomp.c:parse_config_v2:426 - Adding native rule for finit_module action 327681
      lxc-start 1446650865.961 INFO lxc_seccomp - seccomp.c:parse_config_v2:429 - Adding compat rule for finit_module action 327681
      lxc-start 1446650865.961 INFO lxc_seccomp - seccomp.c:parse_config_v2:324 - processing: .delete_module errno 1.
      lxc-start 1446650865.961 INFO lxc_seccomp - seccomp.c:parse_config_v2:426 - Adding native rule for d...

Read more...

Serge Hallyn (serge-hallyn) wrote :

Thanks, what about

sudo brctl show
sudo ifconfig -a
sudo journalctl -u lxc-net
sudo systemd-detect-virt

Kevin Dalley (nereocystis) wrote :

Thanks.

kevin@awabi:~$ sudo brctl show
bridge name bridge id STP enabled interfaces
virbr0 8000.52540063031d yes virbr0-nic
kevin@awabi:~$ sudo ifconfig -a
lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:65536 Metric:1
          RX packets:410820 errors:0 dropped:0 overruns:0 frame:0
          TX packets:410820 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:43000462 (43.0 MB) TX bytes:43000462 (43.0 MB)

virbr0 Link encap:Ethernet HWaddr 52:54:00:63:03:1d
          inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
          UP BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

virbr0-nic Link encap:Ethernet HWaddr 52:54:00:63:03:1d
          BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

wlan0 Link encap:Ethernet HWaddr 34:68:95:ee:9f:bb
          inet addr:10.23.28.210 Bcast:10.23.28.255 Mask:255.255.255.0
          inet6 addr: fdca:f995:220a:8005:3668:95ff:feee:9fbb/64 Scope:Global
          inet6 addr: fe80::3668:95ff:feee:9fbb/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1280 Metric:1
          RX packets:8661210 errors:0 dropped:0 overruns:0 frame:5987785
          TX packets:13518413 errors:10494 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:4112935761 (4.1 GB) TX bytes:1210222618 (1.2 GB)
          Interrupt:19

kevin@awabi:~$ sudo journalctl -u lxc-net
-- No entries --
kevin@awabi:~$ sudo systemd-detect-virt
none

Serge Hallyn (serge-hallyn) wrote :

Ok, so the error msg is simply misleading - it says 'permission denied', but the bridge does not exist.

Can you please show:

sudo /usr/lib/x86_64-linux-gnu/lxc/lxc-net stop
sudo /usr/lib/x86_64-linux-gnu/lxc/lxc-net start
sudo brctl show

and see if your container now starts?

Please also paste /etc/default/lxc-net.

Kevin Dalley (nereocystis) wrote :

Thanks again.

kevin@awabi:~/tmp$ sudo /usr/lib/x86_64-linux-gnu/lxc/lxc-net stop
kevin@awabi:~/tmp$ sudo /usr/lib/x86_64-linux-gnu/lxc/lxc-net start

dnsmasq: failed to create listening socket for 10.0.3.1: Cannot assign requested address
Failed to setup lxc-net.
kevin@awabi:~/tmp$ sudo brctl show
bridge name bridge id STP enabled interfaces
virbr0 8000.52540063031d yes virbr0-nic

No luck with starting it.

Here is /etc/default/lxc-net

# This file is auto-generated by lxc.postinst if it does not
# exist. Customizations will not be overridden.
# Leave USE_LXC_BRIDGE as "true" if you want to use lxcbr0 for your
# containers. Set to "false" if you'll use virbr0 or another existing
# bridge, or mavlan to your host's NIC.
USE_LXC_BRIDGE="true"
# USE_LXC_BRIDGE="false"

# If you change the LXC_BRIDGE to something other than lxcbr0, then
# you will also need to update your /etc/lxc/default.conf as well as the
# configuration (/var/lib/lxc/<container>/config) for any containers
# already created using the default config to reflect the new bridge
# name.
# If you have the dnsmasq daemon installed, you'll also have to update
# /etc/dnsmasq.d/lxc and restart the system wide dnsmasq daemon.
LXC_BRIDGE="lxcbr0"
LXC_ADDR="10.0.3.1"
LXC_NETMASK="255.255.255.0"
LXC_NETWORK="10.0.3.0/24"
LXC_DHCP_RANGE="10.0.3.2,10.0.3.254"
LXC_DHCP_MAX="253"
# Uncomment the next line if you'd like to use a conf-file for the lxcbr0
# dnsmasq. For instance, you can use 'dhcp-host=mail1,10.0.3.100' to have
# container 'mail1' always get ip address 10.0.3.100.
#LXC_DHCP_CONFILE=/etc/lxc/dnsmasq.conf

# Uncomment the next line if you want lxcbr0's dnsmasq to resolve the .lxc
# domain. You can then add "server=/lxc/10.0.3.1' (or your actual )
# to /etc/dnsmasq.conf, after which 'container1.lxc' will resolve on your
# host.
#LXC_DOMAIN="lxc"

Serge Hallyn (serge-hallyn) wrote :

You're still getting

 dnsmasq: failed to create listening socket for 10.0.3.1: Cannot assign requested address

What does

 sudo netstat -lap| grep LISTEN

show?

Kevin Dalley (nereocystis) wrote :
Download full text (11.4 KiB)

kevin@awabi:~/notes/security$ sudo netstat -lap| grep LISTEN
[sudo] password for kevin:
tcp 0 0 usscc-mvetter.co:domain *:* LISTEN 1192/named
tcp 0 0 192.168.43.174:domain *:* LISTEN 1192/named
tcp 0 0 192.168.122.1:domain *:* LISTEN 32446/dnsmasq
tcp 0 0 awabi:domain *:* LISTEN 3337/dnsmasq
tcp 0 0 localhost:domain *:* LISTEN 1192/named
tcp 0 0 *:ssh *:* LISTEN 1198/sshd
tcp 0 0 localhost:ipp *:* LISTEN 4504/cupsd
tcp 0 0 localhost:postgresql *:* LISTEN 1188/postgres
tcp 0 0 localhost:afs3-callback *:* LISTEN 10935/nxnode.bin
tcp 0 0 *:4505 *:* LISTEN 2455/python
tcp 0 0 localhost:953 *:* LISTEN 1192/named
tcp 0 0 *:4506 *:* LISTEN 2533/python
tcp 0 0 *:db-lsp *:* LISTEN 10312/dropbox
tcp 0 0 *:iscsi-target *:* LISTEN 1211/tgtd
tcp 0 0 *:35357 *:* LISTEN 4300/python
tcp 0 0 *:microsoft-ds *:* LISTEN 2963/smbd
tcp 0 0 *:19455 *:* LISTEN 16987/cli
tcp 0 0 localhost:17600 *:* LISTEN 10312/dropbox
tcp 0 0 *:4000 *:* LISTEN 3558/nxd
tcp 0 0 localhost:12001 *:* LISTEN 10935/nxnode.bin
tcp 0 0 *:zabbix-agent *:* LISTEN 4572/zabbix_agentd
tcp 0 0 localhost:20130 *:* LISTEN 2794/nxserver.bin
tcp 0 0 localhost:17603 *:* LISTEN 10312/dropbox
tcp 0 0 localhost:gpsd *:* LISTEN 1/init
tcp 0 0 *:5000 *:* LISTEN 4300/python
tcp 0 0 *:25672 *:* LISTEN 1237/beam.smp
tcp 0 0 *:8200 *:* LISTEN 2356/minidlnad
tcp 0 0 localhost:25001 *:* LISTEN 11005/nxclient.bin
tcp 0 0 localhost:27017 *:* LISTEN 991/mongod
tcp 0 0 localhost:mysql *:* LISTEN 2078/mysqld
tcp 0 0 *:netbios-ssn *:* LISTEN 2963/smbd
tcp 0 0 localhost:45708 *:* LISTEN 16987/cli
t...

Serge Hallyn (serge-hallyn) wrote :

D'oh! thanks for that info. You are running bind9, which is causing
the conflict. To work around this, you can tell bind9 to not listen
on 10.0.3.1 - see https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1240757

Kevin Dalley (nereocystis) wrote :

Thanks for your help.

Unfortunately, I didn't have any luck with this either.

I added

 listen-on { ! 10.0.3.1; };

just before the final "}" line.
restarted bind9, and I have the same error message.

Perhaps I'm missing something.

Kevin Dalley (nereocystis) wrote :

I have now added

 listen-on-v6 { none; };
restarted bind9, and again nothing:

kevin@awabi:~/src$ sudo lxc-start -n escale_build --logfile /tmp/lxc-log --logpriority 3 -Flxc-start: conf.c: instantiate_veth: 2621 failed to attach 'vethQCSVBP' to the bridge 'lxcbr0': Operation not permitted
lxc-start: conf.c: lxc_create_network: 2904 failed to create netdev
lxc-start: start.c: lxc_spawn: 920 failed to create the network
lxc-start: start.c: __lxc_start: 1172 failed to spawn 'escale_build'
lxc-start: lxc_start.c: main: 344 The container failed to start.
lxc-start: lxc_start.c: main: 348 Additional information can be obtained by setting the --logfile and --logpriority options.

I don't desparately need bind9, if that is really the problem.

Quoting Kevin Dalley (<email address hidden>):
> *** This bug is a duplicate of bug 1240757 ***
> https://bugs.launchpad.net/bugs/1240757
>
> I have now added
>
> listen-on-v6 { none; };
> restarted bind9, and again nothing:

Just to be sure - you did also retry stop+start of lxc-net after restarting
bind9, right?

Kevin Dalley (nereocystis) wrote :

I had missed restart lxc-net.

Unfortunately, that didn't make any difference.

kevin@awabi:~$ sudo lxc-start -n escale_build --logfile /tmp/lxc-log --logpriority 3 -F
lxc-start: conf.c: instantiate_veth: 2621 failed to attach 'vethKW0GNU' to the bridge 'lxcbr0': Operation not permitted
lxc-start: conf.c: lxc_create_network: 2904 failed to create netdev
lxc-start: start.c: lxc_spawn: 920 failed to create the network
lxc-start: start.c: __lxc_start: 1172 failed to spawn 'escale_build'
lxc-start: lxc_start.c: main: 344 The container failed to start.
lxc-start: lxc_start.c: main: 348 Additional information can be obtained by setting the --logfile and --logpriority options.

Serge Hallyn (serge-hallyn) wrote :

What does

 sudo /usr/lib/x86_64-linux-gnu/lxc/lxc-net stop
 sudo /usr/lib/x86_64-linux-gnu/lxc/lxc-net start
 sudo netstat -lap| grep LISTEN

show now that you've updated bind9's configuration?

Kevin Dalley (nereocystis) wrote :

And suddenly, it has started working.

I don't know why.

I did a reboot
sudo systemctl status lxc-net

The above command suddenly had results which looked familiar, in a good way.

Before this, I had unsuccessfully tried a few of the sample lxc.

Perhaps I updated something which fixed the problems.

Thanks for your help.

I wish I knew exactly what I did.

Serge Hallyn (serge-hallyn) wrote :

@faux

do you still have this issue?

Kevin Dalley (nereocystis) wrote :

I suggest that README.Debian (or Ubuntu) mention the workarounds regarding bind9 and ipv6.

It might save some time in debugging lxc problems.

Serge Hallyn (serge-hallyn) wrote :

How about a comment in the /etc/default/lxc-net file?

I guess the question is - when it broke for you, which files did you first look at to try to fix it?

Kevin Dalley (nereocystis) wrote :

What I should do is try to back out some of my changes and determine whether I can duplicate the problem.

Then I can determine which changes matter and which ones don't.

I won't have time for a few days, but I will do my best.

Chris West (faux) wrote :

I haven't seen this issue for ages, using primarily sid and wily guests.

Trying again on my desktop, which definitely used to have the issue, I can't reproduce it, using:

2015-11-15 11:29:28 status installed liblxc1:amd64 1.1.5-0ubuntu0.15.10.2
2015-11-15 11:29:29 status installed python3-lxc:amd64 1.1.5-0ubuntu0.15.10.2
2015-11-15 11:29:30 status installed lxc:amd64 1.1.5-0ubuntu0.15.10.2

.. from wily-proposed.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers