lxc-start fails after upgrade to raring

Bug #1100877 reported by Martin Albisetti
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
lxc (Ubuntu)
Expired
High
Unassigned

Bug Description

After the upgrade to raring, my lxc containers no longer start:

beuno@beuno-desktop:~/canonical/ubuntuone$ sudo lxc-start -n u1-servers-precise
lxc-start: failed to attach 'vethYpmRvz' to the bridge 'lxcbr0' : No such device
lxc-start: failed to create netdev
lxc-start: failed to create the network
lxc-start: failed to spawn 'u1-servers-precise'
lxc-start: No such file or directory - failed to remove cgroup '/sys/fs/cgroup/cpuset//lxc/u1-servers-precise'

Upon inspection, the folder /sys/fs/cgroup/cpuset/lxc/ does not exist.

ProblemType: Bug
DistroRelease: Ubuntu 13.04
Package: lxc 0.8.0~rc1-4ubuntu48
ProcVersionSignature: Ubuntu 3.8.0-0.4-generic 3.8.0-rc3
Uname: Linux 3.8.0-0-generic x86_64
ApportVersion: 2.8-0ubuntu1
Architecture: amd64
Date: Thu Jan 17 14:32:17 2013
EcryptfsInUse: Yes
InstallationDate: Installed on 2012-07-31 (169 days ago)
InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Release amd64 (20120425)
MarkForUpload: True
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: lxc
UpgradeStatus: Upgraded to raring on 2012-10-03 (105 days ago)
lxcsyslog:

Revision history for this message
Martin Albisetti (beuno) wrote :
Martin Albisetti (beuno)
description: updated
Revision history for this message
Stéphane Graber (stgraber) wrote :

The error here is the absence of lxcbr0 on the system, the rest is just LXC trying to cleanup.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for reporting this bug. At first I thought it was a dup of 1099155, but if that were the case then lxcbr0 would exist, it simply would not have an address.

Could you please show the results of

ifconfig -a
brctl show
ls -l /var/log/upstart/lxc*
cat /etc/default/lxc
cat /etc/init/lxc-net.conf

Changed in lxc (Ubuntu):
importance: Undecided → High
status: New → Incomplete
Revision history for this message
Martin Albisetti (beuno) wrote :
Download full text (3.4 KiB)

beuno@beuno-desktop:~$ brctl show
bridge name bridge id STP enabled interfaces
beuno@beuno-desktop:~$ ls -l /var/log/upstart/lxc*
-rw-r----- 1 root root 92 Jan 16 23:41 /var/log/upstart/lxc-net.log
beuno@beuno-desktop:~$ cat /etc/default/lxc
# MIRROR to be used by ubuntu template at container creation:
# Leaving it undefined is fine
#MIRROR="http://archive.ubuntu.com/ubuntu"
# or
#MIRROR="http://<host-ip-addr>:3142/archive.ubuntu.com/ubuntu"

# LXC_AUTO - whether or not to start containers symlinked under
# /etc/lxc/auto
LXC_AUTO="true"

# Leave USE_LXC_BRIDGE as "true" if you want to use lxcbr0 for your
# containers. Set to "false" if you'll use virbr0 or another existing
# bridge, or mavlan to your host's NIC.
USE_LXC_BRIDGE="true"

# If you change the LXC_BRIDGE to something other than lxcbr0, then
# you will also need to update your /etc/lxc/lxc.conf as well as the
# configuration (/var/lib/lxc/<container>/config) for any containers
# already created using the default config to reflect the new bridge
# name.
# If you have the dnsmasq daemon installed, you'll also have to update
# /etc/dnsmasq.d/lxc and restart the system wide dnsmasq daemon.
LXC_BRIDGE="lxcbr0"
LXC_ADDR="10.0.3.1"
LXC_NETMASK="255.255.255.0"
LXC_NETWORK="10.0.3.0/24"
LXC_DHCP_RANGE="10.0.3.2,10.0.3.254"
LXC_DHCP_MAX="253"

LXC_SHUTDOWN_TIMEOUT=120
beuno@beuno-desktop:~$ cat /etc/init/lxc-net.conf
description "lxc network"
author "Serge Hallyn <email address hidden>"

start on starting lxc
stop on stopped lxc

env USE_LXC_BRIDGE="false"
env LXC_BRIDGE="lxcbr0"
env LXC_ADDR="10.0.3.1"
env LXC_NETMASK="255.255.255.0"
env LXC_NETWORK="10.0.3.0/24"
env LXC_DHCP_RANGE="10.0.3.2,10.0.3.254"
env LXC_DHCP_MAX="253"
env varrun="/var/run/lxc"

pre-start script
 [ -f /etc/default/lxc ] && . /etc/default/lxc

 [ "x$USE_LXC_BRIDGE" = "xtrue" ] || { stop; exit 0; }

 cleanup() {
  # dnsmasq failed to start, clean up the bridge
  iptables -t nat -D POSTROUTING -s ${LXC_NETWORK} ! -d ${LXC_NETWORK} -j MASQUERADE || true
  ifconfig ${LXC_BRIDGE} down || true
  brctl delbr ${LXC_BRIDGE} || true
 }

 if [ -d /sys/class/net/${LXC_BRIDGE} ]; then
  if [ ! -f ${varrun}/network_up ]; then
   # bridge exists, but we didn't start it
   stop;
  fi
  exit 0;
 fi

 # set up the lxc network
 echo 1 > /proc/sys/net/ipv4/ip_forward
 mkdir -p ${varrun}
 brctl addbr ${LXC_BRIDGE}
 ifconfig ${LXC_BRIDGE} ${LXC_ADDR} netmask ${LXC_NETMASK} up
 iptables -t nat -A POSTROUTING -s ${LXC_NETWORK} ! -d ${LXC_NETWORK} -j MASQUERADE
 dnsmasq -u lxc-dnsmasq --strict-order --bind-interfaces --pid-file=${varrun}/dnsmasq.pid --conf-file= --listen-address ${LXC_ADDR} --dhcp-range ${LXC_DHCP_RANGE} --dhcp-lease-max=${LXC_DHCP_MAX} --dhcp-no-override --except-interface=lo --interface=${LXC_BRIDGE} --dhcp-leasefile=/var/lib/misc/dnsmasq.${LXC_BRIDGE}.leases --dhcp-authoritative || cleanup
 touch ${varrun}/network_up
end script

post-stop script
 [ -f /etc/default/lxc ] && . /etc/default/lxc
 [ -f "${varrun}/network_up" ] || exit 0;
 # if $LXC_BRIDGE has attached interfaces, don't shut it down
 ls /sys/class/net/${LXC_BRIDGE}/brif/* > /dev/null 2>&1 && exit 0;

 if [ -d /sys/class/net/${LXC_BRIDGE} ]; then
  ifconf...

Read more...

Changed in lxc (Ubuntu):
status: Incomplete → New
Revision history for this message
Martin Albisetti (beuno) wrote :

Also, I have a completely different machine that I upgraded, exact same problem.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for the info - could you show the contents of /var/log/upstart/lxc-net.log?

Revision history for this message
Martin Albisetti (beuno) wrote :

beuno@beuno-laptop:~$ sudo cat /var/log/upstart/lxc-net.log

dnsmasq: failed to create listening socket for 10.0.3.1: Cannot assign requested address

Revision history for this message
Stéphane Graber (stgraber) wrote :

Sorry for the confusion, after closer inspection, it's indeed a duplicate of 1099155.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in lxc (Ubuntu):
status: New → Confirmed
Revision history for this message
Ross Patterson (rossp) wrote :

Not a duplicate, see #3

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@Ross, thanks for the bump.

Whoever is still seeing this, can you please respond with

1. output of 'ps -ef | grep dnsmasq'
2. output of 'sudo netstat -anp | grep :53'
3. contents of /etc/dnsmasq.d/* and /etc/dnsmasq.conf

Changed in lxc (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Arne-Christian Blystad (arne-christian) wrote :
Download full text (3.3 KiB)

Hello,

so this bug just happened to me. I was running a local juju environment (which uses LXC), and then my computer froze. I used SysRQ + REISUB to reboot it and then my juju-local environment was unable to start. I've destroyed the environment as well as all containers and rebooted after that, but now I'm getting the exact same error as #7.

#11:

> ps-ef | grep dnsmasq
nobody 3229 1337 0 00:22 ? 00:00:00 /usr/sbin/dnsmasq --no-resolv --keep-in-foreground --no-hosts --bind-interfaces --pid-file=/var/run/NetworkManager/dnsmasq.pid --listen-address=127.0.1.1 --conf-file=/var/run/NetworkManager/dnsmasq.conf --cache-size=0 --proxy-dnssec --enable-dbus=org.freedesktop.NetworkManager.dnsmasq --conf-dir=/etc/NetworkManager/dnsmasq.d
root 5452 4582 0 00:39 pts/3 00:00:00 grep dnsmasq
-------------------------------
sudo netstat -anp | grep :53
tcp 0 0 172.16.182.1:53 0.0.0.0:* LISTEN 1402/named
tcp 0 0 192.168.171.1:53 0.0.0.0:* LISTEN 1402/named
tcp 0 0 172.17.42.1:53 0.0.0.0:* LISTEN 1402/named
tcp 0 0 192.168.1.137:53 0.0.0.0:* LISTEN 1402/named
tcp 0 0 127.0.1.1:53 0.0.0.0:* LISTEN 3229/dnsmasq
tcp 0 0 127.0.0.1:53 0.0.0.0:* LISTEN 1402/named
tcp 1 0 192.168.1.137:53152 173.194.70.18:80 CLOSE_WAIT 3646/chrome
tcp 0 0 192.168.1.137:16088 83.227.238.201:53716 ESTABLISHED 4469/spotify
tcp6 0 0 :::53 :::* LISTEN 1402/named
udp 0 0 0.0.0.0:5353 0.0.0.0:* 1136/avahi-daemon:
udp 0 0 172.16.182.1:53 0.0.0.0:* 1402/named
udp 0 0 192.168.171.1:53 0.0.0.0:* 1402/named
udp 0 0 172.17.42.1:53 0.0.0.0:* 1402/named
udp 0 0 192.168.1.137:53 0.0.0.0:* 1402/named
udp 0 0 127.0.1.1:53 0.0.0.0:* 3229/dnsmasq
udp 0 0 127.0.0.1:53 0.0.0.0:* 1402/named
udp6 0 0 :::5353 :::* 1136/avahi-daemon:
udp6 0 0 :::53 :::* 1402/named
----------------------------------------------
Contents of /etc/dnsmasq.d/*:

cat /etc/dnsmasq.d/*
# Tell any system-wide dnsmasq instance to make sure to bind to interfaces
# instead of listening on 0.0.0.0
# WARNING: changes to this file will get lost if lxc is removed.
bind-interfaces
except-interface=lxcbr0
# Tell any system-wide dnsmasq instance to make sure to bind to interfaces
# instead of listening on 0.0.0.0
# WARNING: changes to this file will get lost if network-manager is removed.
bind-interfaces

/etc/dnsmasq.conf does not exist.

It is worth noting tha...

Read more...

Revision history for this message
Arne-Christian Blystad (arne-christian) wrote :

"Quickfix/simplefix" that worked for me:

apt-get purge lxc*

reboot

apt-get install lxc

Not the ideal solution, but it worked.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@Arne,

in your particular case the problem is that bind9 is installed and had bound port 53 on lxcbr0.

The topic of bind9 has come up before, and I can't recall offhand whether we had done anything about it yet.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Arne's bug is a duplicate of bug 1240757. There we never decided what to do about it.

@Martin, do you still see this bug? If not then I'll let Arne' steal the bug and mark it a dup of bug 1240757.

Revision history for this message
Stéphane Graber (stgraber) wrote :

FWIW, the original error message is usually caused by one of:
 - Server using ksplice which breaks the veth driver, requires a reboot
 - Machine with kernel module version mismatch causing the veth driver to fail to load
 - Machine with a kernel lacking veth support or missing the .ko or depmod entry

In all of those cases, there isn't much LXC can do and the following should fail just as much "sudo ip link add dev veth".

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for lxc (Ubuntu) because there has been no activity for 60 days.]

Changed in lxc (Ubuntu):
status: Incomplete → Expired
Revision history for this message
rowez (info-rowez) wrote :

Get error on `lsb_release -a`:

Distributor ID: Ubuntu
Description: Ubuntu 14.04.2 LTS
Release: 14.04
Codename: trusty

When doing:

sudo lxc-start --name con1
lxc-start: conf.c: instantiate_veth: 2978 failed to attach 'vethJ7BNX2' to the bridge 'lxcbr0': No such device
lxc-start: conf.c: lxc_create_network: 3261 failed to create netdev
lxc-start: start.c: lxc_spawn: 826 failed to create the network
lxc-start: start.c: __lxc_start: 1080 failed to spawn 'con1'
lxc-start: lxc_start.c: main: 342 The container failed to start.
lxc-start: lxc_start.c: main: 346 Additional information can be obtained by setting the --logfile and --logpriority options.

With the simplefix at #13: It will remove to many packages!

virbr0 is running so set to USE_LXC_BRIDGE="false" give same error!

/etc/dnsmasq.conf is missing so not doing install dnsmasq! It will remove maas-dns!

cat /etc/lxc/default.conf
lxc.network.type = veth
lxc.network.link = lxcbr0
lxc.network.flags = up
lxc.network.hwaddr = 00:16:3e:xx:xx:xx

Read more in attachment's!

Revision history for this message
rowez (info-rowez) wrote :

followup #18

Sorry forgot the attachment's:)

Steve Beattie (sbeattie)
tags: removed: apparmor
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@info-rowez,

if you set USE_LXC_BRIDGE="false" then you cannot use lxcbr0 as your lxc.network.link. Update your container configuration and /etc/lxc/default.conf

Revision history for this message
rowez (info-rowez) wrote :

@ serge-hallyn

It works now! Nothing changed! USE_LXC_BRIDGE was and is set to true. Think I forgot to reboot after upgrade:(

Thanks for fast reply!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.