lxc-net fails to start properly after system crash: lock file
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxc (Ubuntu) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
The presence of "/var/lock/lxc-net" causes "service lxc-net start" to claim success but actually just do nothing useful.
When the system goes down hard, /var/lock/lxc-net is not removed, fair enough. This means that systems require manual intervention after booting.
You can reproduce the problem by crashing some processes and fiddling with lock-files, but this happens at every single hard reboot:
faux@alohura:~% sudo service lxc-net stop
lxc-net stop/waiting
## the presence of other dnsmasqs makes this all confusing to me, so let's just kill them anyway, even if they were started by NetworkManager
faux@alohura:~% sudo killall dnsmasq
faux@alohura:~% sudo killall dnsmasq
dnsmasq: no process found
## simulate the lock-file being left over from a hard reboot
faux@alohura:~% sudo touch /var/lock/lxc-net
faux@alohura:~% sudo service lxc-net start
lxc-net start/running
## we haven't bothered to start dnsmasq (or create the bridge interface or..)
faux@alohura:~% ps aux | fgrep dnsmasq
faux 10592 0.0 0.0 13680 2064 pts/4 S+ 09:58 0:00 grep -F dnsmasq
## so containers won't start
faux@alohura:~% lxc-start -n new
lxc-start: lxc_start.c: main: 337 The container failed to start.
lxc-start: lxc_start.c: main: 339 To get more details, run the container in foreground mode.
lxc-start: lxc_start.c: main: 341 Additional information can be obtained by setting the --logfile and --logpriority options.
faux@alohura:~% lxc-start -F -n new
Error attaching veth494WIK to lxcbr0
Quota reached
lxc-start: start.c: lxc_spawn: 930 failed to create the configured network
lxc-start: start.c: __lxc_start: 1087 failed to spawn 'new'
lxc-start: lxc_start.c: main: 337 The container failed to start.
lxc-start: lxc_start.c: main: 341 Additional information can be obtained by setting the --logfile and --logpriority options.
faux@alohura:~%
The error message from lxc-start is very poor, too.
This can be worked around by blowing away the lockfile, then restarting lxc-net.
ProblemType: Bug
DistroRelease: Ubuntu 14.10
Package: lxc 1.1.0~alpha2-
ProcVersionSign
Uname: Linux 3.16.0-24-generic x86_64
ApportVersion: 2.14.7-0ubuntu8
Architecture: amd64
Date: Tue Nov 11 09:54:59 2014
InstallationDate: Installed on 2014-04-16 (209 days ago)
InstallationMedia:
KernLog:
ProcEnviron:
SHELL=/bin/bash
TERM=xterm
PATH=(custom, no user)
LANG=en_GB.UTF-8
XDG_RUNTIME_
SourcePackage: lxc
UpgradeStatus: No upgrade log present (probably fresh install)
defaults.conf:
lxc.network.type = veth
lxc.network.link = lxcbr0
lxc.network.flags = up
lxc.network.hwaddr = 00:16:3e:xx:xx:xx
lxcsyslog:
Quoting Chris West (<email address hidden>):
> Public bug reported:
>
> The presence of "/var/lock/lxc-net" causes "service lxc-net start" to
> claim success but actually just do nothing useful.
>
> When the system goes down hard, /var/lock/lxc-net is not removed, fair
/var/lock should be a tmpfs. This sounds like a local misconfiguration.
Can you show what /var/lock looks like?
cat /proc/self/ mountinfo
df -h /var/lock
ls -ld /var/lock
df -h /run/lock
status: incomplete