lxc-net fails to start properly after system crash: lock file

Bug #1391452 reported by Chris West
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxc (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

The presence of "/var/lock/lxc-net" causes "service lxc-net start" to claim success but actually just do nothing useful.

When the system goes down hard, /var/lock/lxc-net is not removed, fair enough. This means that systems require manual intervention after booting.

You can reproduce the problem by crashing some processes and fiddling with lock-files, but this happens at every single hard reboot:

faux@alohura:~% sudo service lxc-net stop
lxc-net stop/waiting

## the presence of other dnsmasqs makes this all confusing to me, so let's just kill them anyway, even if they were started by NetworkManager

faux@alohura:~% sudo killall dnsmasq
faux@alohura:~% sudo killall dnsmasq
dnsmasq: no process found

## simulate the lock-file being left over from a hard reboot

faux@alohura:~% sudo touch /var/lock/lxc-net

faux@alohura:~% sudo service lxc-net start
lxc-net start/running

## we haven't bothered to start dnsmasq (or create the bridge interface or..)

faux@alohura:~% ps aux | fgrep dnsmasq
faux 10592 0.0 0.0 13680 2064 pts/4 S+ 09:58 0:00 grep -F dnsmasq

## so containers won't start

faux@alohura:~% lxc-start -n new
lxc-start: lxc_start.c: main: 337 The container failed to start.
lxc-start: lxc_start.c: main: 339 To get more details, run the container in foreground mode.
lxc-start: lxc_start.c: main: 341 Additional information can be obtained by setting the --logfile and --logpriority options.

faux@alohura:~% lxc-start -F -n new
Error attaching veth494WIK to lxcbr0
Quota reached
lxc-start: start.c: lxc_spawn: 930 failed to create the configured network
lxc-start: start.c: __lxc_start: 1087 failed to spawn 'new'
lxc-start: lxc_start.c: main: 337 The container failed to start.
lxc-start: lxc_start.c: main: 341 Additional information can be obtained by setting the --logfile and --logpriority options.

faux@alohura:~%

The error message from lxc-start is very poor, too.

This can be worked around by blowing away the lockfile, then restarting lxc-net.

ProblemType: Bug
DistroRelease: Ubuntu 14.10
Package: lxc 1.1.0~alpha2-0ubuntu3
ProcVersionSignature: Ubuntu 3.16.0-24.32-generic 3.16.4
Uname: Linux 3.16.0-24-generic x86_64
ApportVersion: 2.14.7-0ubuntu8
Architecture: amd64
Date: Tue Nov 11 09:54:59 2014
InstallationDate: Installed on 2014-04-16 (209 days ago)
InstallationMedia:

KernLog:

ProcEnviron:
 SHELL=/bin/bash
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_GB.UTF-8
 XDG_RUNTIME_DIR=<set>
SourcePackage: lxc
UpgradeStatus: No upgrade log present (probably fresh install)
defaults.conf:
 lxc.network.type = veth
 lxc.network.link = lxcbr0
 lxc.network.flags = up
 lxc.network.hwaddr = 00:16:3e:xx:xx:xx
lxcsyslog:

Revision history for this message
Chris West (faux) wrote :
Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1391452] [NEW] lxc-net fails to start properly after system crash: lock file

Quoting Chris West (<email address hidden>):
> Public bug reported:
>
> The presence of "/var/lock/lxc-net" causes "service lxc-net start" to
> claim success but actually just do nothing useful.
>
> When the system goes down hard, /var/lock/lxc-net is not removed, fair

/var/lock should be a tmpfs. This sounds like a local misconfiguration.
Can you show what /var/lock looks like?

cat /proc/self/mountinfo
df -h /var/lock
ls -ld /var/lock
df -h /run/lock

 status: incomplete

Changed in lxc (Ubuntu):
status: New → Incomplete
Revision history for this message
Chris West (faux) wrote :

Good spot, thanks: /var/lock is on /, not a symlink to /var/run.

These machines are provisioned from OVH.com templates. I have raised a support request with them to see if they are aware of this or are doing anything strange on purpose.

--

$ cat /proc/self/mountinfo | fgrep lock
27 20 0:19 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none rw,size=5120k

$ df -h /var/lock
Filesystem Size Used Avail Use% Mounted on
/dev/disk/by-uuid/56f53efc-[..] 20G 3.2G 15G 18% /

$ ls -ld /var/lock
drwxrwxrwt 2 root root 4096 Nov 12 10:42 /var/lock

$ df -h /run/lock
Filesystem Size Used Avail Use% Mounted on
none 5.0M 0 5.0M 0% /run/lock

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1391452] Re: lxc-net fails to start properly after system crash: lock file

Thank you for the update.

 status: invalid

Changed in lxc (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.