lxc fails to create containers concurrently

Bug #1007483 reported by Jean-Baptiste Lallement on 2012-06-01
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxc (Ubuntu)
Low
Serge Hallyn
Precise
Undecided
Unassigned

Bug Description

========== SRU justification ==============
1. Impact: parallel creations of containers with the same template
will result in all but one failing. They should instead run in
parallel without racing in their critical section.
2. Development fix: don't add '-n' to the flock arguments, which causes
flock to fail instead of waiting.
3. Stable fix: same as development fix.
4 Test case:
 for i in `seq 10 12`; do
  screen -d -m sudo lxc-create -t ubuntu -n p$i
 done
 ret=0
 while [ $ret -eq 0 ]; do
            sleep 1
             pidof bash > /dev/null 2>&1
             ret=$?
        done

 while [ `pidof lxc-create > /dev/null 2>&1` ]; do
  sleep 1
 done
 lxc-ls
 # make sure p10, p11, and p12 exist
5. Regression potential: none
==============================

When multiple containers (template ubuntu) are created simultaneously only the first container is successfully created, others fail with "failed to execute template 'ubuntu'"

This is a very common scenario in automated tests.

TEST CASE:
Run the following script with sudo
"""
#!/bin/sh
MAX=3
echo "Destroying existing containers"
for x in $(seq 1 $MAX); do
    sudo lxc-destroy -f -n test-lxc-$x &
done

sleep 10
sudo lxc-list

echo "Creating $MAX ubuntu containers"
for x in $(seq 1 $MAX); do
    echo -n "Creating $x ..."
    sudo lxc-create -t ubuntu -n test-lxc-$x &
    echo "done"
done

echo "Waiting 30s for test to finish"
sleep 30
sudo lxc-list
"""

ACTUAL RESULT
=================================
$ sudo sh ./lxc-create-concurrent
Destroying existing containers
'test-lxc-1' does not exist
'test-lxc-2' does not exist
'test-lxc-3' does not exist
RUNNING

FROZEN

STOPPED

Creating 3 ubuntu containers
Creating 1 ...done
Creating 2 ...done
Creating 3 ...done
Waiting 30s for test to finish

No config file specified, using the default config

No config file specified, using the default config

No config file specified, using the default config
debootstrap is /usr/sbin/debootstrap
debootstrap is /usr/sbin/debootstrap
Checking cache download in /var/cache/lxc/quantal/rootfs-amd64 ...
Copy /var/cache/lxc/quantal/rootfs-amd64 to /var/lib/lxc/test-lxc-1/rootfs ...
Copying rootfs to /var/lib/lxc/test-lxc-1/rootfs ...
failed to execute template 'ubuntu'
debootstrap is /usr/sbin/debootstrap
failed to execute template 'ubuntu'
aborted
aborted

##
# The default user is 'ubuntu' with password 'ubuntu'!
# Use the 'sudo' command to run tasks as root in the container.
##

'ubuntu' template installed
'test-lxc-1' created
RUNNING

FROZEN

STOPPED
  test-lxc-1
=================================

ProblemType: Bug
DistroRelease: Ubuntu 12.10
Package: lxc 0.8.0~rc1-4ubuntu10
ProcVersionSignature: Ubuntu 3.4.0-3.8-generic 3.4.0
Uname: Linux 3.4.0-3-generic x86_64
ApportVersion: 2.1.1-0ubuntu1
Architecture: amd64
Date: Fri Jun 1 17:32:06 2012
ProcEnviron:
 TERM=xterm
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: lxc
UpgradeStatus: Upgraded to quantal on 2012-01-31 (121 days ago)

Jean-Baptiste Lallement (jibel) wrote :
Serge Hallyn (serge-hallyn) wrote :

The download_ubuntu function does 'flock -n -x 200'. Removing the -n should make the parallel lxc-creates block rather than exit with failure.

Clint Byrum (clint-fewbar) wrote :

This is happening because populating the cache in /var/cache/lxc/... Is not protected by a lock file or random temp names. The former approach would be preferrable, since if we tell the system to create 3 at a time, we don't want 3 concurrent debootstraps, we want 1 and the other 2 processes waiting for it to finish.

Clint Byrum (clint-fewbar) wrote :

Doh, just behind you Serge. Ok, that makes sense. :)

Changed in lxc (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Serge Hallyn (serge-hallyn) wrote :

Note that lxc-create doesn't fare well anyway without a terminal - apparently due to its use of apt-get update. The scriptlet showing in the description will therefore fail regadless.

Changed in lxc (Ubuntu):
status: Triaged → Confirmed
importance: Medium → Low
Changed in lxc (Ubuntu):
assignee: nobody → Serge Hallyn (serge-hallyn)
status: Confirmed → In Progress
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package lxc - 0.8.0~rc1-4ubuntu13

---------------
lxc (0.8.0~rc1-4ubuntu13) quantal; urgency=low

  * 0086-lxc-unshare-zero-args: fix lxc-unshare segfaulting when no command
    is given (LP: #1011603)
  * 0087-lxc-ls-dash: fix lxc-ls for containers whose names start with a
    dash (LP: #1006332)
  * 0088-ubuntu-template-flock: don't fail when flock is busy, just wait,
    so concurrent lxc-creates don't break. (LP: #1007483)
  * 0089-lxc-netstat-exec: fix lxc-netstat errors (LP: #1011739)
 -- Serge Hallyn <email address hidden> Mon, 11 Jun 2012 15:46:25 +0000

Changed in lxc (Ubuntu):
status: In Progress → Fix Released
description: updated

Hello Jean-Baptiste, or anyone else affected,

Accepted lxc into precise-proposed. The package will build now and be available in a few hours. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case details of your testing will help us make a better decision. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in lxc (Ubuntu Precise):
status: New → Fix Committed
tags: added: verification-needed
Stéphane Graber (stgraber) wrote :

Confirmed that all 3 containers are properly getting created now.

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package lxc - 0.7.5-3ubuntu59

---------------
lxc (0.7.5-3ubuntu59) precise-proposed; urgency=low

  [ Serge Hallyn ]
  * 0085-pivot-dir: use a directory other than /mnt to put the pivot_root
    old dir into (LP: #986385)
  * 0086-lxc-unshare-zero-args: fix lxc-unshare segfaulting when no command
    is given (LP: #1011603)
  * 0087-lxc-ls-dash: fix lxc-ls for containers whose names start with a
    dash (LP: #1006332)
  * 0088-ubuntu-template-flock: don't fail when flock is busy, just wait,
    so concurrent lxc-creates don't break. (LP: #1007483)
  * debian/rules, debian/lxc.apport: install apport hook (LP: #1011644)

  [ Stéphane Graber ]
  * Ship /etc/dnsmasq.d/lxc to configure an eventual system wide
    dnsmasq daemon not to listen on the LXC bridge interface. (LP: #928524)
 -- Serge Hallyn <email address hidden> Mon, 11 Jun 2012 19:56:30 -0500

Changed in lxc (Ubuntu Precise):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers