autopkgtest fails sometimes with adt-virt-lxc

Bug #1348749 reported by Scott Moser
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
autopkgtest (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

# set up vars, install packages
# Note, haveged is just to stop gpg gen-key from blocking.
$ release=utopic; cname="source-$release";
$ sudo apt-get install --assume-yes --quiet lxc autopkgtest haveged

# set up container
$ sudo lxc-create --template=ubuntu-cloud "--name=$cname" -- \
   --stream=daily --release=$release

# get some .dsc to build
$ apt-get source python-boto
$ dsc=$(echo *.dsc)

$ dpkg-query --show lxc autopkgtest
autopkgtest 3.3
lxc 1.1.0~alpha1-0ubuntu3

# try to build the dsc. Sometimes this will work fine.
# sometimes it will fail. Heres an example failed output.
$ adt-run --output-dir=out.d "$dsc" --- lxc --ephemeral --sudo "$cname"
adt-run --debug --output-dir=out.d "$dsc" --- lxc --ephemeral --sudo $cname
adt-run: DBG: Parsed options: Namespace(apt_pocket=[], copy=[], gainroot=None, gnupghome='~/.autopkgtest/gpg', logfile=None, output_dir='out.d', set_lang='C.UTF-8', setup_commands=[], shell=False, shell_fail=False, summary=None, timeout_build=100000, timeout_copy=300, timeout_factor=1.0, timeout_install=3000, timeout_short=100, timeout_test=10000, user=None, verbosity=2)
adt-run: DBG: Remaining arguments: ['python-boto_2.20.1-2ubuntu2.dsc']
adt-run: DBG: Interpreted actions: ['--source', 'python-boto_2.20.1-2ubuntu2.dsc']
adt-run: DBG: Virt runner arguments: ['lxc', '--ephemeral', '--sudo', 'source-utopic']
adt-run: DBG: / tmp(specified) rmtree out.d
adt-run: DBG: testbed init
adt-run [18:04:20]: version 3.3
adt-run: DBG: $ vserver: adt-virt-lxc --ephemeral --sudo source-utopic
adt-run: DBG: got reply from testbed: ok
adt-run: DBG: testbed open, scratch=None
adt-run: DBG: sending command to testbed: open
<VirtSubproc>: failure: (down) ['mkdir', '-m', '777', '/tmp/adt-virt-lxc.shared.8p2iecfr/downtmp'] failed (exit status 2)

Sometimes it will succeed. For this failure case, the next lines after
sending command to testbed would have been somethin glike:

adt-run: DBG: got reply from testbed: ok /tmp/adt-virt-lxc.shared.vq82rc_3/downtmp
adt-run: DBG: sending command to testbed: print-execute-command
adt-run: DBG: got reply from testbed: ok sudo,-E,lxc-attach,-n,adt-virt-lxc-cwlszg,--,env,-i,sh,-ac,.%20/etc/environment%202%3E/dev/null%3B%20.%20/etc/default/locale%202%3E/dev/nu
ll%3B%20exec%20%22%24%40%22,--
adt-run: DBG: sending command to testbed: capabilities
...

ProblemType: Bug
DistroRelease: Ubuntu 14.10
Package: autopkgtest 3.3
ProcVersionSignature: User Name 3.16.0-5.10-generic 3.16.0-rc6
Uname: Linux 3.16.0-5-generic x86_64
ApportVersion: 2.14.4-0ubuntu2
Architecture: amd64
Date: Fri Jul 25 17:39:18 2014
Ec2AMI: ami-00000039
Ec2AMIManifest: FIXME
Ec2AvailabilityZone: nova
Ec2InstanceType: m1.small
Ec2Kernel: aki-00000002
Ec2Ramdisk: ari-00000002
PackageArchitecture: all
ProcEnviron:
 TERM=screen
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: autopkgtest
UpgradeStatus: No upgrade log present (probably fresh install)

Related branches

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :

/usr/bin/adt-virt-lxc --help says:
 -e, --ephemeral Use ephemeral overlays instead of cloning (much faster, but
                   might cause errors in some corner cases)

so i suspect that what this really is is a race condition due to overlayfs.

Revision history for this message
Scott Moser (smoser) wrote :

I've debugged this, and the exit code returned is '2'.
Thats the same exit code that you get if you do:
 sh -ac '. /nonexistant; true'; echo $?

So whats happening, I think is that in adt-virt-lxc, its doing:

  . /etc/environment 2>/dev/null; . /etc/default/locale 2>/dev/null; exec "$@"'

in the cloud images, /etc/default/locale is created by cloud-init on boot.
So we were hitting a race condition where the command would try to source a non-existing file.

Revision history for this message
Scott Moser (smoser) wrote :
Changed in autopkgtest (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Scott Moser (smoser) wrote :

I'm not sure were i'd send this for proper upstream, so I've attached and upstream patch here.

tags: added: patch
Revision history for this message
Martin Pitt (pitti) wrote :

> I'm not sure were i'd send this for proper upstream

Here is fine, thanks! I'll come up with a test case and integrate this on Monday.

Revision history for this message
Martin Pitt (pitti) wrote :
Changed in autopkgtest (Ubuntu):
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package autopkgtest - 3.3.1

---------------
autopkgtest (3.3.1) unstable; urgency=medium

  * Accept comma separators in Tests:, Restrictions:, and Features: fields;
    this is consistent with Depends: and avoids skipping tests. (LP: #1347958)
  * Generalize implementation of getting an interactive shell in testbeds.
    Most runners now don't need to provide a hook_shell() any more.
  * Export install environment to interactive testbed shells, so that
    unpacked test depends in the temp dir are accessible. (LP: #1339103)
  * lxc: Don't fail on a nonexisting /etc/environment or /etc/default/locale.
    (LP: #1348749)
  * adb ssh script: Add --apt-update option to run "apt-get update" (with
    temporarily switching to r/w), to run tests on older development images.
    (LP: #1342838)
  * Drop the very short and unnecessary install timeouts from the NullRunner
    tests.

 -- Martin Pitt <email address hidden> Tue, 29 Jul 2014 13:45:36 +0200

Changed in autopkgtest (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.