lock contention running apt in bootstrap

Bug #1384259 reported by Aaron Bentley
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
Andrew Wilkins
1.22
Fix Released
High
Andrew Wilkins

Bug Description

Juju bootstrap has a race condition where apt cannot acquire a lock:

Started by user admin
Building on master
[workspace] $ /bin/bash /tmp/hudson3325586582201465768.sh
++ rm /var/lib/jenkins/jobs/functional-backup-restore/workspace/artifacts /var/lib/jenkins/jobs/functional-backup-restore/workspace/buildvars.bash /var/lib/jenkins/jobs/functional-backup-restore/workspace/extracted-bin /var/lib/jenkins/jobs/functional-backup-restore/workspace/juju-core_1.21-alpha2-0ubuntu1~12.04.1~juju1_amd64.deb /var/lib/jenkins/jobs/functional-backup-restore/workspace/juju-local_1.21-alpha2-0ubuntu1~12.04.1~juju1_all.deb -rf
++ mkdir -p /var/lib/jenkins/jobs/functional-backup-restore/workspace/artifacts
++ touch /var/lib/jenkins/jobs/functional-backup-restore/workspace/artifacts/empty
++ afact=lastSuccessfulBuild/artifact
+++ lsb_release -sr
++ RELEASE=12.04
+++ dpkg --print-architecture
++ ARCH=amd64
++ [[ -n 1950 ]]
++ wget -q http://juju-ci.vapour.ws:8080/job/build-revision/lastSuccessfulBuild/artifact/buildvars.bash
++ source buildvars.bash
+++ export BRANCH=gitbranch:master:github.com/juju/juju
+++ BRANCH=gitbranch:master:github.com/juju/juju
+++ export REVISION_ID=53d191730604a468098c9a73696ac11403f9efc8
+++ REVISION_ID=53d191730604a468098c9a73696ac11403f9efc8
+++ export VERSION=1.21-alpha2
+++ VERSION=1.21-alpha2
++ PACKAGES_JOB=publish-revision
++ JUJU_LOCAL_DEB=juju-local_1.21-alpha2-0ubuntu1~12.04.1~juju1_all.deb
++ JUJU_CORE_DEB=juju-core_1.21-alpha2-0ubuntu1~12.04.1~juju1_amd64.deb
+++ head -c8
+++ echo 53d191730604a468098c9a73696ac11403f9efc8
++ rev=53d19173
++ echo 'Testing gitbranch:master:github.com/juju/juju 53d19173 on test-function-hp-a'
Testing gitbranch:master:github.com/juju/juju 53d19173 on test-function-hp-a
+++ sed 's,[+],%2B,'
+++ echo juju-local_1.21-alpha2-0ubuntu1~12.04.1~juju1_all.deb
++ ENCODED_LOCAL_DEB=juju-local_1.21-alpha2-0ubuntu1~12.04.1~juju1_all.deb
+++ sed 's,[+],%2B,'
+++ echo juju-core_1.21-alpha2-0ubuntu1~12.04.1~juju1_amd64.deb
++ ENCODED_CORE_DEB=juju-core_1.21-alpha2-0ubuntu1~12.04.1~juju1_amd64.deb
++ wget -q http://juju-ci.vapour.ws:8080/job/publish-revision/lastSuccessfulBuild/artifact/juju-local_1.21-alpha2-0ubuntu1~12.04.1~juju1_all.deb
++ wget -q http://juju-ci.vapour.ws:8080/job/publish-revision/lastSuccessfulBuild/artifact/juju-core_1.21-alpha2-0ubuntu1~12.04.1~juju1_amd64.deb
++ dpkg-deb -x /var/lib/jenkins/jobs/functional-backup-restore/workspace/juju-core_1.21-alpha2-0ubuntu1~12.04.1~juju1_amd64.deb extracted-bin
+++++ find extracted-bin -name juju
++++ dirname extracted-bin/usr/lib/juju-1.21-alpha2/bin/juju
+++ readlink -f extracted-bin/usr/lib/juju-1.21-alpha2/bin
++ export NEW_JUJU_BIN=/mnt/jenkinshome/jobs/functional-backup-restore/workspace/extracted-bin/usr/lib/juju-1.21-alpha2/bin
++ NEW_JUJU_BIN=/mnt/jenkinshome/jobs/functional-backup-restore/workspace/extracted-bin/usr/lib/juju-1.21-alpha2/bin
++ '[' test-function-hp-a == manual ']'
++ jenv=/var/lib/jenkins/cloud-city/environments/test-function-hp-a.jenv
++ [[ -e /var/lib/jenkins/cloud-city/environments/test-function-hp-a.jenv ]]
++ destroy-environment test-function-hp-a
juju --show-log destroy-environment test-function-hp-a --force -y
2014-10-22 11:56:32 INFO juju.cmd supercommand.go:37 running juju [1.20.10-precise-amd64 gc]
2014-10-22 11:56:32 INFO juju.provider.openstack provider.go:204 opening environment "test-function-hp-a"
2014-10-22 11:56:32 WARNING juju.environs.config config.go:968 unknown config field "prefer-ipv6"
2014-10-22 11:56:32 WARNING juju.environs.config config.go:968 unknown config field "uuid"
2014-10-22 11:56:32 INFO juju.provider.common destroy.go:15 destroying environment "test-function-hp-a"
2014-10-22 11:56:53 INFO juju.cmd supercommand.go:329 command finished
++ [[ -e /var/lib/jenkins/cloud-city/environments/test-function-hp-a.jenv ]]
++ jenv=/var/lib/jenkins/cloud-city/environments/functional-backup-restore.jenv
++ [[ -e /var/lib/jenkins/cloud-city/environments/functional-backup-restore.jenv ]]
+++ find /var/lib/jenkins/cloud-city/environments -name '$JOB_NAME*.jenv'
++ azure_jenvs=
+ assess_recovery.py --backup --charm-prefix=local:precise/ /mnt/jenkinshome/jobs/functional-backup-restore/workspace/extracted-bin/usr/lib/juju-1.21-alpha2/bin test-function-hp-a
juju --show-log bootstrap -e test-function-hp-a --constraints mem=2G
2014-10-22 11:56:54 INFO juju.cmd supercommand.go:37 running juju [1.21-alpha2-precise-amd64 gc]
2014-10-22 11:56:55 INFO juju.provider.openstack provider.go:248 opening environment "test-function-hp-a"
2014-10-22 11:56:56 INFO juju.network network.go:102 setting prefer-ipv6 to false
2014-10-22 11:56:56 INFO juju.utils http.go:59 hostname SSL verification enabled
2014-10-22 11:56:56 INFO juju.utils http.go:59 hostname SSL verification enabled
2014-10-22 11:56:56 INFO juju.utils http.go:59 hostname SSL verification enabled
Bootstrapping environment "test-function-hp-a"
2014-10-22 11:56:57 INFO juju.environs.tools tools.go:86 reading tools with major.minor version 1.21
2014-10-22 11:56:57 INFO juju.utils http.go:59 hostname SSL verification enabled
2014-10-22 11:56:57 INFO juju.utils http.go:59 hostname SSL verification enabled
2014-10-22 11:56:58 INFO juju.utils http.go:59 hostname SSL verification enabled
2014-10-22 11:56:58 INFO juju.utils http.go:59 hostname SSL verification enabled
Starting new instance for initial state server
Launching instance
2014-10-22 11:57:02 INFO juju.utils http.go:59 hostname SSL verification enabled
2014-10-22 11:57:02 INFO juju.utils http.go:59 hostname SSL verification enabled
2014-10-22 11:57:02 INFO juju.utils http.go:59 hostname SSL verification enabled
2014-10-22 11:57:03 INFO juju.provider.openstack provider.go:1037 allocated public IP 15.125.100.3
2014-10-22 11:57:10 INFO juju.provider.openstack provider.go:1085 started instance "ca167f9c-fa28-40f0-a65d-6d3f1a1ef7db"
2014-10-22 11:57:23 INFO juju.provider.openstack provider.go:1095 assigned public IP 15.125.100.3 to "ca167f9c-fa28-40f0-a65d-6d3f1a1ef7db"
 - ca167f9c-fa28-40f0-a65d-6d3f1a1ef7db
2014-10-22 11:57:25 INFO juju.environs.bootstrap bootstrap.go:178 newest version: 1.21-alpha2
2014-10-22 11:57:25 INFO juju.environs.bootstrap bootstrap.go:206 picked bootstrap tools version: 1.21-alpha2
Installing Juju agent on bootstrap instance
Waiting for address
2014-10-22 11:57:26 INFO juju.provider.openstack provider.go:453 instance ca167f9c-fa28-40f0-a65d-6d3f1a1ef7db has floating IP address: 15.125.100.3
Attempting to connect to 15.125.100.3:22
Attempting to connect to 10.0.0.135:22
2014-10-22 11:57:36 INFO juju.provider.openstack provider.go:453 instance ca167f9c-fa28-40f0-a65d-6d3f1a1ef7db has floating IP address: 15.125.100.3
2014-10-22 11:57:46 INFO juju.provider.openstack provider.go:453 instance ca167f9c-fa28-40f0-a65d-6d3f1a1ef7db has floating IP address: 15.125.100.3
2014-10-22 11:57:56 INFO juju.provider.openstack provider.go:453 instance ca167f9c-fa28-40f0-a65d-6d3f1a1ef7db has floating IP address: 15.125.100.3
2014-10-22 11:58:06 INFO juju.provider.openstack provider.go:453 instance ca167f9c-fa28-40f0-a65d-6d3f1a1ef7db has floating IP address: 15.125.100.3
2014-10-22 11:58:16 INFO juju.provider.openstack provider.go:453 instance ca167f9c-fa28-40f0-a65d-6d3f1a1ef7db has floating IP address: 15.125.100.3
2014-10-22 11:58:26 INFO juju.provider.openstack provider.go:453 instance ca167f9c-fa28-40f0-a65d-6d3f1a1ef7db has floating IP address: 15.125.100.3
2014-10-22 11:58:36 INFO juju.provider.openstack provider.go:453 instance ca167f9c-fa28-40f0-a65d-6d3f1a1ef7db has floating IP address: 15.125.100.3
2014-10-22 11:58:46 INFO juju.provider.openstack provider.go:453 instance ca167f9c-fa28-40f0-a65d-6d3f1a1ef7db has floating IP address: 15.125.100.3
2014-10-22 11:58:56 INFO juju.provider.openstack provider.go:453 instance ca167f9c-fa28-40f0-a65d-6d3f1a1ef7db has floating IP address: 15.125.100.3
Warning: Permanently added '15.125.100.3' (ECDSA) to the list of known hosts.
sudo: unable to resolve host juju-test-function-hp-3-machine-0
Logging to /var/log/cloud-init-output.log on remote host
Running apt-get update
Cloud-init v. 0.7.5 running 'init-local' at Wed, 22 Oct 2014 11:56:26 +0000. Up 9.57 seconds.
Cloud-init v. 0.7.5 running 'init' at Wed, 22 Oct 2014 11:56:30 +0000. Up 14.27 seconds.
ci-info: +++++++++++++++++++++++++Net device info++++++++++++++++++++++++++
ci-info: +--------+------+------------+---------------+-------------------+
ci-info: | Device | Up | Address | Mask | Hw-Address |
ci-info: +--------+------+------------+---------------+-------------------+
ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | . |
ci-info: | eth0 | True | 10.0.0.136 | 255.255.255.0 | fa:16:3e:3b:6a:c7 |
ci-info: +--------+------+------------+---------------+-------------------+
ci-info: ++++++++++++++++++++++++++++++Route info++++++++++++++++++++++++++++++
ci-info: +-------+-------------+----------+---------------+-----------+-------+
ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags |
ci-info: +-------+-------------+----------+---------------+-----------+-------+
ci-info: | 0 | 0.0.0.0 | 10.0.0.1 | 0.0.0.0 | eth0 | UG |
ci-info: | 1 | 10.0.0.0 | 0.0.0.0 | 255.255.255.0 | eth0 | U |
ci-info: +-------+-------------+----------+---------------+-----------+-------+
Oct 22 11:56:42 ubuntu pollinate[743]: ERROR: Network communication failed [60]\n11:56:42.888121 * Hostname was NOT found in DNS cache
  % Total % Received % Xferd Average Speed Time Time Time Current
                                 Dload Upload Total Spent Left Speed

  0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 011:56:43.013939 * Trying 91.189.94.53...

  0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 011:56:43.148600 * Connected to entropy.ubuntu.com (91.189.94.53) port 443 (#0)
11:56:43.150648 * successfully set certificate verify locations:
11:56:43.150705 * CAfile: /etc/pollinate/entropy.ubuntu.com.pem
  CApath: /dev/null
11:56:43.150881 * SSLv3, TLS handshake, Client hello (1):
11:56:43.150963 } [data not shown]
11:56:43.285614 * SSLv3, TLS handshake, Server hello (2):
11:56:43.285672 { [data not shown]
11:56:43.299533 * SSLv3, TLS handshake, CERT (11):
11:56:43.299590 { [data not shown]
11:56:43.300162 * SSLv3, TLS alert, Server hello (2):
11:56:43.300216 } [data not shown]
11:56:43.300350 * SSL certificate problem: self signed certificate in certificate chain

  0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
11:56:43.300581 * Closing connection 0
curl: (60) SSL certificate problem: self signed certificate in certificate chain
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.
2014-10-22 11:56:42,853 - util.py[WARNING]: Running seed_random (<module 'cloudinit.config.cc_seed_random' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_seed_random.pyc'>) failed
Generating public/private rsa key pair.
Your identification has been saved in /etc/ssh/ssh_host_rsa_key.
Your public key has been saved in /etc/ssh/ssh_host_rsa_key.pub.
The key fingerprint is:
bc:d6:71:8e:e2:4b:70:15:f1:36:3a:67:0a:40:52:8c root@juju-test-function-hp-3-machine-0
The key's randomart image is:
+--[ RSA 2048]----+
| .+o o. |
| Eo. o |
| . . + |
| o . o . |
| . S + + |
| o + X |
| = + . |
| + . |
| o. |
+-----------------+
Generating public/private dsa key pair.
Your identification has been saved in /etc/ssh/ssh_host_dsa_key.
Your public key has been saved in /etc/ssh/ssh_host_dsa_key.pub.
The key fingerprint is:
98:c1:2b:cc:58:21:4e:21:62:20:bc:10:78:82:66:cd root@juju-test-function-hp-3-machine-0
The key's randomart image is:
+--[ DSA 1024]----+
|@o*.. |
|OB.E o |
|+oo . o |
| . = = |
| . + + S |
| . |
| |
| |
| |
+-----------------+
Generating public/private ecdsa key pair.
Your identification has been saved in /etc/ssh/ssh_host_ecdsa_key.
Your public key has been saved in /etc/ssh/ssh_host_ecdsa_key.pub.
The key fingerprint is:
dc:2c:d4:b1:85:c4:38:70:b3:c2:fe:e3:66:31:f4:52 root@juju-test-function-hp-3-machine-0
The key's randomart image is:
+--[ECDSA 256]---+
| ..o+o.. |
| . .o+o+ |
| o o.o |
| . +.oE |
| ..Soo |
| .+.. |
| o+ |
| .o. |
| o. |
+-----------------+
Cloud-init v. 0.7.5 running 'modules:config' at Wed, 22 Oct 2014 11:56:48 +0000. Up 30.67 seconds.
Generating locales...
  en_US.UTF-8... up-to-date
Generation complete.
Cloud-init v. 0.7.5 running 'modules:final' at Wed, 22 Oct 2014 11:56:54 +0000. Up 36.57 seconds.
+ install -D -m 644 /dev/null /var/lib/juju/nonce.txt
+ printf %s\n user-admin:bootstrap
Cloud-init v. 0.7.5 finished at Wed, 22 Oct 2014 11:56:55 +0000. Datasource DataSourceOpenStack [net,ver=2]. Up 37.66 seconds
Ign http://security.ubuntu.com trusty-security InRelease
Get:1 http://security.ubuntu.com trusty-security Release.gpg [933 B]
Get:2 http://security.ubuntu.com trusty-security Release [59.7 kB]
Ign http://az1.clouds.archive.ubuntu.com trusty InRelease
Get:3 http://security.ubuntu.com trusty-security/main Sources [47.9 kB]
Ign http://az1.clouds.archive.ubuntu.com trusty-updates InRelease
Get:4 http://security.ubuntu.com trusty-security/universe Sources [11.2 kB]
Get:5 http://az1.clouds.archive.ubuntu.com trusty Release.gpg [933 B]
Get:6 http://security.ubuntu.com trusty-security/main amd64 Packages [149 kB]
Get:7 http://az1.clouds.archive.ubuntu.com trusty-updates Release.gpg [933 B]
Get:8 http://az1.clouds.archive.ubuntu.com trusty Release [58.5 kB]
Get:9 http://security.ubuntu.com trusty-security/universe amd64 Packages [50.4 kB]
Get:10 http://security.ubuntu.com trusty-security/main Translation-en [72.9 kB]
Get:11 http://security.ubuntu.com trusty-security/universe Translation-en [29.7 kB]
Get:12 http://az1.clouds.archive.ubuntu.com trusty-updates Release [59.7 kB]
Get:13 http://az1.clouds.archive.ubuntu.com trusty/main Sources [1,064 kB]
Get:14 http://az1.clouds.archive.ubuntu.com trusty/universe Sources [6,399 kB]
Get:15 http://az1.clouds.archive.ubuntu.com trusty/main amd64 Packages [1,350 kB]
E: Could not get lock /var/lib/apt/lists/lock - open (11: Resource temporarily unavailable)
E: Unable to lock directory /var/lib/apt/lists/
2014-10-22 11:59:06 ERROR juju.cmd supercommand.go:323 failed to bootstrap environment: subprocess encountered error code 1

EXCEPTION CAUGHT:

Command '('juju', '--show-log', 'bootstrap', '-e', 'test-function-hp-a', '--constraints', 'mem=2G')' returned non-zero exit status 1

FAIL
Build step 'Execute shell' marked build as failure
Archiving artifacts
Description set: gitbranch:master:github.com/juju/juju 53d19173
[BFA] Scanning build for known causes...

[BFA] Done. 0s
Finished: FAILURE

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: none → 1.22
Larry Michel (lmic)
tags: added: oil
Revision history for this message
Ian Booth (wallyworld) wrote :

The version of Juju was 1.21 alpha2. Has this been seen again on a more recent beta?

Revision history for this message
Aaron Bentley (abentley) wrote :

Yes. We saw it in build 2047, which was a build of 1.21beta2

Revision history for this message
Aaron Bentley (abentley) wrote :

Most recently, we saw it in http://juju-ci.vapour.ws:8080/job/hp-deploy-trusty-amd64/2058/console, a build of 1.21.0

Revision history for this message
Larry Michel (lmic) wrote :

We have been seeing this with 1.20.x in oil. The latest recreates are with 1.20.13.

Revision history for this message
Larry Michel (lmic) wrote :

This one is critical for us. We are seeing it with increased frequency.

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.22 → 1.23
Ian Booth (wallyworld)
Changed in juju-core:
milestone: 1.23 → 1.22
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.22-alpha1 → 1.23
no longer affects: juju-core/1.21
Revision history for this message
Ian Booth (wallyworld) wrote :

It seems that there's another process running on the node when Juju is trying to run its bootstrapping scripts, and this other process is locking apt.

Do we have any idea what this other process is? At the moment, it's hard to see a good solution to this problem. Maybe knowing what else is trying to use apt would help.

summary: - race condition running apt in bootstrap
+ lock contention running apt in bootstrap
Revision history for this message
Kapil Thangavelu (hazmat) wrote : Re: [Bug 1384259] [NEW] race condition running apt in bootstrap

My guess it's cloud init

On Wednesday, January 14, 2015, Ian Booth <email address hidden> wrote:

> It seems that there's another process running on the node when Juju is
> trying to run its bootstrapping scripts, and this other process is
> locking apt.
>
> Do we have any idea what this other process is? At the moment, it's hard
> to see a good solution to this problem. Maybe knowing what else is
> trying to use apt would help.
>
> ** Summary changed:
>
> - race condition running apt in bootstrap
> + lock contention running apt in bootstrap
>
> --
> You received this bug notification because you are subscribed to juju-
> core.
> https://bugs.launchpad.net/bugs/1384259
>
> Title:
> lock contention running apt in bootstrap
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1384259/+subscriptions
>

Revision history for this message
Ian Booth (wallyworld) wrote :

Looking at the logs pasted into the bug though, cloud-init seems to have finished before Juju starts running its scripts:

...
Cloud-init v. 0.7.5 running 'modules:config' at Wed, 22 Oct 2014 11:56:48 +0000. Up 30.67 seconds.
Generating locales...
  en_US.UTF-8... up-to-date
Generation complete.
Cloud-init v. 0.7.5 running 'modules:final' at Wed, 22 Oct 2014 11:56:54 +0000. Up 36.57 seconds.
+ install -D -m 644 /dev/null /var/lib/juju/nonce.txt
+ printf %s\n user-admin:bootstrap
Cloud-init v. 0.7.5 finished at Wed, 22 Oct 2014 11:56:55 +0000.
DataSourceOpenStack [net,ver=2]. Up 37.66 seconds
Ign http://security.ubuntu.com trusty-security InRelease
Get:1 http://security.ubuntu.com trusty-security Release.gpg [933 B]
...

Andrew Wilkins (axwalk)
Changed in juju-core:
status: Triaged → In Progress
assignee: nobody → Andrew Wilkins (axwalk)
Andrew Wilkins (axwalk)
Changed in juju-core:
status: In Progress → Fix Committed
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Great to see this is fix committed - what was the root cause? Is there an easy way to find the commit that fixed this?

Revision history for this message
Ian Booth (wallyworld) wrote :

We took a sledge hammer approach - each apt call made by Juju as part of the bootstrap script execution was wrapped in a retry. We did not discover exactly what other process running on the machine was also using apt, but the solution implemented seems to be the common approach used elsewhere.

Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.23 → 1.23-beta1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.