cloud-init sometimes fails on dpkg lock due to concurrent apt-daily.service execution

Bug #1693361 reported by Jim Browne on 2017-05-24
38
This bug affects 4 people
Affects Status Importance Assigned to Milestone
APT
Confirmed
Unknown
cloud-init
Medium
Unassigned
apt (Ubuntu)
Undecided
Unassigned
cloud-init (Ubuntu)
Medium
Unassigned
Xenial
Medium
Unassigned
Yakkety
Medium
Unassigned
Zesty
Medium
Unassigned
Artful
Medium
Unassigned

Bug Description

=== Begin SRU Template ===
[Impact]
A cloud-config that contains packages to install (see below) or
'package_upgrade' will run 'apt-get update'. That can sometimes fail as a
result of contention with the apt-daily.service that updates that information.

Cloud-config showing the problem is just like:

  $ cat my.yaml
  #cloud-config
  packages: ['hello']

[Test Case]
lxc-proposed-snapshot is
  https://git.launchpad.net/~smoser/cloud-init/+git/sru-info/tree/bin/lxc-proposed-snapshot
It publishes an image to lxd with proposed enabled and cloud-init upgraded.

a.) launch an instance with proposed version of cloud-init and some user-data.
   This is platform independent. The test case demonstrates lxd.
   $ printf "%s\n%s\n%s\n" "#cloud-config" "packages: ['hello']" \
       "package_upgrade: true" > config.yaml
   $ release=xenial
   $ ref=proposed-$release
   $ ./lxc-proposed-snapshot --proposed --publish $release $ref;

b.) start the instance
   $ name=$release-1693361
   $ lxc launch my-xenial "--config=user.user-data=$(cat config.yaml)
   $ sleep 1
   $ lxc exec $name -- tail -f /var/log/cloud-init.log /var/log/cloud-init-output.log
   # watch this boot.

 c.) Look for evidence of systemd failure
   journalctl -o short-precise | grep -i break
   journalctl -o short-precise | grep -i order

[Regression Potential]
Regression chance here is low. Its possible that ordering loops
could occur. When that does happen, journalctl will mention it. Unfortunately
in such cases systemd somewhat randomly picks a service to kil so behavior
is somewhat undefined.

[Other Info]
Upstream commit at
  https://git.launchpad.net/cloud-init/commit/?id=11121fe4

=== End SRU Template ===

apt-daily is now a systemd service rather than being invoked by cron.daily. If one builds a custom AMI it is possible that the apt-daily.timer will fire during boot. This can fire at the same time cloud-init is running and if cloud-init loses the race the invocation of apt (e.g. use of "packages:" in the config) will fail.

There is a lot of discussion online about this change to apt-daily (e.g. unattended upgrades happening during business hours, delaying boot, etc.) and discussion of potential systemd changes regarding timers firing during boot (c.f. https://github.com/systemd/systemd/issues/5659).

While it would be better to solve this in apt itself, I suggest that cloud-init be defensive when calling apt and implement some retry mechanism.

Various instances of people running into this issue:

https://github.com/chef/bento/issues/609
https://clusterhq.atlassian.net/browse/FLOC-4486
https://github.com/boxcutter/ubuntu/issues/73
https://unix.stackexchange.com/questions/315502/how-to-disable-apt-daily-service-on-ubuntu-cloud-vm-image

Related branches

On Wed, May 24, 2017 at 09:10:37PM -0000, Jim Browne wrote:
> While it would be better to solve this in apt itself, I suggest that
> cloud-init be defensive when calling apt and implement some retry
> mechanism.

I would suggest instead that cloud-init should declare itself
Before=apt-daily.service / apt-daily.timer, so that cloud-init takes
precedence over apt-daily on first boot.

Jim Browne (jbrowne) wrote :

My concern is another apt dependent task being added somewhere else in systemd that winds up triggering during boot. IMO it's better to be generically defensive about the use of apt, but others certainly have more context and information than I do.

Scott Moser (smoser) wrote :

I suspect that Steve's suggestion should fix this mostly for cloud-init.
Apt does of course have a general locking problem that really does need addressing.

We've all seen workarounds/retries at all sorts of levels to address the problems that
a.) you basically have to run 'apt-get update' before you run 'apt-get install' (bug 1429285), which results in the over-usage of that fairly heavy resource.

b.) if another process is running 'apt-get install' or 'apt-get remove' when you attempt, you will fail with the lock contention.

These things should be solved in apt, not worked around in yet another process that uses it.

Chris White (cwprogram) wrote :

Some research on this indicates:

* `/etc/init.d/rc` is set to run services in parallel via `startpar`
* However there is a portion with concurrency disabled in the same file
* Assuming `cloud-init` was added as part of the non-concurrent part of the file would this prevent the issue?
* `aptdcon` along with `aptd` appears to allow various `apt-get` operations in a queue like system. Unfortunately I can't tell what happens when a standard `apt-get` package install happens while `aptd` is doing its thing. Not only that but it would increase dependencies on cloud images.

Changed in cloud-init:
importance: Undecided → High
Scott Moser (smoser) on 2017-06-12
Changed in cloud-init:
status: New → Confirmed
importance: High → Medium
Changed in cloud-init (Ubuntu Xenial):
status: New → Confirmed
Changed in cloud-init (Ubuntu Yakkety):
status: New → Confirmed
Changed in cloud-init (Ubuntu Zesty):
status: New → Confirmed
Changed in cloud-init (Ubuntu Artful):
status: New → Confirmed
Changed in cloud-init (Ubuntu Xenial):
importance: Undecided → Medium
Changed in cloud-init (Ubuntu Yakkety):
importance: Undecided → Medium
Changed in cloud-init (Ubuntu Zesty):
importance: Undecided → Medium
Changed in cloud-init (Ubuntu Artful):
importance: Undecided → High
importance: High → Medium
no longer affects: apt (Ubuntu Zesty)
no longer affects: apt (Ubuntu Yakkety)
no longer affects: apt (Ubuntu Xenial)
no longer affects: apt (Ubuntu Artful)
Julian Andres Klode (juliank) wrote :

We eventually want wait locking in apt, but I don't think it really solves all issues, especially in scripts with multiple apt invocations. Which is why apt-daily got an additional flock lock for the upcoming SRUs. (see artful).

Feel free.to wait on the same.lock and probably add some ordering against apt-daily and apt-daily-upgrade services.

Changed in apt:
status: Unknown → New
Changed in apt:
status: New → Confirmed
Changed in cloud-init (Ubuntu Artful):
status: Confirmed → Fix Committed
Launchpad Janitor (janitor) wrote :
Download full text (3.9 KiB)

This bug was fixed in the package cloud-init - 0.7.9-197-gebc9ecbc-0ubuntu1

---------------
cloud-init (0.7.9-197-gebc9ecbc-0ubuntu1) artful; urgency=medium

  * debian/control: add build dependency python3-jsonschema (LP: #1695318)
  * New upstream snapshot.
    - Azure: Add network-config, Refactor net layer to handle duplicate macs.
      [Ryan Harper]
    - Tests: Simplify the check on ssh-import-id [Joshua Powers]
    - tests: update ntp tests after sntp added [Joshua Powers]
    - FreeBSD: Make freebsd a variant, fix unittests and
      tools/build-on-freebsd.
    - FreeBSD: fix test failure
    - FreeBSD: replace ifdown/ifup with "ifconfig down" and "ifconfig up".
      [Hongjiang Zhang] (LP: #1697815)
    - FreeBSD: fix cdrom mounting failure if /mnt/cdrom/secure did not exist.
      [Hongjiang Zhang] (LP: #1696295)
    - main: Don't use templater to format the welcome message
      [Andrew Jorgensen]
    - docs: Automatically generate module docs form schema if present.
      [Chad Smith]
    - debian: fix path comment in /etc/hosts template.
      [Jens Sandmann] (LP: #1606406)
    - suse: add hostname and fully qualified domain to template.
      [Jens Sandmann]
    - write_file(s): Print permissions as octal, not decimal [Andrew Jorgensen]
    - ci deps: Add --test-distro to read-dependencies to install all deps
      [Chad Smith]
    - tools/run-centos: cleanups and move to using read-dependencies
    - pkg build ci: Add make ci-deps-<distro> target to install pkgs
      [Chad Smith]
    - systemd: make cloud-final.service run before apt daily services.
      (LP: #1693361)
    - selinux: Allow restorecon to be non-fatal. [Ryan Harper] (LP: #1686751)
    - net: Allow netinfo subprocesses to return 0 or 1.
      [Ryan Harper] (LP: #1686751)
    - net: Allow for NetworkManager configuration [Ryan McCabe] (LP: #1693251)
    - Use distro release version to determine if we use systemd in redhat spec
      [Ryan Harper]
    - net: normalize data in network_state object
    - Integration Testing: tox env, pyxld 2.2.3, and revamp framework
      [Wesley Wiedenmeier]
    - Chef: Update omnibus url to chef.io, minor doc changes. [JJ Asghar]
    - tools: add centos scripts to build and test [Joshua Powers]
    - Drop cheetah python module as it is not needed by trunk [Ryan Harper]
    - rhel/centos spec cleanups.
    - cloud.cfg: move to a template. setup.py changes along the way.
    - Makefile: add deb-src and srpm targets. use PYVER more places.
    - makefile: fix python 2/3 detection in the Makefile [Chad Smith]
    - snap: Removing snapcraft plug line [Joshua Powers] (LP: #1695333)
    - RHEL/CentOS: Fix default routes for IPv4/IPv6 configuration.
      [Andreas Karis] (LP: #1696176)
    - test: Fix pyflakes complaint of unused import.
      [Joshua Powers] (LP: #1695918)
    - NoCloud: support seed of nocloud from smbios information
      [Vladimir Pouzanov] (LP: #1691772)
    - net: when selecting a network device, use natural sort order
      [Marc-Aurèle Brothier]
    - fix typos and remove whitespace in various docs [Stephan Telling]
    - systemd: Fix typo in comment in cloud-init.target. [Chen-Han Hsiao]
    - Tests: Skip jso...

Read more...

Changed in cloud-init (Ubuntu Artful):
status: Fix Committed → Fix Released
Scott Moser (smoser) on 2017-06-28
description: updated
Steve Langasek (vorlon) on 2017-06-29
description: updated

Hello Jim, or anyone else affected,

Accepted cloud-init into zesty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-153-g16a7302f-0ubuntu1~17.04.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-zesty to verification-done-zesty.If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-zesty. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Zesty):
status: Confirmed → Fix Committed
tags: added: verification-needed verification-needed-zesty
Steve Langasek (vorlon) wrote :

Hello Jim, or anyone else affected,

Accepted cloud-init into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-153-g16a7302f-0ubuntu1~16.10.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-yakkety to verification-done-yakkety.If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-yakkety. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Yakkety):
status: Confirmed → Fix Committed
tags: added: verification-needed-yakkety
Changed in cloud-init (Ubuntu Xenial):
status: Confirmed → Fix Committed
tags: added: verification-needed-xenial
Steve Langasek (vorlon) wrote :

Hello Jim, or anyone else affected,

Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-153-g16a7302f-0ubuntu1~16.04.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial.If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Scott Moser (smoser) wrote :

$ for rel in xenial yakkety zesty; do lxc-proposed-snapshot --proposed $rel proposed-$rel --publish || break; done

$ for rel in xenial yakkety zesty; do lxc launch proposed-$rel "--config=user.user-data=$(cat config.yaml)" test-$rel || break; done

$ sleep 2m

$ for rel in xenial yakkety zesty; do mkdir $rel && ( cd $rel && lxc exec test-$rel -- journalctl -o short-precise > journal.log && lxc exec test-$rel -- dpkg-query --show cloud-init > cloud-init-dpkg.txt && lxc file pull test-$rel/var/log/cloud-init.log cloud-init.log && lxc file pull test-$rel/var/log/cloud-init-output.log cloud-init-output.log ) || break; done

$ for rel in xenial yakkety zesty; do tar -czf /tmp/1693361-$rel.tar.gz $rel; done

Scott Moser (smoser) wrote :
Scott Moser (smoser) wrote :
tags: added: verification-done-xenial verification-done-yakkety verification-done-zesty
removed: verification-needed verification-needed-xenial verification-needed-yakkety verification-needed-zesty
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.7.9-153-g16a7302f-0ubuntu1~16.04.2

---------------
cloud-init (0.7.9-153-g16a7302f-0ubuntu1~16.04.2) xenial-proposed; urgency=medium

  * debian/patches/ds-identify-behavior-xenial.patch: refresh patch.
  * cherry-pick 5fb49bac: azure: identify platform by well known value
    in chassis asset (LP: #1693939)
  * cherry-pick 003c6678: net: remove systemd link file writing from eni
    renderer
  * cherry-pick 1cd4323b: azure: remove accidental duplicate line in
    merge.
  * cherry-pick ebc9ecbc: Azure: Add network-config, Refactor net layer
    to handle duplicate macs. (LP: #1690430)
  * cherry-pick 11121fe4: systemd: make cloud-final.service run before
    apt daily (LP: #1693361)

 -- Scott Moser <email address hidden> Wed, 28 Jun 2017 17:17:18 -0400

Changed in cloud-init (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for cloud-init has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.7.9-153-g16a7302f-0ubuntu1~17.04.2

---------------
cloud-init (0.7.9-153-g16a7302f-0ubuntu1~17.04.2) zesty-proposed; urgency=medium

  * cherry-pick 5fb49bac: azure: identify platform by well known value
    in chassis asset (LP: #1693939)
  * cherry-pick 003c6678: net: remove systemd link file writing from eni
    renderer
  * cherry-pick 1cd4323b: azure: remove accidental duplicate line in
    merge.
  * cherry-pick ebc9ecbc: Azure: Add network-config, Refactor net layer
    to handle duplicate macs. (LP: #1690430)
  * cherry-pick 11121fe4: systemd: make cloud-final.service run before
    apt daily (LP: #1693361)

 -- Scott Moser <email address hidden> Wed, 28 Jun 2017 17:20:51 -0400

Changed in cloud-init (Ubuntu Zesty):
status: Fix Committed → Fix Released
Steve Langasek (vorlon) on 2017-07-26
Changed in cloud-init (Ubuntu Yakkety):
status: Fix Committed → Won't Fix

This bug is believed to be fixed in cloud-init in 17.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Confirmed → Fix Released
Julian Andres Klode (juliank) wrote :

Nothing actionable here for apt, so I'll close this. We should consider making frontend locking more flexible for scripts using apt, though, so scripts can hold the lock all the time and drive apt.

Changed in apt (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.