cloud-init sometimes fails on dpkg lock due to concurrent apt-daily.service execution
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
APT |
Fix Released
|
Unknown
|
|||
cloud-init |
Fix Released
|
Medium
|
Unassigned | ||
apt (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
cloud-init (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Xenial |
Fix Released
|
Medium
|
Unassigned | ||
Yakkety |
Won't Fix
|
Medium
|
Unassigned | ||
Zesty |
Fix Released
|
Medium
|
Unassigned | ||
Artful |
Fix Released
|
Medium
|
Unassigned |
Bug Description
=== Begin SRU Template ===
[Impact]
A cloud-config that contains packages to install (see below) or
'package_upgrade' will run 'apt-get update'. That can sometimes fail as a
result of contention with the apt-daily.service that updates that information.
Cloud-config showing the problem is just like:
$ cat my.yaml
#cloud-config
packages: ['hello']
[Test Case]
lxc-proposed-
https:/
It publishes an image to lxd with proposed enabled and cloud-init upgraded.
a.) launch an instance with proposed version of cloud-init and some user-data.
This is platform independent. The test case demonstrates lxd.
$ printf "%s\n%s\n%s\n" "#cloud-config" "packages: ['hello']" \
$ release=xenial
$ ref=proposed-
$ ./lxc-proposed-
b.) start the instance
$ name=$release-
$ lxc launch my-xenial "--config=
$ sleep 1
$ lxc exec $name -- tail -f /var/log/
# watch this boot.
c.) Look for evidence of systemd failure
journalctl -o short-precise | grep -i break
journalctl -o short-precise | grep -i order
[Regression Potential]
Regression chance here is low. Its possible that ordering loops
could occur. When that does happen, journalctl will mention it. Unfortunately
in such cases systemd somewhat randomly picks a service to kil so behavior
is somewhat undefined.
[Other Info]
Upstream commit at
https:/
=== End SRU Template ===
apt-daily is now a systemd service rather than being invoked by cron.daily. If one builds a custom AMI it is possible that the apt-daily.timer will fire during boot. This can fire at the same time cloud-init is running and if cloud-init loses the race the invocation of apt (e.g. use of "packages:" in the config) will fail.
There is a lot of discussion online about this change to apt-daily (e.g. unattended upgrades happening during business hours, delaying boot, etc.) and discussion of potential systemd changes regarding timers firing during boot (c.f. https:/
While it would be better to solve this in apt itself, I suggest that cloud-init be defensive when calling apt and implement some retry mechanism.
Various instances of people running into this issue:
https:/
https:/
https:/
https:/
Related branches
- Joshua Powers (community): Approve
- Server Team CI bot: Needs Fixing (continuous-integration)
- Ryan Harper: Approve
-
Diff: 2029 lines (+1979/-2)7 files modifieddebian/changelog (+12/-2)
debian/patches/cpick-003c6678-net-remove-systemd-link-file-writing-from-eni-renderer (+95/-0)
debian/patches/cpick-11121fe4-systemd-make-cloud-final.service-run-before-apt-daily (+33/-0)
debian/patches/cpick-1cd4323b-azure-remove-accidental-duplicate-line-in-merge (+22/-0)
debian/patches/cpick-5fb49bac-azure-identify-platform-by-well-known-value-in-chassis (+338/-0)
debian/patches/cpick-ebc9ecbc-Azure-Add-network-config-Refactor-net-layer-to-handle (+1474/-0)
debian/patches/series (+5/-0)
- Scott Moser: Approve
- Server Team CI bot: Approve (continuous-integration)
- Steve Langasek (community): Approve
-
Diff: 12 lines (+1/-0)1 file modifiedsystemd/cloud-final.service (+1/-0)
Changed in cloud-init: | |
importance: | Undecided → High |
Changed in cloud-init: | |
status: | New → Confirmed |
importance: | High → Medium |
Changed in cloud-init (Ubuntu Xenial): | |
status: | New → Confirmed |
Changed in cloud-init (Ubuntu Yakkety): | |
status: | New → Confirmed |
Changed in cloud-init (Ubuntu Zesty): | |
status: | New → Confirmed |
Changed in cloud-init (Ubuntu Artful): | |
status: | New → Confirmed |
Changed in cloud-init (Ubuntu Xenial): | |
importance: | Undecided → Medium |
Changed in cloud-init (Ubuntu Yakkety): | |
importance: | Undecided → Medium |
Changed in cloud-init (Ubuntu Zesty): | |
importance: | Undecided → Medium |
Changed in cloud-init (Ubuntu Artful): | |
importance: | Undecided → High |
importance: | High → Medium |
no longer affects: | apt (Ubuntu Zesty) |
no longer affects: | apt (Ubuntu Yakkety) |
no longer affects: | apt (Ubuntu Xenial) |
no longer affects: | apt (Ubuntu Artful) |
Changed in apt: | |
status: | Unknown → New |
Changed in apt: | |
status: | New → Confirmed |
Changed in cloud-init (Ubuntu Artful): | |
status: | Confirmed → Fix Committed |
description: | updated |
description: | updated |
Changed in cloud-init (Ubuntu Yakkety): | |
status: | Fix Committed → Won't Fix |
Changed in apt: | |
status: | Confirmed → Fix Released |
On Wed, May 24, 2017 at 09:10:37PM -0000, Jim Browne wrote:
> While it would be better to solve this in apt itself, I suggest that
> cloud-init be defensive when calling apt and implement some retry
> mechanism.
I would suggest instead that cloud-init should declare itself apt-daily. service / apt-daily.timer, so that cloud-init takes
Before=
precedence over apt-daily on first boot.