cloud-init sometimes fails on dpkg lock due to concurrent apt-daily.service execution
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| APT |
Fix Released
|
Unknown
|
|||
| cloud-init |
Fix Released
|
Medium
|
Unassigned | ||
| apt (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
| cloud-init (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
| Xenial |
Fix Released
|
Medium
|
Unassigned | ||
| Yakkety |
Won't Fix
|
Medium
|
Unassigned | ||
| Zesty |
Fix Released
|
Medium
|
Unassigned | ||
| Artful |
Fix Released
|
Medium
|
Unassigned | ||
Bug Description
=== Begin SRU Template ===
[Impact]
A cloud-config that contains packages to install (see below) or
'package_upgrade' will run 'apt-get update'. That can sometimes fail as a
result of contention with the apt-daily.service that updates that information.
Cloud-config showing the problem is just like:
$ cat my.yaml
#cloud-config
packages: ['hello']
[Test Case]
lxc-proposed-
https:/
It publishes an image to lxd with proposed enabled and cloud-init upgraded.
a.) launch an instance with proposed version of cloud-init and some user-data.
This is platform independent. The test case demonstrates lxd.
$ printf "%s\n%s\n%s\n" "#cloud-config" "packages: ['hello']" \
$ release=xenial
$ ref=proposed-
$ ./lxc-proposed-
b.) start the instance
$ name=$release-
$ lxc launch my-xenial "--config=
$ sleep 1
$ lxc exec $name -- tail -f /var/log/
# watch this boot.
c.) Look for evidence of systemd failure
journalctl -o short-precise | grep -i break
journalctl -o short-precise | grep -i order
[Regression Potential]
Regression chance here is low. Its possible that ordering loops
could occur. When that does happen, journalctl will mention it. Unfortunately
in such cases systemd somewhat randomly picks a service to kil so behavior
is somewhat undefined.
[Other Info]
Upstream commit at
https:/
=== End SRU Template ===
apt-daily is now a systemd service rather than being invoked by cron.daily. If one builds a custom AMI it is possible that the apt-daily.timer will fire during boot. This can fire at the same time cloud-init is running and if cloud-init loses the race the invocation of apt (e.g. use of "packages:" in the config) will fail.
There is a lot of discussion online about this change to apt-daily (e.g. unattended upgrades happening during business hours, delaying boot, etc.) and discussion of potential systemd changes regarding timers firing during boot (c.f. https:/
While it would be better to solve this in apt itself, I suggest that cloud-init be defensive when calling apt and implement some retry mechanism.
Various instances of people running into this issue:
https:/
https:/
https:/
https:/
Related branches
- Joshua Powers (community): Approve
- Server Team CI bot: Needs Fixing (continuous-integration)
- Ryan Harper: Approve
-
Diff: 2029 lines (+1979/-2)7 files modifieddebian/changelog (+12/-2)
debian/patches/cpick-003c6678-net-remove-systemd-link-file-writing-from-eni-renderer (+95/-0)
debian/patches/cpick-11121fe4-systemd-make-cloud-final.service-run-before-apt-daily (+33/-0)
debian/patches/cpick-1cd4323b-azure-remove-accidental-duplicate-line-in-merge (+22/-0)
debian/patches/cpick-5fb49bac-azure-identify-platform-by-well-known-value-in-chassis (+338/-0)
debian/patches/cpick-ebc9ecbc-Azure-Add-network-config-Refactor-net-layer-to-handle (+1474/-0)
debian/patches/series (+5/-0)
- Scott Moser: Approve
- Server Team CI bot: Approve (continuous-integration)
- Steve Langasek (community): Approve
-
Diff: 12 lines (+1/-0)1 file modifiedsystemd/cloud-final.service (+1/-0)
| Changed in cloud-init: | |
| importance: | Undecided → High |
| Changed in cloud-init: | |
| status: | New → Confirmed |
| importance: | High → Medium |
| Changed in cloud-init (Ubuntu Xenial): | |
| status: | New → Confirmed |
| Changed in cloud-init (Ubuntu Yakkety): | |
| status: | New → Confirmed |
| Changed in cloud-init (Ubuntu Zesty): | |
| status: | New → Confirmed |
| Changed in cloud-init (Ubuntu Artful): | |
| status: | New → Confirmed |
| Changed in cloud-init (Ubuntu Xenial): | |
| importance: | Undecided → Medium |
| Changed in cloud-init (Ubuntu Yakkety): | |
| importance: | Undecided → Medium |
| Changed in cloud-init (Ubuntu Zesty): | |
| importance: | Undecided → Medium |
| Changed in cloud-init (Ubuntu Artful): | |
| importance: | Undecided → High |
| importance: | High → Medium |
| no longer affects: | apt (Ubuntu Zesty) |
| no longer affects: | apt (Ubuntu Yakkety) |
| no longer affects: | apt (Ubuntu Xenial) |
| no longer affects: | apt (Ubuntu Artful) |
| Changed in apt: | |
| status: | Unknown → New |
| Changed in apt: | |
| status: | New → Confirmed |
| Changed in cloud-init (Ubuntu Artful): | |
| status: | Confirmed → Fix Committed |
| description: | updated |
| description: | updated |
| Changed in cloud-init (Ubuntu Yakkety): | |
| status: | Fix Committed → Won't Fix |
| Changed in apt: | |
| status: | Confirmed → Fix Released |

On Wed, May 24, 2017 at 09:10:37PM -0000, Jim Browne wrote:
> While it would be better to solve this in apt itself, I suggest that
> cloud-init be defensive when calling apt and implement some retry
> mechanism.
I would suggest instead that cloud-init should declare itself apt-daily. service / apt-daily.timer, so that cloud-init takes
Before=
precedence over apt-daily on first boot.