cloud-init sometimes fails on dpkg lock due to concurrent apt-daily-upgrade.service execution

Bug #1711428 reported by Jim Browne
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cloud-init
Fix Released
Undecided
Unassigned
apt (Ubuntu)
Expired
High
Unassigned

Bug Description

This is the same problem as https://bugs.launchpad.net/cloud-init/+bug/1693361, but with a different APT invoking service. In this case it is apt-daily-upgrade.service.

So, I guess add apt-daily-upgrade.service to the Before line in /lib/systemd/system/cloud-final.service along side apt-daily.service.

Or wait for an APT fix. Or retry APT commands when executing "packages:"

Reporting this to save someone else trouble, but I think we'll be rolling back to Trusty at this point. Hopefully the B LTS will have an alternative to systemd.

Revision history for this message
Steve Langasek (vorlon) wrote :

The apt-daily-upgrade service already declares itself to be 'After=apt-daily.service'. So there should be strict sequencing here of cloud-final.service -> apt-daily.service -> apt-daily-upgrade.service. If that's not happening, this warrants analysis to understand why so that it can be fixed in the apt package directly rather than adding more relationships to the cloud-init unit.

Changed in cloud-init:
status: New → Incomplete
Changed in apt (Ubuntu):
status: New → Incomplete
importance: Undecided → High
Revision history for this message
Jim Browne (jbrowne) wrote :

Hmmm. I confirm what you see.

I think this problem is arising because we have Packer run (due to LP#1693361):
    systemctl disable apt-daily.service
    systemctl disable apt-daily.timer

when building the AMI.

I guess if apt-daily.service is disabled the Before is not transitive from cloud-init to apt-daily-upgrade via apt-daily? Not seeing anything definitive in the docs after a quick glance.

Revision history for this message
Julian Andres Klode (juliank) wrote :

Not that if B has After=A does not imply that the two don't run at the same time. If B is already running, A will be started as well. That's the reason why we added a lock file in apt:

if [ "$1" = "lock_is_held" ]; then
    shift
else
    # Maintain a lock on fd 3, so we can't run the script twice at the same
    # time.
    eval $(apt-config shell StateDir Dir::State/d)
    exec 3>${StateDir}/daily_lock
    if ! flock -w 3600 3; then
        echo "E: Could not acquire lock" >&2
        exit 1
    fi

    # We hold the lock. Rerun this script as a child process, which
    # can run without propagating an extra fd to all of its children.
    "$0" lock_is_held "$@" 3>&-
    exit $?
fi

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 1711428] Re: cloud-init sometimes fails on dpkg lock due to concurrent apt-daily-upgrade.service execution

On Thu, Aug 17, 2017 at 07:20:20PM -0000, Jim Browne wrote:
> Hmmm. I confirm what you see.

> I think this problem is arising because we have Packer run (due to LP#1693361):
> systemctl disable apt-daily.service
> systemctl disable apt-daily.timer

> when building the AMI.

> I guess if apt-daily.service is disabled the Before is not transitive
> from cloud-init to apt-daily-upgrade via apt-daily? Not seeing anything
> definitive in the docs after a quick glance.

Correct. Before/After only establishes ordering between enabled services
and is ignored for services that are not enabled.

Fixing this to enforce that apt-daily.service and apt-daily-upgrade.service
are both always enabled/disabled together would be done by adding
'Requires=apt-daily.service' to apt-daily-upgrade.service. But I think that
is too strict, because one may legitimately choose to disable only one of
the two services for some reason.

So I think this is something that you will want to fix in your image
mastering scripts.

Revision history for this message
Jim Browne (jbrowne) wrote :

> So I think this is something that you will want to fix in your image
> mastering scripts.

I agree and am fine with this being marked INVALID.

However, is juliank's note about the implementation of After= a concern w.r.t. how LP#1693361 was resovled?

Revision history for this message
Steve Langasek (vorlon) wrote :

On Thu, Aug 17, 2017 at 07:59:00PM -0000, Jim Browne wrote:
> However, is juliank's note about the implementation of After= a concern
> w.r.t. how LP#1693361 was resovled?

I don't think so. He's correct that the ordering doesn't guarantee that we
don't hit a conflict if the units wrap around and collide in the opposite
direction. But I believe that locking code is part of the fix that landed.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for apt (Ubuntu) because there has been no activity for 60 days.]

Changed in apt (Ubuntu):
status: Incomplete → Expired
Revision history for this message
James Falcon (falcojr) wrote :
Changed in cloud-init:
status: Incomplete → Fix Released
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.