apt fails during bootstrap on aws

Bug #1259180 reported by Curtis Hovey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
Andrew Wilkins

Bug Description

We see a rise in bootstrap failures when deploying to aws. This issue was first noticed with synchronous boot. We have seen this running locally as well in CI. We see Git being installed, then the instance starts shutting down.

http://162.213.35.54:8080/job/aws-upgrade-deploy/127/console

Related branches

Revision history for this message
Curtis Hovey (sinzui) wrote :

This is the oldest log we could find of a manual test that verifies the problem, but is not very rich in information.

Changed in juju-core:
milestone: none → 1.17.0
Revision history for this message
Andrew Wilkins (axwalk) wrote :

I've reproduced on Azure, looking into it now.

Changed in juju-core:
status: Triaged → In Progress
assignee: nobody → Andrew Wilkins (axwalk)
Revision history for this message
Andrew Wilkins (axwalk) wrote :

One thing is clear at least: I think we should cat or tail the cloud-init-output.log on failure, if any of the commands in the SSH session fail.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Took a while, but I reproduced again on Azure with Juju modified to not destroy the instance. /var/log/cloud-init-output.log has this at the end: http://paste.ubuntu.com/6549372/

Of particular note are the first and last lines. I can't confirm whether it's the same thing on ec2, as I can't reproduce there. Will dig into this error.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

An manual apt-get update fixed it. Need to understand why this happened in the first place.

Revision history for this message
Dave Cheney (dave-cheney) wrote : Re: [Bug 1259180] Re: apt fails during bootstrap on aws

https://code.launchpad.net/~dave-cheney/juju-core/165-log-handleboostrap-error/+merge/197984

Will land in a few minutes, hopefully that will explain what happened.

On Tue, Dec 10, 2013 at 3:54 PM, Andrew Wilkins <
<email address hidden>> wrote:

> Took a while, but I reproduced again on Azure with Juju modified to not
> destroy the instance. /var/log/cloud-init-output.log has this at the
> end: http://paste.ubuntu.com/6549372/
>
> Of particular note are the first and last lines. I can't confirm whether
> it's the same thing on ec2, as I can't reproduce there. Will dig into
> this error.
>
> --
> You received this bug notification because you are subscribed to juju-
> core.
> Matching subscriptions: MOAR JUJU SPAM!
> https://bugs.launchpad.net/bugs/1259180
>
> Title:
> apt fails during bootstrap on aws
>
> Status in juju-core:
> In Progress
>
> Bug description:
> We see a rise in bootstrap failures when deploying to aws. This issue
> was first noticed with synchronous boot. We have seen this running
> locally as well in CI. We see Git being installed, then the instance
> starts shutting down.
>
> http://162.213.35.54:8080/job/aws-upgrade-deploy/127/console
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1259180/+subscriptions
>

Revision history for this message
Andrew Wilkins (axwalk) wrote :

So, I have a hunch that it's because cloud-init hasn't finished, or is in the process of setting up the apt repository mirror as we SSH in and run apt-get update. I think we'll need an initial step in the synchronous bootstrap phase to wait for cloud-init to finish its business before proceeding.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Just confirming my hunch: the log I pasted before confirms that, since otherwise it'd say "azure.archive...."

Revision history for this message
Dave Cheney (dave-cheney) wrote :

Ahh, so they are deadlocking on the apt lock?

On Tue, Dec 10, 2013 at 4:46 PM, Andrew Wilkins <
<email address hidden>> wrote:

> So, I have a hunch that it's because cloud-init hasn't finished, or is
> in the process of setting up the apt repository mirror as we SSH in and
> run apt-get update. I think we'll need an initial step in the
> synchronous bootstrap phase to wait for cloud-init to finish its
> business before proceeding.
>
> --
> You received this bug notification because you are subscribed to juju-
> core.
> Matching subscriptions: MOAR JUJU SPAM!
> https://bugs.launchpad.net/bugs/1259180
>
> Title:
> apt fails during bootstrap on aws
>
> Status in juju-core:
> In Progress
>
> Bug description:
> We see a rise in bootstrap failures when deploying to aws. This issue
> was first noticed with synchronous boot. We have seen this running
> locally as well in CI. We see Git being installed, then the instance
> starts shutting down.
>
> http://162.213.35.54:8080/job/aws-upgrade-deploy/127/console
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1259180/+subscriptions
>

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2013-12-10 9:50, Andrew Wilkins wrote:
> Just confirming my hunch: the log I pasted before confirms that,
> since otherwise it'd say "azure.archive...."
>

Why is synchronous bootstrap doing apt-get update given that is done
by cloud-init? Shouldn't we just let cloud-init do its thing (setting
everything up) and then connect to the service when its done?

Did synchronous bootstrap stop having cloud-init set up everything and
we do it ourselves?

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (Cygwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlKmuf0ACgkQJdeBCYSNAAPfYwCglzkn3BGkJtwlBOS+LTsMe1PP
PLAAmwWttJ/2XYsfsUPEWDJBVuwdb1Nq
=OAPu
-----END PGP SIGNATURE-----

Revision history for this message
Andrew Wilkins (axwalk) wrote :

> Ahh, so they are deadlocking on the apt lock?

No, the lock isn't part of it. cloud-init isn't interacting with apt, just modifying the source lists on disk. We happen to come in and use apt before it's done that.

> Did synchronous bootstrap stop having cloud-init set up everything and we do it ourselves?

Yes, we now run apt-get update/upgrade in the synchronous phase. Part of the reason for synchronous bootstrap was so that we could see which parts fail. cloud-init just adds authorized_keys, and the default actions (like setting the mirrors).

I have a fix I'm testing now.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

I think we should get cloud-init to set up the apt sources at least. I've made a change to have the synchronous phase wait for cloud-init to complete, but I've noticed now that we're not using the cloud mirror for the cloud archive. The only wrinkle is manual, where we do want it to add the sources in the "synchronous phase" (the only phase).

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Actually... that looks like it's existing behaviour. I'll need to revert tools to confirm. If that's the case, I'll open a new bug for that.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Okay, no change needed; that's existing behaviour. Kinda crappy though: mongodb-server takes ages to install, and could be sped up by having a local mirror.

Andrew Wilkins (axwalk)
Changed in juju-core:
status: In Progress → Fix Committed
Revision history for this message
Curtis Hovey (sinzui) wrote :

I downgraded this bug to High since the issue was only in trunk and affected testing. This issue might be considered fix released.

Changed in juju-core:
importance: Critical → High
Curtis Hovey (sinzui)
tags: added: ci
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.