juju bootstrap hangs in slow environments

Bug #1316185 reported by Axel-f
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
Andrew Wilkins
1.20
Fix Released
High
Andrew Wilkins

Bug Description

When juju bootstrap on Azure cloud using:
 juju bootstrap -e azure
the process is stuck forever after apt-get upgrade, usually the ssh timeout after a few hours and the bootstrap is reverted. What I can do is put it into background and use it anyway: juju status displays correctly and everything seems fine (the bootstrap logs say nothing special, no error). I have asked "jose" on IRC and he can reproduce the bug. The output:

$ juju bootstrap -e azure
Launching instance
 - juju-azure-test-dm03gvgu6s
Waiting for address
Attempting to connect to 10.0.0.4:22
Attempting to connect to juju-azure-test-dm03gvgu6s.cloudapp.net:22
Logging to /var/log/cloud-init-output.log on remote host
Running apt-get update
Running apt-get upgrade
^Z

Versions:
juju 1.18.2-trusty-amd64
Ubuntu 14.04 3.13.0-24-generic
Description: Ubuntu 14.04 LTS
Release: 14.04
Codename: trusty

affects: juju → juju-core
Revision history for this message
Curtis Hovey (sinzui) wrote :

I cannot reproduce this. Can you try again with
    juju --debug bootstrap -e azure
to gather more information.

Which region are you using?

Changed in juju-core:
status: New → Triaged
status: Triaged → Incomplete
Revision history for this message
Axel-f (axel-f) wrote :

Here you can find the output (I removed certain parts like certificates and passwords):

juju --debug bootstrap -e azure-axel 2>&1 | tee bootstrap-azure-debug: http://sprunge.us/bTSX
/var/log/cloud-init-output.log: http://sprunge.us/QgPD
/var/log/cloud-init.log: http://sprunge.us/RQVZ

Configuration env:
  azure-axel:
        type: azure
        location: North Europe
        management-subscription-id: 3b60470f-<censored>
        management-certificate-path: /home/codecwatch/aangel_azure.pem
        storage-account-name: juju0axel

Revision history for this message
Curtis Hovey (sinzui) wrote :

Can you try bootstrapping again a different location (and a new storage account in the same location). Something like this.
    location: West US
    storage-account-name: juju0axel0us0west

Juju cannot bootstrap when the storage-account is in a different region than the location. I think West US is the only region that reliably works.

Revision history for this message
Curtis Hovey (sinzui) wrote :

Attached the is the bootstrap debug log.

Revision history for this message
Curtis Hovey (sinzui) wrote :

Attached is the cloud-init-output log.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

bac and I have both reproduced this on separate accounts.

My environments.yaml had "image-stream: daily" before; I took it out and this happened. I don't know if that's a coincidence yet, I'm going to bootstrap again with it back in to confirm.

Changed in juju-core:
status: Incomplete → Triaged
importance: Undecided → Critical
milestone: none → 1.20.0
Revision history for this message
Andrew Wilkins (axwalk) wrote :

Putting "image-stream: daily" back in fixed it for me.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

So I guess this is not critical since there's a workaround and there's no regression. I'll lower to High, but I think we should get onto this sooner rather than later.

Changed in juju-core:
importance: Critical → High
Andrew Wilkins (axwalk)
Changed in juju-core:
status: Triaged → In Progress
assignee: nobody → Andrew Wilkins (axwalk)
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.20.0 → 1.21-alpha1
Revision history for this message
Andrew Wilkins (axwalk) wrote :

I've changed the title to reflect the actual problem, which has nothing specifically to do with Azure itself. The problem is that the "apt-get upgrade" command is taking a long time (6m+), and the ssh server times out the session due to inactivity. The client doesn't notice that the session has been lost.

I've got a fix in the works that sets the ServerAliveInterval option. I'll also investigate ways to speed up "apt-get upgrade", and possibly make the upgrade optional.

summary: - juju bootstrap hangs on Azure
+ juju bootstrap hangs in slow environments
Revision history for this message
Andrew Wilkins (axwalk) wrote :

I've also logged lp:1335822, which describes an option for improving Azure machine provisioning times.

Curtis Hovey (sinzui)
summary: - juju bootstrap hangs in slow environments
+ juju bootstrap hangs with daily image-stream
Andrew Wilkins (axwalk)
summary: - juju bootstrap hangs with daily image-stream
+ juju bootstrap hangs with "released" image-stream in slow environments
Curtis Hovey (sinzui)
summary: - juju bootstrap hangs with "released" image-stream in slow environments
+ juju bootstrap hangs in slow environment
summary: - juju bootstrap hangs in slow environment
+ juju bootstrap hangs in slow environments
Ian Booth (wallyworld)
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.