Cannot bring up hosted model machines in azure

Bug #1612836 reported by Curtis Hovey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
Andrew Wilkins

Bug Description

As seen in this example
    http://reports.vapour.ws/releases/4249/job/azure-arm-deploy/attempt/810#highlight

CI can bootstrap in azure, but none of the machines in the hosted model come up. We can see that the rate limit issue is gone and we have cleaned up all the resources in our subscription. Ever azure test can bootstrap and tear down. None can deploy services.

Changed in juju-core:
assignee: nobody → Andrew Wilkins (axwalk)
Andrew Wilkins (axwalk)
Changed in juju-core:
status: Triaged → In Progress
Revision history for this message
Andrew Wilkins (axwalk) wrote :

I don't know what the deal is yet, but in the logs for the workload machines I'm seeing auth failures. Looking at the logs on the controller, the workload machine appears to be trying to authenticate with another machine's nonce.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Nope, that's not it. The nonces are formatted in a misleading way, but the correct one is being used.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

OK, I think I understand what's going on.

Due to the recent Azure SDK change, when you create a VM the SDK now waits for the "provisioningState" to go to "Succeeded". This is causing us to run into a latent race condition in the provisioner. The provisioner does not record the machine's nonce until it thinks it has been provisioned; so if the machine attempts an API login before the provisioner thinks the machine is ready, the machine gets an auth error back, and uninstalls itself.

It may be that we can hack the Azure code to stop waiting for the provisioningState in this case, but I think we would be better off fixing the race condition.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Fixing the race is probably going to be a fairly large amount of work, so I'm working on a bandaid solution while I come up with a proposal.

Andrew Wilkins (axwalk)
Changed in juju-core:
status: In Progress → Fix Committed
affects: juju-core → juju
Changed in juju:
milestone: 2.0-beta16 → none
milestone: none → 2.0-beta16
Curtis Hovey (sinzui)
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.