juju gives up on bootstrapping with 'bootstrap instance started but did not change to Deployed state'

Bug #1415961 reported by Jason Hobbs
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
James Tunnicliffe
1.22
Fix Released
High
Ian Booth

Bug Description

This is with 1.22-beta1-0ubuntu1~12.04.1~juju1 and looks like a regression.

I try to bootstrap my MAAS environment but it gives up after a few minutes. My timeout is set to 1800 seconds.

jenkins@juju-oil-machine-15:~$ time juju bootstrap -e ci-oil-slave8
Bootstrapping environment "ci-oil-slave8"
Starting new instance for initial state server
Launching instance
WARNING no architecture was specified, acquiring an arbitrary node
WARNING no architecture was specified, acquiring an arbitrary node
 - /MAAS/api/1.0/nodes/node-6491e170-ae10-11e3-9074-00163efc5068/
ERROR failed to bootstrap environment: bootstrap instance started but did not change to Deployed state: instance "/MAAS/api/1.0/nodes/node-6491e170-ae10-11e3-9074-00163efc5068/" is started but not deployed

real 4m22.398s
user 0m1.196s
sys 0m0.088s

The node continues installing via MAAS; it takes about 7 minutes from when the bootstrap started to the installation completing.

I have also seen bootstraps finish successfully; I'm not sure why it sometimes works and sometimes fails.

description: updated
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Also, I can't destroy the environment after this happens:

jenkins@juju-oil-machine-15:~$ juju destroy-environment --force maas
WARNING! this command will destroy the "maas" environment (type: maas)
This includes all machines, services, data and other resources.

Continue [y/N]? y
ERROR cannot release nodes: gomaasapi: got error back from server: 500 INTERNAL SERVER ERROR ('NoneType' object has no attribute 'has_perm')

Curtis Hovey (sinzui)
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 1.23
tags: added: bootstrap maas-provider
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Another failure after 4m22s - weird.

jenkins@juju-oil-machine-15:~$ time juju bootstrap -e ci-oil-slave8
Bootstrapping environment "ci-oil-slave8"
Starting new instance for initial state server
Launching instance
WARNING no architecture was specified, acquiring an arbitrary node
WARNING no architecture was specified, acquiring an arbitrary node
 - /MAAS/api/1.0/nodes/node-94afcdc8-aea0-11e3-9074-00163efc5068/
ERROR failed to bootstrap environment: bootstrap instance started but did not change to Deployed state: instance "/MAAS/api/1.0/nodes/node-94afcdc8-aea0-11e3-9074-00163efc5068/" is started but not deployed

real 4m22.135s
user 0m1.576s
sys 0m0.112s

Revision history for this message
Diogo Matsubara (matsubara) wrote :

FWIW, I got the same error with juju 1.22-beta1-trusty-amd64 but couldn't reproduce with 1.20.13-trusty-amd64

Revision history for this message
Horacio Durán (hduran-8) wrote :

This error was introduced 2015-01-14 apparently by https://github.com/juju/juju/commit/899f14b4 which introduces the use of env.waitForNodeDeployment(result.Instance.Id()) this will wait, if I got it right, at the most 50 checks of up to 5 seconds which ammounts to a little less than 5 mins.
So if the actual deployment takes up to 7 mins this is not going to be succesful
It would seem that with maas API older than 1.7 it will succeed as per the description:
"""
When starting bootstrap node, wait for node to be reported as Deployed.
The API used is new in MAAS 1.7. When used with older versions of MAAS, a 400 is returned. In these cases, the error is ignored and the node is assumed to be deployed.
"""

I will see how cant it be fixed.

Revision history for this message
Ian Booth (wallyworld) wrote :

The timeout value used when waiting for the node to transition to deployed is now read from the "bootstrap-timeout" value in the environment configuration. The default for MAAS is 1800s.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

We need to port Ian's fix from https://github.com/juju/juju/pull/1503 to 1.23 (trunk) as well - I'm having the same issue when trying to deploy any maas node without using the fast installer. I'll ask James to put it on his list.

Changed in juju-core:
assignee: nobody → James Tunnicliffe (dooferlad)
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

This bug is most likely the reason why http://juju-ci.vapour.ws:8080/job/maas-1_8-upgrade-trusty-amd64/65/console and similar CI jobs occasionally fail with 1.21.1 (the fix is not backported even in 1.21.3).

Changed in juju-core:
status: Triaged → In Progress
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Fix for trunk set to land - https://github.com/juju/juju/pull/1700

Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.23 → 1.23-beta1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.