Juju doesn't retry hard enough when destroying MAAS environments

Bug #1384001 reported by Julian Edwards
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Go MAAS API Library
Fix Released
Critical
Raphaël Badin
MAAS
Fix Released
Critical
Raphaël Badin
juju-core
Fix Released
Critical
Horacio Durán

Bug Description

MAAS can return transient errors when destroying due to concurrent power operations being disallowed. Juju ends up with the error as below if you bootstrap and then ctrl-c a short time later.

Juju should retry a few times, it'll work eventually.

ubuntu@maas:~$ juju bootstrap Launching instance
WARNING picked arbitrary tools &{1.20.9-precise-amd64 https://streams.canonical.com/juju/tools/releases/juju-1.20.9-precise-amd64.tgz fdb390f8dfec42cab6cdfbac6f2755b460dbf078c21df62f927facafc1f9c889 8111092}
 - /MAAS/api/1.0/nodes/node-4a7deb78-b8a9-11e3-a845-e4115b13819f/
Waiting for address
Attempting to connect to node7.maas:22
Attempting to connect to node7.maas:22
Attempting to connect to 10.0.0.200:22
^CInterrupt signalled: waiting for bootstrap to exit
ERROR bootstrap failed: interrupted
Stopping instance...
ERROR cannot stop failed bootstrap instance "/MAAS/api/1.0/nodes/node-4a7deb78-b8a9-11e3-a845-e4115b13819f/": gomaasapi: got error back from server: 500 INTERNAL SERVER ERROR (Unable to change power state to 'off' for node node7.master: another action is already in progress for that node.)
Bootstrap failed, destroying environment
ERROR Bootstrap failed, and the environment could not be destroyed: gomaasapi: got error back from server: 500 INTERNAL SERVER ERROR (Unable to change power state to 'off' for node node7.master: another action is already in progress for that node.)
ERROR interrupted

Related branches

Ian Booth (wallyworld)
Changed in juju-core:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Julian Edwards (julian-edwards) wrote :

Adding a MAAS task because MAAS should not be returning a 500, but rather a retryable error code, e.g
503 Service Unavailable

Changed in maas:
status: New → Triaged
importance: Undecided → High
milestone: none → next
Revision history for this message
Julian Edwards (julian-edwards) wrote :

MAAS should set a Retry-After header if it returns a 503.

summary: - Doesn't retry hard enough when destroying environments
+ Juju doesn't retry hard enough when destroying MAAS environments
Curtis Hovey (sinzui)
tags: added: maas-provider
tags: added: destroy-environment
Revision history for this message
Christian Reis (kiko) wrote :

See also bug 1381619.

Changed in maas:
status: Triaged → In Progress
assignee: nobody → Julian Edwards (julian-edwards)
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Something similar happened to the landscape cloud installer. I started a deployment, and decided to abort it as soon as the bootstrap node was up. Landscape then issues a juju destroy-environment call, and that failed with this error:

ERROR destroying environment: gomaasapi: got error back from server: 500 INTERNAL SERVER ERROR (Unable to change power state to 'off' for node shawmut: another action is already in progress for that node.)

Landscape then tried again with --force, which got the same error, and gave up.

The end result is that the nodes stayed allocated in MAAS, effectively having "leaked" due to this error. They are even all "deployed" now.

I will attach some logs and a screenshot.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Changed in maas:
status: In Progress → Fix Committed
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I don't understand, what's preventing MAAS from issuing a power off right after a power on? I can do that in a server's console just fine.

tags: added: cloud-installer
Curtis Hovey (sinzui)
Changed in juju-core:
importance: Medium → High
milestone: none → next-stable
Christian Reis (kiko)
Changed in maas:
status: Fix Committed → In Progress
assignee: Julian Edwards (julian-edwards) → Raphaël Badin (rvb)
Changed in juju-core:
assignee: nobody → Raphaël Badin (rvb)
status: Triaged → In Progress
Changed in maas:
importance: High → Critical
milestone: next → 1.7.0
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: next-stable → 1.21-alpha3
importance: High → Critical
Christian Reis (kiko)
Changed in juju-core:
status: In Progress → New
Curtis Hovey (sinzui)
Changed in juju-core:
status: New → Triaged
Revision history for this message
Julian Edwards (julian-edwards) wrote : Re: [Bug 1384001] Re: Juju doesn't retry hard enough when destroying MAAS environments

On Monday 27 Oct 2014 15:39:54 you wrote:
> ** Changed in: maas
> Status: Fix Committed => In Progress
>
> ** Changed in: maas
> Assignee: Julian Edwards (julian-edwards) => Raphaël Badin (rvb)

Huh?

Curtis Hovey (sinzui)
no longer affects: juju-core/1.20
Raphaël Badin (rvb)
Changed in maas:
status: In Progress → Fix Committed
Revision history for this message
Raphaël Badin (rvb) wrote :

The change to gomaasapi has just landed (revision 58). Now we need Juju to update its dependencies.tsv file to refer to the new revision.

Changed in gomaasapi:
assignee: nobody → Raphaël Badin (rvb)
importance: Undecided → Critical
status: New → Fix Committed
Changed in juju-core:
assignee: Raphaël Badin (rvb) → nobody
Changed in juju-core:
status: Triaged → Opinion
status: Opinion → Triaged
Martin Packman (gz)
Changed in juju-core:
assignee: nobody → Horacio Durán (hduran-8)
status: Triaged → In Progress
Changed in juju-core:
status: In Progress → Fix Committed
Ian Booth (wallyworld)
Changed in juju-core:
status: Fix Committed → In Progress
assignee: Horacio Durán (hduran-8) → Ian Booth (wallyworld)
status: In Progress → Fix Committed
assignee: Ian Booth (wallyworld) → Horacio Durán (hduran-8)
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
Changed in maas:
status: Fix Committed → Fix Released
Changed in gomaasapi:
status: Fix Committed → Fix Released
Revision history for this message
Adam Conrad (adconrad) wrote : Please test proposed package

Hello Julian, or anyone else affected,

Accepted maas into utopic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/maas/1.7.5+bzr3369-0ubuntu1~14.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-needed
Revision history for this message
Andres Rodriguez (andreserl) wrote :

This issue has been verified to work both on upgrade and fresh install, and has been QA'd. Marking verification-done.

tags: added: verification-done
removed: verification-needed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.