[2.x] maas machines create fails when node can't be reached via ipmi

Bug #1702751 reported by Jason Hobbs on 2017-07-06
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Medium
Newell Jensen
2.3
Medium
Newell Jensen

Bug Description

When I create a machine using the CLI, I get a "Authorization Error: 'Nonce already used: xxxx'" failure if the machine can't be reached via IPMI.

The machine goes into commissioning state and tries for about 30 seconds to contact the node. The call blocks waiting for that and then returns an error:

http://paste.ubuntu.com/25034200/

I really don't want to wait for commissioning to start - I just want to create the machine and get the machine info back in json format. The creation still succeeded.

There are no tracebacks in any of the logs. The only thing I see is:
2017-07-06 19:54:59 regiond: [info] 10.245.12.148 POST /MAAS/api/2.0/machines/ HTTP/1.1 --> 401 UNAUTHORIZED (referrer: -; agent: Python-httplib2/0.9.1 (gzip))

And in maas.log:
Jul 6 19:58:21 drexel maas.api: [info] geodude: Enlisted new machine
Jul 6 19:58:21 drexel maas.node: [info] geodude: Status transition from NEW to COMMISSIONING
Jul 6 19:58:52 drexel maas.node: [info] geodude: Status transition from COMMISSIONING to NEW
Jul 6 19:58:52 drexel maas.node: [error] geodude: Could not start node for commissioning: No rack controllers can access the BMC of node: geodude

This is with maas 2.2.0

Related branches

Jason Hobbs (jason-hobbs) wrote :

This is like bug 1600328, except I get a Nonce error.

tags: added: cdo-qa
description: updated
Changed in maas:
importance: Undecided → Medium
milestone: none → 2.3.0
status: New → Triaged
summary: - maas machines create fails when node can't be reached via ipmi
+ [2.x] maas machines create fails when node can't be reached via ipmi
description: updated
Newell Jensen (newell-jensen) wrote :

I am not able to reproduce this. I was able to see the same error when trying to execute the same command as that supplied in the pastebin. After that though, I don't see the error and things seems to be working correctly (when creating machines etc.).

tags: added: foundations-engine
tags: added: internal
tags: removed: foundations-engine
Changed in maas:
milestone: 2.3.0 → 2.3.x
Jason Hobbs (jason-hobbs) wrote :

We hit this onsite today. It's fine that machine creation fails but the error message "nonce already used" is unacceptable. It took us a while to figure out what was really happening. We need an error like "BMC can't be reached".

tags: added: cpe-onsite foundations-engine
Andres Rodriguez (andreserl) wrote :

I'm not quite sure we should fail adding a machine if we cannot power manage. That said, when this was first reported, Newell was unable to reproduce, so I wonder if this has anything to do with the rack not being fully connected or a similar case ?

Changed in maas:
milestone: 2.3.x → 2.4.x
Newell Jensen (newell-jensen) wrote :

Jason,

Did you also get an error like:

Jul 6 19:58:52 drexel maas.node: [error] geodude: Could not start node for commissioning: No rack controllers can access the BMC of node: geodude

as you did in the bug report?

I am curious if, when this nonce error occurs, if the maas.log shows that no rack controllers can access the BMC of the node, as that would obviously be a better error to surface.

Changed in maas:
status: Triaged → In Progress
assignee: nobody → Newell Jensen (newell-jensen)

Newell, I'm not sure. I didn't get to look into the logs, I was watching
someone else type and couldn't access the system. Once we found the
problem we just continued with other work.

On Wed, Nov 29, 2017 at 11:56 PM, Newell Jensen <<email address hidden>
> wrote:

> Jason,
>
> Did you also get an error like:
>
> Jul 6 19:58:52 drexel maas.node: [error] geodude: Could not start node
> for commissioning: No rack controllers can access the BMC of node:
> geodude
>
> as you did in the bug report?
>
> I am curious if, when this nonce error occurs, if the maas.log shows
> that no rack controllers can access the BMC of the node, as that would
> obviously be a better error to surface.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1702751
>
> Title:
> [2.x] maas machines create fails when node can't be reached via ipmi
>
> Status in MAAS:
> Triaged
> Status in MAAS 2.3 series:
> New
>
> Bug description:
> When I create a machine using the CLI, I get a "Authorization Error:
> 'Nonce already used: xxxx'" failure if the machine can't be reached
> via IPMI.
>
> The machine goes into commissioning state and tries for about 30
> seconds to contact the node. The call blocks waiting for that and
> then returns an error:
>
> http://paste.ubuntu.com/25034200/
>
> I really don't want to wait for commissioning to start - I just want
> to create the machine and get the machine info back in json format.
> The creation still succeeded.
>
> There are no tracebacks in any of the logs. The only thing I see is:
> 2017-07-06 19:54:59 regiond: [info] 10.245.12.148 POST
> /MAAS/api/2.0/machines/ HTTP/1.1 --> 401 UNAUTHORIZED (referrer: -; agent:
> Python-httplib2/0.9.1 (gzip))
>
> And in maas.log:
> Jul 6 19:58:21 drexel maas.api: [info] geodude: Enlisted new machine
> Jul 6 19:58:21 drexel maas.node: [info] geodude: Status transition from
> NEW to COMMISSIONING
> Jul 6 19:58:52 drexel maas.node: [info] geodude: Status transition from
> COMMISSIONING to NEW
> Jul 6 19:58:52 drexel maas.node: [error] geodude: Could not start node
> for commissioning: No rack controllers can access the BMC of node: geodude
>
>
> This is with maas 2.2.0
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1702751/+subscriptions
>

Changed in maas:
status: In Progress → Incomplete
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers