maas machines create fails when node can't be reached via ipmi

Bug #1702751 reported by Jason Hobbs
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Expired
Undecided
Unassigned

Bug Description

When I create a machine using the CLI, I get a "Authorization Error: 'Nonce already used: xxxx'" failure if the machine can't be reached via IPMI.

The machine goes into commissioning state and tries for about 30 seconds to contact the node. The call blocks waiting for that and then returns an error:

http://paste.ubuntu.com/25034200/

I really don't want to wait for commissioning to start - I just want to create the machine and get the machine info back in json format. The creation still succeeded.

There are no tracebacks in any of the logs. The only thing I see is:
2017-07-06 19:54:59 regiond: [info] 10.245.12.148 POST /MAAS/api/2.0/machines/ HTTP/1.1 --> 401 UNAUTHORIZED (referrer: -; agent: Python-httplib2/0.9.1 (gzip))

And in maas.log:
Jul 6 19:58:21 drexel maas.api: [info] geodude: Enlisted new machine
Jul 6 19:58:21 drexel maas.node: [info] geodude: Status transition from NEW to COMMISSIONING
Jul 6 19:58:52 drexel maas.node: [info] geodude: Status transition from COMMISSIONING to NEW
Jul 6 19:58:52 drexel maas.node: [error] geodude: Could not start node for commissioning: No rack controllers can access the BMC of node: geodude

This is with maas 2.2.0

Related branches

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

This is like bug 1600328, except I get a Nonce error.

tags: added: cdo-qa
description: updated
Changed in maas:
importance: Undecided → Medium
milestone: none → 2.3.0
status: New → Triaged
summary: - maas machines create fails when node can't be reached via ipmi
+ [2.x] maas machines create fails when node can't be reached via ipmi
description: updated
Revision history for this message
Newell Jensen (newell-jensen) wrote : Re: [2.x] maas machines create fails when node can't be reached via ipmi

I am not able to reproduce this. I was able to see the same error when trying to execute the same command as that supplied in the pastebin. After that though, I don't see the error and things seems to be working correctly (when creating machines etc.).

tags: added: foundations-engine
tags: added: internal
tags: removed: foundations-engine
Changed in maas:
milestone: 2.3.0 → 2.3.x
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

We hit this onsite today. It's fine that machine creation fails but the error message "nonce already used" is unacceptable. It took us a while to figure out what was really happening. We need an error like "BMC can't be reached".

tags: added: cpe-onsite foundations-engine
Revision history for this message
Andres Rodriguez (andreserl) wrote :

I'm not quite sure we should fail adding a machine if we cannot power manage. That said, when this was first reported, Newell was unable to reproduce, so I wonder if this has anything to do with the rack not being fully connected or a similar case ?

Changed in maas:
milestone: 2.3.x → 2.4.x
Revision history for this message
Newell Jensen (newell-jensen) wrote :

Jason,

Did you also get an error like:

Jul 6 19:58:52 drexel maas.node: [error] geodude: Could not start node for commissioning: No rack controllers can access the BMC of node: geodude

as you did in the bug report?

I am curious if, when this nonce error occurs, if the maas.log shows that no rack controllers can access the BMC of the node, as that would obviously be a better error to surface.

Changed in maas:
status: Triaged → In Progress
assignee: nobody → Newell Jensen (newell-jensen)
Revision history for this message
Jason Hobbs (jason-hobbs) wrote : Re: [Bug 1702751] Re: [2.x] maas machines create fails when node can't be reached via ipmi

Newell, I'm not sure. I didn't get to look into the logs, I was watching
someone else type and couldn't access the system. Once we found the
problem we just continued with other work.

On Wed, Nov 29, 2017 at 11:56 PM, Newell Jensen <<email address hidden>
> wrote:

> Jason,
>
> Did you also get an error like:
>
> Jul 6 19:58:52 drexel maas.node: [error] geodude: Could not start node
> for commissioning: No rack controllers can access the BMC of node:
> geodude
>
> as you did in the bug report?
>
> I am curious if, when this nonce error occurs, if the maas.log shows
> that no rack controllers can access the BMC of the node, as that would
> obviously be a better error to surface.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1702751
>
> Title:
> [2.x] maas machines create fails when node can't be reached via ipmi
>
> Status in MAAS:
> Triaged
> Status in MAAS 2.3 series:
> New
>
> Bug description:
> When I create a machine using the CLI, I get a "Authorization Error:
> 'Nonce already used: xxxx'" failure if the machine can't be reached
> via IPMI.
>
> The machine goes into commissioning state and tries for about 30
> seconds to contact the node. The call blocks waiting for that and
> then returns an error:
>
> http://paste.ubuntu.com/25034200/
>
> I really don't want to wait for commissioning to start - I just want
> to create the machine and get the machine info back in json format.
> The creation still succeeded.
>
> There are no tracebacks in any of the logs. The only thing I see is:
> 2017-07-06 19:54:59 regiond: [info] 10.245.12.148 POST
> /MAAS/api/2.0/machines/ HTTP/1.1 --> 401 UNAUTHORIZED (referrer: -; agent:
> Python-httplib2/0.9.1 (gzip))
>
> And in maas.log:
> Jul 6 19:58:21 drexel maas.api: [info] geodude: Enlisted new machine
> Jul 6 19:58:21 drexel maas.node: [info] geodude: Status transition from
> NEW to COMMISSIONING
> Jul 6 19:58:52 drexel maas.node: [info] geodude: Status transition from
> COMMISSIONING to NEW
> Jul 6 19:58:52 drexel maas.node: [error] geodude: Could not start node
> for commissioning: No rack controllers can access the BMC of node: geodude
>
>
> This is with maas 2.2.0
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1702751/+subscriptions
>

Changed in maas:
status: In Progress → Incomplete
Revision history for this message
Björn Tillenius (bjornt) wrote :

This bug was originally for MAAS 2.2 and a lot has changed since then. Have you seen this bug with 2.7 (or maybe 2.6)? If so, please attach the logs, so that we can take a closer look at what's going on.

no longer affects: maas/2.3
Changed in maas:
status: Incomplete → New
importance: Medium → Undecided
assignee: Newell Jensen (newell-jensen) → nobody
milestone: 2.4.x → none
summary: - [2.x] maas machines create fails when node can't be reached via ipmi
+ maas machines create fails when node can't be reached via ipmi
Changed in maas:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.