Comment 8 for bug 1768870

Revision history for this message
Jason Hobbs (jason-hobbs) wrote : Re: [Bug 1768870] Re: node failed commissioning - HTTP Error 400: {'boot_interface': ["Must be one of the node's interfaces."]}

Yes, we add a node and then delete it right afterwards. We've been
doing that for a long time. This failure just started showing up with
2.3.3.

As a reminder, here's how we add nodes in FCE:

We don't have MAC addresses for the nodes, and we don't have direct IPMI
access to the nodes either. To make this work, we have MAAS do all the
heavy lifting.

The steps are as follows:

1) Check if nodes have already been enlisted

2) We add the node to MAAS with correct IPMI credentials and a fake MAC
address (MAAS requires a MAC address).

3) MAAS, prior to returning from the API call to add the machine, issues
the IPMI commands required to PXE boot the machine. It handles this
regardless of the machine's current state.

4) Immediately upon return from the add machine API call, we issue
another API call to delete the machine from MAAS. MAAS does not issue
any power commands in response to this, so the machine continues to
PXE boot, and will show up in MAAS as a 'New' node once enlistment
completes.

5) We poll MAAS for nodes in 'New' state, looking for a machine to match
our IPMI power address. When we find it, we set the proper hostname and
zone on it, and start commissioning.

6) We poll to ensure commissioning completes successfully.

If bug 1707216 were fixed, we could just add the node and MAAS
would handle the rest.

On Thu, May 3, 2018 at 11:42 AM, Andres Rodriguez
<email address hidden> wrote:
> On the other hand, around the same time I also see this:
>
>
> May 3 06:28:38 leafeon maas.interface: [info] eno1 (physical) on leafeon: New MAC, IP binding observed: 14:02:ec:41:d7:38, 10.244.40.170
> May 3 06:28:40 leafeon maas.node: [error] juju-1: Marking node failed: Commissioning failed, cloud-init reported a failure (refer to the event log for more information)
> May 3 06:28:40 leafeon maas.node: [info] juju-1: Status transition from COMMISSIONING to FAILED_COMMISSIONING
> May 3 06:28:41 leafeon maas.interface: [info] eno1 (physical) on leafeon: New MAC, IP binding observed: 14:02:ec:42:28:70, 10.244.40.171
> May 3 06:28:41 leafeon maas.interface: [info] eno1 (physical) on leafeon: New MAC, IP binding observed: 14:02:ec:41:d7:44, 10.244.40.172
> May 3 06:28:42 leafeon maas.node: [info] landscape-3: Status transition from TESTING to READY
> May 3 06:28:52 leafeon maas.node: [info] kibana-3: Storage layout was set to flat.
> May 3 06:28:52 leafeon maas.node: [info] kibana-3: Status transition from COMMISSIONING to TESTING
> May 3 06:28:55 leafeon maas.node: [error] grafana-1: Marking node failed: Commissioning failed, cloud-init reported a failure (refer to the event log for more information)
> May 3 06:28:55 leafeon maas.node: [info] grafana-1: Status transition from COMMISSIONING to FAILED_COMMISSIONING
>
>
> With this specific message:
>
> May 3 06:28:40 leafeon maas.node: [error] juju-1: Marking node failed:
> Commissioning failed, cloud-init reported a failure (refer to the event
> log for more information)
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1768870
>
> Title:
> node failed commissioning - HTTP Error 400: {'boot_interface': ["Must
> be one of the node's interfaces."]}
>
> Status in MAAS:
> New
>
> Bug description:
> Several pod VMs failed to commission on a deploy of FCB.
>
> In rsyslog output, I see errors like this:
>
> May 3 06:29:00 nagios-1 cloud-init[1057]: request to
> http://10.244.40.33/MAAS/metadata//2012-03-01/ failed. sleeping 1.:
> HTTP Error 400: BAD REQUEST
>
> In regiond.log, there is this traceback that appears to be associated
> with the error:
>
> http://paste.ubuntu.com/p/3CBVCfRVzG/
>
> Some other VMs in this deploy successfully commissioned.
>
> This is with 2.3.3-6492-ge999a54-0ubuntu1~16.04.1.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1768870/+subscriptions