[2.x, UI] Add machine doesn’t highlight the importance of adding the PXE interface

Bug #1756016 reported by james beedy on 2018-03-15
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Medium
Unassigned

Bug Description

When adding a machine via the WebUI, it doesn't highlight the importance of adding the PXE interface as one of the interface.

If MAAS doens't know the PXE interface, it wont be able to commission the machine and it will fail to enlist.

[Original report]

MAAS version: 2.3.0 (6434-gd354690-snap)

Unexpected behavior occurs when the maas-url is on a different network then the ipmi.

My nodes impi exist on one network, maas exists on another network. These two networks are routable via their respective gateways, but manually adding a node followed by node commissioning comes to a hard stop when the message
"Already registered on 10.10.0.103:5240/MAAS/api/2.0/machines; skipping enlistment" shows on the console and the node immediately powers down. The node status is left as powerd off and "commissioning" in the maas ui.

If a node is enlisted via pxe boot it fails by getting enlisted what seems to be only 50% whith partial node details and no BMC details. If you then manually add the BMC details, you can commission the node and get stuck in the commissioning broken loop ^.

From my perspective, MAAS should either A) block the user from commissioning a machine that will not properly commission, B) Support operations where the ipmi is in a different address space then maas-url, or C) At least warn the user that placing maas-url and ipmi in different networks will break things.

$ for i in /var/snap/maas/common/log/*; do echo $i; sudo cat $i | pastebinit; done
/var/snap/maas/common/log/dhcpd.log
http://paste.ubuntu.com/p/NRZg2XYRpm/
/var/snap/maas/common/log/named.log
http://paste.ubuntu.com/p/W2KGPMHX8Q/
/var/snap/maas/common/log/ntp.log
http://paste.ubuntu.com/p/yWh4pDVnP4/
/var/snap/maas/common/log/ntpstats
cat: /var/snap/maas/common/log/ntpstats: Is a directory
You are trying to send an empty document, exiting.
/var/snap/maas/common/log/proxy
cat: /var/snap/maas/common/log/proxy: Is a directory
You are trying to send an empty document, exiting.
/var/snap/maas/common/log/proxy.log
http://paste.ubuntu.com/p/FNBPB8WQrW/
/var/snap/maas/common/log/rackd.log
http://paste.ubuntu.com/p/xK8sPzrjF8/
/var/snap/maas/common/log/regiond.log
http://paste.ubuntu.com/p/t3GDStwK2G/
/var/snap/maas/common/log/supervisor.log
http://paste.ubuntu.com/p/BWD9VgRJ56/
/var/snap/maas/common/log/tgt.log
http://paste.ubuntu.com/p/kzq75hcf8H/

james beedy (jamesbeedy) on 2018-03-15
description: updated
description: updated
Andres Rodriguez (andreserl) wrote :

The description of the bug here actually doesn’t match the error st all. If you are seeing

Already registered on 10.10.0.103:5240/MAAS/api/2.0/machines; skipping enlistment

When you are commissioning a machine, that means that the machine failed to pxe boot into the ephemeral environment and it fallback to the default environment. This has really nothing to the with the BMC being in a completely different network.

What I would suggest is that when you start the commissioning process you tail Maas logs and watch the machines console to figure out why it is not pxe booting in the ephemeral for commissioning.

But that said, the error basically means it tried to add a new machine and it failed because the Mac already exists in MAAS.

Changed in maas:
status: New → Incomplete
Andres Rodriguez (andreserl) wrote :

Looking at rackd.log here, I see this:

2018-03-15 01:11:19 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/01-00-0a-f7-39-b4-d0 requested by 00:0a:f7:39:b4:d0
2018-03-15 01:11:19 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/0A0A6403 requested by 00:0a:f7:39:b4:d0
2018-03-15 01:11:19 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/0A0A640 requested by 00:0a:f7:39:b4:d0
2018-03-15 01:11:19 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/0A0A64 requested by 00:0a:f7:39:b4:d0
2018-03-15 01:11:19 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/0A0A6 requested by 00:0a:f7:39:b4:d0
2018-03-15 01:11:19 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/0A0A requested by 00:0a:f7:39:b4:d0
2018-03-15 01:11:19 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/0A0 requested by 00:0a:f7:39:b4:d0
2018-03-15 01:11:19 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/0A requested by 00:0a:f7:39:b4:d0
2018-03-15 01:11:19 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/0 requested by 00:0a:f7:39:b4:d0
2018-03-15 01:11:19 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/default requested by 00:0a:f7:39:b4:d0
2018-03-15 01:11:19 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/default requested by 00:0a:f7:39:b4:d0
2018-03-15 01:11:19 provisioningserver.rackdservices.tftp: [info] ubuntu/amd64/ga-16.04/xenial/daily/boot-kernel requested by 00:0a:f7:39:b4:d0
2018-03-15 01:11:21 provisioningserver.rackdservices.tftp: [info] ubuntu/amd64/ga-16.04/xenial/daily/boot-initrd requested by 00:0a:f7:39:b4:d0

This to me means that the machine attempted to PXE with the interface with MAC 00:0a:f7:39:b4:d0. MAAS didn't find the machine in MAAS, and the machine continued the PXE process until it requested the 'default' config.

Since the MAC is not known and no config was given for it, the 'default' config is for enlistment. During the enlistment process, it attempted to register a machine, but it failed, because there's another machine already registered in MAAS that has this same mac address.

Did you add a machine in MAAS via the API (or WebUI) without specifying *all* its mac addresses?

james beedy (jamesbeedy) on 2018-03-15
description: updated
description: updated
description: updated
james beedy (jamesbeedy) wrote :

@andreserl the user had entered incorrect BMC details previously to cause those errors, but isn't related to the main issue

description: updated
james beedy (jamesbeedy) wrote :

I'll have him chime in if possible

james beedy (jamesbeedy) wrote :

ohhhh, he didn't specify *all* of the mac addresses. That must be the key!

Andres Rodriguez (andreserl) wrote :

James,

Your bug title and description do not match. There's no relationship between the BMC and a machine when the machine requests the PXE process, so it doesn't really matter in what network the BMC is connected, as the BMC doesn't do the PXE process.

From the information you have provided, it is completely clear that:

1. Machine PXE booted using specific MAC, and it requested the config from MAAS.
2. The machine didn't receive a config for the MAC it used to PXE boot from MAAS.
3. The machine continued to request the 'default' config, which is for the enlistment environment.
4. The machine booted into enlistment environment, tried to register itself, but failed because one or more MAC addresses (other than the one used for PXE) are already known to MAAS.

So to me, what it looks like is that a machine is already registered in MAAS with some of the MAC addresses that belong to the machine, but not with the MAC address the machine is using to PXE boot. Since MAAS doesn't know such MAC address, then it doesn't know it needs to PXE the machine for commissioning.

MAAS requires that *at least* the PXE mac address/interface is provided when manually adding a node.

Can you confirm how was this machine you are attempting to commission added to MAAS?

@andreserl in one case the machine was manually added.

I am acting as a proxy for a buddy who spent the night at the data center trying to get things up and had some issues.

Possibly he didn't want to chime in here as I had directed ... just wanted to try and do it all himself.

I'll have the opportunity to get my hands on these systems more today and I'll report back with my findings.

1. Machine PXE booted using specific MAC, and it requested the config from MAAS.
True - great.

2. The machine didn't receive a config for the MAC it used to PXE boot from MAAS.
No idea, the BMC details get filled in.

3. The machine continued to request the 'default' config, which is for the enlistment environment.
True - this seems incorrect behavior, no idea why the machine was doing this.

4. The machine booted into enlistment environment, tried to register itself, but failed because one or more MAC addresses (other than the one used for PXE) are already known to MAAS.
True - The user specified other mac when adding the machine manually.

summary: - enlisting and commissioning breaks when a node impi interface is on a
- different (but routable) network from the MAAS endpoint
+ unexpected behavior when impi interface is on a different (but routable)
+ network from the MAAS endpoint
Daniel Amaya (damaya1982) wrote :

The issue here was that the MAC address used for BMC was correct, but the MAC address used for PXE booting was not correct. When I set the MAC address for PXE booting to the correct interface, enlistment worked perfectly fine.

I would say, as a potential enhancement, on the "Add machine" interface it would be nice to indicate that the MAC address under the machine heading is used for PXE, therefore must be the MAC of the PXE interface. Even just adding something to documentation indicating what the MAC address is supposed to be would be helpful.

Changed in maas:
status: Incomplete → Triaged
importance: Undecided → Medium
milestone: none → 2.4.0beta1
summary: - unexpected behavior when impi interface is on a different (but routable)
- network from the MAAS endpoint
+ [2.x, UI] Add machine doesn’t provide the importance of adding the PXE
+ interface
summary: - [2.x, UI] Add machine doesn’t provide the importance of adding the PXE
+ [2.x, UI] Add machine doesn’t highlight the importance of adding the PXE
interface
description: updated
Changed in maas:
milestone: 2.4.0beta1 → 2.4.0beta2
Changed in maas:
milestone: 2.4.0beta2 → 2.4.0beta3
Changed in maas:
milestone: 2.4.0beta3 → 2.4.0beta4
Changed in maas:
milestone: 2.4.0beta4 → 2.4.x
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers