MAAS create extraneous fabrics when using multiple interfaces and VLANs

Bug #1537257 reported by Mike Deats
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Unassigned
1.9
Won't Fix
High
Unassigned
2.0
Invalid
Undecided
Unassigned

Bug Description

When MAAS first creates the cluster controller, it auto-detects the existing network configuration and creates the appropriate subnets, VLANs, and fabrics. This seems to work fine with a single physical interface w/VLANs, but when when the controller has two interfaces with multiple VLANs, MAAS creates a fabric for each VLAN on the secondary interface.

For example, my controller has the following and can be reproduced by setting up a similar configuration and then installing MAAS normally:
- 4 NICs, bonded in pairs to create bond0 and bond1
- bond0 is set with a static IP in the untagged VLAN for PXE boot, and then has 4 additional VLAN interfaces. Call them bond0.100, bond0.101, bond0.102, and bond0.103
- bond1 has no IP, but has 4 VLANs. Call them bond1.200, bond1.201, and bond1.202, and bond1.203

After install of MAAS 1.9, the cluster controller auto-detects everything correctly, and creates fabric0 that contains all VLANs attached to bond0, as well as the untagged vlan. But then it creates separate fabrics for each of the VLANs on bond1, each with an untagged VLAN, and the VLAN it pulled from the interface. The end result is 5 fabrics:

fabric0: untagged, VLAN100, VLAN101, VLAN102, VLAN103
fabric1: untagged, VLAN200
...
fabric4: untagged, VLAN203

I was able to fix this using the MAAS CLI, but it seems to me that MAAS should create a single fabric for each physical port (or bonded port) it finds. I should have seen something like this after installing MAAS:

fabric0: untagged, VLAN100, VLAN101, VLAN102, VLAN103
fabric1: untagged, VLAN200, VLAN201, VLAN202, VLAN203

Related branches

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Mike,

What you are seeing is actually intended. Each cluster controller will create 1 fabric per connected interface. This is because there's no way for MAAS to know they are on the same or different fabric.

Initially, MAAS used to do as what you suggest (everything in the same fabric), but then we received feedback to do the contrary, which is the current behavior.

While this is a great suggest, we won't be addressing it. However, we'd like to keep the bug open to demonstrate user scenarios and we can discuss this later to see if the fabric creation changes.

Changed in maas:
status: New → Opinion
importance: Undecided → Wishlist
milestone: none → 2.0.0
Revision history for this message
Mike Pontillo (mpontillo) wrote :

I tend to agree with the submitter on this one. If we can tell that a physical port is split up into virtual interfaces (tagged VLANs) we should be smart enough to create a single fabric. I'm not certain we have enough information to do so (it might require a heuristic), but we should consider it.

Revision history for this message
Mike Deats (mikedeats) wrote :

Ok, but then why don't you get 8 fabrics when you have 8 VLANs? I get 1 fabric for all the VLANs on bond0, and then 4 separate fabrics for each VLAN on bond1. VLAN interfaces have to be attached to a physical interface, so I think all VLANs should be grouped together into a fabric based on physical interface.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

There is code in src/maasserver/forms.py around line ~1700 (save() in NodeGroupInterfaceForm) which is supposed to make this determination.

Either we have a bug, or the information we're receiving about the interfaces is complete.

I plan to check the unit tests to make sure they cover this scenario.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

Sorry, I meant "or the information we're receiving about the interfaces is incomplete".

Revision history for this message
Mike Pontillo (mpontillo) wrote :

I wrote a unit test here, which passes:

https://code.launchpad.net/~mpontillo/maas/bug-1537257-triage-1.10/+merge/283885

However, if I comment out the definition of "bond1" in the new unit test, I'm seeing the behavior you describe.

Theory: we create the interfaces in separate fabrics because we cannot find the backing (untagged) VLAN for "bond1" in MAAS, therefore we cannot associate it with the parent Fabric.

Potential workaround: assign an IP address on bond1 before installing the MAAS clusterd.

Changed in maas:
milestone: 2.0.0 → none
status: Opinion → Invalid
importance: Wishlist → Undecided
Revision history for this message
Mike Pontillo (mpontillo) wrote :

I can confirm that this is a bug on MAAS 1.9 (and MAAS 1.10). I'm marking this "Won't Fix", because honestly, I don't think we have the time to fix this - and several workaround are available. (Patches are welcome, though. Any fix must be done on 1.10/xenial, then backported to 1.9/trusty.)

The fix would need to change discover_networks() in src/provisioningserver/network.py so that it communicates networks that do not have IP addresses to the regiond, in addition to the networks that do. (furthermore, we might need to call and parse "/sbin/ip link" in addition to the current "/sbin/ip addr" to gather networking information.)

I don't *think* this will be an issue on trunk (MAAS 2.0) going forward, because the deign of the clusterd is going to radically change. We should double-check this when we're far enough along.

Changed in maas:
milestone: none → next
Revision history for this message
Mike Pontillo (mpontillo) wrote :

We just discussed this and we'll try to get this fixed before we release 1.9.2. Meanwhile, one of the following workarounds can be used:

(1) Assign an IP address to bond1 before installing maas-clusterd
(2) Move the VLANs to the correct fabric using the CLI after installing MAAS

no longer affects: maas/1.10
Revision history for this message
Andres Rodriguez (andreserl) wrote :

We believe this is no longer an issue in the latest releases of MAAS (2.3) as it introduced beacons, which addresses this issue.

Changed in maas:
milestone: next → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.