[2.x] MAAS rack controller cannot auto discover interfaces on whitebox switches

Bug #1630681 reported by Luke Williams
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Invalid
High
Unassigned

Bug Description

I am upgrading whitebox switches to 16.04 and MAAS 2.0 and running ICOS as a snap on these devices. MAAS and Ubuntu and ICOS snaps all work, however, when I create a routing device in ICOS, it is not auto discovered by MAAS. In MAAS 1.9 it would auto detect these interfaces if I created them before installing MAAS, and I could manually create them in 1.9 if I created them after installing MAAS. I have tried both of these on MAAS 2.0 but cannot figure out how to manually create interfaces on the the rack controller.

I have included all the logs from /var/log/maas, but don't see antyhing in there about even trying to discover these interfaces.

The interfaces do show up in ifconfig and I can manage them with ifconfig.

This seems like a regression since in 1.9 this did work.

Tags: switch
Revision history for this message
Luke Williams (wililupy) wrote :
Revision history for this message
Luke Williams (wililupy) wrote :

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===============================-==============================-============-=================================================
ii maas 2.0.0+bzr5189-0ubuntu1~16.04.1 all "Metal as a Service" is a physical cloud and IPAM
ii maas-cli 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS client and command-line interface
un maas-cluster-controller <none> <none> (no description available)
ii maas-common 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS server common files
ii maas-dhcp 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS DHCP server
ii maas-dns 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS DNS server
ii maas-proxy 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS Caching Proxy
ii maas-rack-controller 2.0.0+bzr5189-0ubuntu1~16.04.1 all Rack Controller for MAAS
ii maas-region-api 2.0.0+bzr5189-0ubuntu1~16.04.1 all Region controller API service for MAAS
ii maas-region-controller 2.0.0+bzr5189-0ubuntu1~16.04.1 all Region Controller for MAAS
un maas-region-controller-min <none> <none> (no description available)
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
un python-maas-provisioningserver <none> <none> (no description available)
ii python3-django-maas 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS server provisioning libraries (Python 3)

Revision history for this message
Mike Pontillo (mpontillo) wrote :

In MAAS 2.0 we now auto-detect all your Ethernet interfaces and present those to the MAAS region. It's likely that in this special configuration, your interfaces are not being detected correctly.

Do you know which kernel driver is managing your Ethernet interfaces? If it does not report that it is backed by physical hardware (no /sys/class/<ifname>/device is present) that is the first reason we might not detect it.

To triage this issue, we'll need the output of the following commands:

systemd-detect-virt -c
sudo maas-rack support-dump --networking
sudo cat /proc/net/vlan/config
/sbin/ip addr
brctl show
ls -la /sys/class/net/*/device
for type in $(ls /sys/class/net/*/type); do echo $type: $(cat $type); done

Changed in maas:
status: New → Incomplete
importance: Undecided → High
summary: - MAAS 2.0 cannot auto discover interfaces on whitebox switches
+ [2.0] MAAS 2.0 cannot auto discover interfaces on whitebox switches
Revision history for this message
Luke Williams (wililupy) wrote : Re: [2.0] MAAS 2.0 cannot auto discover interfaces on whitebox switches

systemd-detect-virt -c
none

maas-rack support-dump --networking is in the maas-rack-support-dump.txt

Revision history for this message
Luke Williams (wililupy) wrote :

sudo cat /proc/net/vlan/config
cat: /proc/net/vlan/config: No such file or directory

/sbin/ip addr
output is in ip_addr.txt

Revision history for this message
Luke Williams (wililupy) wrote :

brctl show
-bash: brctl: command not found

On a note for brctl, the bonding built into the kernel with this switch and modules and the ICOS software handles bonding/bridging/aggregating.

ls -la /sys/class/net/*/device
lrwxrwxrwx 1 root root 0 Oct 18 2034 /sys/class/net/eth0/device -> ../../../0000:00:14.0

However, /sys/class/net/ shows all of the interfaces.

for type in $(ls /sys/class/net/*/type); do echo $type: $(cat $type); done

See net_class_type.txt

The Kernel drivers that manages the interfaces are custom built modules by Broadcom. They are running on 4.4.0-21 and cannot be updated due to being built against those headers specifically.

I can access the switch from the FPTI as well as configure them and they work at the switching level. However, I cannot access them from inside MAAS. In previous versions I could manually enter the devices and everything worked. I would like to be able to do this again. And from my first output, it looks like MAAS can see them, it just doesn't have a device mapping like you said.

Let me know if you need any other information.

Changed in maas:
milestone: none → 2.1.0
Revision history for this message
Luke Williams (wililupy) wrote :

I installed MAAS 2.0 on the system without the ICOS Snap and running everything natively, and I still get the same results, the interfaces show up in the dump, they have IP addresses, but MAAS does not see them in the web UI and I cannot setup DHCP for those networks.

I'm going to try to install 2.1 and see what results I get. I know when I tested in the past, I had the same results, but I was running in a snap. I'm going to try outside of the snap and natively installed like in my tests with 1.9.

Revision history for this message
Luke Williams (wililupy) wrote :

I upgraded to 2.1 and I get the same results. I also ran maas-rack support-dump --network and I get the same results, however I added the VLAN's for handling external connected devices (rt_v200 and rt_v102) which currently show down since I don't have anything connected to the device on the ports that those VLAN's are members of.

Let me know if you need anything else.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

After re-reading the test data, it looks like 'eth0' is the only interface that the kernel believes is actually backed by real, physical hardware. If that's the case, the following might be a workaround:

https://paste.ubuntu.com/23310645/

You might encounter one other possible issue, since all those physical interfaces are reporting the same MAC address. We addressed this for the "bridge inherits physical interface MAC" scenario, but I am not positive the code will handle it well if we try to register a few dozen physical interfaces with identical MACs.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

One more note on this. The root cause is that the Broadcom ASIC's interfaces in the whitebox switch don't match our assumptions about what a physical interface looks like. This is important because MAAS, when running on metal (or in a VM), filters out "virtual" ethernet interfaces, such as the interfaces that are dedicated to a container. When MAAS is running in a container, the opposite is true: we need to treat virtual interfaces as "real" Ethernet interfaces, or nothing will show up!

All that was fine until this particular Ethernet driver came along. So in order to resolve this issue, one of two things need to happen:

(1)(a)The Broadcom driver needs to expose a 'device' node in sysfs for each interface; for example, under: /sys/class/net/<interface>/device. MAAS currently uses the fact that a link exists here to determine that an interface is "physical" (i.e. backed by hardware). This could be considered a bug that we could escalate to Broadcom.***

(1)(b)If they aren't willing to add a 'device' node to their interface entries in sysfs, then they could possibly provide an alternate way for us to determine (by looking at /proc or /sys) that the interface is backed by physical hardware.

(2) If (1) isn't possible, we need to determine ourselves an alternative way to know that this is indeed a 'physical' interface. One simple heuristic, based on the data you provided, would be to check if the interface has an underscore in its name, and if so, include it. But I would rather have a more definitive way to tell.

*** If (1)(a) happens, no change is needed in MAAS.

Changed in maas:
milestone: 2.1.0 → 2.1.1
Changed in maas:
milestone: 2.1.1 → 2.1.2
Changed in maas:
milestone: 2.1.2 → 2.1.3
Revision history for this message
Mike Pontillo (mpontillo) wrote :

Good news: it looks like on a switch with this driver loaded, we can at least determine the list of Ethernet interfaces managed by the driver by looking at /proc/bcm/knet/link, which has contents such as:

Software link status:
  Ethernet88 down
  Ethernet76 down
  Ethernet68 down
  Ethernet36 down
  Ethernet44 down
  Ethernet96 down
  Ethernet116 down
  Ethernet120 down
  Ethernet32 down
  Ethernet108 down
  Ethernet84 down
  Ethernet12 down
  Ethernet64 down
  Ethernet92 down
  Ethernet20 down
  Ethernet100 down
  Ethernet112 down
  Ethernet4 down
  Ethernet60 down
  Ethernet0 down
  Ethernet80 down
  Ethernet40 down
  Ethernet104 down
  Ethernet56 down
  Ethernet16 down
  Ethernet24 down
  Ethernet8 down
  Ethernet52 down
  Ethernet72 down
  Ethernet48 down
  Ethernet28 down
  Ethernet124 down

Changed in maas:
milestone: 2.1.3 → 2.2.0
Revision history for this message
Mike Pontillo (mpontillo) wrote :

Note that this bug will hereafter be related to discovering switch interfaces when installing the rack controller.

Discovering interfaces during commissioning time (before a switch ASIC has been configured with a front panel layout, etc) is more difficult and not in scope for this bug.

summary: - [2.0] MAAS 2.0 cannot auto discover interfaces on whitebox switches
+ [2.x] MAAS rack controller cannot auto discover interfaces on whitebox
+ switches
Changed in maas:
milestone: 2.2.0 → 2.2.x
Changed in maas:
status: Incomplete → Triaged
Changed in maas:
milestone: 2.2.x → 2.3.0
tags: added: switch
Changed in maas:
milestone: 2.3.0 → 2.3.x
Revision history for this message
Adam Collard (adam-collard) wrote :

This bug has not seen any activity in the last 6 months, so it is being automatically closed.

If you are still experiencing this issue, please feel free to re-open.

MAAS Team

Changed in maas:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.