[2.0] VLAN interfaces of secondary rack-controller are not reported

Bug #1563701 reported by Matthew Rees on 2016-03-30
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Critical
Unassigned

Bug Description

I am testing MAAS 2.0 and the HA ability of the rack-controllers. I have two VM's running up-to-date Xenial and the latest MAAS packages from ppa:maas/next, and both have the same network configuration (apart from unique IP addresses) and are able to communicate with each other over every interface.

The interfaces on both servers are as follows:
ens3: used for PXE/provisioning nodes, untagged VLAN
ens4: external connectivity, untagged VLAN
ens5: root interface for VLANs, unconfigured
ens5.10: VLAN interface
ens5.11: VLAN interface
ens5.12: VLAN interface
ens5.13: VLAN interface
ens5.14: VLAN interface
ens5.15: VLAN interface
ens5.16: VLAN interface

Here are my installation steps:
1. Install "maas" package on maas-server1
2. Run dpkg-reconfigure on "maas-region-controller" and set PXE network address to "10.2.0.2" (IP of maas-server1 on PXE interface)
3. Run dpkg-reconfigure on "maas-rack-controller" and set API URL to "http://192.168.35.43:5240/MAAS" (IP of maas-server1 on external interface)
4. Create a user and log in to web UI

At this point the interfaces listed for maas-server1 under "Controllers" are correctly identified.

5. Install "maas-rack-controller" package on maas-server2
6. Run dpkg-reconfigure on "maas-rack-controller: and set API URL to "http://192.168.35.43:5240/MAAS" (IP of maas-server1 on external interface)

At this point maas-server2 is registered and identified in the web UI, however only the following interfaces are listed for it:
ens3
ens4
ens5

None of the VLAN interfaces defined on ens5 are listed.

I have tried to create them manually using "maas <login> interfaces create-vlan <system_id_of_maas-server2> vlan=<maas_vlan_id_of_vlan10> parent=<maas_interface_id_of_ens5>" but get a response of "Not Found".

I have attached maas logs, dpkg output and maas-rack support-dump output from both servers.

Thanks!

Related branches

Matthew Rees (matthew-rees) wrote :
Changed in maas:
importance: Undecided → Critical
milestone: none → 2.0.0
status: New → Triaged
Mike Pontillo (mpontillo) wrote :

This is pretty strange. I found the following traceback in your logs, happening over and over again:

https://paste.ubuntu.com/15622602/

Did you add the VLAN interfaces after the rack initially registered? It looks like every time the rack re-registers, it fails to update the interfaces. Yet since there are already interfaces in the database, it makes me think it must have succeeded at least once in the past. And we always send the full list of interfaces. (So they were either offline at the time, or not configured yet?)

Mike Pontillo (mpontillo) wrote :

(Or is it possible that at one time the interfaces existed, but did not have IP addresses?)

Matthew Rees (matthew-rees) wrote :

This was a new OS installation where the interfaces were up, online and able to communicate on each interface between server1 and server2 before I began the MAAS installation.

That said, they are VMs and as a sanity check I will create new block devices and try the installation and setup again, just in case anything was inherited from a previous install (AFAIK Ubuntu does a format of a disk when it is told to use the entire disk during installation, but I'll check regardless)

Matthew Rees (matthew-rees) wrote :

I have finished testing on a fresh install on new VM block devices and the issue persists (latest Xenial daily ISO and maas/next ppa).

The interfaces are configured and brought online with a reboot, and MAAS is installed only after a reboot and confirmation that *all* interfaces can communicate with each other (not just VLAN interfaces).

Mike Pontillo (mpontillo) wrote :

Interesting. It's very strange that you see the physical interfaces appear but not the VLAN interfaces. That should all be done in a transaction, so that is very unexpected.

Can you post the full version of MAAS you are using? We had some code that landed in revision 4842 that we think may help with this issue.

We also have a beta version that should be landing later today; it would be good to re-test with that once it is available.

Mike Pontillo (mpontillo) wrote :

Also, just curious: which hypervisor you're using, what type of virtual NICs are you using, and do you know how they are connected to the underlying physical network? (I'm probably going to need to set up an equivalent test bed, so it would be good to ensure that it's a close match.)

Matthew Rees (matthew-rees) wrote :

Version: 2.0.0~alpha4+bzr4843-0ubuntu1

KVM hypervisor using VirtIO for NICs with the following network configuration:

ens3 (guest) bridged to em1 (host): untagged vlan for MAAS PXE
ens4 (guest) bridged to em2 (host): untagged vlan for external/Internet access
ens5 (guest) bridged to br0 (host): Bridge with a host bond as a member, no untagged access, tagged VLANs 10-16
ens5.10 (guest) bridged to br0.10 (host): tagged VLAN 10
ens5.11 (guest) bridged to br0.11 (host): tagged VLAN 11
ens5.12 (guest) bridged to br0.12 (host): tagged VLAN 12
ens5.13 (guest) bridged to br0.13 (host): tagged VLAN 13
ens5.14 (guest) bridged to br0.14 (host): tagged VLAN 14
ens5.15 (guest) bridged to br0.15 (host): tagged VLAN 15
ens5.16 (guest) bridged to br0.16 (host): tagged VLAN 16

These sysctl values are also enforced on the host:

net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0
net.bridge.bridge-nf-filter-pppoe-tagged = 0
net.bridge.bridge-nf-filter-vlan-tagged = 0

This host setup might seem a little strange or over the top, but what it enables is a guest configuration that resembles something we can expect to see in production, ie:

untagged interface for MAAS PXE
untagged interface for external/Internet access (the specifics of this interface are largely irrelevant though)
tagged interfaces for OpenStack MAAS/Juju spaces

It bears reiterating that the first/primary MAAS server *does* recognise all the correct interfaces. It is only the seconday MAAS server that seems not to, and they have the same config and the same physical host with the same VM configuration.
Let me know if you need

Mike Pontillo (mpontillo) wrote :

After scratching my head on this for quite awhile, I finally figured it out: when VLAN interfaces are created, we neglected to link them to the controller. So their uniqueness was determined by one less attribute. (Which perfectly explains the behavior you were seeing!)

I'm working on a branch that fixes this, but I don't want to rush the fix in because we've uncovered a lot more problems with that code.

Matthew Rees (matthew-rees) wrote :

That's great news, thanks Mike. I look forward to testing a (possible) fix!

Mike Pontillo (mpontillo) wrote :

Sounds good. I changed my mind and decided to try to land the minor fix for this specific issue first, since the other issues with this code are more difficult to solve. Looking forward to your additional testing!

Matthew Rees (matthew-rees) wrote :

Thanks Mike, I've just tested with latest daily Xenial with MAAS 2.0.0~beta1+bzr4873-0ubuntu1 + the change and it is looking good.

The interfaces are recognised and the VLANs are showing that they are served by both MAAS servers (not to mention I've noticed a few other bugs fixed by the beta1 release).

Thanks again!

Mike Pontillo (mpontillo) wrote :

One note about the fix for this issue: the previous behavior created interfaces which were not linked to a node. You may need to either do a fresh install, or remove them by hand, such as:

$ sudo maas-region shell
>>> from maasserver.models import Interface
>>> for iface in Interface.objects.filter(node__isnull=True): iface.delete()

Changed in maas:
status: Triaged → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
Thiago Martins (martinx) wrote :

Just quick note... MaaS does now deal with "vlanXXX" interfaces, neither with eno1.XXXX style.

I can not run PXE on top of a tagged network.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers