default DNS IP for node from unexpected interface (not always first interface or same network)

Bug #1776604 reported by Trent Lloyd on 2018-06-13
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
High
Unassigned
2.4
High
Unassigned

Bug Description

The default IP returned for the DNS name for a node (e.g. maas-node06.maas) is not always the first interface shown in the web interface (or API list) of network interfaces or otherwise defaults to the 'expected' interface.

It is also not a consistent network between nodes that have slightly differing interface configurations, which results in different nodes having default IP addresses on different networks depending on their network configuration order.

In a typical deployment, the first interface in the list will be the one that is entered into DNS as $NODE.$DOMAIN. However when using bridge or bonding interfaces that behavior appears to change unexpectedly.

According to the code in maasserver/models/staticipaddress.py:480#StaticIPAddressManager.get_hostname_ip_mapping it will return the interface that was created first. This likely explains the unexpected behavior, since you may bond or bridge your primary interface however that new bridge/bond will now be created 'last' resulting in a secondary interface without a bond/bridge being assigned the primary IP in DNS.

When creating bridges and bonds through the MAAS web interface (on 2.3.3) the bond or bridge interface usually replaces the primary underlying interface you created it from in the list order shown on the web interface despite being created later.

Example:
before: ens3, ens4, ens5, ens6 (in order)
after: br-bond0(ens3, ens5), ens4, ens6

In this case, the web interface actually shows br-bond0 in the list first, but the DNS entry defaults to ens4.

As previously mentioned get_hostname_ip_mapping is ordered based on which interface is created first. However the interface order shown in the web interface (and possibly used for other reasons) appears based on some other order.

On initial look I thought it may have used maasserver/models/interface.py:244#InterfaceQueriesMixin.all_interfaces_parents_first - this specifically returns interfaces with no parents (e.g. bonds, bridges, or interfaces that are not a member of such) first - however this appears to only be used by the curtin yaml generator. However the interface order I see returned appears to use a similar method - looking at the websocket update for the node I see this order:
1331: bond0
1332: br-bond0
913: ens3
914: ens4
915: ens5
916: ens6

I stopped digging into the code there but I am sure someone more familiar with the code would quickly pinpoint the actual location of the code used to sort this list and why the order is as it is.

 = User Story / Use Case =

The most common situation this is seen in right now, is where the primary interface has some combination of a bond or bridge configured against it. This results in a secondary (likely internal) interface receiving the default IP.

Secondly to this, some nodes (e.g. a nova-compute openstack node or LXD host) may require a bridge on the primary interface but other nodes may not. Resulting in an inconsistent subnet being used as the default for different nodes.

This is standard in many Ubuntu OpenStack deployments right now.

 = Suggested Fix =

These suggestions may need to be split into multiple bugs depending on how the issue is solved. I will await triage before splitting it into 2 bugs as it may be chosen to simply implement a better method instead of 'fixing' the current order to be consistent.

 - At a minimum, the order of the list shown in the web interface should be consistent with the order used to select the default DNS IP. Beware, however, since this appears to be dynamically generated right now, changing the current order may unexpectedly change DNS in existing environments.

 - Ideally the default IP would be chosen more deterministically between multiple nodes with slightly different network configurations (for example it is common that some nodes need a bridge for the primary interface and others do not). A possible alternative would be to select them in vlan/fabric order as that would then be consistent between multiple nodes, or, to configure a specific vlan/fabric order.

 = Workaround =

It appears you can update the DNS selection this using the following MAAS CLI command.

The syntax/parameters of which appear undocumented and a little unexpected (since the result for ip_addresses is a dictionary) but appears to be the intended usage based on the code.

I do not believe it is currently possible to do this through the web interface (however generally speaking the DNS web interface portion is not currently well developed, that is outside the scope of this bug report)

$ maas NAME dnsresource read # locate the numeric ID of the DNS entry for the host you need to update
$ maas NAME dnsresource update ID ip_addresses=[IP]

Trent Lloyd (lathiat) on 2018-06-13
tags: added: sts
Changed in maas:
status: New → Confirmed
Changed in maas:
status: Confirmed → New
Changed in maas:
status: New → Incomplete
Andres Rodriguez (andreserl) wrote :

Trent,

IIRC, the DNS entry defaults to the PXE interface. If the PXE interface is part of a bond, then the default DNS entry is created against the bond, and same for the bridge. I have tested this with:

a. ens1 - non-PXE iface
b. ens2 - non-PXE iface
c. ens3 - PXE iface

1. I deployed the machine, I confirm that default DNS entry was given to (c) above.
2. I created a bond with both ens1/ens2 (bond0), and the default DNS entry was given to ens3.
3. I then create a bond between ens1/ens3 (bond0), and the default DNS entry was created to the bond0.
4. I then created a bridge against ens3 (br0), and the default DNS entry was created against br0 (which is the PXE interface).
5. I then created a bridge against ens1 (br0), and the default DNS entry was created against ens3 (which is the PXE interface).

So, from my testing it is working as expected, where the PXE interface is the interface where the default DNS entry is generated.

That said, could you provide extra information and determine whether the DNS entry, in your case, is also being created against the PXE interface and any other information you think would be relevant?

Thanks!

Andres Rodriguez (andreserl) wrote :

FWIW, a couple things worth noting:

1. I did tests both ways, in which ens3 had the largest DB id, as well as the smalls DB id, and in both situations, it would always get the default DNS entry.
2. All interfaces were set to Auto Assign to ensure they get an IP from MAAS and a DNS entry.

On the other hand, I did a couple experiments to highlight what was does, so I did this:

a. ens0 (non-PXE interface) was set to 'Auto-Assign'
b. ens3 (PXE interface) was set to 'DHCP'

The results yield as expected, which were:

1. Even though ens3 is the PXE interface, it was set to DHCP, which meant we don't really know the IP this interface has.
2. On deployment, ens0 would get the default DNS entry, because we know the IP, as it was assigned to MAAS.
3. Once the machine PXE boots of MAAS over ens3, it did so because it got an IP from MAAS' DHCP, as such, the default DNS entry was updated to point at ens3's IP address, and ens0 was rolled back to ens0.<hostname>.<domain>, which is expected behavior.

Trent Lloyd (lathiat) wrote :

One thing you didn't try that I did try (as this was the actual customer scenario) was a bridge of a bond. Maybe that changes the behavior?

I'll try re-test again and post the results, though I just upgraded my MAAS to 2.4 - not sure if there were changes for 2.4. So I'll try reproduce my original test first and then provide more precise instructions on exactly what I did and the results.

Andres Rodriguez (andreserl) wrote :

I did a quick test, and this does seem that this changes it. It seems when the bridge on a bond is created MAAS picks another interface for the default DNS domain, even tough the underlying physical interface for the bond/bridge is indeed the PXE one....

I think I'll need to run a few more tests to be sure, as the last test was with fake ifaces, so the deployment never finished. Either way, I'm going to target this to 2.5 & 2.4.

Changed in maas:
milestone: none → 2.5.0
importance: Undecided → High
tags: added: internal
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers