default DNS IP for node from unexpected interface (not always first interface or same network)

Bug #1776604 reported by Trent Lloyd
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Mike Pontillo
2.3
Fix Committed
High
Mike Pontillo
2.4
Fix Released
High
Mike Pontillo

Bug Description

The default IP returned for the DNS name for a node (e.g. maas-node06.maas) is not always the first interface shown in the web interface (or API list) of network interfaces or otherwise defaults to the 'expected' interface.

It is also not a consistent network between nodes that have slightly differing interface configurations, which results in different nodes having default IP addresses on different networks depending on their network configuration order.

In a typical deployment, the first interface in the list will be the one that is entered into DNS as $NODE.$DOMAIN. However when using bridge or bonding interfaces that behavior appears to change unexpectedly.

According to the code in maasserver/models/staticipaddress.py:480#StaticIPAddressManager.get_hostname_ip_mapping it will return the interface that was created first. This likely explains the unexpected behavior, since you may bond or bridge your primary interface however that new bridge/bond will now be created 'last' resulting in a secondary interface without a bond/bridge being assigned the primary IP in DNS.

When creating bridges and bonds through the MAAS web interface (on 2.3.3) the bond or bridge interface usually replaces the primary underlying interface you created it from in the list order shown on the web interface despite being created later.

Example:
before: ens3, ens4, ens5, ens6 (in order)
after: br-bond0(ens3, ens5), ens4, ens6

In this case, the web interface actually shows br-bond0 in the list first, but the DNS entry defaults to ens4.

As previously mentioned get_hostname_ip_mapping is ordered based on which interface is created first. However the interface order shown in the web interface (and possibly used for other reasons) appears based on some other order.

On initial look I thought it may have used maasserver/models/interface.py:244#InterfaceQueriesMixin.all_interfaces_parents_first - this specifically returns interfaces with no parents (e.g. bonds, bridges, or interfaces that are not a member of such) first - however this appears to only be used by the curtin yaml generator. However the interface order I see returned appears to use a similar method - looking at the websocket update for the node I see this order:
1331: bond0
1332: br-bond0
913: ens3
914: ens4
915: ens5
916: ens6

I stopped digging into the code there but I am sure someone more familiar with the code would quickly pinpoint the actual location of the code used to sort this list and why the order is as it is.

 = User Story / Use Case =

The most common situation this is seen in right now, is where the primary interface has some combination of a bond or bridge configured against it. This results in a secondary (likely internal) interface receiving the default IP.

Secondly to this, some nodes (e.g. a nova-compute openstack node or LXD host) may require a bridge on the primary interface but other nodes may not. Resulting in an inconsistent subnet being used as the default for different nodes.

This is standard in many Ubuntu OpenStack deployments right now.

 = Suggested Fix =

These suggestions may need to be split into multiple bugs depending on how the issue is solved. I will await triage before splitting it into 2 bugs as it may be chosen to simply implement a better method instead of 'fixing' the current order to be consistent.

 - At a minimum, the order of the list shown in the web interface should be consistent with the order used to select the default DNS IP. Beware, however, since this appears to be dynamically generated right now, changing the current order may unexpectedly change DNS in existing environments.

 - Ideally the default IP would be chosen more deterministically between multiple nodes with slightly different network configurations (for example it is common that some nodes need a bridge for the primary interface and others do not). A possible alternative would be to select them in vlan/fabric order as that would then be consistent between multiple nodes, or, to configure a specific vlan/fabric order.

 = Workaround =

It appears you can update the DNS selection this using the following MAAS CLI command.

The syntax/parameters of which appear undocumented and a little unexpected (since the result for ip_addresses is a dictionary) but appears to be the intended usage based on the code.

I do not believe it is currently possible to do this through the web interface (however generally speaking the DNS web interface portion is not currently well developed, that is outside the scope of this bug report)

$ maas NAME dnsresource read # locate the numeric ID of the DNS entry for the host you need to update
$ maas NAME dnsresource update ID ip_addresses=[IP]

Tags: sts internal

Related branches

Trent Lloyd (lathiat)
tags: added: sts
Changed in maas:
status: New → Confirmed
Changed in maas:
status: Confirmed → New
Changed in maas:
status: New → Incomplete
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Trent,

IIRC, the DNS entry defaults to the PXE interface. If the PXE interface is part of a bond, then the default DNS entry is created against the bond, and same for the bridge. I have tested this with:

a. ens1 - non-PXE iface
b. ens2 - non-PXE iface
c. ens3 - PXE iface

1. I deployed the machine, I confirm that default DNS entry was given to (c) above.
2. I created a bond with both ens1/ens2 (bond0), and the default DNS entry was given to ens3.
3. I then create a bond between ens1/ens3 (bond0), and the default DNS entry was created to the bond0.
4. I then created a bridge against ens3 (br0), and the default DNS entry was created against br0 (which is the PXE interface).
5. I then created a bridge against ens1 (br0), and the default DNS entry was created against ens3 (which is the PXE interface).

So, from my testing it is working as expected, where the PXE interface is the interface where the default DNS entry is generated.

That said, could you provide extra information and determine whether the DNS entry, in your case, is also being created against the PXE interface and any other information you think would be relevant?

Thanks!

Revision history for this message
Andres Rodriguez (andreserl) wrote :

FWIW, a couple things worth noting:

1. I did tests both ways, in which ens3 had the largest DB id, as well as the smalls DB id, and in both situations, it would always get the default DNS entry.
2. All interfaces were set to Auto Assign to ensure they get an IP from MAAS and a DNS entry.

On the other hand, I did a couple experiments to highlight what was does, so I did this:

a. ens0 (non-PXE interface) was set to 'Auto-Assign'
b. ens3 (PXE interface) was set to 'DHCP'

The results yield as expected, which were:

1. Even though ens3 is the PXE interface, it was set to DHCP, which meant we don't really know the IP this interface has.
2. On deployment, ens0 would get the default DNS entry, because we know the IP, as it was assigned to MAAS.
3. Once the machine PXE boots of MAAS over ens3, it did so because it got an IP from MAAS' DHCP, as such, the default DNS entry was updated to point at ens3's IP address, and ens0 was rolled back to ens0.<hostname>.<domain>, which is expected behavior.

Revision history for this message
Trent Lloyd (lathiat) wrote :

One thing you didn't try that I did try (as this was the actual customer scenario) was a bridge of a bond. Maybe that changes the behavior?

I'll try re-test again and post the results, though I just upgraded my MAAS to 2.4 - not sure if there were changes for 2.4. So I'll try reproduce my original test first and then provide more precise instructions on exactly what I did and the results.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

I did a quick test, and this does seem that this changes it. It seems when the bridge on a bond is created MAAS picks another interface for the default DNS domain, even tough the underlying physical interface for the bond/bridge is indeed the PXE one....

I think I'll need to run a few more tests to be sure, as the last test was with fake ifaces, so the deployment never finished. Either way, I'm going to target this to 2.5 & 2.4.

Changed in maas:
milestone: none → 2.5.0
importance: Undecided → High
tags: added: internal
Revision history for this message
Trent Lloyd (lathiat) wrote :

I have a customer using 2.3 who needs to update the DNS record for their machine to be the correct (primary) interface in this scenario. Unfortunately in 2.3 it seems you cannot update the existing DNS record like you can in 2.4 (maas dnsresources read does not return any of the machine hostname records, thus you can't update them - and if you create a record alongside it - DNS returns both IPs instead of replacing the auto generated record)

Is there some other way we can work around this issue on 2.3? As the customer is on Xenial, it seems 2.4 is not available for Xenial so it is not feasible to upgrade to 2.4 for them. Thus we need a fix or workaround for 2.3.

Revision history for this message
Andres Rodriguez (andreserl) wrote : Re: [Bug 1776604] Re: default DNS IP for node from unexpected interface (not always first interface or same network)

Hi Trent,

MAAS 2.3 -> 2.4 has not changed anything with regards to DNS. So updating
the record should be completely possible. Could you please provide a list
of commands with examples outputs?

On Mon, Jul 9, 2018 at 11:59 PM, Trent Lloyd <email address hidden>
wrote:

> I have a customer using 2.3 who needs to update the DNS record for their
> machine to be the correct (primary) interface in this scenario.
> Unfortunately in 2.3 it seems you cannot update the existing DNS record
> like you can in 2.4 (maas dnsresources read does not return any of the
> machine hostname records, thus you can't update them - and if you create
> a record alongside it - DNS returns both IPs instead of replacing the
> auto generated record)
>
> Is there some other way we can work around this issue on 2.3? As the
> customer is on Xenial, it seems 2.4 is not available for Xenial so it is
> not feasible to upgrade to 2.4 for them. Thus we need a fix or
> workaround for 2.3.
>
> --
> You received this bug notification because you are subscribed to MAAS.
> https://bugs.launchpad.net/bugs/1776604
>
> Title:
> default DNS IP for node from unexpected interface (not always first
> interface or same network)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1776604/+subscriptions
>
> Launchpad-Notification-Type: bug
> Launchpad-Bug: product=maas; milestone=2.5.0; status=Incomplete;
> importance=High; assignee=None;
> Launchpad-Bug: product=maas; productseries=2.4; milestone=2.4.1;
> status=New; importance=High; assignee=None;
> Launchpad-Bug-Tags: internal sts
> Launchpad-Bug-Information-Type: Public
> Launchpad-Bug-Private: no
> Launchpad-Bug-Security-Vulnerability: no
> Launchpad-Bug-Commenters: andreserl lathiat
> Launchpad-Bug-Reporter: Trent Lloyd (lathiat)
> Launchpad-Bug-Modifier: Trent Lloyd (lathiat)
> Launchpad-Message-Rationale: Subscriber (MAAS)
> Launchpad-Message-For: andreserl
>

--
Andres Rodriguez (RoAkSoAx)
Ubuntu Server Developer
MSc. Telecom & Networking
Systems Engineer

Revision history for this message
Mike Pontillo (mpontillo) wrote :

I think the difference in behavior between a bonded interface and a bridged bond can be explained by the following section of SQL:

    ORDER BY
        node.hostname,
        is_boot DESC,
        family(staticip.ip),
        CASE
            WHEN interface.type = 'bond' AND
                parent.id = node.boot_interface_id THEN 1
            WHEN interface.type = 'physical' AND
                interface.id = node.boot_interface_id THEN 2
            WHEN interface.type = 'bond' THEN 3
            WHEN interface.type = 'physical' THEN 4
            WHEN interface.type = 'vlan' THEN 5
            WHEN interface.type = 'alias' THEN 6
            WHEN interface.type = 'unknown' THEN 7
            ELSE 8

This query was originally written before bridges were introduced in MAAS, and does not appear to consider them at all.

It looks like we might require another join here to check for the parent-of-a-parent, since we won't know if the bridge's parent-of-a-parent (the bond) is the boot interface. The query currently just checks direct parents.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

@lathiat, are you able to test the patch in this merge proposal to see if it fixes the issue for you?

https://code.launchpad.net/~mpontillo/maas/+git/maas/+merge/349384

I think this is the fix, and it's passing our tests, but I feel that it's risky to change this code, so I'd prefer someone else to verify the patch, too.

Changed in maas:
status: Incomplete → Triaged
assignee: nobody → Mike Pontillo (mpontillo)
Changed in maas:
status: Triaged → Fix Committed
Revision history for this message
Hua Zhang (zhhuabj) wrote :

@Trent, @Andres,

I did not reproduce the problem by using maas=2.3.2-6485-ge93e044-0ubuntu1~16.04.1, my test result [1] shows PXE IP 192.168.100.205 is still in the bond/bridge br-bond0, can you share your detail steps to let me know what am I missing? thanks.

[1] https://imgur.com/a/Mp4ilC1

Revision history for this message
Trent Lloyd (lathiat) wrote :

Joshua: did you look up the DNS name? I don't see the results of that in your screenshot.
The problem is purely the DNS name, e.g. run "ping compute.maas"

Revision history for this message
Hua Zhang (zhhuabj) wrote :

@Trent,

My test result shows DNS for bond/bridge is also normal, pls refer the following output, thanks.

ubuntu@maas:~$ ping -c 1 compute.maas
PING compute.maas (192.168.100.205) 56(84) bytes of data.
64 bytes from 192-168-100-205.maas (192.168.100.205): icmp_seq=1 ttl=64 time=0.368 ms

ubuntu@maas:~$ nslookup compute.maas
Server: 192.168.100.3
Address: 192.168.100.3#53
Name: compute.maas
Address: 192.168.100.205

This is the other info for this test - https://pastebin.canonical.com/p/SRrVw84rnn/

Changed in maas:
milestone: 2.5.0 → 2.5.0alpha1
Revision history for this message
Hua Zhang (zhhuabj) wrote :

I have successfully reproduced the problem after restarting maas server, the DNS IP changed from 192.168.100.205 to 10.12.2.3

ubuntu@maas:~$ nslookup compute
Server: 192.168.100.3
Address: 192.168.100.3#53
Name: compute.maas
Address: 10.12.2.3

I will test the fixed patch tomorrow, will update the test result soon.

Changed in maas:
status: Fix Committed → Fix Released
Revision history for this message
Hua Zhang (zhhuabj) wrote :

Test the fix patch, it works well - https://imgur.com/a/gRumssB

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Awesome! Thank you for all the testing Hua!

On Tue, Jul 31, 2018 at 8:01 PM Hua Zhang <email address hidden>
wrote:

> Test the fix patch, it works well - https://imgur.com/a/gRumssB
>
> --
> You received this bug notification because you are subscribed to MAAS.
> https://bugs.launchpad.net/bugs/1776604
>
> Title:
> default DNS IP for node from unexpected interface (not always first
> interface or same network)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1776604/+subscriptions
>
> Launchpad-Notification-Type: bug
> Launchpad-Bug: product=maas; milestone=2.5.0alpha1; status=Fix Released;
> importance=High; <email address hidden>;
> Launchpad-Bug: product=maas; productseries=2.3; milestone=2.3.x;
> status=Fix Committed; importance=High; assignee=
> <email address hidden>;
> Launchpad-Bug: product=maas; productseries=2.4; milestone=2.4.1;
> status=Fix Committed; importance=High; assignee=
> <email address hidden>;
> Launchpad-Bug-Tags: internal sts
> Launchpad-Bug-Information-Type: Public
> Launchpad-Bug-Private: no
> Launchpad-Bug-Security-Vulnerability: no
> Launchpad-Bug-Commenters: andreserl lathiat mpontillo zhhuabj
> Launchpad-Bug-Reporter: Trent Lloyd (lathiat)
> Launchpad-Bug-Modifier: Hua Zhang (zhhuabj)
> Launchpad-Message-Rationale: Subscriber (MAAS)
> Launchpad-Message-For: andreserl
>
--
Andres Rodriguez (RoAkSoAx)
Ubuntu Server Developer
MSc. Telecom & Networking
Systems Engineer

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Looks like this has been released to 2.3 (i see the patch in 2.3.5-6511-gf466fdb-0ubuntu1)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.