MAAS configures nodes with incorrect DNS server addresses when using multiple IP addresses

Bug #1922891 reported by Stefan Fleischmann
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Undecided
Christian Grabowski

Bug Description

MAAS version 2.9.2-9164-g.ac176b5c4 from snap channel 2.9/stable, but I think this issue has existed for a longer time. We have two subnets configured in MAAS:

 172.17.100.0/23 (eth0, default node network)
 172.17.102.0/24 (eth1, management network for IPMI etc.)

Both are untagged subnets on fabric-0 and MAAS provides DHCP. Because of that the MAAS server has two IP addresses, one on each subnet, let's say 172.17.100.3 and 172.17.102.3. When I deploy a node with static IP address (which is only on the 172.17.100.0/23 subnet) both 172.17.100.3 and 172.17.102.3 end up as nameservers in /etc/netplan/50-cloud-init.yaml.
I recently set up two additional rack controllers with the same setup (one IP on 17.100 and one on 172.102) and that makes things even worse because now there are six nameserver addresses of which three are not reachable
for machines on the 17.100 subnet. Systemd circumvents the problem but on other setups where there is an actual limit of 3 nameservers this might lead to DNS being completely broken.

This is not a problem when DHCP is used. I checked the DHCP response and that only contains the addresses on the 17.100 subnet.

Related branches

Revision history for this message
Stefan Fleischmann (sfleischmann) wrote :

PS: this might look like a duplicate of https://bugs.launchpad.net/maas/+bug/1744454 at first. But note that in my case MAAS has IP addresses on two different subnets, and it blends nameserver information together instead of keeping it separate for each subnet.

I just noticed that this isn't a problem with DHCP, but actually only happens when setting a static IP address in the node config and deploying. Then all six IP addresses end up in /etc/netplan/50-cloud-init.yaml, but via DHCP I only get the ones on 172.17.100 as it should be. So something must go wrong when MAAS composes the config for cloud-init I guess.

description: updated
Changed in maas:
assignee: nobody → Christian Grabowski (cgrabowski)
Changed in maas:
status: New → Triaged
Revision history for this message
Christian Grabowski (cgrabowski) wrote :

The issue here is the boot interface is being selected for the machine's default gateway. MAAS has an order of preferred default gateway interfaces, where the boot interface is preferred over other physical interfaces. There is work being planned to expose selection of a specific interface for the default gateway via the API, but in the meantime, this can be overriden in the UI under the actions of a particular interface on the machine's network section.

Revision history for this message
Stefan Fleischmann (sfleischmann) wrote :

Sorry I don't follow. What does the default gateway have to do with the list of nameservers?

In our case the boot interface and the configured interface are the same, and they both are on 172.17.100.0/23. Still the nameservers from 172.17.102.0/24 end up in the netplan config.

Also under actions of the network interface I only find
 * mark as disconnected
 * add alias or VLAN
 * edit physical
 * remove physical

Revision history for this message
Christian Grabowski (cgrabowski) wrote :

Netplan will configure a list of nameservers per interface, as well as the defaults which end up rendered in /etc/resolv.conf. MAAS configures netplan's default nameservers as the nameservers configured for the default gateway's interface. When the default gateway is not explicitly set, after bond interfaces and bridge interfaces, the boot interface is the preferred default.

A POST to /MAAS/api/2.0/nodes/{system_id}/interfaces/{id}/?op=set_default_gateway, will configure the provided interface's default link's subnet as the default gateway. The associated nameservers will then be the default nameservers.

Revision history for this message
Stefan Fleischmann (sfleischmann) wrote :

I tried this POST, but that didn't make a difference. Did you mean that it will work like that in the future? I still don't get what you said before, that it is possible to set this via the UI. I don't see any function like that in the UI.

In the subnet configuration for the 102 network we even have set
 "Allow DNS resolution: Disallowed"

Yet MAAS configures deployed nodes with these nameservers, how does that make sense?

From what you said it sounds like MAAS composes a list of default nameservers which includes all IP addresses that the MAAS server has on any managed subnet. That assumes that all these subnets are connected, and also ignores the config setting above.

Changed in maas:
milestone: none → 3.0.0-rc1
status: Triaged → Fix Committed
Revision history for this message
Christian Grabowski (cgrabowski) wrote :

Yes, by default, there is a list, and the first interface from that list is used as the default gateway, nameservers associated with that interface will be set in netplan to include in /etc/resolv.conf, you can change which interface is the default gateway, and therefore, which nameservers netplan assigns in /etc/resolv.conf by explicitly setting it on the machine.

Revision history for this message
Stefan Fleischmann (sfleischmann) wrote :

I think now we're getting closer to the real problem here. I also just realized that I provided wrong information in my initial bug report. MAAS accesses both subnets over the _same_ interface, so I should have written

172.17.100.0/23 (eth0, default node network)
172.17.102.0/24 (eth0, management network for IPMI etc.)

So I suppose MAAS cannot keep alias addresses on the same interface apart? Normally I would have set up an alias interface, but Netplan doesn't support that afaik.

Changed in maas:
status: Fix Committed → Fix Released
Revision history for this message
Christian Grabowski (cgrabowski) wrote :

You can link multiple subnets to an interface in MAAS. The POST /MAAS/api/2.0/nodes/{system_id}/interfaces/{id}/?op=link_subnet can be called on the same interface multiple times. Then to address the original issue, if you look at the API docs for POST /MAAS/api/2.0/nodes/{system_id}/interfaces/{id}/?op=set_default_gateway, an interface with multiple subnets must also supply the link ID.

Revision history for this message
Igor Gnip (igorgnip) wrote :

In the diff, I see only test was added/modified. What were the changes to prevent adding unreachable dns server ips ? This happens in multiple situations:

- multiple subnets, server has ip from subnet A but dns servers are equally added from subnets A, B and C if they are on the same VLAN
- additionally, region controller ips are also added as dns servers.
- Sorting order fairly undefined although there is some sorting.
- End result - dice roll which 4 ips get allocated and if in first three of 4 ips is actually reachable (maas rack, correct vlan, correct subnet).

Final cutoff: from list of X nameservers, only maximum of 3 will end up in resolv.conf

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.