MAAS

DNS Servers not set as expected

Bug #1881133 reported by Nick Niehoff on 2020-05-28

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	MAAS	Fix Released	High	Lee Trager	MAAS 2.8.3rc1

Bug Description

According to [1] "All machine communication with MAAS is proxied through rack controllers, including... DNS" however this does not appear to be the case. Some example scenarios:

In all scenarios, if DNS servers are specified as part of the subnet those servers override the MAAS provided servers.

Scenario 1:
Region controller in subnet A, 2 rack controllers in subnet B, machine deployed in subnet B. If no DNS servers are specified NOT in the subnet MAAS provides the 2 rack controllers and the region controller as DNS servers.

Scenario 2:
Region controller in subnet A, 2 rack controllers in subnet B, MAAS configured to relay DHCP for subnet C to rack controllers in subnet B, machine deployed in subnet C. MAAS provides the region controller as the DNS server.

I suspect if I had multiple region controllers MAAS would provide each of them as DNS servers in these scenarios. Based on the documentation I would not expect the region controller to be included in the list of provided DNS servers. In fact, in some environments communication is limited to the region controllers and DNS may not be reachable. In the scenario of relayed DHCP this is tricky but solely based on the docs I would expect MAAS to configure the DNS servers as the rack controller(s) in subnet B.

[1] https://maas.io/docs/maas-communcation

Tags:

Related branches

~ltrager/maas:lp1881133_2.8

Merged into maas:2.8

Lee Trager (community): Approve on 2020-08-15

~ltrager/maas:lp1881133

Merged into maas:master

Adam Collard (community): Approve on 2020-08-14

MAAS Lander: Approve on 2020-08-13

~ltrager/maas:lp1881133

Superseded for merging into maas:master

MAAS Lander: Pending requested 2020-07-25

MAAS Maintainers: Pending requested 2020-07-25

Victor Tapia (vtapia) on 2020-05-28

tags:

added: sts

Revision history for this message

Victor Tapia (vtapia) wrote on 2020-06-03:

I ran a test env that consists of two subnets, 192.168.200.0/24 and 192.168.122.0/24, with regiond being 192.168.122.209 and rackd 192.168.200.1, installed in different machines.

I noticed that depending on the IP assignment method, static or dynamic, the DNS server list changes in the deployed machines:

- dhcpd.conf looks like this for both nodes:

ubuntu@maas-region:~$ sudo cat /var/lib/maas/dhcpd.conf | grep name-s
option domain-name-servers 192.168.122.209;

ubuntu@maas-rackd:/var/log/maas$ sudo grep name-server /var/lib/maas/dhcpd.conf
option domain-name-servers 192.168.200.1, 192.168.122.209;

Note that for some reason, the region IP appears as a DNS server in the rack DHCP config.

- If I deploy a machine with a static IP in the rack subnet the preseed is built wrong, as it replaces the rack DNS entry with the region IP. From the curtin cfg:

            nameservers:
                addresses:
                - 192.168.122.209

The issue seems to come from get_maas_facing_server_host() (@src/maasserver/server_address.py), because it returns the IP from rack_controller.url, which points to the region IP, instead of the IP from the rack.

- The difference in the lists seems to come from the fact that dhcp.py extends the server list using get_dns_server_addresses_for_rack(), which effectively retrieves the rack IPs and adds them to the list.

Revision history for this message

Victor Tapia (vtapia) wrote on 2020-06-04:

0001-Use-rack-and-not-region-IP-addresses-for-DNS-when-possible.patch Edit (2.6 KiB, text/plain)

I prepared a patch I believe solves the issue by using the region IP only if there are no rack IPs for DNS. My test environment, as defined above, consists of 1 Region+rack and 1 remote rack (+ a bunch of custom servers in the separate rack subnet) and after applying the patch, the results look like this:

1. DHCP

ubuntu@maas-rackd:~$ sudo cat /var/lib/maas/dhcpd.conf | grep name-ser
option domain-name-servers 192.168.200.1, 1.1.1.1, 8.8.8.8, 8.8.4.4;

ubuntu@maas-region:~$ sudo cat /var/lib/maas/dhcpd.conf | grep name-ser
option domain-name-servers 192.168.122.209;

2. Static/Preseed, deployed on the rack subnet

      nameservers:
        addresses:
        - 192.168.200.1
        - 1.1.1.1
        - 8.8.8.8
        - 8.8.4.4

Now the results are consistent between dynamic and static assignments, and DNS is relayed via the racks, as the documentation says.

Adam Collard (adam-collard) on 2020-06-10

Changed in maas:
assignee:	nobody → Lee Trager (ltrager)

Revision history for this message

Lee Trager (ltrager) wrote on 2020-07-08:

I'm having trouble understanding this bug and the patch for it. Both scenario 1 and 2 seem correct.

Scenario 1:
If no DNS servers are specified to MAAS but MAAS knows the machine will have access to a certain rack controller using that rack controller as the DNS server is correct. The region is listed as the upstream DNS server. This way the machine has DNS even though the subnet the machine is on doesn't have DNS configured. This is required as MAAS uses DNS when deploying.

Scenario 2:
The region controller is being listed here because MAAS doesn't know what DNS server to use besides what is being used as the upstream DNS server.

You can specify what DNS servers are used for a subnet in the UI or API if you don't want the region included.

The issue with the patch is it forces the rack controller to always be used. This will disable HA because it only returns the IP for the current rack controller, not any rack controller running on the subnet. It will also break users specifying their own DNS servers.

Changed in maas:
status:	New → Incomplete

Revision history for this message

Nick Niehoff (nniehoff) wrote on 2020-07-08:

For security reasons deployed machines may only be allowed to communicate with the rack controllers and not the region controllers. According to the documentation ALL traffic is proxied through the rack controllers which seems to account for the security requirement. However in Scenario 2 when the rack controller lives in another subnet from the deployed machine the machine is getting configured with the region controller as the only DNS server. It should be pointing at the rack controller(s) (at least first, with the region as a fallback sure) based on the documentation. If this is not the case I would suggest the documentation be updated to reflect MAAS' intentions for how DNS will be configured in a variety of cases. MAAS should know which rack controller(s) are available based on the fact DHCP was relayed from their subnet, it should be a safe assumption that the deployed machine could at least reach DNS from the same rack controller it received a DHCP lease from. HA could be achieved by having multiple rack controllers. I'll let Victor speak to his patch.

Revision history for this message

Lee Trager (ltrager) wrote on 2020-07-08:

What I suspect is happening is MAAS doesn't know what networks subnet C has access to. In this case MAAS asks all rack controllers it knows about it try and send the BMC commands to boot the machine. The machine boots, runs DHCP, a rack controller in subnet B responds, the deployment runs, but because MAAS doesn't know what networks subnet C has access to it just uses the region controller.

You can confirm this using the MAAS shell

$ sudo maas-region shell
Python 3.6.9 (default, Apr 18 2020, 01:56:04)
Type "copyright", "credits" or "license" for more information.

IPython 5.5.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.

In [1]: from maasserver.models import Subnet

In [2]: Subnet.objects.get_best_subnet_for_ip('192.168.1.1')

In [3]: Subnet.objects.get_best_subnet_for_ip('10.0.0.100')
Out[3]: <Subnet: 10.0.0.0/24:10.0.0.0/24(vid=0)>

Are rack controllers defined for subnet C?

I suspect that if you configured the machine to use DHCP the DNS from the rack would be used. Another way to fix this would be to define the DNS servers for subnet C to use.

Revision history for this message

Nick Niehoff (nniehoff) wrote on 2020-07-09:

In my test environment:

>>> Subnet.objects.get_best_subnet_for_ip('10.12.19.7')
<Subnet: Subnet-Test2:10.12.19.0/24(vid=19)>
>>> Subnet.objects.get_best_subnet_for_ip('10.12.15.7')
<Subnet: Subnet-OSMgmt:10.12.15.0/24(vid=15)>

Subnet-Test2:10.12.19.0/24 is what I have referred to in this case as subnet C
Subnet-OSMgmt:10.12.15.0/24 is what I have referred to in this case as subnet B

from the cli "maas admin subnets read" I the same primary and secondary rack controllers for both subnets.

The workaround we found was to explicitly set the DNS servers on subnet C to point to the rack controllers. But I expected it to point to the rack controllers by default.

Lee Trager (ltrager) on 2020-07-10

Changed in maas:
status:	Incomplete → Triaged

Revision history for this message

Lee Trager (ltrager) wrote on 2020-07-10:

I think I've come up with a patch which should fix this. The code which determined DNS servers was generated by looking at what URLs each controller is configured to use. I've changed this to using any IP address a rack controller has which is on the same VLAN as the machine.

Please try the related branch and let me know if it fixes this.

Changed in maas:
status:	Triaged → In Progress
importance:	Undecided → High

Revision history for this message

David Negreira (dnegreira) wrote on 2020-07-10:

I tested this, and I do see that the DNS is set to the IP of the correct rack controller, but the DNS servers configured on the subnet itself are not set.

Revision history for this message

Victor Tapia (vtapia) wrote on 2020-07-10:

During the initial tests, I noticed that DHCP and Static IPs received different DNS servers on deploy. My original patch tried to unify that configuration and remove the regiond subnet IPs when the deployed machine was set in an external subnet, using its rackd IPs as nameservers.

With that in mind, I noticed an issue with my original patch, so I went with a simpler one[1] that shows the following results in 2.7:

https://pastebin.ubuntu.com/p/ZKx4D3QQcQ/

I have also tested Lee's patch in 2.8, and it seems that only one controller IP is shown per subnet in HA scenarios:

https://pastebin.ubuntu.com/p/td5Tmf2mPY/

I believe that both DHCP and Static IP deployments should provide a consistent set of nameservers consisting of just the subnet IPs and the custom servers set by the user. There might be some value to include the regiond IP in the list as a fallback, but that would introduce a timeout when used in certain scenarios where the server is not reachable.

[1] Patch: https://pastebin.ubuntu.com/p/KgxpTsrRsm/

Revision history for this message

David Negreira (dnegreira) wrote on 2020-07-10:

#10

Just to clarify a bit my comment #8:
I tested with Lee Patch on latest 2.8.

When I deploy a machine which has auto-assign configured as the way to configure the IP of the interface, it is OK, the netplan configuration on the machine has the correct DNS: the rack controller of the specific subnet + Custom DNS set on that same vlan.

When I deploy a machine which has DHCP configured as the way to configure the IP of the interface it is not OK.

I see configured as DNS the custom DNS that I have set on the subnet plus the rack controller on that same subnet, plus the IP of the other controller which this machine should not have access to and should not be configured.

Revision history for this message

Lee Trager (ltrager) wrote on 2020-07-25:

#11

I've updated the related branch so the region IP will only be used in dhcpd.conf generation if no other DNS servers are found. I think that should solve this bug along with the changes I made to preseed. Could you please verify?

Revision history for this message

Victor Tapia (vtapia) wrote on 2020-07-27:

#12

Hi Lee,

Let's confirm with dnegreira or nniehoff, but your patches look good to me, thanks!
For completion's sake, here are the results I get with your latest patches and the test environment I used earlier:

https://pastebin.ubuntu.com/p/7vmNCjHDfD/