DNS Servers not set as expected

Bug #1881133 reported by Nick Niehoff
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Lee Trager

Bug Description

According to [1] "All machine communication with MAAS is proxied through rack controllers, including... DNS" however this does not appear to be the case. Some example scenarios:

In all scenarios, if DNS servers are specified as part of the subnet those servers override the MAAS provided servers.

Scenario 1:
Region controller in subnet A, 2 rack controllers in subnet B, machine deployed in subnet B. If no DNS servers are specified NOT in the subnet MAAS provides the 2 rack controllers and the region controller as DNS servers.

Scenario 2:
Region controller in subnet A, 2 rack controllers in subnet B, MAAS configured to relay DHCP for subnet C to rack controllers in subnet B, machine deployed in subnet C. MAAS provides the region controller as the DNS server.

I suspect if I had multiple region controllers MAAS would provide each of them as DNS servers in these scenarios. Based on the documentation I would not expect the region controller to be included in the list of provided DNS servers. In fact, in some environments communication is limited to the region controllers and DNS may not be reachable. In the scenario of relayed DHCP this is tricky but solely based on the docs I would expect MAAS to configure the DNS servers as the rack controller(s) in subnet B.

[1] https://maas.io/docs/maas-communcation

Tags: sts

Related branches

Victor Tapia (vtapia)
tags: added: sts
Revision history for this message
Victor Tapia (vtapia) wrote :

I ran a test env that consists of two subnets, 192.168.200.0/24 and 192.168.122.0/24, with regiond being 192.168.122.209 and rackd 192.168.200.1, installed in different machines.

I noticed that depending on the IP assignment method, static or dynamic, the DNS server list changes in the deployed machines:

- dhcpd.conf looks like this for both nodes:

ubuntu@maas-region:~$ sudo cat /var/lib/maas/dhcpd.conf | grep name-s
           option domain-name-servers 192.168.122.209;

ubuntu@maas-rackd:/var/log/maas$ sudo grep name-server /var/lib/maas/dhcpd.conf
           option domain-name-servers 192.168.200.1, 192.168.122.209;

Note that for some reason, the region IP appears as a DNS server in the rack DHCP config.

- If I deploy a machine with a static IP in the rack subnet the preseed is built wrong, as it replaces the rack DNS entry with the region IP. From the curtin cfg:

            nameservers:
                addresses:
                - 192.168.122.209

The issue seems to come from get_maas_facing_server_host() (@src/maasserver/server_address.py), because it returns the IP from rack_controller.url, which points to the region IP, instead of the IP from the rack.

- The difference in the lists seems to come from the fact that dhcp.py extends the server list using get_dns_server_addresses_for_rack(), which effectively retrieves the rack IPs and adds them to the list.

Revision history for this message
Victor Tapia (vtapia) wrote :

I prepared a patch I believe solves the issue by using the region IP only if there are no rack IPs for DNS. My test environment, as defined above, consists of 1 Region+rack and 1 remote rack (+ a bunch of custom servers in the separate rack subnet) and after applying the patch, the results look like this:

1. DHCP

ubuntu@maas-rackd:~$ sudo cat /var/lib/maas/dhcpd.conf | grep name-ser
           option domain-name-servers 192.168.200.1, 1.1.1.1, 8.8.8.8, 8.8.4.4;

ubuntu@maas-region:~$ sudo cat /var/lib/maas/dhcpd.conf | grep name-ser
           option domain-name-servers 192.168.122.209;

2. Static/Preseed, deployed on the rack subnet

      nameservers:
        addresses:
        - 192.168.200.1
        - 1.1.1.1
        - 8.8.8.8
        - 8.8.4.4

Now the results are consistent between dynamic and static assignments, and DNS is relayed via the racks, as the documentation says.

Changed in maas:
assignee: nobody → Lee Trager (ltrager)
Revision history for this message
Lee Trager (ltrager) wrote :

I'm having trouble understanding this bug and the patch for it. Both scenario 1 and 2 seem correct.

Scenario 1:
If no DNS servers are specified to MAAS but MAAS knows the machine will have access to a certain rack controller using that rack controller as the DNS server is correct. The region is listed as the upstream DNS server. This way the machine has DNS even though the subnet the machine is on doesn't have DNS configured. This is required as MAAS uses DNS when deploying.

Scenario 2:
The region controller is being listed here because MAAS doesn't know what DNS server to use besides what is being used as the upstream DNS server.

You can specify what DNS servers are used for a subnet in the UI or API if you don't want the region included.

The issue with the patch is it forces the rack controller to always be used. This will disable HA because it only returns the IP for the current rack controller, not any rack controller running on the subnet. It will also break users specifying their own DNS servers.

Changed in maas:
status: New → Incomplete
Revision history for this message
Nick Niehoff (nniehoff) wrote :

For security reasons deployed machines may only be allowed to communicate with the rack controllers and not the region controllers. According to the documentation ALL traffic is proxied through the rack controllers which seems to account for the security requirement. However in Scenario 2 when the rack controller lives in another subnet from the deployed machine the machine is getting configured with the region controller as the only DNS server. It should be pointing at the rack controller(s) (at least first, with the region as a fallback sure) based on the documentation. If this is not the case I would suggest the documentation be updated to reflect MAAS' intentions for how DNS will be configured in a variety of cases. MAAS should know which rack controller(s) are available based on the fact DHCP was relayed from their subnet, it should be a safe assumption that the deployed machine could at least reach DNS from the same rack controller it received a DHCP lease from. HA could be achieved by having multiple rack controllers. I'll let Victor speak to his patch.

Revision history for this message
Lee Trager (ltrager) wrote :

What I suspect is happening is MAAS doesn't know what networks subnet C has access to. In this case MAAS asks all rack controllers it knows about it try and send the BMC commands to boot the machine. The machine boots, runs DHCP, a rack controller in subnet B responds, the deployment runs, but because MAAS doesn't know what networks subnet C has access to it just uses the region controller.

You can confirm this using the MAAS shell

$ sudo maas-region shell
Python 3.6.9 (default, Apr 18 2020, 01:56:04)
Type "copyright", "credits" or "license" for more information.

IPython 5.5.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.

In [1]: from maasserver.models import Subnet

In [2]: Subnet.objects.get_best_subnet_for_ip('192.168.1.1')

In [3]: Subnet.objects.get_best_subnet_for_ip('10.0.0.100')
Out[3]: <Subnet: 10.0.0.0/24:10.0.0.0/24(vid=0)>

Are rack controllers defined for subnet C?

I suspect that if you configured the machine to use DHCP the DNS from the rack would be used. Another way to fix this would be to define the DNS servers for subnet C to use.

Revision history for this message
Nick Niehoff (nniehoff) wrote :

In my test environment:

>>> Subnet.objects.get_best_subnet_for_ip('10.12.19.7')
<Subnet: Subnet-Test2:10.12.19.0/24(vid=19)>
>>> Subnet.objects.get_best_subnet_for_ip('10.12.15.7')
<Subnet: Subnet-OSMgmt:10.12.15.0/24(vid=15)>

Subnet-Test2:10.12.19.0/24 is what I have referred to in this case as subnet C
Subnet-OSMgmt:10.12.15.0/24 is what I have referred to in this case as subnet B

from the cli "maas admin subnets read" I the same primary and secondary rack controllers for both subnets.

The workaround we found was to explicitly set the DNS servers on subnet C to point to the rack controllers. But I expected it to point to the rack controllers by default.

Lee Trager (ltrager)
Changed in maas:
status: Incomplete → Triaged
Revision history for this message
Lee Trager (ltrager) wrote :

I think I've come up with a patch which should fix this. The code which determined DNS servers was generated by looking at what URLs each controller is configured to use. I've changed this to using any IP address a rack controller has which is on the same VLAN as the machine.

Please try the related branch and let me know if it fixes this.

Changed in maas:
status: Triaged → In Progress
importance: Undecided → High
Revision history for this message
David Negreira (dnegreira) wrote :

I tested this, and I do see that the DNS is set to the IP of the correct rack controller, but the DNS servers configured on the subnet itself are not set.

Revision history for this message
Victor Tapia (vtapia) wrote :

During the initial tests, I noticed that DHCP and Static IPs received different DNS servers on deploy. My original patch tried to unify that configuration and remove the regiond subnet IPs when the deployed machine was set in an external subnet, using its rackd IPs as nameservers.

With that in mind, I noticed an issue with my original patch, so I went with a simpler one[1] that shows the following results in 2.7:

https://pastebin.ubuntu.com/p/ZKx4D3QQcQ/

I have also tested Lee's patch in 2.8, and it seems that only one controller IP is shown per subnet in HA scenarios:

https://pastebin.ubuntu.com/p/td5Tmf2mPY/

I believe that both DHCP and Static IP deployments should provide a consistent set of nameservers consisting of just the subnet IPs and the custom servers set by the user. There might be some value to include the regiond IP in the list as a fallback, but that would introduce a timeout when used in certain scenarios where the server is not reachable.

[1] Patch: https://pastebin.ubuntu.com/p/KgxpTsrRsm/

Revision history for this message
David Negreira (dnegreira) wrote :

Just to clarify a bit my comment #8:
I tested with Lee Patch on latest 2.8.

When I deploy a machine which has auto-assign configured as the way to configure the IP of the interface, it is OK, the netplan configuration on the machine has the correct DNS: the rack controller of the specific subnet + Custom DNS set on that same vlan.

When I deploy a machine which has DHCP configured as the way to configure the IP of the interface it is not OK.

I see configured as DNS the custom DNS that I have set on the subnet plus the rack controller on that same subnet, plus the IP of the other controller which this machine should not have access to and should not be configured.

Revision history for this message
Lee Trager (ltrager) wrote :

I've updated the related branch so the region IP will only be used in dhcpd.conf generation if no other DNS servers are found. I think that should solve this bug along with the changes I made to preseed. Could you please verify?

Revision history for this message
Victor Tapia (vtapia) wrote :

Hi Lee,

Let's confirm with dnegreira or nniehoff, but your patches look good to me, thanks!
For completion's sake, here are the results I get with your latest patches and the test environment I used earlier:

https://pastebin.ubuntu.com/p/7vmNCjHDfD/

Revision history for this message
David Negreira (dnegreira) wrote :

I can confirm that they look good to me as well, thanks for the patches!

Changed in maas:
milestone: none → next
status: In Progress → Fix Committed
Changed in maas:
milestone: next → 2.8.x
Lee Trager (ltrager)
Changed in maas:
milestone: 2.8.x → 2.8.3
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.