[2.x] DNS records are created with wrong/unpredictable subnet IP

Bug #1823183 reported by Florian Guitton
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Medium
Björn Tillenius

Bug Description

Hello everybody,
I hope you day is bright !

When setting up new nodes with a 'complex' set of interfaces there does not seem to be a way to influence which interface gets picked first for DNS records creation.

Say for example that a MAAS setup for domain metal.example.com has a node 'abc' with the following interfaces:
bond0 (enp2s0f0/enp2s0f2) -> VLAN untagged -> Unconfigured
bond0.20 -> VLAN 20 -> Auto assign (172.20.0.0/16)
bond0.30 -> VLAN 30 -> Auto assign (172.30.0.0/16)
bond0.40.br -> Bridge on VLAN 40 -> Auto assign (172.40.0.0/16)
enp4s0f0 -> PXE on VLAN 80 -> Auto assign (10.80.0.0/16)

We then use Juju to commission machine for deployment.
The Juju controller is on VLAN 30 (172.30.0.0/16) and configure to DNS resolve via MAAS.
We use the following command:
> juju add-machine abc.metal.example.com

When the machine gets deployed a set on DNS records get created for this host:
abc.metal.example.com -> A -> 172.30.x.x
bond0.20.abc.metal.example.com -> A -> 172.20.x.x
bond0.30.abc.metal.example.com -> A -> 172.30.x.x
bond0.40.br.abc.metal.example.com -> A -> 172.40.x.x
enp4s0f0.abc.metal.example.com -> A -> 10.80.x.x

In this case the Juju model is trying to connect to abc.metal.example.com and succeed installation and deployment on top VLAN 30.

But we then try to add a second machine 'xyz' to the Juju model:
> juju add-machine xyz.metal.example.com

And this type MAAS deploys the node (configured similarly to the first one) but creates the following DNS records instead:
xyz.metal.example.com -> A -> 10.80.x.x
bond0.20.xyz.metal.example.com -> A -> 172.20.x.x
bond0.30.xyz.metal.example.com -> A -> 172.30.x.x
bond0.40.br.xyz.metal.example.com -> A -> 172.40.x.x
enp4s0f0.xyz.metal.example.com -> A -> 10.80.x.x

Now Juju is trying to reach xyz.metal.example.com and the machine status remains pending as communication is not possible on the resolved IP.

In effect MAAS has created a "node record" not preceded by interface name that settled on a different subnet. This makes it challenging to predict what IP will come back.

There should be a way to tell MAAS to preferably use a particular subnet when creating DNS records for deploying node. Is there such a feature in place I would have missed. Can the API be used to change this behaviour in some way today ?

As always, thank you very much for your time !

Best wishes,

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Florian,

In your use case scenario, which one is the PXE interface in the different machines?

Changed in maas:
status: New → Incomplete
Revision history for this message
Florian Guitton (f-guitton) wrote :

Hello Andres !

In this particular setup the PXE interface is enp4s0f0 (VLAN 80/10.80.x.x).

description: updated
Revision history for this message
Rony Zeidan (ronynov) wrote :

Hi,
i am experincing the same behavior as Florian, the only difference is that i am using static IP to configure the interfaces before deploying them from juju. One thing i noticed is depending on which vlan or interface i configure first, it will be the first record created in the DNS domain, utlimatly the other records get created (bondx.x.host.domain).
But when i ping from host1 to host2 it is always the first record created that is returned by the DNS.
for example when i create my static assignements:
Host1 if i create in the following order:

bond0.400 > 10.150.0.1
bond0.401 > 10.150.1.1
bndd0.402 > 10.150.2.1

the record host1.maas > A > 10.150.0.1 is created
On host2
bond0.402 > 10.150.2.2
bond0.400 > 10.150.0.2
bond0.401 > 10.150.1.2

the record host2.maas > A > 10.150.2.2 is created

So:
ping host2 from host1 it is resolving to 10.150.2.2
ping host1 from host2 it is resolving to 10.150.1.1

once the machine are deployed, the records bondx.x.host1.maas are added. This is causing some issues for our deployment.
Best Regards

Changed in maas:
milestone: none → next
Revision history for this message
Björn Tillenius (bjornt) wrote :

I fixed a bug related to this, which was exactly what you described here, that the xyz.metal.example.com got an undeterministic IP. In fact, it would cycle through the list of possible IPs and point to one of them at a time.

I've changed it so that the xyz.metal.example.com record now points to all the possible IPs. What means that Juju should be smart enough to try to connect to all of them, so it should succeed.

Will that be enough for your use case?

Changed in maas:
status: Incomplete → Fix Committed
importance: Undecided → Medium
assignee: nobody → Björn Tillenius (bjornt)
milestone: next → 2.7.0alpha1
Revision history for this message
Florian Guitton (f-guitton) wrote :

Hello Björn !

This sounds like a wonderful thing ! Thank you very much for your time !
I see this is expected to be out with 2.7.0, I shall give it a try when this gets out.
As soon as it is done I will be able to confirm back here that it works for us.

I would be interested to understand how that could lead to some changes in how MAAS commissions machines too.
Along the same line as the Juju problem, right now upon deploying, MAAS will boot on the PXE interface and then try to connect to the controller using the PXE network as we can see in the logs:

Finalizing /tmp/tmp9g2x_drm/targetfinish: cmd-install/stage-hook/builtin/cmd-hook: SUCCESS: curtin command hook--2019-09-25 09:44:22-- http://10-80-0-0--16.maas-internal:5248/MAAS/metadata/latest/by-id/e3dr7s/Resolving 10-80-0-0--16.maas-internal (10-80-0-0--16.maas-internal)... 10.80.0.54, 10.80.0.55Connecting to 10-80-0-0--16.maas-internal (10-80-0-0--16.maas-internal)|10.80.0.54|:5248... connected.HTTP request sent, awaiting response... 200 OKLength: 2 [text/plain]Saving to: ‘/dev/null’
0K 100% 224K=0s
2019-09-25 09:44:23 (224 KB/s) - ‘/dev/null’ saved [2/2]
curtin: Installation finished.

This is no good because once again it requires the machine to have an interface set on the PXE network et the installation second phase and by proxy at all times. From a security standpoint, that makes every machine ever commissioned reachable to each other via the PXE network, unless some network magic gets done afterwards. Could MAAS not use the new DNS record for the controllers in trying to resolve during deployment ? This would allow a welcomed finer control of the network configuration.

Thank you again for looking at this !
Best wishes,

Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.