[2.2] DHCP ntp-server setting can be misconfigured with an IP of a different fabric/vlan

Bug #1695083 reported by Andres Rodriguez on 2017-06-01
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Critical
Mike Pontillo
2.2
Critical
Mike Pontillo

Bug Description

The MAAS rack discovered a new interface bridge (lxdbr0). After the discovery of such bridge, MAAS created a new Fabric/VLAN/Subnet (e.g. fabric-5/untagged/10.10.10.0/24). After the discovery,

That said, I was providing DHCP from fabric-3/untagged which has two subnets 10.90.90.0/24 and 192.168.100.0/24.

After restarting the rack controller, the "ntp-servers" configuration in DHCP was updated pointing to the IP address of the lxdbr0. You can see this as:

shared-network vlan-5002 {
    subnet 10.90.90.0 netmask 255.255.255.0 {

           ignore-client-uids true;
           next-server 10.90.90.1;
           option subnet-mask 255.255.255.0;
           option broadcast-address 10.90.90.255;
           option domain-name-servers 10.90.90.1;
           option domain-name "maas";
           option routers 10.90.90.254;
           option ntp-servers 10.10.10.1;

    }
    subnet 192.168.100.0 netmask 255.255.255.0 {

           ignore-client-uids true;
           next-server 192.168.100.5;
           option subnet-mask 255.255.255.0;
           option broadcast-address 192.168.100.255;
           option domain-name-servers 10.90.90.1;
           option domain-name "maas";
           option ntp-servers 10.10.10.1;
    }
}

To revert it back, two things were tried:

Removing 'lxdbr0'

1. Remove 'lxdbr0'
2. restart maas-rackd
3. ntp-servers option was updated accordingly.

Changing newly created VLAN to a different L2 space

1. Added a new L2 space (test).
2. Moved fabric-5/untagged to 'test' space.
3. restarted maas-rackd.

This updated the configuration of dhcpd to have the correct NTP server.

That said, what seems to be happening:

1. MAAS using the IP of a different subnet (in a different fabric/vlan) for NTP because it is in the same space.
2. Trigger that would have updated the config automatically in the event of a space change.
3. Removing an interface should trigger an update as well.

Related branches

description: updated
Changed in maas:
milestone: none → 2.2.1
importance: Undecided → Critical
status: New → Triaged
milestone: 2.2.1 → 2.3.0
BenLake (me-benlake) wrote :

I think this happened to me as well, but I'm not as clear on the timing of the new fabric coming up and when NTP server IPs on deployment changed. Also, the NTP server IP chosen was not that of the new fabric; so my issue may not be quite the same as the OPs. However, the IP chosen, whenever it was chosen, was non-functional. I have no way to specify which fabric/subnet/address should be used when cascading to deployed nodes.

In any case, I don't really care for most of the behind-the-scenes automagical discovery. Anything that is changing a config should be confirmed (even if it is a yet-to-be config). So having processes discover things is handy, but nothing should take effect until confirmed (my $0.02).

Andres Rodriguez (andreserl) wrote :

Yes, this is the same issue you had this morning. The bug reflects how we reproduced it in our side, but doesn't necessarily mean it is the only way. Effectively it can happen in any way.

summary: - [2.2] NTP misconfigured after the Rack discovered a new 'lxdbr0'
- interface
+ [2.2] DHCP ntp-server setting can be misconfigured with an IP of a
+ different fabric/vlan
Mike Pontillo (mpontillo) wrote :

The fix for this is actually trickier than it looks, because in order to do what I consider to be the "best" fix, we would need to change the RPC schema. That is because NTP servers are specified on a per-rack-controller basis, not a per-shared-subnet basis.

So I think the best compromise for now is, when selecting the NTP server to provide for a rack controller, prefer subnets on VLANs with DHCP enabled. I've proposed a branch that does that[1].

The other bug here is that if spaces are changed, the "best routable NTP server" calculation changes, but the database triggers don't notice the changed spaces and don't recalculate the DHCP configuration. I feel like with the "choose DHCP-enabled VLANs first" change, this is a less likely to occur edge case, and should be handled as a separate bug.

[1]:
https://code.launchpad.net/~mpontillo/maas/ntp-issues--bug-1695083/+merge/325037

Changed in maas:
status: Triaged → Fix Committed
Mike Pontillo (mpontillo) wrote :

By the way, the other thing I determined was that NTP configuration in DHCP has nothing to do with spaces. It simply selects an IP address on the rack controller and uses that for DHCP.

Spaces are used by the internal NTP configuration in order to figure out the appropriate peer NTP servers, not by DHCP. So the trigger on spaces doesn't apply to this bug.

Mike Pontillo (mpontillo) wrote :

Finally, I've filed bug #1695937 to capture a related issue that can occur if the rack manages multiple VLANs and/or subnets, and those subnets are not mutually reachable.

Changed in maas:
assignee: nobody → Mike Pontillo (mpontillo)
Changed in maas:
milestone: 2.3.0 → 2.3.0alpha1
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers