No next-server option in dhcpd.conf == problems in VLANs with multiple subnets
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Expired
|
Medium
|
Unassigned |
Bug Description
Let's say we have a following VLAN interface on rack controller:
$ ip -4 addr show dev eth1.621
5: eth1.621@eth1: <BROADCAST,
inet 172.16.0.1/24 brd 172.16.0.255 scope global eth1.621
valid_lft forever preferred_lft forever
inet 192.168.0.1/24 brd 192.168.0.255 scope global eth1.621
valid_lft forever preferred_lft forever
Subnet 192.168.0.0/24 has a reserved dynamic range [192.168.0.2 – 192.168.0.254], subnet 172.16.0.0/24 does not:
$ cat /var/lib/
... snip ...
shared-network vlan-5011 {
subnet 172.16.0.0 netmask 255.255.255.0 {
option subnet-mask 255.255.255.0;
option broadcast-address 172.16.0.255;
option domain-name-servers 192.168.74.100;
option domain-name "maas";
option routers 172.16.0.1;
option ntp-servers 10.0.3.1;
#
# Subnet DHCP snippets
#
# No DHCP snippets defined for subnet
}
subnet 192.168.0.0 netmask 255.255.255.0 {
option subnet-mask 255.255.255.0;
option broadcast-address 192.168.0.255;
option domain-name-servers 192.168.74.100;
option domain-name "maas";
option routers 192.168.0.1;
option ntp-servers 10.0.3.1;
#
# Subnet DHCP snippets
#
# No DHCP snippets defined for subnet
pool {
range 192.168.0.2 192.168.0.254;
}
}
}
... snip ...
Now the funny part:
1. A server is booted from network for enlisting and asks for IP address
2. 172.16.0.1 responds and offers 192.168.0.2 for an IP address and 172.16.0.1 for a next-server
3. A server accepts and asks for pxelinux.0 from 172.16.0.1
4. *192.168.0.1* responds from source port X
5. A server tries to answer to *172.16.0.1:X* and gets ICMP Destination unreachable (Port unreachable)
(See attached PCAP dump for more details)
I see several problems here:
1. DHCP configuration is built without next-server by default and that makes DHCP service to send a suboptimal next-server option.
2. TFTP service does not respect to what IP it gets requests.
3. A server is stubbornly fixed on next-server and does not take into account where it gets responses from.
I have ended by creating a DHCP snippet on 192.168.0.0/24 subnet, defining next-server as 192.168.0.1, but was really surprised I could not find anyone else having this issue in the bug list. Which makes me feel I am doing something wrong (or unusual).
dpkg -l '*maas*'|cat
Desired=
| Status=
|/ Err?=(none)
||/ Name Version Architecture Description
+++-===
ii maas 2.1.2+bzr5555-
ii maas-cli 2.1.2+bzr5555-
un maas-cluster-
ii maas-common 2.1.2+bzr5555-
ii maas-dhcp 2.1.2+bzr5555-
ii maas-dns 2.1.2+bzr5555-
ii maas-proxy 2.1.2+bzr5555-
ii maas-rack-
ii maas-region-api 2.1.2+bzr5555-
ii maas-region-
un maas-region-
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
un python-
ii python3-django-maas 2.1.2+bzr5555-
ii python3-maas-client 2.1.2+bzr5555-
ii python3-
Changed in maas: | |
milestone: | 2.3.0 → 2.3.x |
I think this goes into the general bucket of "guessing which IP or interface to offer is hard". We're currently fixing some similar bugs, related to internal bridges on rack or region controllers (lxdbr0, virbr0 etc) but your use case is even more interesting. How would you pseudo-code an appropriate algorithm to select addresses, in your case?