No next-server option in dhcpd.conf == problems in VLANs with multiple subnets

Bug #1651680 reported by Marius Žalinauskas
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Expired
Medium
Unassigned

Bug Description

Let's say we have a following VLAN interface on rack controller:

$ ip -4 addr show dev eth1.621
5: eth1.621@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 172.16.0.1/24 brd 172.16.0.255 scope global eth1.621
       valid_lft forever preferred_lft forever
    inet 192.168.0.1/24 brd 192.168.0.255 scope global eth1.621
       valid_lft forever preferred_lft forever

Subnet 192.168.0.0/24 has a reserved dynamic range [192.168.0.2 – 192.168.0.254], subnet 172.16.0.0/24 does not:

$ cat /var/lib/maas/dhcpd.conf
... snip ...
shared-network vlan-5011 {
    subnet 172.16.0.0 netmask 255.255.255.0 {
           ignore-client-uids true;
           option subnet-mask 255.255.255.0;
           option broadcast-address 172.16.0.255;
           option domain-name-servers 192.168.74.100;
           option domain-name "maas";
           option routers 172.16.0.1;
           option ntp-servers 10.0.3.1;

           default-lease-time 600;
           max-lease-time 600;
           #
           # Subnet DHCP snippets
           #
           # No DHCP snippets defined for subnet
    }
    subnet 192.168.0.0 netmask 255.255.255.0 {
           ignore-client-uids true;
           option subnet-mask 255.255.255.0;
           option broadcast-address 192.168.0.255;
           option domain-name-servers 192.168.74.100;
           option domain-name "maas";
           option routers 192.168.0.1;
           option ntp-servers 10.0.3.1;

           default-lease-time 600;
           max-lease-time 600;
           #
           # Subnet DHCP snippets
           #
           # No DHCP snippets defined for subnet
           pool {
              range 192.168.0.2 192.168.0.254;
           }
    }
}
... snip ...

Now the funny part:

1. A server is booted from network for enlisting and asks for IP address
2. 172.16.0.1 responds and offers 192.168.0.2 for an IP address and 172.16.0.1 for a next-server
3. A server accepts and asks for pxelinux.0 from 172.16.0.1
4. *192.168.0.1* responds from source port X
5. A server tries to answer to *172.16.0.1:X* and gets ICMP Destination unreachable (Port unreachable)

(See attached PCAP dump for more details)

I see several problems here:

1. DHCP configuration is built without next-server by default and that makes DHCP service to send a suboptimal next-server option.
2. TFTP service does not respect to what IP it gets requests.
3. A server is stubbornly fixed on next-server and does not take into account where it gets responses from.

I have ended by creating a DHCP snippet on 192.168.0.0/24 subnet, defining next-server as 192.168.0.1, but was really surprised I could not find anyone else having this issue in the bug list. Which makes me feel I am doing something wrong (or unusual).

dpkg -l '*maas*'|cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===============================-==============================-============-=================================================
ii maas 2.1.2+bzr5555-0ubuntu1~16.04.1 all "Metal as a Service" is a physical cloud and IPAM
ii maas-cli 2.1.2+bzr5555-0ubuntu1~16.04.1 all MAAS client and command-line interface
un maas-cluster-controller <none> <none> (no description available)
ii maas-common 2.1.2+bzr5555-0ubuntu1~16.04.1 all MAAS server common files
ii maas-dhcp 2.1.2+bzr5555-0ubuntu1~16.04.1 all MAAS DHCP server
ii maas-dns 2.1.2+bzr5555-0ubuntu1~16.04.1 all MAAS DNS server
ii maas-proxy 2.1.2+bzr5555-0ubuntu1~16.04.1 all MAAS Caching Proxy
ii maas-rack-controller 2.1.2+bzr5555-0ubuntu1~16.04.1 all Rack Controller for MAAS
ii maas-region-api 2.1.2+bzr5555-0ubuntu1~16.04.1 all Region controller API service for MAAS
ii maas-region-controller 2.1.2+bzr5555-0ubuntu1~16.04.1 all Region Controller for MAAS
un maas-region-controller-min <none> <none> (no description available)
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
un python-maas-provisioningserver <none> <none> (no description available)
ii python3-django-maas 2.1.2+bzr5555-0ubuntu1~16.04.1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.1.2+bzr5555-0ubuntu1~16.04.1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.1.2+bzr5555-0ubuntu1~16.04.1 all MAAS server provisioning libraries (Python 3)

Revision history for this message
Marius Žalinauskas (marius-zalinauskas) wrote :
Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

I think this goes into the general bucket of "guessing which IP or interface to offer is hard". We're currently fixing some similar bugs, related to internal bridges on rack or region controllers (lxdbr0, virbr0 etc) but your use case is even more interesting. How would you pseudo-code an appropriate algorithm to select addresses, in your case?

Revision history for this message
Mike Pontillo (mpontillo) wrote :

Off the top of my head, if I'm on the rack controller and I want to pick a sane "own IP" for a particular destination network (or host), I would do a route lookup. So the way I might do it in Python is something like this:

http://paste.ubuntu.com/25015405/

The above script takes advantage of a subtle feature of the socket layer: you can bind() and connect() to a UDP socket without actually sending any traffic, and then use getsockname() on the socket to figure out source interface was selected by the route lookup.

This would probably also work for things like offering up other rack-based addresses, such as web service endpoints, iSCSI, or NTP. (Though it gets harder if I'm on the region controller making a similar decision, since I don't have direct access to that information.)

Anyway, if you save the contents of that pastebin as 'lookup.py', then 'chmod +x lookup.py' and then do:

    ./lookup.py <destination-ip> ...

It should return a result like:

    <destination-ip> via <source-ip>

Marius, if you could run that on your MAAS rack controller and then let me know if that script returns acceptable source IPs for each destination, that would be much appreciated.

Changed in maas:
status: New → Triaged
importance: Undecided → Medium
milestone: none → 2.3.0
Revision history for this message
Marius Žalinauskas (marius-zalinauskas) wrote :

Sorry for delay.

Mike, you script returns an acceptable source IPs on every rack controller I have on hand (even those with more complex setup than in my 1st example).

Every rack controller contains MAAS 2.2.1 (custom patched to use advanced networking and partitioning features on CentOS 6/7, but patch does not interfere with this particular issue in any way).

Revision history for this message
Mike Pontillo (mpontillo) wrote :

For the record, the utility function to determine the appropriate source address [in this case, for the next-server] has landed in MAAS 2.3.[1] So we're in a better position to fix this bug now.

[1]: https://git.launchpad.net/maas/commit/?id=c2aed3017ef73b5af23c72d40c9c0c0fc1cf475f

Changed in maas:
milestone: 2.3.0 → 2.3.x
Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

Is this still an issue in a more recent version of MAAS (3.2 or later)? IP address handling has been changed since this issue was submitted.

Changed in maas:
milestone: 2.3.x → none
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.