[1.9] wrong subnet in DHCP answer when multiple networks are present

Bug #1521618 reported by Zoltan Arnold Nagy on 2015-12-01
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
MAAS
High
Blake Rouse
1.9
High
Trent Lloyd
isc-dhcp (Ubuntu)
Undecided
Unassigned

Bug Description

So I have 3 interfaces with 3, non-overlapping subnets defined in my maas cluster controller.

The idea would be that there is a provisioning network (10.6.0.0/16) to do the actual provisioning and once the node gets deployed it is using a different network (because the provisioning network is only 1x1Gbit while the production network is bonded (LACP) 10Gbit).

However, when I boot up a fresh, new node to add to MAAS, it gets the following DHCP reply:

ip=10.6.239.3:10.6.250.250:9.4.113.254:255.255.255.0

So instead of picking up the /16 subnet correctly for the 10.6.239.3 IP, it picks up the /24 from the network where it gets it's default gateway from.

Is this a bug or my understanding of how MAAS should behave when there are multiple networks flawed?

Here is my /var/lib/maas/dhcpd.conf:

subnet 9.4.113.0 netmask 255.255.255.0 {
       if option arch = 00:0E {
          filename "pxelinux.0";
          option path-prefix "ppc64el/";
       } elsif option arch = 00:07 {
          filename "bootx64.efi";
       } elsif option arch = 00:0B {
          filename "grubaa64.efi";
       } elsif option arch = 00:0C {
          filename "bootppc64.bin";
       } else {
          filename "pxelinux.0";
       }
       interface "eth0";
       ignore-client-uids true;
       option subnet-mask 255.255.255.0;
       option broadcast-address 9.4.113.255;
       option domain-name-servers 9.4.113.251;
       option domain-name "i.zc2.ibm.com";
       option routers 9.4.113.254;
       option ntp-servers ntp.ubuntu.com;
       range dynamic-bootp 9.4.113.150 9.4.113.190;
       class "PXE" {
          match if substring (option vendor-class-identifier, 0, 3) = "PXE";
          default-lease-time 30;
          max-lease-time 30;
       }
}
subnet 10.6.0.0 netmask 255.255.0.0 {
       if option arch = 00:0E {
          filename "pxelinux.0";
          option path-prefix "ppc64el/";
       } elsif option arch = 00:07 {
          filename "bootx64.efi";
       } elsif option arch = 00:0B {
          filename "grubaa64.efi";
       } elsif option arch = 00:0C {
          filename "bootppc64.bin";
       } else {
          filename "pxelinux.0";
       }
       interface "eth1";
       ignore-client-uids true;
       option subnet-mask 255.255.0.0;
       option broadcast-address 10.6.255.255;
       option domain-name-servers 9.4.113.251;
       option domain-name "i.zc2.ibm.com";
       option ntp-servers ntp.ubuntu.com;
       range dynamic-bootp 10.6.239.0 10.6.239.239;
       class "PXE" {
          match if substring (option vendor-class-identifier, 0, 3) = "PXE";
          default-lease-time 30;
          max-lease-time 30;
       }
}

Here is "subnets read":

[
    {
        "dns_servers": [],
        "name": "9.4.113.0/24",
        "space": "space-0",
        "vlan": {
            "name": "untagged",
            "resource_uri": "/MAAS/api/1.0/vlans/0/",
            "fabric": "fabric-0",
            "vid": 0,
            "id": 0
        },
        "gateway_ip": "9.4.113.254",
        "cidr": "9.4.113.0/24",
        "id": 1,
        "resource_uri": "/MAAS/api/1.0/subnets/1/"
    },
    {
        "dns_servers": [],
        "name": "10.7.0.0/16",
        "space": "space-0",
        "vlan": {
            "name": "untagged",
            "resource_uri": "/MAAS/api/1.0/vlans/5001/",
            "fabric": "fabric-1",
            "vid": 0,
            "id": 5001
        },
        "gateway_ip": null,
        "cidr": "10.7.0.0/16",
        "id": 2,
        "resource_uri": "/MAAS/api/1.0/subnets/2/"
    },
    {
        "dns_servers": [],
        "name": "10.6.0.0/16",
        "space": "space-0",
        "vlan": {
            "name": "untagged",
            "resource_uri": "/MAAS/api/1.0/vlans/5002/",
            "fabric": "fabric-2",
            "vid": 0,
            "id": 5002
        },
        "gateway_ip": null,
        "cidr": "10.6.0.0/16",
        "id": 3,
        "resource_uri": "/MAAS/api/1.0/subnets/3/"
    }
]

Running 1.9.0~rc2+bzr4509-0ubuntu1~trusty1.

Related branches

Mike Pontillo (mpontillo) wrote :

Thanks for taking the time to report a bug in MAAS.

Can you give us more information on this line in your bug report:

ip=10.6.239.3:10.6.250.250:9.4.113.254:255.255.255.0

Is this from a packet capture? (if so, on which interface?) I'm trying to figure out what this means. In it you have:

10.6.239.3 <-- seems to be an IP address within "range dynamic-bootp 10.6.239.0 10.6.239.239"
10.6.250.250 <-- seems to be another IP address within your /16
9.4.113.254 <-- router
255.255.255.0 <-- /24

It's interesting that dhcpd would behave in this way; it seems to be picking up the "option routers" from a subnet it didn't select. It looks like this may be a dhcpd bug...

Mike Pontillo (mpontillo) wrote :

Would it be possible for you to capture the DHCP packets (for example, using Wireshark) and attach them to the bug? Thanks!

Changed in maas:
status: New → Incomplete
LaMont Jones (lamont) wrote :

Can we also get the output of the following on the cluster controller: ip addr list; ip route list

thanks

Zoltan Arnold Nagy (zoltan) wrote :

I will be able to do the tcpdumps a bit later; in the meantime, here are the requested outputs:

root@maas:~# ip addr list; ip route list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:bf:29:c4 brd ff:ff:ff:ff:ff:ff
    inet 9.4.113.251/24 brd 9.4.113.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:febf:29c4/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:bf:a4:48 brd ff:ff:ff:ff:ff:ff
    inet 10.6.250.250/16 brd 10.6.255.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:febf:a448/64 scope link
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:bf:f1:71 brd ff:ff:ff:ff:ff:ff
    inet 10.7.250.250/16 brd 10.7.255.255 scope global eth2
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:febf:f171/64 scope link
       valid_lft forever preferred_lft forever
default via 9.4.113.254 dev eth0
9.4.113.0/24 dev eth0 proto kernel scope link src 9.4.113.251
10.6.0.0/16 dev eth1 proto kernel scope link src 10.6.250.250
10.7.0.0/16 dev eth2 proto kernel scope link src 10.7.250.250
root@maas:~#

Zoltan Arnold Nagy (zoltan) wrote :

Attaching the screenshot from the console of the server.

Zoltan Arnold Nagy (zoltan) wrote :

Adding the packet capture. The DHCP reply clearly has the wrong subnet; after that I see it starts transferring the file over tftp so not sure why it gets stuck.

Zoltan Arnold Nagy (zoltan) wrote :

Mike, I think the syntax of that line is the following:

ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:<dns0-ip>:<dns1-ip>

(from the kernel's documentation)

Mike Pontillo (mpontillo) wrote :

Ah, thanks for all the information. (I didn't realize you were pasting the kernel parameter.)

After researching this problem, I think this definitely looks like a bug in isc-dhcp. (The other possibility is that MAAS is configuring dhcpd incorrectly in this situation, but so far it looks like our configuration is correct, but dhcpd is interpreting it incorrectly.)

I noticed that on Trusty we are using 4.2.4-7ubuntu12.3, while on Xenial we are using 4.3.1-5ubuntu4. 4.3.1-5ubuntu4. The latest "Extended support" version from ISC seems to be 4.3.3.[1]

To move forward, we'll need to further triage this to see if the bug occurs on other versions of dhcpd.

Meanwhile, I think you should be able to work around this issue by changing your hosts' network configuration after commissioning. You can configure MAAS to disable the boot interface upon deployment, so that your provisioning network will only be used for the initial PXE boot.

[1]: https://www.isc.org/downloads/

Changed in maas:
status: Incomplete → Triaged
importance: Undecided → Medium
Mike Pontillo (mpontillo) wrote :

As an aside, Zoltan, I'm curious what symptoms you see due to this issue. Since the gateway is off-link, it should not be reachable from the provisioning network. What errors occur in your environment due to this bug?

LaMont Jones (lamont) wrote :

I'm having trouble reproducing this...

What version of isc-dhcp-server is running here?

Does it change anything if you (manually) reorder the 2 subnets in dhcpd.conf and restart the dhcp server?

Mike Pontillo (mpontillo) wrote :

Ah, disregard my previous question. When I re-read this bug I missed the part where you said "So instead of picking up the /16 subnet correctly for the 10.6.239.3 IP, it picks up the /24 from the network where it gets it's default gateway from."

Changed in isc-dhcp (Ubuntu):
status: New → Confirmed
Zoltan Arnold Nagy (zoltan) wrote :

In the meantime I had to change the environment to get on with the project so I don't have this setup anymore, but I could try to reproduce it on a VMware environment later this week if you guys couldn't.

This is on trusty running isc-dhcp-server 4.2.4-7ubuntu12.3, with maas-dhcp 1.9.0~rc2+bzr4509-0ubuntu1~trusty1

Zoltan Arnold Nagy (zoltan) wrote :

Mike, the symptom was that the PXE boot just hang as can bee seen from the packet capture. No idea why actually, but the first thing fishy was the wrong DHCP reply :)

Changed in maas:
importance: Medium → Critical
milestone: none → 2.0.0
Blake Rouse (blake-rouse) wrote :

Okay so what needs to be done to fix this issue is to order the subnets in the generated configuration where the ones without gateways come first and the ones with gateways follow. This causes isc-dhcp to work correctly. This might be an issue with shared-network because the order of the subnets can only go so far before the order of shared-network breaks it. That is to say if there are 2 shared networks each with 2 subnets and one of each have a gateway defined then one of the 4 will have an issue as the order in that case cannot be enforced.

Note for next-server:
You also need to set next-server on each subnet to make sure that it can communicate back to the rack controller on that subnet. If not the PXE client will select the IP from where it recieved the DHCP response which can be a different subnet.

I think it might be best to say you can only provide DHCP to subnets that a rack controller has an IP address in. Without it the machines will fail to PXE boot. Or we don't enforce it which is legal and might be a network configuration that the administrator wants, but might be surprising to other users of MAAS.

Blake Rouse (blake-rouse) wrote :

Okay I have played around with this even more and its just not going to work without fixing isc-dhcp itself. Here is an example configuration, where the gateway is set in one shared-network and not in the other. When the machine boots it gets the gateway from the other shared-network.

http://paste.ubuntu.com/15273844/

Now if you re-order the configuration and place the one without the gateway first it works as expected. But this will not work as the ordering can never correct as described in my previous comment.

Paolo de Rosa (paolo-de-rosa) wrote :

We hit this bug also with DT and we noticed that moving conditional options (those below) in the global section solve the problem, so we worked around changing dhcp.conf template.

if option arch = 00:0E {
         filename "pxelinux.0";
          option path-prefix "ppc64el/";
       } elsif option arch = 00:07 {
          filename "bootx64.efi";
       } elsif option arch = 00:0B {
          filename "grubaa64.efi";
       } elsif option arch = 00:0C {
          filename "bootppc64.bin";
       } else {
          filename "pxelinux.0";
}
class "PXE" {
       match if substring (option vendor-class-identifier, 0, 3) = "PXE";
        default-lease-time 30;
        max-lease-time 30;
}

Trent Lloyd (lathiat) wrote :

Full patch and explanation here:
https://code.launchpad.net/~lathiat/maas/1.9-lp1521618-dhcp-incorrect-router

With multiple subnets, PXE DHCP clients would receive the router of the first
subnet in the configuration file regardless of which subnet they actually
received a lease from.

This happens because match sections such as "class" (used here to give PXE
clients a smaller lease time) and "host" (not currently used, but is the
example given in the below e-mail) are not actually scoped within the subnet{}
declaration as you would expect. They are actually defined in the global scope
for all subnets, but also inherit options from the subnet{} declaration they
were defined in such as "option routers".

This strange behavior is explained in the following email:
https://lists.isc.org/pipermail/dhcp-users/2011-September/014001.html

We fix this by moving the conditional declarations out into the global scope,
instead of repeating them for each subnet. This way they do not inherit any
options from the subnet scope.

james beedy (jamesbeedy) wrote :

Has a fix for this been implemented?

This still seems to exist in MAAS-2.0 beta6+bzr5017

Changed in maas:
status: Triaged → In Progress
importance: Critical → High
assignee: nobody → Blake Rouse (blake-rouse)
summary: - wrong subnet in DHCP answer when multiple networks are present
+ [1.9] wrong subnet in DHCP answer when multiple networks are present
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers