DHCP Agent should not release DHCP lease when client ID is not set on port

Bug #1806770 reported by Arjun Baindur
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Brian Haley

Bug Description

DHCP agent has a really strict enforcement of client ID, which is part of the DHCP extra options. If a VM advertises a client ID, DHCP agent will automatically release it's lease whenever *any* other port is updated/deleted. This happens even if no client ID is set on the port.

When reload_allocations() is called, DHCP agent parses the current leases file, the hosts file, and gets the list all the ports in the network from DB, computing 3 different sets. The set from leases file (v4_leases) will have some client ID. The set from port DB will have None. As a result the set subtraction does not filter out the entry, and the port's DHCP lease is constantly released, whenever the VM renews its lease and any other port in the network is deleted:

https://github.com/openstack/neutron/blob/stable/pike/neutron/agent/linux/dhcp.py#L850

        v4_leases = set()
        for (k, v) in cur_leases.items():
            # IPv4 leases have a MAC, IPv6 ones do not, so we must ignore
            if netaddr.IPAddress(k).version == constants.IP_VERSION_4:
                # treat '*' as None, see note in _read_leases_file_leases()
                client_id = v['client_id']
                if client_id is '*':
                    client_id = None
                v4_leases.add((k, v['iaid'], client_id))

        new_leases = set()
        for port in self.network.ports:
            client_id = self._get_client_id(port)
            for alloc in port.fixed_ips:
                new_leases.add((alloc.ip_address, port.mac_address, client_id))

        # If an entry is in the leases or host file(s), but doesn't have
        # a fixed IP on a corresponding neutron port, consider it stale.
        entries_to_release = (v4_leases | old_leases) - new_leases
        if not entries_to_release:
            return

It was observed in one example of a released lease, its entries looked like:

new_leases (from port DB)
(u'10.81.96.186', u'fa:16:3e:eb:a1:13', None)
old_leases (from hosts file)
('10.81.96.186', 'fa:16:3e:eb:a1:13', None)
v4_leases (from leases file - updated by dnsmasq when VM requests)
('10.81.96.186', 'fa:16:3e:eb:a1:13', '01:fa:16:3e:eb:a1:13')

Therefore the entries_to_release did not have that IP, MAC filtered out. The client_id in v4_leases entry was coming from a Windows VM, which faces a bug that prevents it from disabling client ID. entries_to_release in fact had some 50+ entries like that, causing a storm of DHCPRELEASE.

This can cause issues where when the VM later asks to renew its lease when the expiry period is coming up (I think about halfway thru), dnsmasq sends an DHCP NAK and the lease is re-negotiated and existing networking connections can get disrupted. It also just causes DHCP agent to do unneccessary work, releasing a ton of leases when it technically shouldn't.

Setting the client ID in the port's DHCP extra opts is not an good solution:

1. In some cases, like Windows VMs, the client ID is advertised as the MAC by default. In fact, there is a Windows bug which prevents you from even turning this off: https://support.microsoft.com/en-us/help/3004537/dhcp-client-always-includes-option-61-in-the-dhcp-request-in-windows-8

Linux VMs dont have this on by default, when I checked, but they may be enabled in some templates unknown to users

2. End users will usually just be deploying a VM, with the port being auto created by Nova. They don't know or need to know about advanced networking concepts like DHCP client IDs.

3. We can't expect everyone to modify their existing app templates, or end users to make API calls, to update ports everytime they deploy a VM

So, client ID should only be enforced, and leases released, if it's actually set on the port DB's DHCP extra Opts. In that case it means someone knows what they are doing, and we want to check for a mismatch.

If its None, I suspect in 99.9999% of cases the operator does not know or care about client ID field.

Arjun Baindur (abaindur)
Changed in neutron:
assignee: nobody → Arjun Baindur (abaindur)
Revision history for this message
Arjun Baindur (abaindur) wrote :

Testing out a fix where we parse the entries_to_release set, and filter out entries where a client ID is set, but the same (IP, MAC) combo in the port DB set (new_leases) has a client ID of None.

Arjun Baindur (abaindur)
description: updated
description: updated
Arjun Baindur (abaindur)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/622634

Changed in neutron:
status: New → In Progress
Revision history for this message
Bence Romsics (bence-romsics) wrote :

Thank you for your bug report.

I'm trying to reproduce the error, but without success so far. This is what I'm doing:

Prepared an image with hardcoded root password to be used later via serial console: bionic-server-cloudimg-amd64.img

 openstack image create u1804 --container-format bare --disk-format qcow2 --public --file ~/bionic-server-cloudimg-amd64.img

Created two servers: vm0 is a ubuntu 18.04 so I have a DHCP client that can send a client ID. vm1 is just there so I can delete it to trigger reload_allocations() in the dhcp-agent. I assume port deletion on the same network should trigger reload_allocations(), but I did not verify it yet.

 openstack server create vm0 --flavor ds512M --image u1804 --nic net-id=private --wait
 openstack server create vm1 --flavor cirros256 --image cirros-0.3.5-x86_64-disk --nic net-id=private --wait

Access vm0 over its libvirt serial console:

 sudo virsh console $( openstack server show vm0 -f value -c OS-EXT-SRV-ATTR:instance_name )

Inside vm0 configure client ID for the DHCP client:

 vim /etc/dhcp/dhclient.conf
 add line: send dhcp-client-identifier "CLIENT-FOO";

On the dhcp-agent's host start snooping the dnsmasq interface for DHCP traffic:

 sudo ip netns exec qdhcp-"$( openstack network show private -f value -c id )" ip a # pick the interface name
 sudo ip netns exec qdhcp-"$( openstack network show private -f value -c id )" tcpdump -vvv -s0 -i tap4204ce00-e0 | less

Again inside vm0 release and re-acquire the lease to prove I see the relevant DHCP traffic in tcpdump:

 dhclient -v -r
 dhclient -v ens2 -cf /etc/dhcp/dhclient.conf

I do see the DHCP packets. I see the client ID in the DHCP RELEASE, DISCOVER and REQUEST.

But then when I delete vm1 (openstack server delete vm1) I don't see a single DHCPRELEASE. I was expecting to see the storm you described. What am I missing?

Changed in neutron:
status: In Progress → Incomplete
Revision history for this message
Arjun Baindur (abaindur) wrote :

You wont see the DHCP release packets, because the agent is invoking dhcp_release utility within the namespace. So packet never ingresses or egresses the qdhcp namespace. Its received by dnsmasq. You can confirm this by:

1. Observing /var/log/messages - you will see a dnsmasq log for DHCPRELEASE for both the IP of VM you deleted and VM which has a lease w client ID

2. Checking the dnsmasq leases file for the network (to find its location, ps ax | grep <netuuid> and it will be one of the args to dnsmasq process). Confirm prior to deletion, an entry for vm0 is present and with a client ID. After deleting the other VM, dnsmasq will remove this entry from leases file. You should no longer see anything if you grep for its IP.

3. Add logs of your own to dhcp agent in the _release_unused_leases() function (enabling debug might be sufficient). You can confirm DHCP agent is exec'ing dhcp_release for the untouched VM's IP

Also, you only have 1 dhcp server right? on setup w multiple dhcp per network, only 1 server will actually have the lease.

Changed in neutron:
status: Incomplete → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/623066

Revision history for this message
Arjun Baindur (abaindur) wrote :

Please note there are 2 different fixes above, I just uploaded a slightly different diff which has a new review. Only one should be merged:

https://review.openstack.org/#/c/623066/
https://review.openstack.org/#/c/622634/

Revision history for this message
Bence Romsics (bence-romsics) wrote :

Arjun, thank you for the extra points, I was able to quickly verify points (1) and (2). Now that the bug was reproduced and since you already provided a nice analysis I'm setting this to triaged and medium importance because IIUC we may have short loss of network connection because of problem.

Changed in neutron:
status: In Progress → Triaged
importance: Undecided → Medium
Changed in neutron:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Arjun Baindur (<email address hidden>) on branch: master
Review: https://review.openstack.org/622634
Reason: Going to go with the other diff, since its slightly better. It checks the other sets after the leases have been diff'd, rather than every lease with a client ID. So it will potentially process less elements.

https://review.openstack.org/#/c/623066/

Changed in neutron:
assignee: Arjun Baindur (abaindur) → Brian Haley (brian-haley)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/623066
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f2111e035424bf714099966ad724e9a4bd604c18
Submitter: Zuul
Branch: master

commit f2111e035424bf714099966ad724e9a4bd604c18
Author: Arjun Baindur <email address hidden>
Date: Wed Dec 5 12:43:05 2018 -0800

    Do not release DHCP lease when no client ID is set on port

    The DHCP agent has a really strict enforcement of client ID, which
    is part of the DHCP extra options. If a VM advertises a client ID,
    DHCP agent will automatically release it's lease whenever *any* other
    port is updated/deleted, even if no client ID is set on the port,
    because it thinks the client ID has changed.

    When reload_allocations() is called, the DHCP agent parses the leases
    and hosts files, and gets the list of all the ports in the network from the
    DB, computing 3 different sets. The set from the leases file (v4_leases)
    could have a client ID, but the set from the port DB and hosts file will
    have None.

    As a result, the set subtraction does not filter out the entry,
    and all ports that have an active lease with a client ID are released.

    The Client ID should only be enforced and leases released
    if it's actually set in the port DB's DHCP extra Opts.
    In that case it means someone knows what they are doing,
    and we want to check for a mismatch. If the client ID on a port is
    empty, it should not be treated like an unused lease.

    We can't expect end users that just create VMs with auto created ports
    to know/care about DHCP client IDs, then manually update ports or
    change app templates.

    In some cases, like Windows VMs, the client ID is advertised as the MAC by default.
    In fact, there is a Windows bug which prevents you from even turning this off:
    https://support.microsoft.com/en-us/help/3004537/dhcp-client-always-includes-option-61-in-the-dhcp-request-in-windows-8

    Linux VMs don't have this on by default, but it may be enabled
    in some templates unknown to users.

    Change-Id: I8021f740bd78e654915337bd3287b45b2c422e95
    Closes-Bug: #1806770

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/643443

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/643444

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/643445

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.openstack.org/643444
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=54dfbd94a6b753af2d0b5babe8d927c1fb901ab5
Submitter: Zuul
Branch: stable/queens

commit 54dfbd94a6b753af2d0b5babe8d927c1fb901ab5
Author: Arjun Baindur <email address hidden>
Date: Wed Dec 5 12:43:05 2018 -0800

    Do not release DHCP lease when no client ID is set on port

    The DHCP agent has a really strict enforcement of client ID, which
    is part of the DHCP extra options. If a VM advertises a client ID,
    DHCP agent will automatically release it's lease whenever *any* other
    port is updated/deleted, even if no client ID is set on the port,
    because it thinks the client ID has changed.

    When reload_allocations() is called, the DHCP agent parses the leases
    and hosts files, and gets the list of all the ports in the network from the
    DB, computing 3 different sets. The set from the leases file (v4_leases)
    could have a client ID, but the set from the port DB and hosts file will
    have None.

    As a result, the set subtraction does not filter out the entry,
    and all ports that have an active lease with a client ID are released.

    The Client ID should only be enforced and leases released
    if it's actually set in the port DB's DHCP extra Opts.
    In that case it means someone knows what they are doing,
    and we want to check for a mismatch. If the client ID on a port is
    empty, it should not be treated like an unused lease.

    We can't expect end users that just create VMs with auto created ports
    to know/care about DHCP client IDs, then manually update ports or
    change app templates.

    In some cases, like Windows VMs, the client ID is advertised as the MAC by default.
    In fact, there is a Windows bug which prevents you from even turning this off:
    https://support.microsoft.com/en-us/help/3004537/dhcp-client-always-includes-option-61-in-the-dhcp-request-in-windows-8

    Linux VMs don't have this on by default, but it may be enabled
    in some templates unknown to users.

    Change-Id: I8021f740bd78e654915337bd3287b45b2c422e95
    Closes-Bug: #1806770
    (cherry picked from commit f2111e035424bf714099966ad724e9a4bd604c18)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.openstack.org/643443
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ce037876a7598bd270e26bca96955c38036b2206
Submitter: Zuul
Branch: stable/rocky

commit ce037876a7598bd270e26bca96955c38036b2206
Author: Arjun Baindur <email address hidden>
Date: Wed Dec 5 12:43:05 2018 -0800

    Do not release DHCP lease when no client ID is set on port

    The DHCP agent has a really strict enforcement of client ID, which
    is part of the DHCP extra options. If a VM advertises a client ID,
    DHCP agent will automatically release it's lease whenever *any* other
    port is updated/deleted, even if no client ID is set on the port,
    because it thinks the client ID has changed.

    When reload_allocations() is called, the DHCP agent parses the leases
    and hosts files, and gets the list of all the ports in the network from the
    DB, computing 3 different sets. The set from the leases file (v4_leases)
    could have a client ID, but the set from the port DB and hosts file will
    have None.

    As a result, the set subtraction does not filter out the entry,
    and all ports that have an active lease with a client ID are released.

    The Client ID should only be enforced and leases released
    if it's actually set in the port DB's DHCP extra Opts.
    In that case it means someone knows what they are doing,
    and we want to check for a mismatch. If the client ID on a port is
    empty, it should not be treated like an unused lease.

    We can't expect end users that just create VMs with auto created ports
    to know/care about DHCP client IDs, then manually update ports or
    change app templates.

    In some cases, like Windows VMs, the client ID is advertised as the MAC by default.
    In fact, there is a Windows bug which prevents you from even turning this off:
    https://support.microsoft.com/en-us/help/3004537/dhcp-client-always-includes-option-61-in-the-dhcp-request-in-windows-8

    Linux VMs don't have this on by default, but it may be enabled
    in some templates unknown to users.

    Change-Id: I8021f740bd78e654915337bd3287b45b2c422e95
    Closes-Bug: #1806770
    (cherry picked from commit f2111e035424bf714099966ad724e9a4bd604c18)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 14.0.0.0rc1

This issue was fixed in the openstack/neutron 14.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/643445
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9961fa068b78ba91393bd886ebbd6390cee24443
Submitter: Zuul
Branch: stable/pike

commit 9961fa068b78ba91393bd886ebbd6390cee24443
Author: Arjun Baindur <email address hidden>
Date: Wed Dec 5 12:43:05 2018 -0800

    Do not release DHCP lease when no client ID is set on port

    The DHCP agent has a really strict enforcement of client ID, which
    is part of the DHCP extra options. If a VM advertises a client ID,
    DHCP agent will automatically release it's lease whenever *any* other
    port is updated/deleted, even if no client ID is set on the port,
    because it thinks the client ID has changed.

    When reload_allocations() is called, the DHCP agent parses the leases
    and hosts files, and gets the list of all the ports in the network from the
    DB, computing 3 different sets. The set from the leases file (v4_leases)
    could have a client ID, but the set from the port DB and hosts file will
    have None.

    As a result, the set subtraction does not filter out the entry,
    and all ports that have an active lease with a client ID are released.

    The Client ID should only be enforced and leases released
    if it's actually set in the port DB's DHCP extra Opts.
    In that case it means someone knows what they are doing,
    and we want to check for a mismatch. If the client ID on a port is
    empty, it should not be treated like an unused lease.

    We can't expect end users that just create VMs with auto created ports
    to know/care about DHCP client IDs, then manually update ports or
    change app templates.

    In some cases, like Windows VMs, the client ID is advertised as the MAC by default.
    In fact, there is a Windows bug which prevents you from even turning this off:
    https://support.microsoft.com/en-us/help/3004537/dhcp-client-always-includes-option-61-in-the-dhcp-request-in-windows-8

    Linux VMs don't have this on by default, but it may be enabled
    in some templates unknown to users.

    Change-Id: I8021f740bd78e654915337bd3287b45b2c422e95
    Closes-Bug: #1806770
    (cherry picked from commit f2111e035424bf714099966ad724e9a4bd604c18)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.7

This issue was fixed in the openstack/neutron 11.0.7 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.3

This issue was fixed in the openstack/neutron 13.0.3 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.6

This issue was fixed in the openstack/neutron 12.0.6 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.