Activity log for bug #1806770

Date Who What changed Old value New value Message
2018-12-04 21:50:02 Arjun Baindur bug added bug
2018-12-04 21:50:15 Arjun Baindur neutron: assignee Arjun Baindur (abaindur)
2018-12-04 22:03:43 Arjun Baindur description DHCP agent has a really strict enforcement of client ID, which is part of the DHCP extra options. If a VM advertises a client ID, DHCP agent will automatically release it's lease whenever *any* other port is updated/deleted. This happens even if no client ID is set on the port. When reload_allocations() is called, DHCP agent parses the current leases file, the hosts file, and gets the list all the ports in the network from DB, computing 3 different sets. The set from leases file (v4_leases) will have some client ID. The set from port DB will have None. As a result the set subtraction does not filter out the entry, and the port's DHCP lease is constantly released, whenever the VM renews its lease and any other port in the network is deleted: https://github.com/openstack/neutron/blob/stable/pike/neutron/agent/linux/dhcp.py#L850 v4_leases = set() for (k, v) in cur_leases.items(): # IPv4 leases have a MAC, IPv6 ones do not, so we must ignore if netaddr.IPAddress(k).version == constants.IP_VERSION_4: # treat '*' as None, see note in _read_leases_file_leases() client_id = v['client_id'] if client_id is '*': client_id = None v4_leases.add((k, v['iaid'], client_id)) new_leases = set() for port in self.network.ports: client_id = self._get_client_id(port) for alloc in port.fixed_ips: new_leases.add((alloc.ip_address, port.mac_address, client_id)) # If an entry is in the leases or host file(s), but doesn't have # a fixed IP on a corresponding neutron port, consider it stale. entries_to_release = (v4_leases | old_leases) - new_leases if not entries_to_release: return I believe that this client ID and releasing of lease should only occur if a client id is set in the port's DHCP Extra opts and there is a mismatch. Otherwise, ignore whatever client ID the VM advertises. This can cause issues where when the VM later asks to renew its lease when the expiry period is coming up (I think about halfway thru), dnsmasq sends an DHCP NAK and the lease is re-negotiated and existing networking connections can get disrupted. It also just causes DHCP agent to do unneccessary work, releasing a ton of leases when it technically shouldn't. Setting the client ID in the port's DHCP extra opts is not an good solution: 1. In some cases, like Windows VMs, the client ID is advertised as the MAC by default. In fact, there is a Windows bug which prevents you from even turning this off: https://support.microsoft.com/en-us/help/3004537/dhcp-client-always-includes-option-61-in-the-dhcp-request-in-windows-8 Linux VMs dont have this on by default, when I checked, but they may be enabled in some templates unknown to users 2. End users will usually just be deploying a VM, with the port being auto created by Nova. They don't know or need to know about advanced networking concepts like DHCP client IDs. 3. We can't expect everyone to modify their existing app templates, or end users to make API calls, to update ports everytime they deploy a VM So, client ID should only be enforce, and leases released, if it's actually set on the port. In that case it means someone knows what they are doing, and we want to check for a mismatch. If its None, I suspect in 99.9999% of cases the operator does not know or care about client ID field. DHCP agent has a really strict enforcement of client ID, which is part of the DHCP extra options. If a VM advertises a client ID, DHCP agent will automatically release it's lease whenever *any* other port is updated/deleted. This happens even if no client ID is set on the port. When reload_allocations() is called, DHCP agent parses the current leases file, the hosts file, and gets the list all the ports in the network from DB, computing 3 different sets. The set from leases file (v4_leases) will have some client ID. The set from port DB will have None. As a result the set subtraction does not filter out the entry, and the port's DHCP lease is constantly released, whenever the VM renews its lease and any other port in the network is deleted: https://github.com/openstack/neutron/blob/stable/pike/neutron/agent/linux/dhcp.py#L850         v4_leases = set()         for (k, v) in cur_leases.items():             # IPv4 leases have a MAC, IPv6 ones do not, so we must ignore             if netaddr.IPAddress(k).version == constants.IP_VERSION_4:                 # treat '*' as None, see note in _read_leases_file_leases()                 client_id = v['client_id']                 if client_id is '*':                     client_id = None                 v4_leases.add((k, v['iaid'], client_id))         new_leases = set()         for port in self.network.ports:             client_id = self._get_client_id(port)             for alloc in port.fixed_ips:                 new_leases.add((alloc.ip_address, port.mac_address, client_id))         # If an entry is in the leases or host file(s), but doesn't have         # a fixed IP on a corresponding neutron port, consider it stale.         entries_to_release = (v4_leases | old_leases) - new_leases         if not entries_to_release:             return It was observed in one example of a released lease, its entries looked like: new_leases (u'10.81.96.186', u'fa:16:3e:eb:a1:13', None) old_leases ('10.81.96.186', 'fa:16:3e:eb:a1:13', None) v4_leases ('10.81.96.186', 'fa:16:3e:eb:a1:13', '01:fa:16:3e:eb:a1:13') Therefore the entries_to_release did not have that IP, MAC filtered out. The client_id in v4_leases entry was coming from a Windows VM, which faces a bug that prevents it from disabling client ID. entries_to_release in fact had some 50+ entries like that, causing a storm of DHCPRELEASE. I believe that this client ID and releasing of lease should only occur if a client id is set in the port's DHCP Extra opts and there is a mismatch. Otherwise, ignore whatever client ID the VM advertises. This can cause issues where when the VM later asks to renew its lease when the expiry period is coming up (I think about halfway thru), dnsmasq sends an DHCP NAK and the lease is re-negotiated and existing networking connections can get disrupted. It also just causes DHCP agent to do unneccessary work, releasing a ton of leases when it technically shouldn't. Setting the client ID in the port's DHCP extra opts is not an good solution: 1. In some cases, like Windows VMs, the client ID is advertised as the MAC by default. In fact, there is a Windows bug which prevents you from even turning this off: https://support.microsoft.com/en-us/help/3004537/dhcp-client-always-includes-option-61-in-the-dhcp-request-in-windows-8 Linux VMs dont have this on by default, when I checked, but they may be enabled in some templates unknown to users 2. End users will usually just be deploying a VM, with the port being auto created by Nova. They don't know or need to know about advanced networking concepts like DHCP client IDs. 3. We can't expect everyone to modify their existing app templates, or end users to make API calls, to update ports everytime they deploy a VM So, client ID should only be enforce, and leases released, if it's actually set on the port. In that case it means someone knows what they are doing, and we want to check for a mismatch. If its None, I suspect in 99.9999% of cases the operator does not know or care about client ID field.
2018-12-04 22:05:17 Arjun Baindur description DHCP agent has a really strict enforcement of client ID, which is part of the DHCP extra options. If a VM advertises a client ID, DHCP agent will automatically release it's lease whenever *any* other port is updated/deleted. This happens even if no client ID is set on the port. When reload_allocations() is called, DHCP agent parses the current leases file, the hosts file, and gets the list all the ports in the network from DB, computing 3 different sets. The set from leases file (v4_leases) will have some client ID. The set from port DB will have None. As a result the set subtraction does not filter out the entry, and the port's DHCP lease is constantly released, whenever the VM renews its lease and any other port in the network is deleted: https://github.com/openstack/neutron/blob/stable/pike/neutron/agent/linux/dhcp.py#L850         v4_leases = set()         for (k, v) in cur_leases.items():             # IPv4 leases have a MAC, IPv6 ones do not, so we must ignore             if netaddr.IPAddress(k).version == constants.IP_VERSION_4:                 # treat '*' as None, see note in _read_leases_file_leases()                 client_id = v['client_id']                 if client_id is '*':                     client_id = None                 v4_leases.add((k, v['iaid'], client_id))         new_leases = set()         for port in self.network.ports:             client_id = self._get_client_id(port)             for alloc in port.fixed_ips:                 new_leases.add((alloc.ip_address, port.mac_address, client_id))         # If an entry is in the leases or host file(s), but doesn't have         # a fixed IP on a corresponding neutron port, consider it stale.         entries_to_release = (v4_leases | old_leases) - new_leases         if not entries_to_release:             return It was observed in one example of a released lease, its entries looked like: new_leases (u'10.81.96.186', u'fa:16:3e:eb:a1:13', None) old_leases ('10.81.96.186', 'fa:16:3e:eb:a1:13', None) v4_leases ('10.81.96.186', 'fa:16:3e:eb:a1:13', '01:fa:16:3e:eb:a1:13') Therefore the entries_to_release did not have that IP, MAC filtered out. The client_id in v4_leases entry was coming from a Windows VM, which faces a bug that prevents it from disabling client ID. entries_to_release in fact had some 50+ entries like that, causing a storm of DHCPRELEASE. I believe that this client ID and releasing of lease should only occur if a client id is set in the port's DHCP Extra opts and there is a mismatch. Otherwise, ignore whatever client ID the VM advertises. This can cause issues where when the VM later asks to renew its lease when the expiry period is coming up (I think about halfway thru), dnsmasq sends an DHCP NAK and the lease is re-negotiated and existing networking connections can get disrupted. It also just causes DHCP agent to do unneccessary work, releasing a ton of leases when it technically shouldn't. Setting the client ID in the port's DHCP extra opts is not an good solution: 1. In some cases, like Windows VMs, the client ID is advertised as the MAC by default. In fact, there is a Windows bug which prevents you from even turning this off: https://support.microsoft.com/en-us/help/3004537/dhcp-client-always-includes-option-61-in-the-dhcp-request-in-windows-8 Linux VMs dont have this on by default, when I checked, but they may be enabled in some templates unknown to users 2. End users will usually just be deploying a VM, with the port being auto created by Nova. They don't know or need to know about advanced networking concepts like DHCP client IDs. 3. We can't expect everyone to modify their existing app templates, or end users to make API calls, to update ports everytime they deploy a VM So, client ID should only be enforce, and leases released, if it's actually set on the port. In that case it means someone knows what they are doing, and we want to check for a mismatch. If its None, I suspect in 99.9999% of cases the operator does not know or care about client ID field. DHCP agent has a really strict enforcement of client ID, which is part of the DHCP extra options. If a VM advertises a client ID, DHCP agent will automatically release it's lease whenever *any* other port is updated/deleted. This happens even if no client ID is set on the port. When reload_allocations() is called, DHCP agent parses the current leases file, the hosts file, and gets the list all the ports in the network from DB, computing 3 different sets. The set from leases file (v4_leases) will have some client ID. The set from port DB will have None. As a result the set subtraction does not filter out the entry, and the port's DHCP lease is constantly released, whenever the VM renews its lease and any other port in the network is deleted: https://github.com/openstack/neutron/blob/stable/pike/neutron/agent/linux/dhcp.py#L850         v4_leases = set()         for (k, v) in cur_leases.items():             # IPv4 leases have a MAC, IPv6 ones do not, so we must ignore             if netaddr.IPAddress(k).version == constants.IP_VERSION_4:                 # treat '*' as None, see note in _read_leases_file_leases()                 client_id = v['client_id']                 if client_id is '*':                     client_id = None                 v4_leases.add((k, v['iaid'], client_id))         new_leases = set()         for port in self.network.ports:             client_id = self._get_client_id(port)             for alloc in port.fixed_ips:                 new_leases.add((alloc.ip_address, port.mac_address, client_id))         # If an entry is in the leases or host file(s), but doesn't have         # a fixed IP on a corresponding neutron port, consider it stale.         entries_to_release = (v4_leases | old_leases) - new_leases         if not entries_to_release:             return It was observed in one example of a released lease, its entries looked like: new_leases (u'10.81.96.186', u'fa:16:3e:eb:a1:13', None) old_leases ('10.81.96.186', 'fa:16:3e:eb:a1:13', None) v4_leases ('10.81.96.186', 'fa:16:3e:eb:a1:13', '01:fa:16:3e:eb:a1:13') Therefore the entries_to_release did not have that IP, MAC filtered out. The client_id in v4_leases entry was coming from a Windows VM, which faces a bug that prevents it from disabling client ID. entries_to_release in fact had some 50+ entries like that, causing a storm of DHCPRELEASE. This can cause issues where when the VM later asks to renew its lease when the expiry period is coming up (I think about halfway thru), dnsmasq sends an DHCP NAK and the lease is re-negotiated and existing networking connections can get disrupted. It also just causes DHCP agent to do unneccessary work, releasing a ton of leases when it technically shouldn't. Setting the client ID in the port's DHCP extra opts is not an good solution: 1. In some cases, like Windows VMs, the client ID is advertised as the MAC by default. In fact, there is a Windows bug which prevents you from even turning this off: https://support.microsoft.com/en-us/help/3004537/dhcp-client-always-includes-option-61-in-the-dhcp-request-in-windows-8 Linux VMs dont have this on by default, when I checked, but they may be enabled in some templates unknown to users 2. End users will usually just be deploying a VM, with the port being auto created by Nova. They don't know or need to know about advanced networking concepts like DHCP client IDs. 3. We can't expect everyone to modify their existing app templates, or end users to make API calls, to update ports everytime they deploy a VM So, client ID should only be enforced, and leases released, if it's actually set on the port DB's DHCP extra Opts. In that case it means someone knows what they are doing, and we want to check for a mismatch. If its None, I suspect in 99.9999% of cases the operator does not know or care about client ID field.
2018-12-04 22:11:50 Arjun Baindur description DHCP agent has a really strict enforcement of client ID, which is part of the DHCP extra options. If a VM advertises a client ID, DHCP agent will automatically release it's lease whenever *any* other port is updated/deleted. This happens even if no client ID is set on the port. When reload_allocations() is called, DHCP agent parses the current leases file, the hosts file, and gets the list all the ports in the network from DB, computing 3 different sets. The set from leases file (v4_leases) will have some client ID. The set from port DB will have None. As a result the set subtraction does not filter out the entry, and the port's DHCP lease is constantly released, whenever the VM renews its lease and any other port in the network is deleted: https://github.com/openstack/neutron/blob/stable/pike/neutron/agent/linux/dhcp.py#L850         v4_leases = set()         for (k, v) in cur_leases.items():             # IPv4 leases have a MAC, IPv6 ones do not, so we must ignore             if netaddr.IPAddress(k).version == constants.IP_VERSION_4:                 # treat '*' as None, see note in _read_leases_file_leases()                 client_id = v['client_id']                 if client_id is '*':                     client_id = None                 v4_leases.add((k, v['iaid'], client_id))         new_leases = set()         for port in self.network.ports:             client_id = self._get_client_id(port)             for alloc in port.fixed_ips:                 new_leases.add((alloc.ip_address, port.mac_address, client_id))         # If an entry is in the leases or host file(s), but doesn't have         # a fixed IP on a corresponding neutron port, consider it stale.         entries_to_release = (v4_leases | old_leases) - new_leases         if not entries_to_release:             return It was observed in one example of a released lease, its entries looked like: new_leases (u'10.81.96.186', u'fa:16:3e:eb:a1:13', None) old_leases ('10.81.96.186', 'fa:16:3e:eb:a1:13', None) v4_leases ('10.81.96.186', 'fa:16:3e:eb:a1:13', '01:fa:16:3e:eb:a1:13') Therefore the entries_to_release did not have that IP, MAC filtered out. The client_id in v4_leases entry was coming from a Windows VM, which faces a bug that prevents it from disabling client ID. entries_to_release in fact had some 50+ entries like that, causing a storm of DHCPRELEASE. This can cause issues where when the VM later asks to renew its lease when the expiry period is coming up (I think about halfway thru), dnsmasq sends an DHCP NAK and the lease is re-negotiated and existing networking connections can get disrupted. It also just causes DHCP agent to do unneccessary work, releasing a ton of leases when it technically shouldn't. Setting the client ID in the port's DHCP extra opts is not an good solution: 1. In some cases, like Windows VMs, the client ID is advertised as the MAC by default. In fact, there is a Windows bug which prevents you from even turning this off: https://support.microsoft.com/en-us/help/3004537/dhcp-client-always-includes-option-61-in-the-dhcp-request-in-windows-8 Linux VMs dont have this on by default, when I checked, but they may be enabled in some templates unknown to users 2. End users will usually just be deploying a VM, with the port being auto created by Nova. They don't know or need to know about advanced networking concepts like DHCP client IDs. 3. We can't expect everyone to modify their existing app templates, or end users to make API calls, to update ports everytime they deploy a VM So, client ID should only be enforced, and leases released, if it's actually set on the port DB's DHCP extra Opts. In that case it means someone knows what they are doing, and we want to check for a mismatch. If its None, I suspect in 99.9999% of cases the operator does not know or care about client ID field. DHCP agent has a really strict enforcement of client ID, which is part of the DHCP extra options. If a VM advertises a client ID, DHCP agent will automatically release it's lease whenever *any* other port is updated/deleted. This happens even if no client ID is set on the port. When reload_allocations() is called, DHCP agent parses the current leases file, the hosts file, and gets the list all the ports in the network from DB, computing 3 different sets. The set from leases file (v4_leases) will have some client ID. The set from port DB will have None. As a result the set subtraction does not filter out the entry, and the port's DHCP lease is constantly released, whenever the VM renews its lease and any other port in the network is deleted: https://github.com/openstack/neutron/blob/stable/pike/neutron/agent/linux/dhcp.py#L850         v4_leases = set()         for (k, v) in cur_leases.items():             # IPv4 leases have a MAC, IPv6 ones do not, so we must ignore             if netaddr.IPAddress(k).version == constants.IP_VERSION_4:                 # treat '*' as None, see note in _read_leases_file_leases()                 client_id = v['client_id']                 if client_id is '*':                     client_id = None                 v4_leases.add((k, v['iaid'], client_id))         new_leases = set()         for port in self.network.ports:             client_id = self._get_client_id(port)             for alloc in port.fixed_ips:                 new_leases.add((alloc.ip_address, port.mac_address, client_id))         # If an entry is in the leases or host file(s), but doesn't have         # a fixed IP on a corresponding neutron port, consider it stale.         entries_to_release = (v4_leases | old_leases) - new_leases         if not entries_to_release:             return It was observed in one example of a released lease, its entries looked like: new_leases (from port DB) (u'10.81.96.186', u'fa:16:3e:eb:a1:13', None) old_leases (from hosts file) ('10.81.96.186', 'fa:16:3e:eb:a1:13', None) v4_leases (from leases file - updated by dnsmasq when VM requests) ('10.81.96.186', 'fa:16:3e:eb:a1:13', '01:fa:16:3e:eb:a1:13') Therefore the entries_to_release did not have that IP, MAC filtered out. The client_id in v4_leases entry was coming from a Windows VM, which faces a bug that prevents it from disabling client ID. entries_to_release in fact had some 50+ entries like that, causing a storm of DHCPRELEASE. This can cause issues where when the VM later asks to renew its lease when the expiry period is coming up (I think about halfway thru), dnsmasq sends an DHCP NAK and the lease is re-negotiated and existing networking connections can get disrupted. It also just causes DHCP agent to do unneccessary work, releasing a ton of leases when it technically shouldn't. Setting the client ID in the port's DHCP extra opts is not an good solution: 1. In some cases, like Windows VMs, the client ID is advertised as the MAC by default. In fact, there is a Windows bug which prevents you from even turning this off: https://support.microsoft.com/en-us/help/3004537/dhcp-client-always-includes-option-61-in-the-dhcp-request-in-windows-8 Linux VMs dont have this on by default, when I checked, but they may be enabled in some templates unknown to users 2. End users will usually just be deploying a VM, with the port being auto created by Nova. They don't know or need to know about advanced networking concepts like DHCP client IDs. 3. We can't expect everyone to modify their existing app templates, or end users to make API calls, to update ports everytime they deploy a VM So, client ID should only be enforced, and leases released, if it's actually set on the port DB's DHCP extra Opts. In that case it means someone knows what they are doing, and we want to check for a mismatch. If its None, I suspect in 99.9999% of cases the operator does not know or care about client ID field.
2018-12-05 02:00:56 OpenStack Infra neutron: status New In Progress
2018-12-05 14:42:13 Bence Romsics neutron: status In Progress Incomplete
2018-12-05 20:10:07 OpenStack Infra neutron: status Incomplete In Progress
2018-12-06 15:14:59 Bence Romsics neutron: status In Progress Triaged
2018-12-06 15:15:08 Bence Romsics neutron: importance Undecided Medium
2018-12-07 22:04:36 OpenStack Infra neutron: status Triaged In Progress
2018-12-10 16:55:39 Blake Covarrubias bug added subscriber Blake Covarrubias
2019-01-18 18:59:23 OpenStack Infra neutron: assignee Arjun Baindur (abaindur) Brian Haley (brian-haley)
2019-03-14 19:08:21 OpenStack Infra neutron: status In Progress Fix Released
2019-03-21 20:00:29 OpenStack Infra tags l3-ipam-dhcp in-stable-queens l3-ipam-dhcp
2019-03-22 23:23:57 OpenStack Infra tags in-stable-queens l3-ipam-dhcp in-stable-queens in-stable-rocky l3-ipam-dhcp
2019-03-26 04:02:25 OpenStack Infra tags in-stable-queens in-stable-rocky l3-ipam-dhcp in-stable-pike in-stable-queens in-stable-rocky l3-ipam-dhcp