Very often in DHCP agent we have ""Duplicate IP addresses found, DHCP cache is out of sync"

Bug #1828423 reported by Slawek Kaplonski
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Slawek Kaplonski

Bug Description

As described in comment https://bugs.launchpad.net/neutron/+bug/1760047/comments/24 it looks that after patch https://review.opendev.org/#/c/626830/ we have a lot of messages like "Duplicate IP addresses found, DHCP cache is out of sync" in neutron dhcp agent's logs.

To reproduce this it is enough to create new network, subnet and port in this network. It will then be allocated to dhcp agent and in logs there will be such warning.

Changed in neutron:
importance: Undecided → Low
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :
Download full text (4.4 KiB)

Hi:

I tried to (in this order):
- Create a network
- Create a subnet

I've tested a bit adding more log information. I've seen that, when the event "network create" is processed [1], tries to retrieve the network information, NOT from the cache but from the RPC client [2]. Then in [3], this new information (network, subnet and port) is added to the cache (debug log added at the beginning of [4]):

WARNING neutron.agent.dhcp.agent [-] RAH - put network: _ns_name=qdhcp-a46c2769-fd1c-49f5-889f-6f019b727a48, admin_state_up=True, availability_zone_hints=[], availability_zones=['nova'], created_at=2019-05-20T11:20:38Z, description=, id=a46c2769-fd1c-49f5-889f-6f019b727a48, ipv4_address_scope=None, ipv6_address_scope=None, mtu=1500, name=net9, non_local_subnets=[], port_security_enabled=True, ports=[{'id': '3ba19af8-26fd-481e-bc60-2e294c74c7d4', 'name': '', 'network_id': 'a46c2769-fd1c-49f5-889f-6f019b727a48', 'tenant_id': '159b38ee91dc46a59a40536b26cc4233', 'mac_address': 'fa:16:3e:07:1f:a8', 'admin_state_up': True, 'status': 'ACTIVE', 'fixed_ips': [{'subnet_id': '38695b5e-ad25-4993-a851-8e1e18b053cf', 'ip_address': '10.103.0.2', 'subnet': {'id': '38695b5e-ad25-4993-a851-8e1e18b053cf', 'name': 'subnet3_a', 'tenant_id': '159b38ee91dc46a59a40536b26cc4233', 'network_id': 'a46c2769-fd1c-49f5-889f-6f019b727a48', 'ip_version': 4, 'subnetpool_id': None, 'enable_dhcp': True, 'ipv6_ra_mode': None, 'ipv6_address_mode': None, 'gateway_ip': '10.103.0.1', 'cidr': '10.103.0.0/24', 'allocation_pools': [{'start': '10.103.0.2', 'end': '10.103.0.254'}], 'host_routes': [], 'dns_nameservers': [], 'shared': False, 'description': '', 'service_types': [], 'tags': [], 'created_at': '2019-05-20T11:20:45Z', 'updated_at': '2019-05-20T11:20:45Z', 'revision_number': 0, 'project_id': '159b38ee91dc46a59a40536b26cc4233'}}], 'device_id': 'dhcp95cf1ac9-7fbf-5be6-bc93-e3471f4e5b33-a46c2769-fd1c-49f5-889f-6f019b727a48', 'device_owner': 'network:dhcp', 'allowed_address_pairs': [], 'extra_dhcp_opts': [], 'security_groups': [], 'description': '', 'binding:vnic_type': 'normal', 'binding:profile': {}, 'binding:host_id': 'osdev18', 'binding:vif_type': 'ovs', 'binding:vif_details': {'port_filter': True, 'ovs_hybrid_plug': False, 'datapath_type': 'system', 'bridge_name': 'br-int'}, 'port_security_enabled': False, 'qos_policy_id': None, 'resource_request': None, 'tags': [], 'created_at': '2019-05-20T11:20:46Z', 'updated_at': '2019-05-20T11:20:49Z', 'revision_number': 4, 'project_id': '159b38ee91dc46a59a40536b26cc4233'}], project_id=159b38ee91dc46a59a40536b26cc4233, provider:network_type=vlan, provider:physical_network=default, provider:segmentation_id=1072, qos_policy_id=None, revision_number=3, router:external=False, shared=False, status=ACTIVE, subnets=[{'id': '38695b5e-ad25-4993-a851-8e1e18b053cf', 'name': 'subnet3_a', 'tenant_id': '159b38ee91dc46a59a40536b26cc4233', 'network_id': 'a46c2769-fd1c-49f5-889f-6f019b727a48', 'ip_version': 4, 'subnetpool_id': None, 'enable_dhcp': True, 'ipv6_ra_mode': None, 'ipv6_address_mode': None, 'gateway_ip': '10.103.0.1', 'cidr': '10.103.0.0/24', 'allocation_pools': [{'start': '10.103.0.2', 'end': '10.103.0.254'}], 'host_routes': [], 'dns_nameservers': []...

Read more...

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Rodolfo, thx for detailed analysis of this issue.
I agree that we should handle correctly ports added to the cache when the network is created" as You said. But isn't it what https://review.opendev.org/#/c/658307/ is doing? What else You would like to do here instead of just better checking if cache is out of sync?

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hi Slaweq:

The matter of [1] is to point out the unneeded number of times we need to resync the DHCP agent. You took the second option I commented in c#2. Maybe I prefer the first one, avoiding to introduce data for other resources (port) when a network is created.

For the records: the present bug can be solved in [2].

[1] https://bugs.launchpad.net/neutron/+bug/1760047/comments/24
[2] https://review.opendev.org/#/c/658307

Revision history for this message
Fred S (fsbiz) wrote :
Download full text (9.1 KiB)

This fix helps and removes the semaphore that was present in the prior code but it is not fixing the root cause of the problem unfortunately.

Multiple port update messages are serialized to a Priority Queue. The messages are then drained serially without the code yielding. The end result still looks similar (somewhat better than without the fix).

The below is an output from my /var/log/dnsmasq.log when 8 baremetal nodes are provisioned simultaneously.
As can be seen dnsmasq could not process DHCP packets from 23:36:00 to 23:37:33 (over 93 seconds). This leads to the PXE boots on some of the baremetals timing out.

Oct 7 23:36:00 stg-sc-control01 dnsmasq-dhcp[2328509]: DHCPACK(ns-7f4ce6ff-2a) 10.33.27.101 6c:b3:11:4f:87:40 host-10-33-27-101
Oct 7 23:36:26 stg-sc-control01 dnsmasq-dhcp[2328509]: read /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/host
Oct 7 23:36:26 stg-sc-control01 dnsmasq-dhcp[2328509]: read /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/opts
Oct 7 23:36:29 stg-sc-control01 dnsmasq-dhcp[2328509]: read /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/host
Oct 7 23:36:29 stg-sc-control01 dnsmasq-dhcp[2328509]: read /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/opts
Oct 7 23:36:32 stg-sc-control01 dnsmasq-dhcp[2328509]: read /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/host
Oct 7 23:36:32 stg-sc-control01 dnsmasq-dhcp[2328509]: read /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/opts
Oct 7 23:36:34 stg-sc-control01 dnsmasq-dhcp[2328509]: read /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/host
Oct 7 23:36:34 stg-sc-control01 dnsmasq-dhcp[2328509]: read /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/opts
Oct 7 23:36:36 stg-sc-control01 dnsmasq-dhcp[2328509]: read /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/host
Oct 7 23:36:36 stg-sc-control01 dnsmasq-dhcp[2328509]: read /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/opts
Oct 7 23:36:38 stg-sc-control01 dnsmasq-dhcp[2328509]: read /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/host
Oct 7 23:36:38 stg-sc-control01 dnsmasq-dhcp[2328509]: read /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/opts
Oct 7 23:36:40 stg-sc-control01 dnsmasq-dhcp[2328509]: read /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/host
Oct 7 23:36:40 stg-sc-control01 dnsmasq-dhcp[2328509]: read /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/opts
Oct 7 23:36:42 stg-sc-control01 dnsmasq-dhcp[2328509]: read /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/host
Oct 7 23:36:42 stg-sc-control01 dnsmasq-dhcp[2328509]: read /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/opts
Oct 7 23:36:44 stg-sc-control01 dnsmasq-dhcp[2328509]: read /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/host
Oct 7 23:36:44 stg-sc-control01 dnsmasq-dhcp[2328509]: read /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/opts
Oct 7 23:36:46 stg-sc-control01 dnsmasq-dhcp[2328509]: read /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/host
Oct 7 23:36:46 stg-sc-control01 dnsmasq-dhcp[2328509]: read /var/...

Read more...

Changed in neutron:
importance: Low → Medium
Revision history for this message
Oleg Bondarev (obondarev) wrote :

Similar to my comment https://bugs.launchpad.net/neutron/+bug/1862315/comments/3 - if IP is reused quickly, we may get such "Duplicate IP addresses" messages due to lower priority of port delete events than port create events.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/707077

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/707077
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a0bb5763b277d7360554fd64813fdce01244d2fa
Submitter: Zuul
Branch: master

commit a0bb5763b277d7360554fd64813fdce01244d2fa
Author: Oleg Bondarev <email address hidden>
Date: Tue Feb 11 12:18:09 2020 +0400

    dhcp-agent: equalize port create_low/update/delete priority

    Low port delete priority may lead to duplicate entries in network
    cache if IPs are reused frequently.
    Also can't find a strict reason why it should be of lower priority.

    Change-Id: I55f858d50e636eb9091570b256380330b9ce9cb3
    Related-bug: #1862315
    Related-bug: #1828423

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/708122

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/708123

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/708124

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.opendev.org/708125

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/queens)

Reviewed: https://review.opendev.org/708125
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=de29b49b132b84f07329780958b503edbd5c77f6
Submitter: Zuul
Branch: stable/queens

commit de29b49b132b84f07329780958b503edbd5c77f6
Author: Oleg Bondarev <email address hidden>
Date: Tue Feb 11 12:18:09 2020 +0400

    dhcp-agent: equalize port create_low/update/delete priority

    Low port delete priority may lead to duplicate entries in network
    cache if IPs are reused frequently.
    Also can't find a strict reason why it should be of lower priority.

    Change-Id: I55f858d50e636eb9091570b256380330b9ce9cb3
    Related-bug: #1862315
    Related-bug: #1828423
    (cherry picked from commit a0bb5763b277d7360554fd64813fdce01244d2fa)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/rocky)

Reviewed: https://review.opendev.org/708124
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9499467caedfd2b30d2514f15b41cbde6e8a9fe6
Submitter: Zuul
Branch: stable/rocky

commit 9499467caedfd2b30d2514f15b41cbde6e8a9fe6
Author: Oleg Bondarev <email address hidden>
Date: Tue Feb 11 12:18:09 2020 +0400

    dhcp-agent: equalize port create_low/update/delete priority

    Low port delete priority may lead to duplicate entries in network
    cache if IPs are reused frequently.
    Also can't find a strict reason why it should be of lower priority.

    Change-Id: I55f858d50e636eb9091570b256380330b9ce9cb3
    Related-bug: #1862315
    Related-bug: #1828423
    (cherry picked from commit a0bb5763b277d7360554fd64813fdce01244d2fa)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/train)

Reviewed: https://review.opendev.org/708122
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=e894904a7d8dbb0e01d9c4d18b524de4b436ead3
Submitter: Zuul
Branch: stable/train

commit e894904a7d8dbb0e01d9c4d18b524de4b436ead3
Author: Oleg Bondarev <email address hidden>
Date: Tue Feb 11 12:18:09 2020 +0400

    dhcp-agent: equalize port create_low/update/delete priority

    Low port delete priority may lead to duplicate entries in network
    cache if IPs are reused frequently.
    Also can't find a strict reason why it should be of lower priority.

    Change-Id: I55f858d50e636eb9091570b256380330b9ce9cb3
    Related-bug: #1862315
    Related-bug: #1828423
    (cherry picked from commit a0bb5763b277d7360554fd64813fdce01244d2fa)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/stein)

Reviewed: https://review.opendev.org/708123
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=5a0c3e1fddd9863ff4d515e153ec67c0c3bbcb22
Submitter: Zuul
Branch: stable/stein

commit 5a0c3e1fddd9863ff4d515e153ec67c0c3bbcb22
Author: Oleg Bondarev <email address hidden>
Date: Tue Feb 11 12:18:09 2020 +0400

    dhcp-agent: equalize port create_low/update/delete priority

    Low port delete priority may lead to duplicate entries in network
    cache if IPs are reused frequently.
    Also can't find a strict reason why it should be of lower priority.

    Change-Id: I55f858d50e636eb9091570b256380330b9ce9cb3
    Related-bug: #1862315
    Related-bug: #1828423
    (cherry picked from commit a0bb5763b277d7360554fd64813fdce01244d2fa)

tags: added: in-stable-stein
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Can we close this bug as resolved now? Or do we need anything else regarding to it?

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I think that with https://review.opendev.org/c/openstack/neutron/+/773160 we can close this one as it should be fixed now.

Changed in neutron:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.