protect for dhcp agent cache out of sync with neutron server

Bug #1645835 reported by zjf
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Undecided
Antonio Ojea

Bug Description

If DHCP agent port cache is out of sync with neutron server, new VM may not get it's IP. when DHCP agent execute port_create_end method, port's IP should be checked before being used.

The scenario might be:
1)create a VM, neutron server notify DHCP agent port_create_end
2)destroy this VM, neutron server notify DHCP agent port_delete_end
  if rpc message of port_delete_end lost at this step, neutron server will be out of sync with DHCP agent cache. The port will still exit in DHCP agent cache, Dnsmasq will still have port's mac-host-ip record in host file.
3)create a new VM, neutron server may allocate the IP which cached in DHCP agent, this VM will not get IP when it start, because Dnsmasq have two records about this IP.

tags: added: l3-ipam-dhcp
zjf (zjf)
summary: - dhcp agent dosen't update port cache after database recovery
+ protect for dhcp agent cache out of sync with neutron server
zjf (zjf)
description: updated
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

I am not sure if the scenario is real with the message bus we use. Do you see it happening in real life?

tags: added: needs-attention
Revision history for this message
zjf (zjf) wrote :

The problem of loss of rpc message has occurred previously. In this scenario, loss of rpc message is just one way to cause dhcp agent port cache out of sync with neutron server. I think DHCP agent should provide protection for port cache, when dhcp agent port cache is out of sync with neutron server.

Revision history for this message
ZongmingZuo (windtalkers) wrote :

Half a month ago, I met this problem. I restart the DHCP agent to deal with it, But I don't think the DHCP proxy cache should provide protection port,You can choose not using no-ping option for dnsmasq configuration.

Antonio Ojea (aojea)
Changed in neutron:
assignee: nobody → Antonio Ojea (itsuugo)
Revision history for this message
Antonio Ojea (aojea) wrote :

I have a similar problem, and indeed restarting the DHCP agent solves it, but is not a solution that you can use constantly in production environments.

The problem I'm facing is that some applications are constantly creating neutron ports with deviceowner:floatingip in the same subnet, that port creation fails and analyzing the RPC traffic we can see that the DHCP agent only receives the "port_create_end" RPC cast but it never receives a "port_delete_end" .

After some time, the applications starts to create duplicate ips assignments that are stored in the DHCP cache but not in neutron. These duplicates IPs are created in the dnsmasq host files causing the behavior explained in https://bugs.launchpad.net/neutron/+bug/1732456

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/626641

Changed in neutron:
status: New → In Progress
Changed in neutron:
assignee: Antonio Ojea (itsuugo) → Darragh O'Reilly (darragh-oreilly)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/629676

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/629677

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/629678

Changed in neutron:
assignee: Darragh O'Reilly (darragh-oreilly) → Antonio Ojea (itsuugo)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/626641
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ffbf65126fcd25ca242e5512b6e30cb7a1b4ef37
Submitter: Zuul
Branch: master

commit ffbf65126fcd25ca242e5512b6e30cb7a1b4ef37
Author: aojeagarcia <email address hidden>
Date: Thu Dec 20 19:49:56 2018 +0100

    protect DHCP agent cache out of sync

    If DHCP agent port cache is out of sync with neutron server, dnsmasq
    entries are wrong and VMs may not acquire an IP because of duplicate
    entries.

    When DHCP agent executes port_create_end method, port's
    IP should be checked before being used, if there are duplicate IP
    addresses in the same network in the cache we should resync.

    Co-Authored-By: <email address hidden>
    Closes-Bug: #1645835

    Change-Id: Icc555050283420fddfb90bb67e02bc303e989e27

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/632085

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/629677
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c5a1214ca656ea872406a35d40763c69d93e37b9
Submitter: Zuul
Branch: stable/pike

commit c5a1214ca656ea872406a35d40763c69d93e37b9
Author: aojeagarcia <email address hidden>
Date: Thu Dec 20 19:49:56 2018 +0100

    protect DHCP agent cache out of sync

    If DHCP agent port cache is out of sync with neutron server, dnsmasq
    entries are wrong and VMs may not acquire an IP because of duplicate
    entries.

    When DHCP agent executes port_create_end method, port's
    IP should be checked before being used, if there are duplicate IP
    addresses in the same network in the cache we should resync.

    Co-Authored-By: <email address hidden>
    Closes-Bug: #1645835

    Change-Id: Icc555050283420fddfb90bb67e02bc303e989e27

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ocata)

Reviewed: https://review.openstack.org/632085
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a5737eb531771771478e62fc04443f90a63e8e58
Submitter: Zuul
Branch: stable/ocata

commit a5737eb531771771478e62fc04443f90a63e8e58
Author: aojeagarcia <email address hidden>
Date: Thu Dec 20 19:49:56 2018 +0100

    protect DHCP agent cache out of sync

    If DHCP agent port cache is out of sync with neutron server, dnsmasq
    entries are wrong and VMs may not acquire an IP because of duplicate
    entries.

    When DHCP agent executes port_create_end method, port's
    IP should be checked before being used, if there are duplicate IP
    addresses in the same network in the cache we should resync.

    Co-Authored-By: <email address hidden>
    Closes-Bug: #1645835

    Change-Id: Icc555050283420fddfb90bb67e02bc303e989e27
    (cherry picked from commit ffbf65126fcd25ca242e5512b6e30cb7a1b4ef37)

tags: added: in-stable-ocata
tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.openstack.org/629676
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=8294bcf92e7a27f92173379afd39cac3c42864f6
Submitter: Zuul
Branch: stable/rocky

commit 8294bcf92e7a27f92173379afd39cac3c42864f6
Author: aojeagarcia <email address hidden>
Date: Thu Dec 20 19:49:56 2018 +0100

    protect DHCP agent cache out of sync

    If DHCP agent port cache is out of sync with neutron server, dnsmasq
    entries are wrong and VMs may not acquire an IP because of duplicate
    entries.

    When DHCP agent executes port_create_end method, port's
    IP should be checked before being used, if there are duplicate IP
    addresses in the same network in the cache we should resync.

    Co-Authored-By: <email address hidden>
    Closes-Bug: #1645835

    Change-Id: Icc555050283420fddfb90bb67e02bc303e989e27
    (cherry picked from commit ffbf65126fcd25ca242e5512b6e30cb7a1b4ef37)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.openstack.org/629678
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=88528d191f79896afc96fe2d679c6214d1768cec
Submitter: Zuul
Branch: stable/queens

commit 88528d191f79896afc96fe2d679c6214d1768cec
Author: aojeagarcia <email address hidden>
Date: Thu Dec 20 19:49:56 2018 +0100

    protect DHCP agent cache out of sync

    If DHCP agent port cache is out of sync with neutron server, dnsmasq
    entries are wrong and VMs may not acquire an IP because of duplicate
    entries.

    When DHCP agent executes port_create_end method, port's
    IP should be checked before being used, if there are duplicate IP
    addresses in the same network in the cache we should resync.

    Co-Authored-By: <email address hidden>
    Closes-Bug: #1645835

    Change-Id: Icc555050283420fddfb90bb67e02bc303e989e27
    (cherry picked from commit ffbf65126fcd25ca242e5512b6e30cb7a1b4ef37)

tags: added: in-stable-queens
tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 14.0.0.0b2

This issue was fixed in the openstack/neutron 14.0.0.0b2 development milestone.

tags: removed: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.7

This issue was fixed in the openstack/neutron 11.0.7 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.3

This issue was fixed in the openstack/neutron 13.0.3 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.6

This issue was fixed in the openstack/neutron 12.0.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron ocata-eol

This issue was fixed in the openstack/neutron ocata-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.