L3 DVR ARP population gets incorrect MAC address in some cases

Bug #1869887 reported by Slawek Kaplonski
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Slawek Kaplonski

Bug Description

L3 dvr router is setting permanent arp entries in qrouter's namespace for all ports plugged to the subnets which are connected to the router.
In most cases it's fine, but as it uses MAC address defined in Neutron DB for that (which is fine in general) it may cause connectivity problem in specific conditions.

It happens for example with Octavia as Octavia creates unbound ports just to allocate IP address for their VIP in Neutron's db. And Octavia then sets this IP address in allowed_address_pair of other ports which are plugged to Amphora's VMs.
But in DVR case such IP address is populated in arp cache with mac address from own port, it don't works fine when is configured as additional IP on interface with different MAC.

Octavia is only one, most common known example of such use case, but we know that there are other users who are doing something similar with keepalived on their instances.

So as this additional port is always "unbound", and "unbound" means that such port is basically just entry in Neutron DB, I think that there is no need to set it in arp cache. Only bound ports should be set there.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/716302

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
Oleg Bondarev (obondarev) wrote :

Are there any actual occurrences of this bug in the fields? I agree that there is no need to set unbound ports in arp cache, but need to estimate possible performance degradation caused by additional check/db query.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

@Oleg: actually we have this issue now on deployment with Octavia and DVR. That's why we saw that issue.
And IMO performance impact will not be big, basically if You look at my patch, You can see that dvr_mac_db.get_ports_query_by_subnet_and_ip(context, subnet_id) already returns port bindings, so no need for any new db query. There is some filtering of ports there but as this is done only during plugging new internal network to the router, I don't think it will have big impact on performance. But if performance would be impacted, maybe we should modify this code and filter those unbound ports on db level already?

Revision history for this message
Oleg Bondarev (obondarev) wrote :

Ah, I missed that no new db query is needed, my bad. Thanks for clarification!

Changed in neutron:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/716302
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=eb775458c6da57426703289c7b969caddb83d677
Submitter: Zuul
Branch: master

commit eb775458c6da57426703289c7b969caddb83d677
Author: Slawek Kaplonski <email address hidden>
Date: Tue Mar 31 05:33:06 2020 +0200

    [DVR] Don't populate unbound ports in router's ARP cache

    When user is using keepalived on their instances, he often creates
    additional port in Neutron to allocate some IP address which will
    be then used as VIP in keepalived and will be configured in
    allowed_address_pair of other ports plugged to instances with
    keepalived.
    This is e.g. Octavia's use case.

    This together with DVR caused problems with connectivity to such VIP
    as it was populated in router's arp cache with MAC address from
    Neutron db.

    As this port isn't bound, it is only Neutron db entry so there is no
    need to set it in arp cache of the router.
    This patch is doing exactly that to filter such "unbound" and
    "binding_failed" ports from the list.

    Change-Id: Ia885ce00dbb5f2968859e8d0850bc511016f0846
    Closes-Bug: #1869887

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/717315

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/717317

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/717319

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/717321

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/train)

Reviewed: https://review.opendev.org/717315
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=141bbe5e2f07a8c0015fa177319428d66c330e2b
Submitter: Zuul
Branch: stable/train

commit 141bbe5e2f07a8c0015fa177319428d66c330e2b
Author: Slawek Kaplonski <email address hidden>
Date: Tue Mar 31 05:33:06 2020 +0200

    [DVR] Don't populate unbound ports in router's ARP cache

    When user is using keepalived on their instances, he often creates
    additional port in Neutron to allocate some IP address which will
    be then used as VIP in keepalived and will be configured in
    allowed_address_pair of other ports plugged to instances with
    keepalived.
    This is e.g. Octavia's use case.

    This together with DVR caused problems with connectivity to such VIP
    as it was populated in router's arp cache with MAC address from
    Neutron db.

    As this port isn't bound, it is only Neutron db entry so there is no
    need to set it in arp cache of the router.
    This patch is doing exactly that to filter such "unbound" and
    "binding_failed" ports from the list.

    Change-Id: Ia885ce00dbb5f2968859e8d0850bc511016f0846
    Closes-Bug: #1869887
    (cherry picked from commit eb775458c6da57426703289c7b969caddb83d677)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.opendev.org/717321
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=09aaa520f00af4498610d91a64e8b37988168312
Submitter: Zuul
Branch: stable/queens

commit 09aaa520f00af4498610d91a64e8b37988168312
Author: Slawek Kaplonski <email address hidden>
Date: Tue Mar 31 05:33:06 2020 +0200

    [DVR] Don't populate unbound ports in router's ARP cache

    When user is using keepalived on their instances, he often creates
    additional port in Neutron to allocate some IP address which will
    be then used as VIP in keepalived and will be configured in
    allowed_address_pair of other ports plugged to instances with
    keepalived.
    This is e.g. Octavia's use case.

    This together with DVR caused problems with connectivity to such VIP
    as it was populated in router's arp cache with MAC address from
    Neutron db.

    As this port isn't bound, it is only Neutron db entry so there is no
    need to set it in arp cache of the router.
    This patch is doing exactly that to filter such "unbound" and
    "binding_failed" ports from the list.

    Conflicts:
        neutron/tests/unit/db/test_l3_dvr_db.py

    Change-Id: Ia885ce00dbb5f2968859e8d0850bc511016f0846
    Closes-Bug: #1869887
    (cherry picked from commit eb775458c6da57426703289c7b969caddb83d677)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/stein)

Reviewed: https://review.opendev.org/717317
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=914cd7e15a996d13b539e660da58f055007b9718
Submitter: Zuul
Branch: stable/stein

commit 914cd7e15a996d13b539e660da58f055007b9718
Author: Slawek Kaplonski <email address hidden>
Date: Tue Mar 31 05:33:06 2020 +0200

    [DVR] Don't populate unbound ports in router's ARP cache

    When user is using keepalived on their instances, he often creates
    additional port in Neutron to allocate some IP address which will
    be then used as VIP in keepalived and will be configured in
    allowed_address_pair of other ports plugged to instances with
    keepalived.
    This is e.g. Octavia's use case.

    This together with DVR caused problems with connectivity to such VIP
    as it was populated in router's arp cache with MAC address from
    Neutron db.

    As this port isn't bound, it is only Neutron db entry so there is no
    need to set it in arp cache of the router.
    This patch is doing exactly that to filter such "unbound" and
    "binding_failed" ports from the list.

    Change-Id: Ia885ce00dbb5f2968859e8d0850bc511016f0846
    Closes-Bug: #1869887
    (cherry picked from commit eb775458c6da57426703289c7b969caddb83d677)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.opendev.org/717319
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=31f7c01f97cba9d4bf9648cd0f987b4025b39434
Submitter: Zuul
Branch: stable/rocky

commit 31f7c01f97cba9d4bf9648cd0f987b4025b39434
Author: Slawek Kaplonski <email address hidden>
Date: Tue Mar 31 05:33:06 2020 +0200

    [DVR] Don't populate unbound ports in router's ARP cache

    When user is using keepalived on their instances, he often creates
    additional port in Neutron to allocate some IP address which will
    be then used as VIP in keepalived and will be configured in
    allowed_address_pair of other ports plugged to instances with
    keepalived.
    This is e.g. Octavia's use case.

    This together with DVR caused problems with connectivity to such VIP
    as it was populated in router's arp cache with MAC address from
    Neutron db.

    As this port isn't bound, it is only Neutron db entry so there is no
    need to set it in arp cache of the router.
    This patch is doing exactly that to filter such "unbound" and
    "binding_failed" ports from the list.

    Conflicts:
        neutron/tests/unit/db/test_l3_dvr_db.py

    Change-Id: Ia885ce00dbb5f2968859e8d0850bc511016f0846
    Closes-Bug: #1869887
    (cherry picked from commit eb775458c6da57426703289c7b969caddb83d677)

tags: added: in-stable-rocky
tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron queens-eol

This issue was fixed in the openstack/neutron queens-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron rocky-eol

This issue was fixed in the openstack/neutron rocky-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.