ARP entries dropped by DVR routers when the qr device is not ready or present

Bug #1501086 reported by Swaminathan Vasudevan
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Undecided
Swaminathan Vasudevan

Bug Description

The ARP entries are dropped by DVR routers when the 'qr' device does not exist in the namespace.

There are two ways in the L3 agent the ARP entries are updated.
Once when an internal csnat port is created, then arp entries added from the 'dvr_local_router' by calling the "set_subnet_arp_info" which in turn calls the "_update_arp_entry".

There is another time, when an arp update "rpc" message comes from the Server to the agent as "add_arp_entry" or "delete_arp_entry" which inturn calls "_update_arp_entry".

We have seen log traces that shows that the arp update message comes before the "qr" device is ready. So we get to drop those arp message.

We need to kind of cache those arp messages and update the router-namespace when the "qr" device is ready.

If you see the message below, we are checking for the device and throwing a warning message that the device is not ready, but the arp entries are not saved anywere. They are dropped.

2015-09-24 18:45:30.150 WARNING neutron.agent.l3.dvr_local_router [req-0565ce3a-905d-43fa-a6f3-1a07df6c6c2b None None] Arp operation add failed for device qr-b672ffde-cd, since the device does not exist anymore. The device might have been concurrently deleted or not created yet.

If you see here the internal_network 'qr' device is added later.

2015-09-24 18:45:30.367 DEBUG neutron.agent.l3.router_info [req-7e5722e4-5fef-4889-9372-8cf1218522a2 None None] adding internal network: prefix(qr-), port(b672ffde-cd80-49eb-9817-58436fa8e8fd) _internal_network_added /opt/stack/new/neutron/neutron/agent/l3/router_info.py:300

Changed in neutron:
status: New → Confirmed
assignee: nobody → Swaminathan Vasudevan (swaminathan-vasudevan)
Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/228582
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=d9fb3a66b4aead65fac2df08a5f34538e7af4d7b
Submitter: Jenkins
Branch: master

commit d9fb3a66b4aead65fac2df08a5f34538e7af4d7b
Author: Swaminathan Vasudevan <email address hidden>
Date: Mon Sep 28 11:43:03 2015 -0700

    Cache the ARP entries in L3 Agent for DVR

    There seems to be a timing issue between the
    ARP entries that arrive from the server to
    the agent and the internal qr-device getting
    created by the agent.
    So those unsuccessful arp entries are dropped.

    This patch makes sure that the early ARP entries
    are cached in the agent and then utilized when
    the internal device is up.

    Closes-Bug: #1501086
    Change-Id: I9ec5412f14808de73e8dd86e3d51593946d312a0

Changed in neutron:
status: In Progress → Fix Committed
Stephen Ma (stephen-ma)
tags: added: kilo-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/237769

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/kilo)

Reviewed: https://review.openstack.org/237769
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=223aafdcdd0ded6f912c6597446e90151f8f8a67
Submitter: Jenkins
Branch: stable/kilo

commit 223aafdcdd0ded6f912c6597446e90151f8f8a67
Author: Swaminathan Vasudevan <email address hidden>
Date: Mon Sep 28 11:43:03 2015 -0700

    Cache the ARP entries in L3 Agent for DVR

    There seems to be a timing issue between the
    ARP entries that arrive from the server to
    the agent and the internal qr-device getting
    created by the agent.
    So those unsuccessful arp entries are dropped.

    This patch makes sure that the early ARP entries
    are cached in the agent and then utilized when
    the internal device is up.

    Conflicts:
            neutron/agent/l3/dvr_router.py
            neutron/tests/unit/agent/l3/test_agent.py
            neutron/tests/unit/agent/l3/test_dvr_local_router.py

    Closes-Bug: #1501086
    Change-Id: I9ec5412f14808de73e8dd86e3d51593946d312a0
    (cherry picked from commit d9fb3a66b4aead65fac2df08a5f34538e7af4d7b)

tags: added: in-stable-kilo
Assaf Muller (amuller)
tags: added: liberty-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/238989

tags: removed: kilo-backport-potential
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/neutron 8.0.0.0b1

This issue was fixed in the openstack/neutron 8.0.0.0b1 development milestone.

Changed in neutron:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/238989
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=0cc889f3185a2e8fe3fef45190e4fb457ce40150
Submitter: Jenkins
Branch: stable/liberty

commit 0cc889f3185a2e8fe3fef45190e4fb457ce40150
Author: Swaminathan Vasudevan <email address hidden>
Date: Mon Sep 28 11:43:03 2015 -0700

    Cache the ARP entries in L3 Agent for DVR

    There seems to be a timing issue between the
    ARP entries that arrive from the server to
    the agent and the internal qr-device getting
    created by the agent.
    So those unsuccessful arp entries are dropped.

    This patch makes sure that the early ARP entries
    are cached in the agent and then utilized when
    the internal device is up.

    Closes-Bug: #1501086
    Change-Id: I9ec5412f14808de73e8dd86e3d51593946d312a0
    (cherry picked from commit d9fb3a66b4aead65fac2df08a5f34538e7af4d7b)

tags: added: in-stable-liberty
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 7.0.3

This issue was fixed in the openstack/neutron 7.0.3 release.

tags: removed: liberty-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.