[L3] snat-ns will be initialized twice for DVR+HA routers during agent restart

Bug #1850779 reported by LIU Yulong
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
LIU Yulong
neutron (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Medium
Unassigned
Focal
Fix Released
Undecided
Unassigned
Groovy
Fix Released
Undecided
Unassigned
Hirsute
Fix Released
Undecided
Unassigned

Bug Description

If the DVR+HA router has external gateway, the snat-namespace will be initialized twice during agent restart.
And that initialized function will run many [1][2] external resource processing actions which will definitely increase the starting time of agent.

https://github.com/openstack/neutron/blob/master/neutron/agent/l3/dvr_snat_ns.py#L31-L39
https://github.com/openstack/neutron/blob/master/neutron/agent/l3/namespaces.py#L91-L108

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
SRU:

[Impact]
Longer l3-agent initialization time during restarts due to creation of snat namespace and setting corresponding sysctl twice.
With this fix, the initialization phase is triggered only once.

[Test Case]
* deploy Openstack on bionic queens (with neutron dvr l3 ha settings and debug mode on for neutron ) and create a router
  (If stsstack-bundles are used, here are the commands
   ./generate-bundle.sh -s bionic -n bionicqueens --dvr-snat-l3ha --create-model --run
   ./configure
   # Configure creates a router with external gateway attached
  )
* Restart neutron-l3-agent on one of the node
  systemctl restart neutron-l3-agent.service

* Check /var/log/neutron/neutron-l3-agent.log and wait for the logs to be settled with all initialization steps
  During initialization steps, following sysctl's are configured [1] [2].
  Verify if the debug logs show sysctl execution statements are displayed twice after restart for snat namespace.
  (If the fix is applied they should be displayed only once)

  grep -inr snat-<router-id> /var/log/neutron/neutron-l3-agent.log | grep sysctl

  Example log:
  2718:2021-04-14 05:17:20.114 10868 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-f64dded1-ef73-47b4-bcee-bb25840e9a02', 'sysctl', '-w', 'net.ipv4.ip_forward=1'] create_process /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:87

[Where problems could occur]
no regression is expected, but if one occurs it would likely result in longer init time and/or failure to correctly init the snat-namespace

[1] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/dvr_snat_ns.py#L31-L39
[2] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/namespaces.py#L91-L108

LIU Yulong (dragon889)
Changed in neutron:
assignee: nobody → LIU Yulong (dragon889)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/692352

Changed in neutron:
status: New → In Progress
tags: added: l3-ha
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/692352
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7a9d6d26419defa148764166600bc4ac6b50c109
Submitter: Zuul
Branch: master

commit 7a9d6d26419defa148764166600bc4ac6b50c109
Author: LIU Yulong <email address hidden>
Date: Thu Oct 31 19:17:36 2019 +0800

    Do not initialize snat-ns twice

    If the DVR+HA router has external gateway, the snat-namespace will be
    initialized twice during agent restart. And that ns initialization
    function will run many external resource processing actions which will
    definitely increase the starting time of L3 agent. This patch addresses
    this issue.

    Change-Id: I7719491275fa1ebfa7e881366e5cb066e3d4185c
    Closes-Bug: #1850779

Changed in neutron:
status: In Progress → Fix Released
tags: added: l3-dvr-backlog
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 16.0.0.0b1

This issue was fixed in the openstack/neutron 16.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/709406

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/709648

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/709649

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/709650

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.opendev.org/709651

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.opendev.org/709406
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9d5e80e935049d08e0fcefc0c823fb67c793a51b
Submitter: Zuul
Branch: stable/queens

commit 9d5e80e935049d08e0fcefc0c823fb67c793a51b
Author: LIU Yulong <email address hidden>
Date: Thu Oct 31 19:17:36 2019 +0800

    Do not initialize snat-ns twice

    If the DVR+HA router has external gateway, the snat-namespace will be
    initialized twice during agent restart. And that ns initialization
    function will run many external resource processing actions which will
    definitely increase the starting time of L3 agent. This patch addresses
    this issue.

    Change-Id: I7719491275fa1ebfa7e881366e5cb066e3d4185c
    Closes-Bug: #1850779
    (cherry picked from commit 7a9d6d26419defa148764166600bc4ac6b50c109)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.opendev.org/709650
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c667e4d3b23cb8a9ea827b2594a0100974bb9b85
Submitter: Zuul
Branch: stable/rocky

commit c667e4d3b23cb8a9ea827b2594a0100974bb9b85
Author: LIU Yulong <email address hidden>
Date: Thu Oct 31 19:17:36 2019 +0800

    Do not initialize snat-ns twice

    If the DVR+HA router has external gateway, the snat-namespace will be
    initialized twice during agent restart. And that ns initialization
    function will run many external resource processing actions which will
    definitely increase the starting time of L3 agent. This patch addresses
    this issue.

    Change-Id: I7719491275fa1ebfa7e881366e5cb066e3d4185c
    Closes-Bug: #1850779
    (cherry picked from commit 7a9d6d26419defa148764166600bc4ac6b50c109)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/stein)

Reviewed: https://review.opendev.org/709649
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=01d0612a3a1b50cdadf633d5e5b9bad1cfe662fd
Submitter: Zuul
Branch: stable/stein

commit 01d0612a3a1b50cdadf633d5e5b9bad1cfe662fd
Author: LIU Yulong <email address hidden>
Date: Thu Oct 31 19:17:36 2019 +0800

    Do not initialize snat-ns twice

    If the DVR+HA router has external gateway, the snat-namespace will be
    initialized twice during agent restart. And that ns initialization
    function will run many external resource processing actions which will
    definitely increase the starting time of L3 agent. This patch addresses
    this issue.

    Change-Id: I7719491275fa1ebfa7e881366e5cb066e3d4185c
    Closes-Bug: #1850779
    (cherry picked from commit 7a9d6d26419defa148764166600bc4ac6b50c109)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.7

This issue was fixed in the openstack/neutron 13.0.7 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.opendev.org/709651
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=6f3591c2af72287a38d7c9214b746405f05d5169
Submitter: Zuul
Branch: stable/pike

commit 6f3591c2af72287a38d7c9214b746405f05d5169
Author: LIU Yulong <email address hidden>
Date: Thu Oct 31 19:17:36 2019 +0800

    Do not initialize snat-ns twice

    If the DVR+HA router has external gateway, the snat-namespace will be
    initialized twice during agent restart. And that ns initialization
    function will run many external resource processing actions which will
    definitely increase the starting time of L3 agent. This patch addresses
    this issue.

    Change-Id: I7719491275fa1ebfa7e881366e5cb066e3d4185c
    Closes-Bug: #1850779
    (cherry picked from commit 7a9d6d26419defa148764166600bc4ac6b50c109)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/train)

Reviewed: https://review.opendev.org/709648
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a56f11222a6c294190a1d90d3db421577f07663d
Submitter: Zuul
Branch: stable/train

commit a56f11222a6c294190a1d90d3db421577f07663d
Author: LIU Yulong <email address hidden>
Date: Thu Oct 31 19:17:36 2019 +0800

    Do not initialize snat-ns twice

    If the DVR+HA router has external gateway, the snat-namespace will be
    initialized twice during agent restart. And that ns initialization
    function will run many external resource processing actions which will
    definitely increase the starting time of L3 agent. This patch addresses
    this issue.

    Change-Id: I7719491275fa1ebfa7e881366e5cb066e3d4185c
    Closes-Bug: #1850779
    (cherry picked from commit 7a9d6d26419defa148764166600bc4ac6b50c109)

tags: added: in-stable-train
Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :
affects: ubuntu → neutron (Ubuntu)
Changed in neutron (Ubuntu Hirsute):
status: New → Fix Released
Changed in neutron (Ubuntu Groovy):
status: New → Fix Released
Changed in neutron (Ubuntu Focal):
status: New → Fix Released
description: updated
Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :

The fix is available until Ubuntu UCA Rocky but not in Ubuntu bionic neutron packages.
Submitted debdiff for bionic.

Changed in neutron (Ubuntu Bionic):
status: New → Triaged
importance: Undecided → High
importance: High → Medium
Revision history for this message
Corey Bryant (corey.bryant) wrote :

A new version of neutron is now in the bionic unapproved queue awaiting SRU team review: https://launchpad.net/ubuntu/bionic/+queue?queue_state=1&queue_text=neutron

Revision history for this message
Arjun Baindur (abaindur) wrote :

I think this fix causes problems. We have multiple nodes that are DVR_SNAT mode. Snat namespace is scheduled to 1 of them.

When l3-agent is restarted on the othre nodes, now, initialize() is invoked always for DvrEdgeRouter which creates the SNAT namespace prematurely. This in turn causes external_gateway_added() to later detect that this host is NOT hosting snat router, but the namespace exists, so it removes it by triggerring external_gateway_removed(dvr_edge_router --> dvr_local_router)

Problem is that the dvr_local_router code for external_gateway_removed() ends up DELETING the rfp/fpr pair and severs the qrouter connection to fip namespace (and deletes all the FIP routes in fip namespace as a result).

Prior to this bug fix, _create_snat_namespace for DvrEdgeRouter was only invoked in _create_dvr_gateway(), which was only invoked when the node was actually hosting SNAT for the router.

Even without the breaking issue of deleting the rtr_2_fip link, this fix uneccesarily creates SNAT namespace on every host, only for it to be deleted.

FYI this is for non-HA routers

Revision history for this message
Arjun Baindur (abaindur) wrote :

I filed this for the above collateral issue: https://bugs.launchpad.net/neutron/+bug/1926531

Dan Streetman (ddstreet)
description: updated
Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :

@abaindur

the description in comment #18 sounds more to me like https://bugs.launchpad.net/neutron/+bug/1894843 and already fixed in stable/queens upstream.

Here is the commit: https://opendev.org/openstack/neutron/commit/8f3daf3f9892cd691dd52965f0fa4eaa07ac3788

Revision history for this message
Robie Basak (racb) wrote : Please test proposed package

Hello LIU, or anyone else affected,

Accepted neutron into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/neutron/2:12.1.1-0ubuntu7 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in neutron (Ubuntu Bionic):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-bionic
Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :

The test case is verified succesfully on bionic-proposed and sysctl executions are done only once during neutron-l3-agent restart

tags: added: verification-done verification-done-bionic
removed: verification-needed verification-needed-bionic
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for neutron has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package neutron - 2:12.1.1-0ubuntu7

---------------
neutron (2:12.1.1-0ubuntu7) bionic; urgency=medium

  * Handle OVSFWPortNotFound and OVSFWTagNotFound in ovs firewall
    - d/p/0001-Handle-OVSFWPortNotFound-and-OVSFWTagNotFound-in-ovs.patch
      (LP: #1849098).

neutron (2:12.1.1-0ubuntu6) bionic; urgency=medium

  * Do not initialize snat-ns twice (LP: #1850779)
    - d/p/0001-Do-not-initialize-snat-ns-twice.patch

neutron (2:12.1.1-0ubuntu5) bionic; urgency=medium

  * Backport fix for dvr-snat missig rfp interfaces (LP: #1894843)
    - d/p/0001-Fix-deletion-of-rfp-interfaces-when-router-is-re-ena.patch

 -- Seyeong Kim <email address hidden> Mon, 03 May 2021 17:15:28 +0900

Changed in neutron (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron pike-eol

This issue was fixed in the openstack/neutron pike-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron queens-eol

This issue was fixed in the openstack/neutron queens-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.