Port creation is slow due to the way networking-ovn handles ACLs

Bug #1752897 reported by Daniel Alvarez
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
networking-ovn
Fix Released
Undecided
Maciej Jozefczyk
ovsdbapp
Fix Released
Undecided
Unassigned

Bug Description

I raised the problem in OVS mailing list at [0].

Basically the port creation times grows linearly due to the fact that we're creating duplicated ACL entries in the NB database every time a port is added to a Logical Switch. Most of those ACL's are exactly the same except for the inport/outport fields which makes reference to every single port affected by that rule. Whenever some data comes from ovsdb-server, the Python IDL takes a lot of time for processing such a big amount of data (especially in JSON conversions).

As an initial optimization, I sent a patch [1] that improved a lot the processing in the OVS Python IDL as you can see in this graph [2]. However, even though this looks good enough, there's still room for improvement by cutting down the number of ACLs to 1 per SG rule per Logical Switch instead of 1 per SG rule per Port per Logical Switch, especially at scale. This will make networking-ovn much faster when creating/updating ports/security group rules and also will make ovn-northd less busy as there'll be less ACLs to process.

There's a chain of patches in Core OVN currently under review [3] that will add the required functionality there and we would need to make use of it in networking-ovn as well.

[0] https://mail.openvswitch.org/pipermail/ovs-discuss/2018-February/046149.html
[1] https://github.com/openvswitch/ovs/commit/eab138764e3aaa9f98a725a4636dd30e2a41a896
[2] http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180227/7e4b716d/attachment-0001.png
[3] https://patchwork.ozlabs.org/project/openvswitch/list/?series=31165

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/549249
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=93737fdc85c66213166e1e3bb14519871a2efad4
Submitter: Zuul
Branch: master

commit 93737fdc85c66213166e1e3bb14519871a2efad4
Author: Daniel Alvarez <email address hidden>
Date: Fri Mar 2 15:59:15 2018 +0100

    ACL optimizations design spec

    This spec describes the current problem with ACL's, the changes being
    done in the core OVN side and the proposed solution on the OpenStack
    integration driver to solve it.

    Partial-Bug: #1752897

    Change-Id: I23c4150606ef9e51ab0af0abaad7fe3f56e5f754
    Signed-off-by: Daniel Alvarez <email address hidden>

tags: added: networking-ovn-proactive-backport-potential
tags: removed: networking-ovn-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/594136

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/594137

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/594138

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-ovn (stable/queens)

Change abandoned by Daniel Alvarez (<email address hidden>) on branch: stable/queens
Review: https://review.openstack.org/594137

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (stable/rocky)

Reviewed: https://review.openstack.org/594136
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=55c1a36a1d72a5f4b95b7fdcf647315ecdf526f6
Submitter: Zuul
Branch: stable/rocky

commit 55c1a36a1d72a5f4b95b7fdcf647315ecdf526f6
Author: Daniel Alvarez <email address hidden>
Date: Wed May 23 15:04:07 2018 +0200

    Support Port Groups in networking-ovn

    A new feature has been introduced in core OVN which allows to define
    a group of ports and assign ACLs to those. This patch is making use
    of the new feature if supported by the underlying OVS version.

    As a result we'll be modelling Neutron Security Groups as OVN Port
    Groups and we won't be adding one ACL per Security Group Rule per
    port. Instead, just add one single ACL per Security Group. This will
    also tackle the race conditions that we had for Address Sets as those
    will just be used for Remote Security Groups and will be automatically
    generated/deleted by core OVN in SB database upon Port Group creation/
    deletion.

    The major benefit of this patch is that we'll reduce the number of
    ACL's dramatically, resulting in a performance leap as discussed at:
    https://mail.openvswitch.org/pipermail/ovs-discuss/2018-February/046166.html

    This patch will address the migration of old Security Groups being
    modelled as Address Sets if the OVN schema supports the feature. This
    migration will be performed from the OvnWorker which is holding a lock
    on the IDL. This ensures that the migration happens from only one worker
    in the cloud and after it's done, all the neutron-server instances are
    ready to use Port Groups.

    NOTE: This also squashes I706199109a3b7dd5339c90b731f8cb8f04ca4f49

    Closes-Bug: #1752897
    Closes-Bug: #1790118
    Co-Authored-By: Lucas Alvares Gomes <email address hidden>
    Change-Id: I35d5ec40c666e92b92b9d664e9615c6fecde595a
    Signed-off-by: Daniel Alvarez <email address hidden>
    (cherry picked from commit f01169b405bb5080a1bc1653f79512eb0664c35d)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 5.0.1

This issue was fixed in the openstack/networking-ovn 5.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-ovn (stable/queens)

Change abandoned by Daniel Alvarez (<email address hidden>) on branch: stable/queens
Review: https://review.openstack.org/594138

Revision history for this message
Daniel Alvarez (dalvarezs) wrote :

This problem is affecting stable/queens as ovn-controller will hog the CPU on 100% due to the big number of ACLs, Logical and physical flows.

Changed in networking-ovn:
assignee: nobody → Maciej Jozefczyk (maciej.jozefczyk)
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to ovsdbapp (stable/queens)

Reviewed: https://review.opendev.org/681657
Committed: https://git.openstack.org/cgit/openstack/ovsdbapp/commit/?id=024c2bca858a6eb6c896ed547484b10538ea4acf
Submitter: Zuul
Branch: stable/queens

commit 024c2bca858a6eb6c896ed547484b10538ea4acf
Author: Lucas Alvares Gomes <email address hidden>
Date: Fri May 25 15:18:31 2018 +0100

    Add Port_Group commands

    This patch is adding commands to manipulate the new Port_Group table
    introduced in the OVN Northbound Database.

    Four new commands are being added:

    * pg_add() -> To add a new Port Group
    * pg_del() -> To delete a Port Group
    * pg_add_ports() -> To add a list of LSP to a Port Group
    * pg_add_acls() -> To add a list of ACL to a Port Group
    * pg_del_ports() -> To delete a list of LSP from a Port Group
    * pg_del_acls() -> To delete a list of ACL from a Port Group

    Related-Bug: #1752897

    Change-Id: I9bae9a5681501d8e8a85c25ccb6d496d5b3f8681
    (cherry picked from commit 90dfca5dfc60e48544ff25f63c3fa59cb88fc521)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/681658
Committed: https://git.openstack.org/cgit/openstack/ovsdbapp/commit/?id=3f92a421a4d7d8722540f03fc54e0632d78af6bb
Submitter: Zuul
Branch: stable/queens

commit 3f92a421a4d7d8722540f03fc54e0632d78af6bb
Author: Daniel Alvarez <email address hidden>
Date: Fri Jun 1 09:45:57 2018 +0200

    Add Port Group ACL commands

    After introducing the concept of Port Groups in OVN, ACLs can be
    applied either to a Logical Switch or to a Port Group.
    This patch is adding the commands to support adding ACLs to a Port
    Group. Also it's removing the initial implementation of adding
    standalone ACLs to a Port Group.

    Related-Bug: #1752897

    Change-Id: I7720cd7aa801fbc4be87e1c665e4549725576269
    Signed-off-by: Daniel Alvarez <email address hidden>
    (cherry picked from commit 7e980f1ae5acc0af8810e783d4f3c27c194451c8)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/681659
Committed: https://git.openstack.org/cgit/openstack/ovsdbapp/commit/?id=5e954f98e436cac9eba4ca4fd533e73a923d36fe
Submitter: Zuul
Branch: stable/queens

commit 5e954f98e436cac9eba4ca4fd533e73a923d36fe
Author: Lucas Alvares Gomes <email address hidden>
Date: Thu May 31 14:06:33 2018 +0100

    Port Group's letfovers

    * Fix some docstrings
    * Add a pg_get() command to make the interface more complete
    * Fix may_exist for pg_add()

    Related-Bug: #1752897

    Change-Id: I596e0674a860d2dd9738799a09b66e1712953728
    (cherry picked from commit 62a619024ad332ad088108ece098d90aa5c3e159)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to networking-ovn (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.opendev.org/682699

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-ovn (stable/queens)

Change abandoned by Maciej Józefczyk (<email address hidden>) on branch: stable/queens
Review: https://review.opendev.org/681562
Reason: we decided to not bump minimal version

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to networking-ovn (stable/queens)

Reviewed: https://review.opendev.org/682699
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=88f6de96ac7a623e12a8931e2c9f80387db77855
Submitter: Zuul
Branch: stable/queens

commit 88f6de96ac7a623e12a8931e2c9f80387db77855
Author: Maciej Józefczyk <email address hidden>
Date: Tue Sep 17 10:17:48 2019 +0000

    Add Port Groups ACLs related code from ovsdbapp

    This stable/queens-only patch adds three major changes taken from
    ovsdbapp (matching branch).

    1) Add Port_Group commands
    https://review.opendev.org/#/c/681657

    2) Add Port Group ACL commands
    https://review.opendev.org/#/c/681658

    3) Port Group's letfovers
    https://review.opendev.org/#/c/681659

    Change-Id: I2a9a9dcdefe6e25bc789a6e588c585557b32c43e
    Related-Bug: #1752897

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/594138
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=8ad2ff706eb725a225b0289307ebd34e2024566d
Submitter: Zuul
Branch: stable/queens

commit 8ad2ff706eb725a225b0289307ebd34e2024566d
Author: Daniel Alvarez <email address hidden>
Date: Wed May 23 15:04:07 2018 +0200

    Support Port Groups in networking-ovn

    This changes squashes three commits:

    1) Support Port Groups in networking-ovn

    A new feature has been introduced in core OVN which allows to define
    a group of ports and assign ACLs to those. This patch is making use
    of the new feature if supported by the underlying OVS version.

    As a result we'll be modelling Neutron Security Groups as OVN Port
    Groups and we won't be adding one ACL per Security Group Rule per
    port. Instead, just add one single ACL per Security Group. This will
    also tackle the race conditions that we had for Address Sets as those
    will just be used for Remote Security Groups and will be automatically
    generated/deleted by core OVN in SB database upon Port Group creation/
    deletion.

    The major benefit of this patch is that we'll reduce the number of
    ACL's dramatically, resulting in a performance leap as discussed at:
    https://mail.openvswitch.org/pipermail/ovs-discuss/2018-February/046166.html

    This patch will address the migration of old Security Groups being
    modelled as Address Sets if the OVN schema supports the feature. This
    migration will be performed from the OvnWorker which is holding a lock
    on the IDL. This ensures that the migration happens from only one worker
    in the cloud and after it's done, all the neutron-server instances are
    ready to use Port Groups.

    2) Fix bug migrating ACLs to Port Groups

    When migrating to Port Groups, we were installing wrong ACLs
    matching on old Address Sets. This was because, during migration
    time, old Address Sets still existed in NB database and we
    were relying on their existence to figure out which one to use.

    This patch fixes it by simply picking the auto generated
    Address Set if Port Groups is supported in the OVSDB schema.

    3) Drop subnet Port Groups

    This patch is dropping the code that creates and handles the
    subnet Port Groups. Those were used to place the ACLs that
    allowed DHCP traffic to reach the responder in the OVN
    pipeline but as explained in the bug description, it's
    not needed anymore.

    Related-Bug: #1752897

    Co-Authored-By: Lucas Alvares Gomes <email address hidden>
    Change-Id: I35d5ec40c666e92b92b9d664e9615c6fecde595a
    Signed-off-by: Daniel Alvarez <email address hidden>
    (cherry picked from commit f01169b405bb5080a1bc1653f79512eb0664c35d)

Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

Marking as fixed after port groups support has been added to networking-ovn

Changed in networking-ovn:
status: Triaged → Fix Released
Revision history for this message
Terry Wilson (otherwiseguy) wrote :

This was fixed a while ago, marking fixed.

Changed in ovsdbapp:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.