Security group performance issue for iptables driver due to "stateless feature"

Bug #2045950 reported by Robert van Leeuwen
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Rodolfo Alonso

Bug Description

There is a huge performance issue with the security groups when using the iptables implementation:
If you have a security group with say 500 rules it will take minutes for the RPC server to create the the port configuration.
You will see this when you restart the neutron-linuxbridge-agent on a compute node with an instance with the security group with a lot of rules.
In the agent log you will see "Preparing filters for devices" and this will take minutes for a single port when having a significate amount of rules in the security group.

After some investigation this seems to be cause:

In the commit below stateful functionality was added for iptables implementation:

https://opendev.org/openstack/neutron/commit/cbc473e066d#diff-7d7a372d8ed39ad8489a39ff7c3f3d783235218c

However there is a huge performance impact, in the following function in
neutron/db/securitygroups_rpc_base.py
 def security_group_info_for_ports

For EACH rule in the security group rule in a a group it will do a database lookup to check what the setting is on the group:
            stateful = self._is_security_group_stateful(context,
                                                        security_group_id)
Which will call:

    def _is_security_group_stateful(self, context, sg_id):
        return sg_obj.SecurityGroup.get_sg_by_id(context, sg_id).stateful

So if you have say 500 rules it will go 500 times(!) to the database to check the exact same property on the group object which absolutely tanks performance.

I played around with caching the stateful property for the group (since it is not even changeable on a security group if there are rules present) and the function went from taking multiple minutes to about a second.

Revision history for this message
Lajos Katona (lajos-katona) wrote :

Hi, thanks for reporting this issue. Could you give some more details please?
Are you seeing the same performance issue on master or you tested only on specific branch? I assume you are using the hybrid firewall driver with OVS or am I wrong?

Revision history for this message
Lajos Katona (lajos-katona) wrote :

I checked and the methods you referenced are still there on master

Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Robert van Leeuwen (robertvan) wrote :

Hello Lajos,

I experienced the issue in Yoga + linuxbridge ML2 + iptables firewall driver.

As you have noticed the issue is still there in Master:

https://opendev.org/openstack/neutron/src/branch/master/neutron/db/securitygroups_rpc_base.py#L218

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

This issue should affect ML2/LB only, due to the problem described: the server side RPC callback will retrieve the SG per port to populate the "stateful" flag [1].

In ML2/OVS, this issue is not present because of the cached RPC implementation. The method retrieves the SG stored locally in the OVS agent cache without sending any request to the Neutron server [2].

In order to improve the method ``SecurityGroupInfoAPIMixin.security_group_info_for_ports``, it should retrieve the SGs in a DB bulk call, grouping all the SG IDs needed and only retrieving the "stateful" flag.

Regards.

[1]https://review.opendev.org/c/openstack/neutron/+/572767/53/neutron/db/securitygroups_rpc_base.py#432
[2]https://review.opendev.org/c/openstack/neutron/+/572767/53/neutron/api/rpc/handlers/securitygroups_rpc.py#360
[3]https://review.opendev.org/c/openstack/neutron/+/572767/53/neutron/db/securitygroups_rpc_base.py#165

Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/903707

Changed in neutron:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/neutron/+/904243

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2023.2)

Fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/neutron/+/904244

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/neutron/+/904245

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/neutron/+/904246

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/neutron/+/904247

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/904248

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/903707
Committed: https://opendev.org/openstack/neutron/commit/6b6abb9698318a0b5db09f0c4d30a47438a94643
Submitter: "Zuul (22348)"
Branch: master

commit 6b6abb9698318a0b5db09f0c4d30a47438a94643
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu Dec 14 15:45:48 2023 +0000

    Improve the SG RPC callback ``security_group_info_for_ports``

    This method populates the SG rules in a dictionary. Each SG rule
    inherits the "stateful" value of the SG. Prior to this patch, each
    SG rule was isuing a database call to retrieve the SG register.

    In this patch, the SG "stateful" retrieval is done in one database
    query for all SG. That improves the performance of this method
    reducing the database access to only one single call.

    This improvement, as commented in the LP bug, affects to
    ML2/LinuxBridge. ML2/OVS agent uses a cached RPC implementation
    that not requires to perform any RPC call/database query.

    Closes-Bug: #2045950
    Change-Id: Iafd0419a1d1eeb25d5589edc2570ebf287450957

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
Robert van Leeuwen (robertvan) wrote :

Thx Rodolfo, this was a fixed and merged super quickly! :)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/904244
Committed: https://opendev.org/openstack/neutron/commit/ac465e9ef628df56299675887eaf998cbe974d19
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit ac465e9ef628df56299675887eaf998cbe974d19
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu Dec 14 15:45:48 2023 +0000

    Improve the SG RPC callback ``security_group_info_for_ports``

    This method populates the SG rules in a dictionary. Each SG rule
    inherits the "stateful" value of the SG. Prior to this patch, each
    SG rule was isuing a database call to retrieve the SG register.

    In this patch, the SG "stateful" retrieval is done in one database
    query for all SG. That improves the performance of this method
    reducing the database access to only one single call.

    This improvement, as commented in the LP bug, affects to
    ML2/LinuxBridge. ML2/OVS agent uses a cached RPC implementation
    that not requires to perform any RPC call/database query.

    Closes-Bug: #2045950
    Change-Id: Iafd0419a1d1eeb25d5589edc2570ebf287450957
    (cherry picked from commit 6b6abb9698318a0b5db09f0c4d30a47438a94643)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/904243
Committed: https://opendev.org/openstack/neutron/commit/29d5570ab38cc056ef0ff4bd399605905d8fe8c1
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit 29d5570ab38cc056ef0ff4bd399605905d8fe8c1
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu Dec 14 15:45:48 2023 +0000

    Improve the SG RPC callback ``security_group_info_for_ports``

    This method populates the SG rules in a dictionary. Each SG rule
    inherits the "stateful" value of the SG. Prior to this patch, each
    SG rule was isuing a database call to retrieve the SG register.

    In this patch, the SG "stateful" retrieval is done in one database
    query for all SG. That improves the performance of this method
    reducing the database access to only one single call.

    This improvement, as commented in the LP bug, affects to
    ML2/LinuxBridge. ML2/OVS agent uses a cached RPC implementation
    that not requires to perform any RPC call/database query.

    Closes-Bug: #2045950
    Change-Id: Iafd0419a1d1eeb25d5589edc2570ebf287450957
    (cherry picked from commit 6b6abb9698318a0b5db09f0c4d30a47438a94643)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/904245
Committed: https://opendev.org/openstack/neutron/commit/bbe235779b9094b42523c2063434fdd137dc450a
Submitter: "Zuul (22348)"
Branch: stable/zed

commit bbe235779b9094b42523c2063434fdd137dc450a
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu Dec 14 15:45:48 2023 +0000

    Improve the SG RPC callback ``security_group_info_for_ports``

    This method populates the SG rules in a dictionary. Each SG rule
    inherits the "stateful" value of the SG. Prior to this patch, each
    SG rule was isuing a database call to retrieve the SG register.

    In this patch, the SG "stateful" retrieval is done in one database
    query for all SG. That improves the performance of this method
    reducing the database access to only one single call.

    This improvement, as commented in the LP bug, affects to
    ML2/LinuxBridge. ML2/OVS agent uses a cached RPC implementation
    that not requires to perform any RPC call/database query.

    Closes-Bug: #2045950
    Change-Id: Iafd0419a1d1eeb25d5589edc2570ebf287450957
    (cherry picked from commit 6b6abb9698318a0b5db09f0c4d30a47438a94643)

tags: added: in-stable-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/904246
Committed: https://opendev.org/openstack/neutron/commit/7ac9edac80132577ab285484cd48f9a859153350
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 7ac9edac80132577ab285484cd48f9a859153350
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu Dec 14 15:45:48 2023 +0000

    Improve the SG RPC callback ``security_group_info_for_ports``

    This method populates the SG rules in a dictionary. Each SG rule
    inherits the "stateful" value of the SG. Prior to this patch, each
    SG rule was isuing a database call to retrieve the SG register.

    In this patch, the SG "stateful" retrieval is done in one database
    query for all SG. That improves the performance of this method
    reducing the database access to only one single call.

    This improvement, as commented in the LP bug, affects to
    ML2/LinuxBridge. ML2/OVS agent uses a cached RPC implementation
    that not requires to perform any RPC call/database query.

    Conflicts:
        neutron/objects/securitygroup.py

    Closes-Bug: #2045950
    Change-Id: Iafd0419a1d1eeb25d5589edc2570ebf287450957
    (cherry picked from commit 6b6abb9698318a0b5db09f0c4d30a47438a94643)

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/904247
Committed: https://opendev.org/openstack/neutron/commit/6e26d9ed40a88e26d1201b5632f644b70de6f962
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 6e26d9ed40a88e26d1201b5632f644b70de6f962
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu Dec 14 15:45:48 2023 +0000

    Improve the SG RPC callback ``security_group_info_for_ports``

    This method populates the SG rules in a dictionary. Each SG rule
    inherits the "stateful" value of the SG. Prior to this patch, each
    SG rule was isuing a database call to retrieve the SG register.

    In this patch, the SG "stateful" retrieval is done in one database
    query for all SG. That improves the performance of this method
    reducing the database access to only one single call.

    This improvement, as commented in the LP bug, affects to
    ML2/LinuxBridge. ML2/OVS agent uses a cached RPC implementation
    that not requires to perform any RPC call/database query.

    Conflicts:
        neutron/objects/securitygroup.py

    Closes-Bug: #2045950
    Change-Id: Iafd0419a1d1eeb25d5589edc2570ebf287450957
    (cherry picked from commit 6b6abb9698318a0b5db09f0c4d30a47438a94643)
    (cherry picked from commit 7ac9edac80132577ab285484cd48f9a859153350)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/904248
Committed: https://opendev.org/openstack/neutron/commit/a1495378e3a763e9c69088279789591c4b22e358
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit a1495378e3a763e9c69088279789591c4b22e358
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu Dec 14 15:45:48 2023 +0000

    Improve the SG RPC callback ``security_group_info_for_ports``

    This method populates the SG rules in a dictionary. Each SG rule
    inherits the "stateful" value of the SG. Prior to this patch, each
    SG rule was isuing a database call to retrieve the SG register.

    In this patch, the SG "stateful" retrieval is done in one database
    query for all SG. That improves the performance of this method
    reducing the database access to only one single call.

    This improvement, as commented in the LP bug, affects to
    ML2/LinuxBridge. ML2/OVS agent uses a cached RPC implementation
    that not requires to perform any RPC call/database query.

    Conflicts:
        neutron/objects/securitygroup.py

    Closes-Bug: #2045950
    Change-Id: Iafd0419a1d1eeb25d5589edc2570ebf287450957
    (cherry picked from commit 6b6abb9698318a0b5db09f0c4d30a47438a94643)
    (cherry picked from commit 7ac9edac80132577ab285484cd48f9a859153350)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 24.0.0.0b1

This issue was fixed in the openstack/neutron 24.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron yoga-eom

This issue was fixed in the openstack/neutron yoga-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron wallaby-eom

This issue was fixed in the openstack/neutron wallaby-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron xena-eom

This issue was fixed in the openstack/neutron xena-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 21.2.1

This issue was fixed in the openstack/neutron 21.2.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.