SecurityGroup deletion causes bulk_pull of SG rules by all the agents

Bug #2022360 reported by Guillaume Espanel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
In Progress
Medium
Guillaume Espanel

Bug Description

Deleting a security group results in each agent of the region running
a bulk_pull query for all the rules in the security group against the
neutron-rpc. This is incurs a load on neutron-rpc, rabbitmq and the db
proportional to the number of agents and the number of security group
rules and has a noticeable impact on larger infrastructures.

How to reproduce:

1. Create a security group.
2. Delete the security group.
3. Observe all the neutron agents are performing a bulk_pull for the deleted security group.

Tags: sg-fw
Miro Tomaska (mtomaska)
tags: added: sg-fw
Revision history for this message
Miro Tomaska (mtomaska) wrote :

Hi Guillaume,

Thank you for submitting the bug. For completness
- what OSP version are you using? Probably does not matter that much since the bulk_pull has not changed much
- How big is your enviroment? How many rpc workers do you have on each node? How many security rules per group? Basically more details to reproduce the performance issue.
- Agents logs could be useful as well if handy.

Thanks!

Changed in neutron:
status: New → In Progress
Revision history for this message
Guillaume Espanel (guillaume-espanel) wrote :

Here's a proposed patch (I don't know why it doesn't appear in the bug's comments): https://review.opendev.org/c/openstack/neutron/+/883235

For the sake of completeness:
we observed the problem in Stein and Xena, but the code indicates this issue exists in the subsequent versions as well.

Without being too specific, our environment spans up to several thousands agents. I don't have clear numbers about how many rules per group we have, but I would hazard a guess at at least 2.

With regard to the code-path triggering the issue:

When a security group is deleted, a notification is sent to all the agents triggering the callbacks of SecurityGroupServerAPIShim [1]. All the agents receiving the notification proceed with the SecurityGroupServerAPIShim._clear_child_sg_rules callback which itself is using
self.rcache.get_resources to look for SecurityGroupRules belonging to the deleted SecurityGroup [2].

Now, the first thing RemoteResourceCache.get_resources does is call _flood_cache_for_query [3] which looks in its own cache if it already queried neutron-rpc for these resources and if not, finally, performs the unwelcome bulk_pull.

[1] https://github.com/openstack/neutron/blob/fd21c905ca9016092d48d3f4442bae6d4abb42e3/neutron/api/rpc/handlers/securitygroups_rpc.py#L247
[2] https://github.com/openstack/neutron/blob/fd21c905ca9016092d48d3f4442bae6d4abb42e3/neutron/api/rpc/handlers/securitygroups_rpc.py#L306
[3] https://github.com/openstack/neutron/blob/fd21c905ca9016092d48d3f4442bae6d4abb42e3/neutron/agent/resource_cache.py#L127

Revision history for this message
Lajos Katona (lajos-katona) wrote :

Thanks for taking care for this issue

Changed in neutron:
assignee: nobody → Guillaume Espanel (guillaume-espanel)
Revision history for this message
Lajos Katona (lajos-katona) wrote :

Just to make it more visible, related patch:
https://review.opendev.org/c/openstack/neutron/+/883235

Changed in neutron:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 23.0.0.0b3

This issue was fixed in the openstack/neutron 23.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.