Comment 2 for bug 2022360

Revision history for this message
Guillaume Espanel (guillaume-espanel) wrote :

Here's a proposed patch (I don't know why it doesn't appear in the bug's comments): https://review.opendev.org/c/openstack/neutron/+/883235

For the sake of completeness:
we observed the problem in Stein and Xena, but the code indicates this issue exists in the subsequent versions as well.

Without being too specific, our environment spans up to several thousands agents. I don't have clear numbers about how many rules per group we have, but I would hazard a guess at at least 2.

With regard to the code-path triggering the issue:

When a security group is deleted, a notification is sent to all the agents triggering the callbacks of SecurityGroupServerAPIShim [1]. All the agents receiving the notification proceed with the SecurityGroupServerAPIShim._clear_child_sg_rules callback which itself is using
self.rcache.get_resources to look for SecurityGroupRules belonging to the deleted SecurityGroup [2].

Now, the first thing RemoteResourceCache.get_resources does is call _flood_cache_for_query [3] which looks in its own cache if it already queried neutron-rpc for these resources and if not, finally, performs the unwelcome bulk_pull.

[1] https://github.com/openstack/neutron/blob/fd21c905ca9016092d48d3f4442bae6d4abb42e3/neutron/api/rpc/handlers/securitygroups_rpc.py#L247
[2] https://github.com/openstack/neutron/blob/fd21c905ca9016092d48d3f4442bae6d4abb42e3/neutron/api/rpc/handlers/securitygroups_rpc.py#L306
[3] https://github.com/openstack/neutron/blob/fd21c905ca9016092d48d3f4442bae6d4abb42e3/neutron/agent/resource_cache.py#L127