os-security-groups api call creates api call explosion to neutron

Bug #1729741 reported by Robert van Leeuwen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Medium
Unassigned
Pike
Confirmed
Medium
Unassigned

Bug Description

1) create a security group
2) create a bunch of security group rules which reference a security group instead of a CIDR e.g.
openstack security group rule create --remote-group xxxxx-1123-xxxx-xxx-xxxxx

When querying nova api /os-security-groups there will be an API call to neutron for each rule that has a remote group attached.

In the logs you will seee GET /v2.0/security-groups/xxxxx-1123-xxxx-xxx-xxxxx
Creating rules with a CIDR do not have this issue.

As you can imagine this will quickly get very slow.

Revision history for this message
Robert van Leeuwen (rovanleeuwen) wrote :

FYI: tested with Kilo/Mitaka & Pike

Revision history for this message
Robert van Leeuwen (rovanleeuwen) wrote :

The problem seems to be here in the security_groups.py:

    def _format_security_group_rule(self, context, rule, group_rule_data=None):
        """Return a security group rule in desired API response format.

        If group_rule_data is passed in that is used rather than querying
        for it.
        """
        sg_rule = {}
        sg_rule['id'] = rule['id']
        sg_rule['parent_group_id'] = rule['parent_group_id']
        sg_rule['ip_protocol'] = rule['protocol']
        sg_rule['from_port'] = rule['from_port']
        sg_rule['to_port'] = rule['to_port']
        sg_rule['group'] = {}
        sg_rule['ip_range'] = {}
        if group_rule_data:
            sg_rule['group'] = group_rule_data
        elif rule['group_id']:
            try:
                source_group = self.security_group_api.get(
                    context, id=rule['group_id'])
            except exception.SecurityGroupNotFound:
                # NOTE(arosen): There is a possible race condition that can
                # occur here if two api calls occur concurrently: one that
                # lists the security groups and another one that deletes a
                # security group rule that has a group_id before the
                # group_id is fetched. To handle this if
                # SecurityGroupNotFound is raised we return None instead
                # of the rule and the caller should ignore the rule.
                LOG.debug("Security Group ID %s does not exist",
                          rule['group_id'])
                return
            sg_rule['group'] = {'name': source_group.get('name'),
                                'tenant_id': source_group.get('project_id')}
        else:
            sg_rule['ip_range'] = {'cidr': rule['cidr']}
        return sg_rule

As you can see it will do a:
 elif rule['group_id']:
        source_group = self.security_group_api.get(
                    context, id=rule['group_id'])

For each result with a group_id it will go to neutron to get the group info.
It could at least cache this within a single API call.
Even better would be if we could ask neutron to give the correct info in a single API call

Revision history for this message
Matt Riedemann (mriedem) wrote :

I think I actually reported a similar bug quite awhile ago, trying to find it.

tags: added: performance
Revision history for this message
Matt Riedemann (mriedem) wrote :

I was thinking of bug 1580621 and I did point this out in comment 1:

https://bugs.launchpad.net/nova/+bug/1580621/comments/1

Have you tested with this patch?

https://review.openstack.org/#/c/315311/

Revision history for this message
Matt Riedemann (mriedem) wrote :
tags: added: api
Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/517648

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/517648
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=dc658dbdcf2325cf6f27d9ae61d85b835f8410f8
Submitter: Zuul
Branch: master

commit dc658dbdcf2325cf6f27d9ae61d85b835f8410f8
Author: Matt Riedemann <email address hidden>
Date: Fri Nov 3 12:26:03 2017 -0400

    Avoid redundant security group queries in GET /servers/{id}/os-security-groups

    The GET /servers/{server_id}/os-security-groups API code can
    perform poorly if the instance has several security groups and
    each security group has several rules. This is because when processing
    the output, we loop over the groups, and loop over the rules per group,
    and then for each rule, if it has a group_id specified, we query
    the security group details (from Neutron in most cases).

    If more than one rule points at the same group_id, we're doing a redundant
    group lookup and sending more traffic to the security group API (aka Neutron)
    than needed.

    This change optimizes that single API to load the rule group details
    up front so that we only do at most one lookup per group_id.

    This could be extended to GET /os-security-groups but that API is
    deprecated so any optimization there is lower priority.

    Change-Id: Ia451429f61b15526fade6838386e562c17591d36
    Closes-Bug: #1729741

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
Robert van Leeuwen (rovanleeuwen) wrote :
Download full text (3.3 KiB)

Re-opened this bug: I am not sure the fix did anything.
I applied the fix and still see the exact same thing in the neutron logs, e.g.:

2017-11-13 09:55:49,914.914 10778 INFO neutron.wsgi [req-4c18bd00-48c2-498f-92ee-824e215b83b9 c70f3fc161e04718a108cf8192d0816e d2930aef1c824f048b8e1301b3ded161 - - -] 10.41.0.33 - - [13/Nov/2017 09:55:49] "GET /v2.0/security-groups/d2770f7f-8695-48de-806b-e690189e25a8 HTTP/1.1" 200 41170 0.062934
2017-11-13 09:55:49,982.982 10778 INFO neutron.wsgi [req-f69f7d30-6121-456e-8dec-4c142965e91c c70f3fc161e04718a108cf8192d0816e d2930aef1c824f048b8e1301b3ded161 - - -] 10.41.0.33 - - [13/Nov/2017 09:55:49] "GET /v2.0/security-groups/d2770f7f-8695-48de-806b-e690189e25a8 HTTP/1.1" 200 41170 0.055435
2017-11-13 09:55:50,054.054 10778 INFO neutron.wsgi [req-4a389783-31b1-421e-a0f9-b31e419bc5df c70f3fc161e04718a108cf8192d0816e d2930aef1c824f048b8e1301b3ded161 - - -] 10.41.0.33 - - [13/Nov/2017 09:55:50] "GET /v2.0/security-groups/d2770f7f-8695-48de-806b-e690189e25a8 HTTP/1.1" 200 41170 0.058556
2017-11-13 09:55:50,119.119 10778 INFO neutron.wsgi [req-3dffc0a6-5f14-400f-8b4c-02555d2e262d c70f3fc161e04718a108cf8192d0816e d2930aef1c824f048b8e1301b3ded161 - - -] 10.41.0.33 - - [13/Nov/2017 09:55:50] "GET /v2.0/security-groups/d2770f7f-8695-48de-806b-e690189e25a8 HTTP/1.1" 200 41170 0.052321
2017-11-13 09:55:50,198.198 10778 INFO neutron.wsgi [req-007398bc-72e9-4858-8b04-f49face2304d c70f3fc161e04718a108cf8192d0816e d2930aef1c824f048b8e1301b3ded161 - - -] 10.41.0.33 - - [13/Nov/2017 09:55:50] "GET /v2.0/security-groups/d2770f7f-8695-48de-806b-e690189e25a8 HTTP/1.1" 200 41170 0.066602
2017-11-13 09:55:50,270.270 10778 INFO neutron.wsgi [req-092f3234-926b-4f2c-8626-1b3e7f1203dd c70f3fc161e04718a108cf8192d0816e d2930aef1c824f048b8e1301b3ded161 - - -] 10.41.0.33 - - [13/Nov/2017 09:55:50] "GET /v2.0/security-groups/d2770f7f-8695-48de-806b-e690189e25a8 HTTP/1.1" 200 41170 0.059341
2017-11-13 09:55:50,334.334 10778 INFO neutron.wsgi [req-20f45327-9d24-482c-b648-176e85d46a81 c70f3fc161e04718a108cf8192d0816e d2930aef1c824f048b8e1301b3ded161 - - -] 10.41.0.33 - - [13/Nov/2017 09:55:50] "GET /v2.0/security-groups/d2770f7f-8695-48de-806b-e690189e25a8 HTTP/1.1" 200 41170 0.051137
2017-11-13 09:55:50,397.397 10778 INFO neutron.wsgi [req-0fa86b48-59a0-460c-829e-ea008a442ed0 c70f3fc161e04718a108cf8192d0816e d2930aef1c824f048b8e1301b3ded161 - - -] 10.41.0.33 - - [13/Nov/2017 09:55:50] "GET /v2.0/security-groups/d2770f7f-8695-48de-806b-e690189e25a8 HTTP/1.1" 200 41170 0.050534
2017-11-13 09:55:50,459.459 10778 INFO neutron.wsgi [req-8bb7bfdc-b12a-46b3-a76d-7ba299d45a84 c70f3fc161e04718a108cf8192d0816e d2930aef1c824f048b8e1301b3ded161 - - -] 10.41.0.33 - - [13/Nov/2017 09:55:50] "GET /v2.0/security-groups/d2770f7f-8695-48de-806b-e690189e25a8 HTTP/1.1" 200 41170 0.048582
2017-11-13 09:55:50,527.527 10778 INFO neutron.wsgi [req-f19d8910-967c-4236-91d3-a59852482383 c70f3fc161e04718a108cf8192d0816e d2930aef1c824f048b8e1301b3ded161 - - -] 10.41.0.33 - - [13/Nov/2017 09:55:50] "GET /v2.0/security-groups/d2770f7f-8695-48de-806b-e690189e25a8 HTTP/1.1" 200 41170 0.054999
2017-11-13 09:55:50,596.596 10778 INFO neutron.wsgi [req-f21e1335-ddd7-49b0-9f2b-d...

Read more...

Changed in nova:
status: Fix Released → Confirmed
Revision history for this message
Matt Riedemann (mriedem) wrote :

Robert, can you confirm a few things?

1. The fix I put in was just for the "GET /servers/{server_id}/os-security-groups" API, since the other APIs in the os-security-groups extension are deprecated. Are you sure that's the API you are testing with the fix applied?

2. Can you provide details on how many unique security groups are applied to a single instance in your recreate test? And how many rules are applied to each group?

If the instance has 50 unique security groups, then you're right that this fix won't do much since Nova will make a separate call to Neutron per group to get details. However, there should still only be 1 call to Neutron per group and rule, which would be an improvement.

Revision history for this message
Robert van Leeuwen (rovanleeuwen) wrote :

1) The usage is currently by terraform which seems to call /v2/tenant-id/os-security-groups
so I guess we will have to file a bug-report there

2) I created 50 rules with the identical remote security group

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0b2

This issue was fixed in the openstack/nova 17.0.0.0b2 development milestone.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Removing myself as assignee since I'm not actively working on this and I'm not sure if it's still an issue.

Changed in nova:
assignee: Matt Riedemann (mriedem) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.