Comment 9 for bug 1830679

Revision history for this message
Miguel Lavalle (minsel) wrote :

I am able to replicate this bug with a development system setup with master branch. Executing the following python script, I created 6000+ security groups in my development system (showing only relevant part):

secgroups_ids = []
projects_ids = []
secgroups_in_project = 0
created_secgroups_count = 0
created_projects = 0
try:
    while created_secgroups_count < 6000:
        if secgroups_in_project <= 0:
            project = keystone.projects.create(
                name='test-project-%s' % created_projects,
                domain=login_data['user_domain_id'])
            secgroups_in_project = random.randint(1, 7)
            projects_ids.append(project.id)
            new_project = True
            created_projects += 1
        body = {'security_group': {
                    'name': 'test-secgroup-%s' % created_secgroups_count,
                    'description': ('Test security group %s' %
                                    created_secgroups_count),
                    'project_id': project.id}}
        secgroup = neutron.create_security_group(body=body)
        secgroups_in_project -= 1
        secgroups_ids.append(secgroup['security_group']['id'])
        created_secgroups_count += 1
        if new_project:
            new_project = False
            default_secgroup = neutron.list_security_groups(
                name='default', project_id=project.id)['security_groups'][0]
            secgroups_ids.append(default_secgroup['id'])
            created_secgroups_count += 1
        print(created_secgroups_count)
    import pdb
    pdb.set_trace()
finally:
    for secgroup_id in secgroups_ids:
        neutron.delete_security_group(secgroup_id)
    for project_id in projects_ids:
        keystone.projects.delete(project_id)

After creating the security groups (and leaving above script stopped at the pdb.set_trace() line), I execute the following bash script:

source ~/demo-openrc
start_time="$(date -u +%s)"
openstack security group list
end_time="$(date -u +%s)"
elapsed="$(($end_time-$start_time))"
echo "Total of $elapsed seconds elapsed for process"

One of the executions produces the following output (all runs showed about the same elapsed time):

+--------------------------------------+---------+------------------------+----------------------------------+------+
| ID | Name | Description | Project | Tags |
+--------------------------------------+---------+------------------------+----------------------------------+------+
| 58d172f3-b1ce-4592-a954-7151f9e7708e | default | Default security group | 29b2f9ab6dc04e0ab8ab7e48c8baaa8d | [] |
+--------------------------------------+---------+------------------------+----------------------------------+------+
Total of 525 seconds elapsed for process

However, executing the following python script:

import time

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

from neutron.db.models import securitygroup

engine = create_engine(
    'mysql+pymysql://root:devstack@127.0.0.1/neutron?charset=utf8',
    echo=False)
session = sessionmaker()
session.configure(bind=engine)
s = session()
start = time.time()
query = s.query(securitygroup.SecurityGroup)
secgroups = query.all()
end = time.time()
print('Number of security groups retrieved %s' % len(secgroups))
print('Elapsed time in seconds: %s' % (end - start))

I get the following output:

Number of security groups retrieved 6005
Elapsed time in seconds: 0.654874801636

This tells us that the problem is not the definition of the SQLAlchemy DB model as suggested in one of the previous comments regarding the lazy attribute of the rbac_entries relationship (https://review.opendev.org/#/c/635311/22/neutron/db/models/securitygroup.py@33), because a query that retrieves 6005 sec groups takes less than a second.

Digging at the code, I added LOG statements right before this line https://review.opendev.org/#/c/635311/22/neutron/objects/rbac_db.py@114 and right before this line https://review.opendev.org/#/c/635311/22/neutron/objects/rbac_db.py@118. Executing the above bash script, I get the following output:

+--------------------------------------+---------+------------------------+----------------------------------+------+
| ID | Name | Description | Project | Tags |
+--------------------------------------+---------+------------------------+----------------------------------+------+
| 58d172f3-b1ce-4592-a954-7151f9e7708e | default | Default security group | 29b2f9ab6dc04e0ab8ab7e48c8baaa8d | [] |
+--------------------------------------+---------+------------------------+----------------------------------+------+
Total of 504 seconds elapsed for process

While I can see the following lines in the neutron server log file:

Jun 16 23:11:51 allinone neutron-server[14624]: DEBUG neutron.objects.rbac_db [None req-909eccd0-1cdb-480c-a5be-170ef82abdeb demo demo] mlavalle {{(pid=14638) get_objects /opt/stack/neutron/neutron/objects/rbac_db.py:119}}

Jun 16 23:19:25 allinone neutron-server[14624]: DEBUG neutron.objects.rbac_db [None req-909eccd0-1cdb-480c-a5be-170ef82abdeb demo demo] mlavalle {{(pid=14638) get_objects /opt/stack/neutron/neutron/objects/rbac_db.py:124}}

Between these two lines there is an elapsed time of 454 seconds. In other words, Most of the 504 seconds shown above are spent trying to select, based on non-existent RBAC entries, the security groups retrieved from the DB