Slownesses on neutron API with many RBAC rules

Bug #1918145 reported by LEDUC Florian
42
This bug affects 8 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Rodolfo Alonso

Bug Description

* Summary: Slownesses on neutron API with many RBAC rules

* High level description: Sharing several networks or security groups to project drastically increase API response time on some routes (/networks or /server/detail).

For quite some time we have observing that reponse times are increasing (slowly fur surely) on /networks calls. We have increased the number of Neutron workers, but in vain.

Lately, we're observing that it's getting worse (reponse time form 5 to 370 seconds). We discarded possible bottlenecks one by one (our service endpoint performance, neutron API configuration, etc).
But we have found that some calls in the DB takes a lot of time. It seems they are stuck in the mariadb database (10.3.10). So we have captured a slow queries in mysql.

An example of for /server/detail:
---------------------------------
http://paste.openstack.org/show/803334/

We can see that there are more than 2 millions of rows examinated, and around 1657 returned.

An example of for /networks:
----------------------------
http://paste.openstack.org/show/803337/
Rows_sent: 517 Rows_examined: 223519

* Pre-conditions:
Database tables size:
table:
    - networkrbacs 16928 rows
    - securitygrouprbacs 1691 rows
    - keystone.project 1713 rows

Control plane nodes are shared with some others services:
- RMQ
- mariadb
- Openstack APIs
- DHCP agents

It seems the code of those lines are based on https://github.com/openstack/neutron-lib/blob/698e4c8daa7d43018a71122ec5b0cd5b17b55141/neutron_lib/db/model_query.py#L120

* Step-by-step reproduction steps:

- Create a lot of projects (at least 1000)
- Create a SG in admin account
- Create fake networks (vlan, vxlan) with associated
- Share the SG and all networks with all projects

* Expected output: lower response time, less than 5 seconds (approximatively).

* Actual output: May lead to gateway timeout.

* Version:
  ** OpenStack version Stein releases for all components (neutron 14.2.0).
  ** CentOS 7.4 with kolla containers
  ** kolla-ansible for stein release

* Environment: We operate all services in Openstack except for Cinder.

* Perceived severity: Medium

description: updated
Hongbin Lu (hongbin.lu)
Changed in neutron:
status: New → Confirmed
importance: Undecided → Medium
tags: added: db
tags: added: loadimpact
Revision history for this message
LEDUC Florian (leducflorian) wrote :

Hi,

We're currently investigating issue on the database related to EXPLAIN or ANALYZE output. It may be due to non-optimized indexes on table.

We will let you know.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Leduc:

I can't reproduce those queries in my env. I tried creating 300 projects, 300 networks (one per project) and sharing each network with each project (not a full 300-to-300 matrix, but I have more than 2000 rbac objects). When I try to reproduce [1], I have [2]. My query only examines 10 rows.

Some questions:
- Do you have [3] in your code?
- How much time does your system take to execute "openstack network list"?

Regards.

[1]http://paste.openstack.org/show/803337/
[2]http://paste.openstack.org/show/804174/
[3]https://review.opendev.org/c/openstack/neutron/+/720051

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I'm marking this as incomplete until we will get info requested by Rodolfo in the comment #2.

Changed in neutron:
status: Confirmed → Incomplete
Revision history for this message
LEDUC Florian (leducflorian) wrote :

Hi,

thanks for the response.

After executing mysqlcheck -o -A -p (OPTIMIZE, ANALYZE) we did not manage to decrease the load and the number of rows examined.

- Yes we have [3].
- Here's a little of context for openstack network list:

Region running Stein release:
project: 1909
rbac rules: 22241
networks: 57
security groups: 2847

REPONSE TIMES:

AS ADMIN CONTEXT:
-----------------

real 0m8.596s
user 0m1.616s
sys 0m0.184s

AS USER CONTEXT:
----------------
real 0m16.632s
user 0m1.588s
sys 0m0.175s

We have successfully migrated to Ussuri one of our region, here's some figures:

project: 770
rbac rules: 29528
networks: 94
security groups: 1071

REPONSE TIMES:

AS ADMIN CONTEXT:
-----------------

real 0m5.163s
user 0m1.611s
sys 0m0.184s

AS USER CONTEXT:
----------------
real 0m6.398s
user 0m1.602s
sys 0m0.174s

I've noticed that somes route are taking a lot of time to respond. As we did not deploy the OSprofiler in production env, we do not know is they are stuck in python, or the DB, or even elsewhere:
- openstack security group list (56s)
- openstack network rbac list (18s)

Revision history for this message
LEDUC Florian (leducflorian) wrote :

Hi,

any update regarding my last comment ?

Revision history for this message
LEDUC Florian (leducflorian) wrote :
Download full text (4.2 KiB)

Hello,

after another attempts to add several networks, we have noticed that during the performance issue we do full table SCAN.

MariaDB [neutron]> EXPLAIN SELECT networkrbacs.project_id AS networkrbacs_project_id, networkrbacs.id AS networkrbacs_id, networkrbacs.target_tenant AS networkrbacs_target_tenant, networkrbacs.action AS networkrbacs_action, networkrbacs.object_id AS networkrbacs_objec t_id, anon_1.networks_id AS anon_1_networks_id
    -> FROM (SELECT networks.id AS networks_id
    -> FROM networks LEFT OUTER JOIN networkrbacs ON networks.id = networkrbacs.object_id
    -> WHERE networkrbacs.action = 'access_as_external' AND networkrbacs.target_tenant = 'a9b0XXXXXXX' OR network rbacs.target_tenant = '*' OR networks.project_id = 'a9b0XXXXXXX' OR networkrbacs.action = 'access_as_shared' AND (networkrbacs.target_tenant = 'a9b0XXXXXXX' OR networkrbacs.target_tenant = '*')) AS anon_1 INNER JOIN networkrba cs ON anon_1.networks_id = networkrbacs.object_id
    -> ;
+------+-------------+--------------+--------+--------------------------------+-----------+---------+--------------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref |rows | Extra |
+------+-------------+--------------+--------+--------------------------------+-----------+---------+--------------------------------+-------+-------------+
| 1 | SIMPLE | networkrbacs | ALL | object_id | NULL | NULL | NULL |22857 | |
| 1 | SIMPLE | networks | eq_ref | PRIMARY,ix_networks_project_id | PRIMARY | 110 | neutron.networkrbacs.object_id |1 | |
| 1 | SIMPLE | networkrbacs | ref | object_id | object_id | 110 | neutron.networkrbacs.object_id |238 | Using where |
+------+-------------+--------------+--------+--------------------------------+-----------+---------+--------------------------------+-------+-------------+

After removing some of them, it seems we're under a certain threshold and we do use the index again.

MariaDB [neutron]> EXPLAIN SELECT networkrbacs.project_id AS networkrbacs_project_id, networkrbacs.id AS networkrbacs_id, networkrbacs.target_tenant AS networkrbacs_target_tenant, networkrbacs.action AS networkrbacs_action, networkrbacs.object_id AS networkrbacs_object_id, anon_1.networks_id AS anon_1_networks_id
    -> FROM (SELECT networks.id AS networks_id
    -> FROM networks LEFT OUTER JOIN networkrbacs ON networks.id = networkrbacs.object_id
    -> WHERE networkrbacs.action = 'access_as_external' AND networkrbacs.target_tenant = 'a9b0XXXXXXX' OR networkrbacs.target_tenant = '*' OR networks.project_id = 'a9b0XXXXXXX' OR networkrbacs.action = 'access_as_shared' AND (networkrbacs.target_tenant = 'a9b0XXXXXXX' OR networkrbacs.target_tenant = '*')) AS anon_1 INNER JOIN networkrbacs ON anon_1.networks_id = networkrbacs.object_id;
+------+-------------+-------------...

Read more...

Changed in neutron:
status: Incomplete → Confirmed
Revision history for this message
Tyler Stachecki (tstachecki) wrote :

Hi, we've also run into this issue and have been running a fix for our cloud sometime now with success.

I can work on upstreaming the fix we're using for review/inclusion.

Changed in neutron:
assignee: nobody → Tyler Stachecki (tstachecki)
Revision history for this message
LEDUC Florian (leducflorian) wrote :
Download full text (6.9 KiB)

Hi Tyler,

after some searching on mariadb database tuning, we found that setting "join_cache_level = 3" on mariadb is speeding the query time by 5, but it still does a full table scan and consume 100% during the processing of the request.

Another clue is that index statistics values are not well updated in that case:

------------------------
MariaDB [nova]> select * from mysql.innodb_index_stats where table_name='networkrbacs';
+---------------+--------------+--------------------------------------------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description |
+---------------+--------------+--------------------------------------------------+---------------------+--------------+------------+-------------+-----------------------------------+
| neutron | networkrbacs | PRIMARY | 2021-05-07 15:32:09 | n_diff_pfx01 | 22865 | 267 | id |
| neutron | networkrbacs | PRIMARY | 2021-05-07 15:32:09 | n_leaf_pages | 1 | NULL | Number of leaf pages in the index |
| neutron | networkrbacs | PRIMARY | 2021-05-07 15:32:09 | size | 289 | NULL | Number of pages in the index |
| neutron | networkrbacs | ix_networkrbacs_project_id | 2021-05-07 15:32:09 | n_diff_pfx01 | 1 | 109 | project_id |
| neutron | networkrbacs | ix_networkrbacs_project_id | 2021-05-07 15:32:09 | n_diff_pfx02 | 22865 | 109 | project_id,id |
| neutron | networkrbacs | ix_networkrbacs_project_id | 2021-05-07 15:32:09 | n_leaf_pages | 1 | NULL | Number of leaf pages in the index |
| neutron | networkrbacs | ix_networkrbacs_project_id | 2021-05-07 15:32:09 | size | 161 | NULL | Number of pages in the index |
| neutron | networkrbacs | object_id | 2021-05-07 15:32:09 | n_diff_pfx01 | 48 | 163 | object_id |
| neutron | networkrbacs | object_id | 2021-05-07 15:32:09 | n_diff_pfx02 | 22865 | 163 | object_id,id |
| neutron | networkrbacs | object_id | 2021-05-07 15:32:09 | n_leaf_pages | 1 | NULL | Number of leaf pages in the index |
| neutron | networkrbacs | object_id | 2021-05-07 15:32:09 | size | 225 | NULL | Number of pages in the index |
| neutron | networkrbacs | uniq_networkrbacs0tenant_target0object_id0action | 2021-05-07 15:32:09 | n_diff_pfx01 | 1 | 255 | action |
| ne...

Read more...

Revision history for this message
Tyler Stachecki (tstachecki) wrote :

Sorry for the delay.

In writing up our fix, I realized that the fix I deployed locally to our cloud is not comprehensive and does not cover all possible use cases for networks/RBACs. That's OK, though - I'd still like to work on a full solution and have a good grasp of the problem at hand.

The problem is as Florian describes. By default, the SQLalchemy-generated query for many of these "RBAC membership query"-type functions, such as `openstack network list`, as seen below:

SELECT
    networkrbacs.project_id AS networkrbacs_project_id,
    networkrbacs.id AS networkrbacs_id,
    networkrbacs.target_tenant AS networkrbacs_target_tenant,
    networkrbacs.action AS networkrbacs_action,
    networkrbacs.object_id AS networkrbacs_object_id,
    subnets_1.network_id AS subnets_1_network_id
FROM (
    SELECT
        networks.id AS networks_id FROM networks
    LEFT OUTER JOIN networkrbacs
        ON networks.id = networkrbacs.object_id
    WHERE
        networkrbacs.action = 'access_as_external' AND
        networkrbacs.target_tenant = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' OR
        networkrbacs.target_tenant = '*' OR
        networks.project_id = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' OR
        networkrbacs.action = 'access_as_shared' AND (
            networkrbacs.target_tenant = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' OR
            networkrbacs.target_tenant = '*'
)) AS anon_1
    INNER JOIN subnets AS subnets_1
        ON anon_1.networks_id = subnets_1.network_id
    INNER JOIN networkrbacs
        ON subnets_1.network_id = networkrbacs.object_id
ORDER BY subnets_1.network_id;

EXPLAIN-ing the query, we can see the first part of the query plan has a table scan for object_id in networkrbacs. This is why as the cloud deployment accumulates a large amount of RBACs, the performance gets very slow.

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: networkrbacs
   partitions: NULL
         type: ALL
possible_keys: object_id
          key: NULL
      key_len: NULL
          ref: NULL
         rows: *************
     filtered: 100.00
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
...
<snip>
...

The painpoint of the query is here:
    WHERE
        networkrbacs.action = 'access_as_external' AND
        ...
        networks.project_id = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' OR

If the networks.project_id part of the expression is dropped or changed such that an index is used, the query completes very quickly regardless how many RBACs are in play.

This part of the query is generated in `python3-neutronlib`, here:
https://github.com/openstack/neutron-lib/blob/master/neutron_lib/db/model_query.py#L125

I'll check if an index is present and see if adding one helps, though we also got by for our specific use case by adjusting the code slightly around this point as an intern measure.

Revision history for this message
Tyler Stachecki (tstachecki) wrote (last edit ):

Sorry, been backlogged and have not been able to get around to this.

The table scan seems to be attributable to the LEFT OUTER JOIN in the subquery. Replacing that with an INNER JOIN will result in the indexes used (or, at least it does on MySQL).

On first glance, the OUTER JOIN seems necessary as the subquery contains a "OR networks.project_id = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' OR" part of the subquery, which indicates that networks without RBACs are still not filtered out.

But, at least in the case of the outer query at hand here, we see:
    INNER JOIN subnets AS subnets_1
        ON anon_1.networks_id = subnets_1.network_id
    INNER JOIN networkrbacs
        ON subnets_1.network_id = networkrbacs.object_id

i.e., we INNER JOIN the subquery (anon_1) back to networkrbacs anyways, which would filter out the rows that the OUTER JOIN in the subquery affords. Thus, I wonder if the OUTER JOIN is truly needed or not.

Revision history for this message
Tyler Stachecki (tstachecki) wrote :

I was able to come up with one solution, which is to change the subquery from:

    SELECT
        networks.id AS networks_id FROM networks
    LEFT OUTER JOIN networkrbacs
        ON networks.id = networkrbacs.object_id
    WHERE
        networkrbacs.action = 'access_as_external' AND
        networkrbacs.target_tenant = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' OR
        networkrbacs.target_tenant = '*' OR
        networks.project_id = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' OR
        networkrbacs.action = 'access_as_shared' AND (
            networkrbacs.target_tenant = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' OR
            networkrbacs.target_tenant = '*'

to:

    SELECT
        networks.id AS networks_id FROM networks
    INNER JOIN networkrbacs
        ON networks.id = networkrbacs.object_id
    WHERE
        networkrbacs.action = 'access_as_external' AND
        networkrbacs.target_tenant = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' OR
        networkrbacs.target_tenant = '*' OR
        networkrbacs.action = 'access_as_shared' AND (
            networkrbacs.target_tenant = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' OR
            networkrbacs.target_tenant = '*'
    UNION
    SELECT
        networks.id AS networks_ID FROM networks
    WHERE
        networks.project_id = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'

Thinking a little more to see if there is a better way... this feels a bit kludgy.

Revision history for this message
LEDUC Florian (leducflorian) wrote (last edit ):

Hello Tyler,

thanks for the explanation.

Tomorrow will be pushing a fix in our production to force the index to be used for the RBAC model, securitygrouprbacs is also impacted by those slowness which kill our control plane.

Any chance we could share our findings in this thread ?

Revision history for this message
Tyler Stachecki (tstachecki) wrote :

Gladly, the more feedback the merrier. I'm quite confident it will work out, as we have deployed something very similar to our production clouds for a few weeks now with success.

Which method are you using to force the index to be used?

Revision history for this message
LEDUC Florian (leducflorian) wrote :

Hi,

We used SQLalchemy hints to force the use of indexes. Here's a diff of our internal pull request. We know that we cannot submit this patch upstream today, mostly because of our lack of understanding on how neutron is built. A little bit of help from the community would be appreciated.

How did you fix it on your side ?

Regards,

Revision history for this message
Tyler Stachecki (tstachecki) wrote :

Interesting... I was unaware of the FORCE INDEX clause in MySQL. I'm not sure how much upstream will take a liking to that (the use of it would minimally have to be guarded for MySQL dialects, as things like postgres and other SQL servers don't support it AFAIK).

Similar to my last post on the subquery and changing it from a LEFT OUTER JOIN to INNER JOIN + UNION, we got along by refactoring the generated query such that the optimizer could leverage the index.

Revision history for this message
Tyler Stachecki (tstachecki) wrote (last edit ):

Attached a rough working patch (albeit, against Rocky for now) that demonstrates where my thinking is going.

Unfortunately, after a lot of thinking today, I could not come up with a solution that does not ultimately touch both neutron-lib and neutron to some degree. The crux of the problem always seems to be that I need a way to operate on the RBAC part of the query filter separately from the model side. For performance reasons, they really need to be treated separately -- that OUTER JOIN appearing in the subqueries is a killer, and splitting them up allows us to remove it.

I tried the force index patch just now, but on Percona MySQL 5.7, I still do not see the index being used and thus no performance improvements.

OTOH, this 'intrusive' patch speeds up API calls by over 2x in my lab environment with a couple thousand RBACs. The delay is mostly all in OSC/neutron-server and the database query takes msec instead of >1sec.
rocky/master: 3.927s~4.084s
preliminary RBAC query patch: 1.474s~1.629s

Revision history for this message
LEDUC Florian (leducflorian) wrote :

After several months of good performance with the FORCE INDEX patch. Here we go again with performance issues on RBAC model and it occured during a night without any intervention from us.

We will give it a try with your patch. Here's some figures about our status:

MariaDB [neutron]> select count(*) from networkrbacs;
+----------+
| count(*) |
+----------+
| 75512 |
+----------+
1 row in set (0.019 sec)

MariaDB [neutron]>

MariaDB [neutron]> select count(*) from keystone.project;
+----------+
| count(*) |
+----------+
| 3203 |
+----------+
1 row in set (0.001 sec)

MariaDB [neutron]> select count(*) from networks;
+----------+
| count(*) |
+----------+
| 103 |
+----------+
1 row in set (0.000 sec)

MariaDB [neutron]> select count(*) from networksegments;
+----------+
| count(*) |
+----------+
| 194 |
+----------+
1 row in set (0.000 sec)

Slawek / Rodolfo ? Can you have a closer on that issue if you can ?

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Leduc/Tyler:

In c#2 I asked if you had https://review.opendev.org/c/openstack/neutron/+/720051 in your environment. Please, check that.

I don't understand the explanation in c#9 and c#10 about the "networkrbacs.object_id" column. This field is a foreign key, that means it is added to the indexes list.

In order to speed up the current query we can set "networkrbacs.target_tenant" and "networkrbacs.action" as indexes. That will help on the query described in c#11. The insertion operation (when a RBAC is created) will need to calculate those new indexes but this is trivial compared to the improvement during the select queries.

I'm going to propose a WIP patch in master. However, it could be much better if you manually set those two columns "networkrbacs.target_tenant" and "networkrbacs.action" as indexes in your deployment, benchmark the queries and share the results.

Regards.

Revision history for this message
Tyler Stachecki (tstachecki) wrote :

Rodolfo, yes, we have that commit/PR -- specifically, we're now on a very recent Ussuri and still have slow queries without applying workarounds to the code.

Although c#9 and c#10 may be confusing, I can absolutely attest that working around that saved us a bunch of heartache at scale - we saw OpenStack API calls that were running for upwards of one minute or more get cut down to 100s of milliseconds.

I am tied up early this week triaging some other issues, but ack your request to add an index and report back.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/810072

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Tyler:

https://review.opendev.org/c/openstack/neutron/+/810072 is the patch. Because is a DB migration, you won't be able to execute it directly. However, this could be achieved manually in Ussuri by:
1) Creating manually the indexes in the DB.
2) Applying https://review.opendev.org/c/openstack/neutron/+/810072/1/neutron/db/rbac_db_models.py.

Regards.

Revision history for this message
LEDUC Florian (leducflorian) wrote :
Revision history for this message
LEDUC Florian (leducflorian) wrote :

Thanks guys for your quick answer. We have dumped our production database in our small lab and we will give a try to your suggestions and patch.

Revision history for this message
LEDUC Florian (leducflorian) wrote (last edit ):

Hi,

I've backported the patch provided in c#22 to Ussuri . It ended up with an error duplicate keys as follows:

------------------------------------------------------------------------------------------
(neutron-server)[neutron@ip-192-166-206-244 /]$ neutron-db-manage upgrade heads
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
  Running upgrade for neutron ...
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Running upgrade d8bdf05313f4 -> ba859d649675
Traceback (most recent call last):
  File "/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
    cursor, statement, parameters, context
  File "/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
    cursor.execute(statement, parameters)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/pymysql/cursors.py", line 170, in execute
    result = self._query(query)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/pymysql/cursors.py", line 328, in _query
    conn.query(q)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/pymysql/connections.py", line 517, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/pymysql/connections.py", line 732, in _read_query_result
    result.read()
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/pymysql/connections.py", line 1075, in read
    first_packet = self.connection._read_packet()
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/pymysql/connections.py", line 684, in _read_packet
    packet.check_error()
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/pymysql/protocol.py", line 220, in check_error
    err.raise_mysql_exception(self._data)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.InternalError: (1061, "Duplicate key name 'ix_networkrbacs_target_tenant'")

------------------------------------------------------------------------------------------

All the indexes that were attempted to be created by the alembic migration ended up with this error, so I guess they were already there.

Sorry Rodolfo, I did not respond to you message, yes patch #720051 is present on our system.

Tyler, you mentionned an migration to ussuri ealier, did you backported your patch ? Does it work well for your need?

Revision history for this message
LEDUC Florian (leducflorian) wrote (last edit ):

Hi,

I've tested out your patch on ussuri. I've got better results:
- before patch GET /v2.0/networks -> around 40-50s
- after patch GET /v2.0/networks -> around 15s

I don't know what are the pros and cons to have this patch in our deployment. But still, thanks for sharing it with us Tyler.

Next steps, I'll the run the OS profiler in my lab to understand where all these times are spent.

Revision history for this message
Tyler Stachecki (tstachecki) wrote :

Florian,

Yes, unfortunately, the patch does not work as well on Ussuri as it did Rocky - there are some changes to the SQLAlchemy query logic or indexes somewhere. On Rocky, we saw the latency of a "normal" Neutron call instead of 15s in your example. Also, for the same reason, please be careful -- as I am not sure about the correctness of the patch as it pertains to Ussuri given that there were underlying changes.

Unfortunately, I am on PTO this week and still have not gotten a chance to try Rodolfo's changes as there were some internal cases that I was working on for my employer.

Rodolfo,

I do intend to give your patch/suggestions a whirl next week as well.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hi Leduc:

Sorry, I don't know what patch are you talking about. Is it [1]?

In any case, [1] should not be necessary if [2] is in place. If all RBAC registers have "target_tenant" and "action" as indexes, the query speed in "query_with_hooks" should be improved because all comparisons are against indexed columns.

BTW, you can't backport this patch to Ussuri but create manually the indexes in the DB and add the definitions manually in the code. It is not possible to backport a DB migration in Neutron.

Regards.

[1]https://bugs.launchpad.net/neutron/+bug/1918145/+attachment/5501805/+files/preliminary-neutron-rbac-perf.patch
[2]https://review.opendev.org/c/openstack/neutron/+/810072

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/810072
Committed: https://opendev.org/openstack/neutron/commit/f8c879ddbf7628e9a873d6a213e4905097455a46
Submitter: "Zuul (22348)"
Branch: master

commit f8c879ddbf7628e9a873d6a213e4905097455a46
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Mon Sep 20 15:55:04 2021 +0000

    Add new indexes to RBAC DB models

    Added two new indexes to all RBAC DB models: "target_tenant" and
    "action".

    The DB models affected are "networkrbacs", "qospolicyrbacs",
    "securitygrouprbacs", "addressscoperbacs", "subnetpoolrbacs" and
    "addressgrouprbacs".

    The goal of this patch is to speed up the model query if RBAC apply to
    this object. If the object query scope is a project, [1] will be added
    to the DB query. If "action" and "target_tenant" are indexed, the exact
    match filtering will be faster.

    [1]https://github.com/openstack/neutron-lib/blob/890d62a3df3f35bb18bf1a11e79a9e97e7dd2d2c/neutron_lib/db/model_query.py#L123-L131

    Change-Id: I0a70a1a500fad52ca55006d6e2ebc1044aef0fc8
    Closes-Bug: #1918145

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
Tyler Stachecki (tstachecki) wrote :
Download full text (11.2 KiB)

Apologies, finally got around to getting some data.

In a vanilla Ussuri cluster, I ran `openstack network list` with ~1k projects and ~2k projects.
@ 1k: runtime was ~6.4 sec
@ 2k: runtime was ~17.3 sec
...

I then applied:
https://review.opendev.org/c/openstack/neutron/+/810072

*Before* the patch, sanity check on indexes:
mysql> SHOW INDEX FROM networkrbacs;
+--------------+------------+--------------------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+--------------------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| networkrbacs | 0 | PRIMARY | 1 | id | A | 1813 | NULL | NULL | | BTREE | | |
| networkrbacs | 0 | uniq_networkrbacs0tenant_target0object_id0action | 1 | action | A | 2 | NULL | NULL | | BTREE | | |
| networkrbacs | 0 | uniq_networkrbacs0tenant_target0object_id0action | 2 | object_id | A | 7 | NULL | NULL | | BTREE | | |
| networkrbacs | 0 | uniq_networkrbacs0tenant_target0object_id0action | 3 | target_tenant | A | 1877 | NULL | NULL | | BTREE | | |
| networkrbacs | 1 | object_id | 1 | object_id | A | 7 | NULL | NULL | | BTREE | | |
| networkrbacs | 1 | ix_networkrbacs_project_id | 1 | project_id | A | 1 | NULL | NULL | YES | BTREE | | |
+--------------+------------+--------------------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

mysql> SHOW INDEX FROM qospolicyrbacs;
+----------------+------------+------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------+------------+------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| qospolicyrbacs | 0 | PRIMARY | 1 | id | A | 0 | NULL | NULL | | BTREE | | |
| qospolicyr...

Revision history for this message
Tyler Stachecki (tstachecki) wrote (last edit ):

Sorry, typo in the last update:

After the patch I see indexes ix_<TABLENAME>_target_tenant and ix_<TABLENAME>_action on all tables above

Of course not all tables have index ix_networkrbacs_...

.e.g., for networkrbacs, though --
mysql> SHOW INDEX FROM networkrbacs;
+--------------+------------+--------------------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+--------------------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| networkrbacs | 0 | PRIMARY | 1 | id | A | 1977 | NULL | NULL | | BTREE | | |
| networkrbacs | 0 | uniq_networkrbacs0tenant_target0object_id0action | 1 | action | A | 2 | NULL | NULL | | BTREE | | |
| networkrbacs | 0 | uniq_networkrbacs0tenant_target0object_id0action | 2 | object_id | A | 7 | NULL | NULL | | BTREE | | |
| networkrbacs | 0 | uniq_networkrbacs0tenant_target0object_id0action | 3 | target_tenant | A | 1998 | NULL | NULL | | BTREE | | |
| networkrbacs | 1 | object_id | 1 | object_id | A | 7 | NULL | NULL | | BTREE | | |
| networkrbacs | 1 | ix_networkrbacs_project_id | 1 | project_id | A | 1 | NULL | NULL | YES | BTREE | | |
| networkrbacs | 1 | ix_networkrbacs_target_tenant | 1 | target_tenant | A | 1998 | NULL | NULL | | BTREE | | |
| networkrbacs | 1 | ix_networkrbacs_action | 1 | action | A | 2 | NULL | NULL | | BTREE | | |

I believe I should have some time to continue chipping away at this issue this week.

Revision history for this message
Tyler Stachecki (tstachecki) wrote (last edit ):

I've come up with a patch for Ussuri that seems to work in all cases.

I'm still doing regression testing of this patch from both a performance and correctness perspective, but it seems to work on both accounts so far.

Comparing changes with the patch to upstream in my development build:
osc network list: 0m21.095s -> 0m2.608s
osc network show <name>: 0m19.178s -> 0m2.953s
osc port create --network <name>: 0m35.204s -> 0m5.635s
osc port delete <uuid>: 0m11.437s -> 0m3.863s

Revision history for this message
Tyler Stachecki (tstachecki) wrote :

Small issue with the previous patch; attaching v2.

Revision history for this message
Tyler Stachecki (tstachecki) wrote :

Some more performance data:

Before the patch, on an internal test cluster, we observed that the database would go underwater (~0% host CPU idle) when creating/deleting even 100 VMs.

With the patch, I was able to create/delete 1k+ VMs faster than I could 100 VMs before the patch. Following that, the database load dropped from 4000% CPU (pre-patch) to only 1000% after... while doing 10x the amount of work (1k VMs vs. 100).

The creation/deletion of 1k VMs probably could have gone faster yet, but the bottleneck shifted somewhere else in the control plane; the database is now no longer a limiter.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :
Revision history for this message
Tyler Stachecki (tstachecki) wrote :

Sure, I'll do that shortly.

One blocker to getting this PR ready at present is that there are both changes to neutron and neutron-lib. There are only a few lines of changes in the former, but I'm not sure that I'll be able to break that co-dependency.

Revision history for this message
Tyler Stachecki (tstachecki) wrote :

After speaking with some peers, it was suggested that we start with a blueprint and go from there. I cannot break the co-dependency on the patch between the two projects (neutron, neutron-lib), and the changes to the model query class are non-trivial.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Tyler:

I don't think we need a BP here. In any case, feel free to at least push patches both for n-lib and neutron (depending on the n-lib one). That will be easier to read and test.

Regards.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron-lib (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 20.0.0.0rc1

This issue was fixed in the openstack/neutron 20.0.0.0rc1 release candidate.

Revision history for this message
Slawek Kaplonski (slaweq) wrote : auto-abandon-script

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Changed in neutron:
assignee: Tyler Stachecki (tstachecki) → nobody
tags: added: timeout-abandon
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron-lib (master)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron-lib/+/817459
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Slawek Kaplonski (slaweq) wrote : auto-abandon-script

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/817462
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in neutron:
assignee: nobody → Tyler Stachecki (tstachecki)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron-lib (master)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron-lib/+/817459
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Sahid Orentino (sahid-ferdjaoui) wrote :

I think one of the first step that we can have is to remove the ORDER BY as it creates the temporary filesort that you have mentioned in #9.

I may missing something, an order by UUID does not bring any kind value?

A second step would be to understand why the possible key object_id is not used.

There is also another point, we can notice that we do filter per action, but I think that we do not have an index on it, maybe we could also investigate that point.

Changed in neutron:
status: Fix Released → Confirmed
Revision history for this message
Tyler Stachecki (tstachecki) wrote :

Hello,

I already have a fix for this (that we have used in production for >1 year now...), but it got a -1 in code review as it makes quite impactful changes to the code. It is unclear to me what needs to be done in order to make it a candidate for merging at this time...

https://review.opendev.org/c/openstack/neutron-lib/+/817459
https://review.opendev.org/c/openstack/neutron/+/817462

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Tyler:

The patch was abandoned. Please restore it again, add the needed tests and fix any rebase issue you can have.

About the tests, I will insist on this here and in the patches: you need to ensure this patch is not affecting the existing code (we are not introducing new bugs), you'll need to propose a Neutron patch based on the n-lib one including a set of tests checking this new code. This code is particularly sensitive and we need to be extremely careful when modifying it. I'm 100% on improving the SQL performance but not without testing.

Please, include in the n-lib patch a documentation describing what this patch is doing, with SQL examples (that will help a lot).

Regards.

Revision history for this message
Sahid Orentino (sahid-ferdjaoui) wrote (last edit ):

Hello Tyler,

Thank you I have noticed your patch and looks interesting, it comes with a different approach. My feeling is that the current queries look good but we need to help MySQL to select the right indexes

1/ Removing the ORDER BY will improve the performance on requests that return lot of rows as this will avoid that extra filesort.

2/ We can notice that the index object_id is mentioned but not used by MySQL for some reason. I have removed it and MySQL started to use a totally different set of indexes.

This reduce time on my env from 6,5 to 1,5 seconds.

Then I have noticed that removing the filter on securitygroup.project_id = ... Improved signficially the performance from 1,5 to about 0.050 seconds.

It's I guess the kind of metrics that we should have if MySQL would use the right indexes.

That is said removing the filter on project_id is not what we want, I'm currently investigate the issue on it.

@Tyler, anychance that you try that in your env? the significant change is removing index on object_id.

Thanks a lot,
s.

Revision history for this message
Tyler Stachecki (tstachecki) wrote :

Rodolfo: Thank you for the direction. It may take me some time to get to everything, but I will work on it. The patch was formerly passing all existing TCs and I tested it to the best of my knowledge outside of our particular production use cases, but I revisit and add more TCs as needed to make sure there is full coverage for the change.

Sahid: Can you clarify on (1) and (2) in the context of the patch? The ORDER BY... in the sample queries above come from the SQLalchemy generated code in the context of the existing machinery - e.g. if you look at get_collection_query(...), it is the client/user of the library code who specifies whether or not a sort is needed for the query. We cannot unconditionally elide an ORDER BY, I think.

On your notes about the non-use of the index leading to the slowness - that is the crux of the problem here. If you look at LEDUC's former comments in this bug, there was mention of FORCE INDEX fixing the problem in some cases (I was unable to replicate a fix in Percona MySQL 5.7, though).

Going back to existing use cases of the code - I do not think removing the filter for project_id is a viable solution because it breaks existing use cases of query_with_hooks. It may make it faster, yes, but the question becomes how do we fix it without breaking existing use cases? (or, am I missing something -- if you have any code/patch to share that you'd like tested, I am more than glad to test).

Revision history for this message
Sahid Orentino (sahid-ferdjaoui) wrote :

> On your notes about the non-use of the index leading to the slowness - that is the crux of the problem here. If you look at LEDUC's former comments in this bug, there was mention of FORCE INDEX fixing the problem in some cases (I was unable to replicate a fix in Percona MySQL 5.7, though).

It's basically the point, looks like if we remove the index on object_id MySQL will use a different approach to build and execute the query which looks to significantly improve performance without the need of changing everything. from 6s to 1s.

> Going back to existing use cases of the code - I do not think removing the filter for project_id is a viable solution because it breaks existing use cases of query_with_hooks. It may make it faster, yes, but the question becomes how do we fix it without breaking existing use cases? (or, am I missing something -- if you have any code/patch to share that you'd like tested, I am more than glad to test).

I'm not saying we should remove this filter. Just mentioning that removing it improves performance. From 1s to 0.05s looks like we should investigate the reason of that slowness.

Revision history for this message
Tyler Stachecki (tstachecki) wrote :

On (1) - OK, interesting observation. I will consider why the object_id index was added/what may use it. But even 1s is quite slow (vs. the 0.05s you mention... there is still a big performance crunch).

On (2) - Handwaving a bit and speaking from memory - but I think the project_id filter is used extensively in the RBAC code. Its role is to select RBACs which are applicable to the context - that being RBACs which are either global *or* which are scoped to the project(_id). Any time Neutron networks, subnets, etc. are filtered or collected, the code generally flows through this query logic which itself operates on RBACs relative to the context, which is where this "global RBAC or context/project-specific RBAC" use and thus filter against project_id comes into play. For this reason, if you query things as admin (without changing the code), you will probably not see the performance hit.

IOW, I don't think it can be removed as its use is integral to how all the Neutron code is written. This is what led to my PS which splits the logic for these two cases and does a late union of that either/or case.

Revision history for this message
Sahid Orentino (sahid-ferdjaoui) wrote :
Download full text (5.5 KiB)

An other interesting point.

I have noticed a query that takes about 50s, looks like it's to retrieve networks. With a simple change this one now takes 0.5s.

For some reasons that query returns 27840 rows, but let's not focus too much on that.

```
SELECT networks.project_id AS networks_project_id, networks.id AS networks_id, networks.name AS networks_name, networks.status AS networks_status, networks.admin_state_up AS networks_admin_state_up, networks.vlan_transparent AS networks_vlan_transparent, networks.availability_zone_hints AS networks_availability_zone_hints, networks.mtu AS networks_mtu, networks.standard_attr_id AS networks_standard_attr_id, standardattributes_1.id AS standardattributes_1_id, standardattributes_1.resource_type AS standardattributes_1_resource_type, standardattributes_1.description AS standardattributes_1_description, standardattributes_1.revision_number AS standardattributes_1_revision_number, standardattributes_1.created_at AS standardattributes_1_created_at, standardattributes_1.updated_at AS standardattributes_1_updated_at, networkdnsdomains_1.network_id AS networkdnsdomains_1_network_id, networkdnsdomains_1.dns_domain AS networkdnsdomains_1_dns_domain, externalnetworks_1.network_id AS externalnetworks_1_network_id, externalnetworks_1.is_default AS externalnetworks_1_is_default, standardattributes_2.id AS standardattributes_2_id, standardattributes_2.resource_type AS standardattributes_2_resource_type, standardattributes_2.description AS standardattributes_2_description, standardattributes_2.revision_number AS standardattributes_2_revision_number, standardattributes_2.created_at AS standardattributes_2_created_at, standardattributes_2.updated_at AS standardattributes_2_updated_at, networksegments_1.id AS networksegments_1_id, networksegments_1.network_id AS networksegments_1_network_id, networksegments_1.network_type AS networksegments_1_network_type, networksegments_1.physical_network AS networksegments_1_physical_network, networksegments_1.segmentation_id AS networksegments_1_segmentation_id, networksegments_1.is_dynamic AS networksegments_1_is_dynamic, networksegments_1.segment_index AS networksegments_1_segment_index, networksegments_1.name AS networksegments_1_name, networksegments_1.standard_attr_id AS networksegments_1_standard_attr_id, segmenthostmappings_1.segment_id AS segmenthostmappings_1_segment_id, segmenthostmappings_1.host AS segmenthostmappings_1_host, networksecuritybindings_1.network_id AS networksecuritybindings_1_network_id, networksecuritybindings_1.port_security_enabled AS networksecuritybindings_1_port_security_enabled, qos_network_policy_bindings_1.policy_id AS qos_network_policy_bindings_1_policy_id, qos_network_policy_bindings_1.network_id AS qos_network_policy_bindings_1_network_id FROM networks LEFT OUTER JOIN networkrbacs ON networks.id = networkrbacs.object_id LEFT OUTER JOIN standardattributes AS standardattributes_1 ON standardattributes_1.id = networks.standard_attr_id LEFT OUTER JOIN networkdnsdomains AS networkdnsdomains_1 ON networks.id = networkdnsdomains_1.network_id LEFT OUTER JOIN externalnetworks AS externalnetworks_1 ON networks.id = externalnetworks_1.network_id LEFT O...

Read more...

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-lib (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron-lib/+/870557

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/871113

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron-lib (master)

Change abandoned by "Sahid Orentino Ferdjaoui <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron-lib/+/870557

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/872761
Committed: https://opendev.org/openstack/neutron/commit/4e71675ddbb1ad0e3002a8fe6071c357d9d30c9e
Submitter: "Zuul (22348)"
Branch: master

commit 4e71675ddbb1ad0e3002a8fe6071c357d9d30c9e
Author: Sahid Orentino Ferdjaoui <email address hidden>
Date: Mon Feb 6 16:21:40 2023 +0100

    rbacs: add functionl test that asserts behavior on networks/users

    This is adding functional test to validate shared/not shared behavior
    with networks and users.

    Related-bug: #1918145
    Signed-off-by: Sahid Orentino Ferdjaoui <email address hidden>
    Change-Id: Ic1fc820f407ef9c9fa2d44ada7af86f556cc0cd6

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/c/openstack/neutron/+/871113
Committed: https://opendev.org/openstack/neutron/commit/e6de524555db92c0d232bc2f6fbc3edfa751a91f
Submitter: "Zuul (22348)"
Branch: master

commit e6de524555db92c0d232bc2f6fbc3edfa751a91f
Author: Sahid Orentino Ferdjaoui <email address hidden>
Date: Thu Jan 19 17:40:11 2023 +0100

    rbacs: filter out model that are already owned by context

    Taking example of a network that have multiple rbacs. In a situation
    of selecting networks that are shared to a project.

    If we could could already match the one that are owned by the context
    we can expect les rbacs to scan.

    https://bugs.launchpad.net/neutron/+bug/1918145/comments/54

    In an environement with about 200 00 rbacs and 200 networks this
    reduce time of the request from more than 50s to less than a second.

    Related-bug: #1918145
    Signed-off-by: Sahid Orentino Ferdjaoui <email address hidden>
    Change-Id: I54808cbd4cdccfee97eb59053418f55ba57e11a6

    Signed-off-by: Sahid Orentino Ferdjaoui <email address hidden>
    Change-Id: Ib155fbb3f6b325d10e3fbea201677dc218111c17

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron-lib (master)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron-lib/+/817459
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/884877

Changed in neutron:
assignee: Tyler Stachecki (tstachecki) → Rodolfo Alonso (rodolfo-alonso-hernandez)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-lib (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron-lib/+/884878

Revision history for this message
LEDUC Florian (leducflorian) wrote :

Regarding your last comment, we have used this approach for two ORM objects for stable/ussuri back then:

------------------------------------------------------------------------

--- a/neutron/db/models/segment.py
+++ b/neutron/db/models/segment.py
@@ -69,11 +69,9 @@ class SegmentHostMapping(model_base.BASEV2):

     # Add a relationship to the NetworkSegment model in order to instruct
     # SQLAlchemy to eagerly load this association
- # Monkey-patch : change the lazy strategy from 'subquery' to 'joined'
- # as MariaDB 10.3 don't use an optimal execution plan with 'subquery'
     network_segment = orm.relationship(
         NetworkSegment, load_on_pending=True,
         backref=orm.backref("segment_host_mapping",
- lazy='joined',
+ lazy='subquery',
                             cascade='delete'))
     revises_on_change = ('network_segment', )

------------------------------------------------------------------------
--- a/neutron/db/models/segment.py
+++ b/neutron/db/models/segment.py
@@ -49,7 +49,7 @@ class NetworkSegment(standard_attr.HasStandardAttributes,
                      nullable=True)
     network = orm.relationship(models_v2.Network,
                                backref=orm.backref("segments",
- lazy='subquery',
+ lazy='joined',
                                                    cascade='delete'))
     api_collections = [segment.SEGMENTS]

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

The model change by itself is not enough, we need to group the results by the resource ID. This is what is done in the n-lib patch. To improve the performance, we need both.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron-lib (master)

Reviewed: https://review.opendev.org/c/openstack/neutron-lib/+/884878
Committed: https://opendev.org/openstack/neutron-lib/commit/829e97024c2b73dd67bfd8a04c65f03be556eec8
Submitter: "Zuul (22348)"
Branch: master

commit 829e97024c2b73dd67bfd8a04c65f03be556eec8
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Sun May 28 17:43:30 2023 +0200

    Add a "GROUP BY" clause on queries with RBAC entries

    As reported in the Neutron patch [1], this change introduce a
    "GROUP BY" clause on the SQL queries with RBAC entries. With [1],
    all resouces with RBAC entries ('network', 'qospolicy',
    'securitygroup', 'addressscope', 'subnetpool', 'addressgroup') will
    load the RBAC entries with "joined" strategy.

    Because of the low cardinality of the RBAC query when all the RBAC
    registers are in one single project, this patch groups the resource
    queries by the resource ID. That will reduce the results returned by
    the SQL engine to only the singular registers required.

    [1]https://review.opendev.org/c/openstack/neutron/+/884877

    Related-Bug: #1918145
    Change-Id: I800e0356714d59ba93ab6252c77be0a82f024055

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-lib (stable/2023.1)

Related fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/neutron-lib/+/885080

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-lib (stable/zed)

Related fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/neutron-lib/+/885081

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-lib (stable/yoga)

Related fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/neutron-lib/+/885082

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-lib (stable/xena)

Related fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/neutron-lib/+/885083

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-lib (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron-lib/+/885084

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron-lib (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/neutron-lib/+/885080
Committed: https://opendev.org/openstack/neutron-lib/commit/86ea887de64bba3d48a245884aea961b04afad0a
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit 86ea887de64bba3d48a245884aea961b04afad0a
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Sun May 28 17:43:30 2023 +0200

    Add a "GROUP BY" clause on queries with RBAC entries

    As reported in the Neutron patch [1], this change introduce a
    "GROUP BY" clause on the SQL queries with RBAC entries. With [1],
    all resouces with RBAC entries ('network', 'qospolicy',
    'securitygroup', 'addressscope', 'subnetpool', 'addressgroup') will
    load the RBAC entries with "joined" strategy.

    Because of the low cardinality of the RBAC query when all the RBAC
    registers are in one single project, this patch groups the resource
    queries by the resource ID. That will reduce the results returned by
    the SQL engine to only the singular registers required.

    [1]https://review.opendev.org/c/openstack/neutron/+/884877

    Related-Bug: #1918145
    Change-Id: I800e0356714d59ba93ab6252c77be0a82f024055
    (cherry picked from commit 829e97024c2b73dd67bfd8a04c65f03be556eec8)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron-lib (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/neutron-lib/+/885081
Committed: https://opendev.org/openstack/neutron-lib/commit/fdbece1bac3d4d42d7ced45077c1301f03fba5e6
Submitter: "Zuul (22348)"
Branch: stable/zed

commit fdbece1bac3d4d42d7ced45077c1301f03fba5e6
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Sun May 28 17:43:30 2023 +0200

    Add a "GROUP BY" clause on queries with RBAC entries

    As reported in the Neutron patch [1], this change introduce a
    "GROUP BY" clause on the SQL queries with RBAC entries. With [1],
    all resouces with RBAC entries ('network', 'qospolicy',
    'securitygroup', 'addressscope', 'subnetpool', 'addressgroup') will
    load the RBAC entries with "joined" strategy.

    Because of the low cardinality of the RBAC query when all the RBAC
    registers are in one single project, this patch groups the resource
    queries by the resource ID. That will reduce the results returned by
    the SQL engine to only the singular registers required.

    [1]https://review.opendev.org/c/openstack/neutron/+/884877

    Related-Bug: #1918145
    Change-Id: I800e0356714d59ba93ab6252c77be0a82f024055
    (cherry picked from commit 829e97024c2b73dd67bfd8a04c65f03be556eec8)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron-lib (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/neutron-lib/+/885084
Committed: https://opendev.org/openstack/neutron-lib/commit/619c0fe5345af66061730ce763a1cd45c353f37c
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 619c0fe5345af66061730ce763a1cd45c353f37c
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Sun May 28 17:43:30 2023 +0200

    Add a "GROUP BY" clause on queries with RBAC entries

    As reported in the Neutron patch [1], this change introduce a
    "GROUP BY" clause on the SQL queries with RBAC entries. With [1],
    all resouces with RBAC entries ('network', 'qospolicy',
    'securitygroup', 'addressscope', 'subnetpool', 'addressgroup') will
    load the RBAC entries with "joined" strategy.

    Because of the low cardinality of the RBAC query when all the RBAC
    registers are in one single project, this patch groups the resource
    queries by the resource ID. That will reduce the results returned by
    the SQL engine to only the singular registers required.

    [1]https://review.opendev.org/c/openstack/neutron/+/884877

    Conflicts:
        neutron_lib/db/model_query.py

    Related-Bug: #1918145
    Change-Id: I800e0356714d59ba93ab6252c77be0a82f024055
    (cherry picked from commit 829e97024c2b73dd67bfd8a04c65f03be556eec8)
    (cherry picked from commit 4a952047823c3c330dad5475d67f77c7ee8d6419)

tags: added: in-stable-wallaby
tags: added: in-stable-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron-lib (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/neutron-lib/+/885082
Committed: https://opendev.org/openstack/neutron-lib/commit/467eafd20a229e24459ad225b458ff4e04e46fd8
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 467eafd20a229e24459ad225b458ff4e04e46fd8
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Sun May 28 17:43:30 2023 +0200

    Add a "GROUP BY" clause on queries with RBAC entries

    As reported in the Neutron patch [1], this change introduce a
    "GROUP BY" clause on the SQL queries with RBAC entries. With [1],
    all resouces with RBAC entries ('network', 'qospolicy',
    'securitygroup', 'addressscope', 'subnetpool', 'addressgroup') will
    load the RBAC entries with "joined" strategy.

    Because of the low cardinality of the RBAC query when all the RBAC
    registers are in one single project, this patch groups the resource
    queries by the resource ID. That will reduce the results returned by
    the SQL engine to only the singular registers required.

    [1]https://review.opendev.org/c/openstack/neutron/+/884877

    Related-Bug: #1918145
    Change-Id: I800e0356714d59ba93ab6252c77be0a82f024055
    (cherry picked from commit 829e97024c2b73dd67bfd8a04c65f03be556eec8)

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron-lib (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/neutron-lib/+/885083
Committed: https://opendev.org/openstack/neutron-lib/commit/4a952047823c3c330dad5475d67f77c7ee8d6419
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 4a952047823c3c330dad5475d67f77c7ee8d6419
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Sun May 28 17:43:30 2023 +0200

    Add a "GROUP BY" clause on queries with RBAC entries

    As reported in the Neutron patch [1], this change introduce a
    "GROUP BY" clause on the SQL queries with RBAC entries. With [1],
    all resouces with RBAC entries ('network', 'qospolicy',
    'securitygroup', 'addressscope', 'subnetpool', 'addressgroup') will
    load the RBAC entries with "joined" strategy.

    Because of the low cardinality of the RBAC query when all the RBAC
    registers are in one single project, this patch groups the resource
    queries by the resource ID. That will reduce the results returned by
    the SQL engine to only the singular registers required.

    [1]https://review.opendev.org/c/openstack/neutron/+/884877

    Conflicts:
        neutron_lib/db/model_query.py

    Related-Bug: #1918145
    Change-Id: I800e0356714d59ba93ab6252c77be0a82f024055
    (cherry picked from commit 829e97024c2b73dd67bfd8a04c65f03be556eec8)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/neutron/+/885172

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/neutron/+/885173

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/neutron/+/885174

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/neutron/+/885175

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/885176

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/884877
Committed: https://opendev.org/openstack/neutron/commit/e9da29d16c474822c015996cf34e40005419146a
Submitter: "Zuul (22348)"
Branch: master

commit e9da29d16c474822c015996cf34e40005419146a
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Sun May 28 17:28:03 2023 +0200

    Change RBAC relationship loading method to "joined"

    This patch changes all RBAC relationship method to "joined". This change
    enforces that the RBAC associated registers are loaded along with the
    parent resource. The rationale of this change is to be able to control
    the SQL query executed; the subquery cannot be directly managed by
    Neutron.

    It is very usual to create the RBAC rules from one single project that
    is usually the adminitrator project. That means all RBAC rules will
    belong to it. Before this change, the SQL subquery performed to
    retrieve the RBAC entries was this (from a network query):

      SELECT networks.id AS networks_id
      FROM networks LEFT OUTER JOIN networkrbacs ON networks.id =
      networkrbacs.object_id
      WHERE networks.project_id = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.action = 'access_as_external'
        AND networkrbacs.target_project = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.target_project = '*'
        OR networks.project_id = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.action IN ('access_as_shared', 'access_as_readonly')
        AND (networkrbacs.target_project = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.target_project = '*');

    This SQL result has a very low cardinality; that means there are many
    duplicated registers. For example, with 10 external network, 1000
    projects and 2500 RBAC rules, this query returns 1.4 million rows.
    Instead if a "GROUP BY resource_id" (in this case network_id) clause is
    added, the number of rows is reduced to 10 (considering this project
    has a RBAC per network).

    In order to introduce this "GROUP BY" clause, this patch is changing
    the loading method. The clause is added in a neutron-lib patch [1].

    This change by itself does not improve the query performance. The
    neutron-lib patch is needed too. Although this patch does not modify
    que SQL query results, the tests added will prove that the neutron-lib
    patch does not introduce any regression.

    [1]https://review.opendev.org/c/openstack/neutron-lib/+/884878

    Closes-Bug: #1918145
    Change-Id: Ic6001bd5a57493b8befdf81a41eb0bd1c8022df3

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/885172
Committed: https://opendev.org/openstack/neutron/commit/593939911c8c360b209e0c12c302ec68eee5bbca
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit 593939911c8c360b209e0c12c302ec68eee5bbca
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Sun May 28 17:28:03 2023 +0200

    Change RBAC relationship loading method to "joined"

    This patch changes all RBAC relationship method to "joined". This change
    enforces that the RBAC associated registers are loaded along with the
    parent resource. The rationale of this change is to be able to control
    the SQL query executed; the subquery cannot be directly managed by
    Neutron.

    It is very usual to create the RBAC rules from one single project that
    is usually the adminitrator project. That means all RBAC rules will
    belong to it. Before this change, the SQL subquery performed to
    retrieve the RBAC entries was this (from a network query):

      SELECT networks.id AS networks_id
      FROM networks LEFT OUTER JOIN networkrbacs ON networks.id =
      networkrbacs.object_id
      WHERE networks.project_id = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.action = 'access_as_external'
        AND networkrbacs.target_project = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.target_project = '*'
        OR networks.project_id = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.action IN ('access_as_shared', 'access_as_readonly')
        AND (networkrbacs.target_project = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.target_project = '*');

    This SQL result has a very low cardinality; that means there are many
    duplicated registers. For example, with 10 external network, 1000
    projects and 2500 RBAC rules, this query returns 1.4 million rows.
    Instead if a "GROUP BY resource_id" (in this case network_id) clause is
    added, the number of rows is reduced to 10 (considering this project
    has a RBAC per network).

    In order to introduce this "GROUP BY" clause, this patch is changing
    the loading method. The clause is added in a neutron-lib patch [1].

    This change by itself does not improve the query performance. The
    neutron-lib patch is needed too. Although this patch does not modify
    que SQL query results, the tests added will prove that the neutron-lib
    patch does not introduce any regression.

    [1]https://review.opendev.org/c/openstack/neutron-lib/+/884878

    Closes-Bug: #1918145
    Change-Id: Ic6001bd5a57493b8befdf81a41eb0bd1c8022df3
    (cherry picked from commit e9da29d16c474822c015996cf34e40005419146a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/885173
Committed: https://opendev.org/openstack/neutron/commit/d3b403bfee17267dfa36184d6be27ba973f068ee
Submitter: "Zuul (22348)"
Branch: stable/zed

commit d3b403bfee17267dfa36184d6be27ba973f068ee
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Sun May 28 17:28:03 2023 +0200

    Change RBAC relationship loading method to "joined"

    This patch changes all RBAC relationship method to "joined". This change
    enforces that the RBAC associated registers are loaded along with the
    parent resource. The rationale of this change is to be able to control
    the SQL query executed; the subquery cannot be directly managed by
    Neutron.

    It is very usual to create the RBAC rules from one single project that
    is usually the adminitrator project. That means all RBAC rules will
    belong to it. Before this change, the SQL subquery performed to
    retrieve the RBAC entries was this (from a network query):

      SELECT networks.id AS networks_id
      FROM networks LEFT OUTER JOIN networkrbacs ON networks.id =
      networkrbacs.object_id
      WHERE networks.project_id = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.action = 'access_as_external'
        AND networkrbacs.target_project = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.target_project = '*'
        OR networks.project_id = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.action IN ('access_as_shared', 'access_as_readonly')
        AND (networkrbacs.target_project = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.target_project = '*');

    This SQL result has a very low cardinality; that means there are many
    duplicated registers. For example, with 10 external network, 1000
    projects and 2500 RBAC rules, this query returns 1.4 million rows.
    Instead if a "GROUP BY resource_id" (in this case network_id) clause is
    added, the number of rows is reduced to 10 (considering this project
    has a RBAC per network).

    In order to introduce this "GROUP BY" clause, this patch is changing
    the loading method. The clause is added in a neutron-lib patch [1].

    This change by itself does not improve the query performance. The
    neutron-lib patch is needed too. Although this patch does not modify
    que SQL query results, the tests added will prove that the neutron-lib
    patch does not introduce any regression.

    [1]https://review.opendev.org/c/openstack/neutron-lib/+/884878

    Closes-Bug: #1918145
    Change-Id: Ic6001bd5a57493b8befdf81a41eb0bd1c8022df3
    (cherry picked from commit e9da29d16c474822c015996cf34e40005419146a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/885175
Committed: https://opendev.org/openstack/neutron/commit/4846e1939b7b22ced690b7422be2fcd06d794d10
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 4846e1939b7b22ced690b7422be2fcd06d794d10
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Sun May 28 17:28:03 2023 +0200

    Change RBAC relationship loading method to "joined"

    This patch changes all RBAC relationship method to "joined". This change
    enforces that the RBAC associated registers are loaded along with the
    parent resource. The rationale of this change is to be able to control
    the SQL query executed; the subquery cannot be directly managed by
    Neutron.

    It is very usual to create the RBAC rules from one single project that
    is usually the adminitrator project. That means all RBAC rules will
    belong to it. Before this change, the SQL subquery performed to
    retrieve the RBAC entries was this (from a network query):

      SELECT networks.id AS networks_id
      FROM networks LEFT OUTER JOIN networkrbacs ON networks.id =
      networkrbacs.object_id
      WHERE networks.project_id = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.action = 'access_as_external'
        AND networkrbacs.target_project = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.target_project = '*'
        OR networks.project_id = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.action IN ('access_as_shared', 'access_as_readonly')
        AND (networkrbacs.target_project = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.target_project = '*');

    This SQL result has a very low cardinality; that means there are many
    duplicated registers. For example, with 10 external network, 1000
    projects and 2500 RBAC rules, this query returns 1.4 million rows.
    Instead if a "GROUP BY resource_id" (in this case network_id) clause is
    added, the number of rows is reduced to 10 (considering this project
    has a RBAC per network).

    In order to introduce this "GROUP BY" clause, this patch is changing
    the loading method. The clause is added in a neutron-lib patch [1].

    This change by itself does not improve the query performance. The
    neutron-lib patch is needed too. Although this patch does not modify
    que SQL query results, the tests added will prove that the neutron-lib
    patch does not introduce any regression.

    [1]https://review.opendev.org/c/openstack/neutron-lib/+/884878

    Conflicts:
        neutron/tests/unit/objects/test_rbac.py

    Closes-Bug: #1918145
    Change-Id: Ic6001bd5a57493b8befdf81a41eb0bd1c8022df3
    (cherry picked from commit e9da29d16c474822c015996cf34e40005419146a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/885176
Committed: https://opendev.org/openstack/neutron/commit/de2ad80a3925ba08f7ea765db3e9e045f973d266
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit de2ad80a3925ba08f7ea765db3e9e045f973d266
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Sun May 28 17:28:03 2023 +0200

    Change RBAC relationship loading method to "joined"

    This patch changes all RBAC relationship method to "joined". This change
    enforces that the RBAC associated registers are loaded along with the
    parent resource. The rationale of this change is to be able to control
    the SQL query executed; the subquery cannot be directly managed by
    Neutron.

    It is very usual to create the RBAC rules from one single project that
    is usually the adminitrator project. That means all RBAC rules will
    belong to it. Before this change, the SQL subquery performed to
    retrieve the RBAC entries was this (from a network query):

      SELECT networks.id AS networks_id
      FROM networks LEFT OUTER JOIN networkrbacs ON networks.id =
      networkrbacs.object_id
      WHERE networks.project_id = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.action = 'access_as_external'
        AND networkrbacs.target_project = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.target_project = '*'
        OR networks.project_id = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.action IN ('access_as_shared', 'access_as_readonly')
        AND (networkrbacs.target_project = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.target_project = '*');

    This SQL result has a very low cardinality; that means there are many
    duplicated registers. For example, with 10 external network, 1000
    projects and 2500 RBAC rules, this query returns 1.4 million rows.
    Instead if a "GROUP BY resource_id" (in this case network_id) clause is
    added, the number of rows is reduced to 10 (considering this project
    has a RBAC per network).

    In order to introduce this "GROUP BY" clause, this patch is changing
    the loading method. The clause is added in a neutron-lib patch [1].

    This change by itself does not improve the query performance. The
    neutron-lib patch is needed too. Although this patch does not modify
    que SQL query results, the tests added will prove that the neutron-lib
    patch does not introduce any regression.

    [1]https://review.opendev.org/c/openstack/neutron-lib/+/884878

    Conflicts:
        neutron/tests/unit/objects/test_rbac.py

    Closes-Bug: #1918145
    Change-Id: Ic6001bd5a57493b8befdf81a41eb0bd1c8022df3
    (cherry picked from commit e9da29d16c474822c015996cf34e40005419146a)
    (cherry picked from commit 4846e1939b7b22ced690b7422be2fcd06d794d10)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/885174
Committed: https://opendev.org/openstack/neutron/commit/1a66962ee338e21f9523025f06346d35ddca334d
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 1a66962ee338e21f9523025f06346d35ddca334d
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Sun May 28 17:28:03 2023 +0200

    Change RBAC relationship loading method to "joined"

    This patch changes all RBAC relationship method to "joined". This change
    enforces that the RBAC associated registers are loaded along with the
    parent resource. The rationale of this change is to be able to control
    the SQL query executed; the subquery cannot be directly managed by
    Neutron.

    It is very usual to create the RBAC rules from one single project that
    is usually the adminitrator project. That means all RBAC rules will
    belong to it. Before this change, the SQL subquery performed to
    retrieve the RBAC entries was this (from a network query):

      SELECT networks.id AS networks_id
      FROM networks LEFT OUTER JOIN networkrbacs ON networks.id =
      networkrbacs.object_id
      WHERE networks.project_id = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.action = 'access_as_external'
        AND networkrbacs.target_project = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.target_project = '*'
        OR networks.project_id = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.action IN ('access_as_shared', 'access_as_readonly')
        AND (networkrbacs.target_project = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.target_project = '*');

    This SQL result has a very low cardinality; that means there are many
    duplicated registers. For example, with 10 external network, 1000
    projects and 2500 RBAC rules, this query returns 1.4 million rows.
    Instead if a "GROUP BY resource_id" (in this case network_id) clause is
    added, the number of rows is reduced to 10 (considering this project
    has a RBAC per network).

    In order to introduce this "GROUP BY" clause, this patch is changing
    the loading method. The clause is added in a neutron-lib patch [1].

    This change by itself does not improve the query performance. The
    neutron-lib patch is needed too. Although this patch does not modify
    que SQL query results, the tests added will prove that the neutron-lib
    patch does not introduce any regression.

    [1]https://review.opendev.org/c/openstack/neutron-lib/+/884878

    Closes-Bug: #1918145
    Change-Id: Ic6001bd5a57493b8befdf81a41eb0bd1c8022df3
    (cherry picked from commit e9da29d16c474822c015996cf34e40005419146a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 22.0.1

This issue was fixed in the openstack/neutron 22.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 21.1.1

This issue was fixed in the openstack/neutron 21.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 20.3.1

This issue was fixed in the openstack/neutron 20.3.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron-lib (master)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron-lib/+/870557
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 23.0.0.0b3

This issue was fixed in the openstack/neutron 23.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-lib (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron-lib/+/902537

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/902538

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron-lib (stable/ussuri)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron-lib/+/902537
Reason: This branch of the project transitioned to End of Life. Open patches need to be abandoned.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/ussuri)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/902538
Reason: This branch of the project transitioned to End of Life. Open patches need to be abandoned.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron wallaby-eom

This issue was fixed in the openstack/neutron wallaby-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron xena-eom

This issue was fixed in the openstack/neutron xena-eom release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.