difference in execution time between admin/non-admin call

Bug #1975828 reported by Lajos Katona
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Triaged
Medium
Unassigned

Bug Description

Part of https://bugs.launchpad.net/neutron/+bug/1973349 :
Another interesting thing is difference in execution time between admin/non-admin call:
(openstack) dmitriy@6BT6XT2:~$ . Documents/openrc/admin.rc
(openstack) dmitriy@6BT6XT2:~$ time openstack port list --project <project> | wc -l
2142

real 0m5,401s
user 0m1,565s
sys 0m0,086s
(openstack) dmitriy@6BT6XT2:~$ . Documents/openrc/<project>.rc
(openstack) dmitriy@6BT6XT2:~$ time openstack port list | wc -l
2142

real 2m38,101s
user 0m1,626s
sys 0m0,083s
(openstack) dmitriy@6BT6XT2:~$
(openstack) dmitriy@6BT6XT2:~$ time openstack port list --project <project> | wc -l
2142

real 1m17,029s
user 0m1,541s
sys 0m0,085s
(openstack) dmitriy@6BT6XT2:~$

So basically if provide tenant_id to query, it will be execute twice as fast.But it won't look through networks owned by tenant (which would kind of explain difference in speed).

Environment:
Neutron SHA: 97180b01837638bd0476c28bdda2340eccd649af
Backend: ovs
OS: Ubuntu 20.04
Mariadb: 10.6.5
SQLalchemy: 1.4.23
Backend: openvswitch
Plugins: router vpnaas metering neutron_dynamic_routing.services.bgp.bgp_plugin.BgpPlugin

Tags: db
Changed in neutron:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hi:

I'm comparing both queries and are very similar [1]. The only difference is in the WHERE clause, where the query with the project ID adds an unnecessary check:

WHERE
        (ports.project_id = '8f5410f0e28148ddb03f39028f2c9097'
            OR ports.network_id = networks.id
            AND networks.project_id = '8f5410f0e28148ddb03f39028f2c9097'
            AND ports.project_id IN ('8f5410f0e28148ddb03f39028f2c9097'))

I'm checking with different projects and number of ports but I don't see any difference in the response time of the server.

Regards.

[1]https://paste.opendev.org/show/b0QFVpuE8JpJ6z3TVVR4/

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hi Dimitry:

This is a LP for the second question in https://bugs.launchpad.net/neutron/+bug/1973349. I'm trying to reproduce the issue you have with the queries with and without project ID but I'm not having luck with it. Both queries perform the same in my environment. The queries, as reported in the previous comment, are very similar.

Can you paste here the SQL query [1] that you have with and without project ID? This "query" object is a ``sqlalchemy.lib.sqlalchemy.orm.query.Query`` instance that implements __str__. If you print "str(query)", you'll have the actual SQL command.

Regards.

[1]https://github.com/openstack/neutron/blob/f966ed8c1507706c01a3ac4f6f38315670ed2ba0/neutron/db/db_base_plugin_v2.py#L1605

Changed in neutron:
status: Confirmed → Incomplete
Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

I think more problem is how hooks work in sqlalchemy overall.

Because what we see, is that query you're mentioning (SELECT DISTINCT ports.id AS ports_id FROM ports, networks WHERE ports.project_id = '<project>' OR ports.network_id = networks.id AND networks.project_id = '<project>') on itself takes quite much time on it's own. And it's run 3 times at least for simple port list, which is quite suboptimal.

It would make total sense to somehow cache result of that query, not to re-run it each time as subquery as handler, but I didn't find how to make it possible with sqlalchemy without messing with codebase massively.

I could imagine that smth is wrong with database, but we hit that in several regions after upgrade. We also tried to move database to another host (and new db deployment that is not in use) and query result was quite slow there as well.

That;s the code you pointed me that's implements hook https://opendev.org/openstack/neutron/src/commit/430abde13ec58e611a8752ca579fee5110a0a61d/neutron/db/db_base_plugin_v2.py#L115-L118

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hi Dimitry:

The complexity of the query depends on how we are building the OVO, that is basically a SQL view, cherry-picking from different tables. I wouldn't blame sqlalchemy but how Neutron builds those objects (and "port" is particularly complex, if not the most).

Did you try to manually execute the queries I pasted? That would discard the DB problem.

Can you paste the query that I commented in [1]? Maybe, because of "neutron_dynamic_routing" plugin, this query has something else added.

Regards.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
Revision history for this message
Artem Goncharov (gtema) wrote :

Time difference in the admin/user is most likely (at least this is what my "debugging session showed") caused by policies evaluation. For 10-50 records it makes not much difference, but for 1500 ports in the response policy engine is overloaded applying policies to every entry (at least it is clearly visible that there are multiple additional DB calls done). It need to be further investigated whether there is any sort of caching is present for that and/or whether it is possible to optimize the process.
It is clear, that returning 1500 ports is bad from the performance point of view at all, but if user decided to have a huge project with multiple networks in it there is no much chance convincing him to change the architecture of the application.

Changed in neutron:
status: Expired → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.