FloatingIP list bad performance

Bug #1850639 reported by Oleg Bondarev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Invalid
High
Oleg Bondarev

Bug Description

Faced on stable/queens but applicable to master too.
On quite a heavy loaded environment it was noticed that simple floatingip list command takes significant time (~1200 fips) while for example port list is always faster (>7000 ports).
If enable sqlalchemy debug logs there can be seen lots of:

2019-10-22 21:02:44,977.977 23957 DEBUG sqlalchemy.orm.path_registry [req-3db31d53-f6b9-408e-b8c7-bf037ef10a1b 1df8a7d5eb
b5414b9e29cf581098681c 10479799101a4fe4ada17daa105707c5 - default default] set 'memoized_setups' on path 'EntityRegistry(
(<Mapper at 0x7f1a3f064ed0; FloatingIPDNS>,))' to '{}' set /usr/lib/python2.7/dist-packages/sqlalchemy/orm/path_registry.
py:63

- which basically eats all the time of a request.

As a test I commented 'dns' field in FloatingIP DB object definition and response time reduced from 14 to 1 second. DNS extension is not configured on the environment and no external DNS is used.
Also I don't see this field used anywhere in neutron.

Interestingly Port DB object has 'dns' field either (with corresponding portdnses table in DB, all the same as done for floatingips), however DB object is not used when listing ports.

The proposal would be to remove 'dns' field from FloatingIP OVO as not used, until we find performance bottleneck.

Tags: loadimpact
tags: added: loadi
tags: added: loadimpact
removed: loadi
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/692280

Changed in neutron:
assignee: nobody → Oleg Bondarev (obondarev)
status: New → In Progress
Revision history for this message
LIU Yulong (dragon889) wrote :
Revision history for this message
LIU Yulong (dragon889) wrote :

We have a stable/queens deployment, it has 1200+ floating ips, here is the result in neutron-server:
2019-10-31 18:49:33.157 725356 INFO neutron.wsgi [None req-d1625269-5142-4f98-973f-2ba0aebbc788 570ad5d4a77b48d49670d6d0ce5ba4be 0805323d5b8a4f68b791e621a6236bc7 - default default] 12.129.209.197 "GET /v2.0/floatingips HTTP/1.1" status: 200 len: 878126 time: 8.4739108

IMO, 8.47s+ is acceptable. The CLI shows a bit long time, IMO, the data transfer and resolving data structure is time-consuming. So I'd like to this is more like a client issue, openstackclient does not support pagination for floating ip list.

# openstack help floating ip list
usage: openstack floating ip list [-h] [-f {csv,json,table,value,yaml}]
                                  [-c COLUMN] [--max-width <integer>]
                                  [--fit-width] [--print-empty] [--noindent]
                                  [--quote {all,minimal,none,nonnumeric}]
                                  [--sort-column SORT_COLUMN]
                                  [--network <network>] [--port <port>]
                                  [--fixed-ip-address <ip-address>] [--long]
                                  [--status <status>] [--project <project>]
                                  [--project-domain <project-domain>]
                                  [--router <router>]
                                  [--tags <tag>[,<tag>,...]]
                                  [--any-tags <tag>[,<tag>,...]]
                                  [--not-tags <tag>[,<tag>,...]]
                                  [--not-any-tags <tag>[,<tag>,...]]

This is the help text of openstack server list:
# openstack help server list
usage: openstack server list [-h] [-f {csv,json,table,value,yaml}] [-c COLUMN]
                             [--max-width <integer>] [--fit-width]
                             [--print-empty] [--noindent]
                             [--quote {all,minimal,none,nonnumeric}]
                             [--sort-column SORT_COLUMN]
                             [--reservation-id <reservation-id>]
                             [--ip <ip-address-regex>]
                             [--ip6 <ip-address-regex>] [--name <name-regex>]
                             [--instance-name <server-name>]
                             [--status <status>] [--flavor <flavor>]
                             [--image <image>] [--host <hostname>]
                             [--all-projects] [--project <project>]
                             [--project-domain <project-domain>]
                             [--user <user>] [--user-domain <user-domain>]
                             [--long] [-n] [--marker <server>]
                             [--limit <num-servers>] [--deleted]
                             [--changes-since <changes-since>]

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Oleg:

This parameter is also used in https://storyboard.openstack.org/#!/story/1547736. Please check https://review.opendev.org/#/c/558824/.

We should find this performance degradation without removing this field.

Regards.

Revision history for this message
Oleg Bondarev (obondarev) wrote :

I do not agree that 8.4739108 sec is acceptable if it can be less then 1 second. Moreover it shouldn't take so long if DNS extension is disabled, and floatingipdnses table has 0 records.

Revision history for this message
Oleg Bondarev (obondarev) wrote :

I'm not saying DNS feature is not used with floating IPs, this is just about 'dns' field in FloatingIP OVO not used anywhere in neutron code (unless I missed smth).

Revision history for this message
Mike Bayer (zzzeek) wrote :

hi there -

you can't determine "eats all the time of a request" from logs alone. you need cprofile outputs. please apply the recipe at https://docs.sqlalchemy.org/en/13/faq/performance.html#code-profiling and run the problematic function a few times and we will look to see where the time is spent. in particular, if you can provide a cprofile dump via pr.dump_stats(filename) where pr is the Profile object and attach here, I can walk you through what's going on. Typically neutron has issues with queries that are too complicated being constructed over and over again and in the past we have talked about applying the baked query extension https://docs.sqlalchemy.org/en/13/orm/extensions/baked.html?highlight=baked#module-sqlalchemy.ext.baked to solve some of these, however the work at https://review.opendev.org/#/c/609715/ seems to have been abandoned and was not carried over to neutron-lib. someone would still need to pick that up.

Revision history for this message
Mike Bayer (zzzeek) wrote :

though if you found just one relationship that seemed to be a large factor in the problem you likely should change the loading method used by that relationship. if its joined, set it to lazy. that kind of thing.

Revision history for this message
Oleg Bondarev (obondarev) wrote :

Ok, so it's not related to sqlalchemy, as I expected it's an issue with neutron DB object, fixed in Rocky: https://review.opendev.org/#/c/565358/

Changed in neutron:
status: In Progress → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Oleg Bondarev (<email address hidden>) on branch: master
Review: https://review.opendev.org/692280
Reason: The issue was fixed with https://review.opendev.org/#/c/565358/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.