Bug #1850639 “FloatingIP list bad performance” : Bugs : neutron

Lajos Katona (lajos-katona) on 2019-10-30

tags:	added: loadi
tags:	added: loadimpact removed: loadi

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-31: Fix proposed to neutron (master)

#1

Fix proposed to branch: master
Review: https://review.opendev.org/692280

Changed in neutron:
assignee:	nobody → Oleg Bondarev (obondarev)
status:	New → In Progress

Revision history for this message

LIU Yulong (dragon889) wrote on 2019-10-31:

#2

It is used:
https://specs.openstack.org/openstack/neutron-specs/specs/liberty/external-dns-resolution.html
https://review.opendev.org/#/q/external-dns-resolution

Revision history for this message

LIU Yulong (dragon889) wrote on 2019-10-31:

#3

We have a stable/queens deployment, it has 1200+ floating ips, here is the result in neutron-server:
2019-10-31 18:49:33.157 725356 INFO neutron.wsgi [None req-d1625269-5142-4f98-973f-2ba0aebbc788 570ad5d4a77b48d49670d6d0ce5ba4be 0805323d5b8a4f68b791e621a6236bc7 - default default] 12.129.209.197 "GET /v2.0/floatingips HTTP/1.1" status: 200 len: 878126 time: 8.4739108

IMO, 8.47s+ is acceptable. The CLI shows a bit long time, IMO, the data transfer and resolving data structure is time-consuming. So I'd like to this is more like a client issue, openstackclient does not support pagination for floating ip list.

# openstack help floating ip list
usage: openstack floating ip list [-h] [-f {csv,json,table,value,yaml}]
                                  [-c COLUMN] [--max-width <integer>]
                                  [--fit-width] [--print-empty] [--noindent]
                                  [--quote {all,minimal,none,nonnumeric}]
                                  [--sort-column SORT_COLUMN]
                                  [--network <network>] [--port <port>]
                                  [--fixed-ip-address <ip-address>] [--long]
                                  [--status <status>] [--project <project>]
                                  [--project-domain <project-domain>]
                                  [--router <router>]
                                  [--tags <tag>[,<tag>,...]]
                                  [--any-tags <tag>[,<tag>,...]]
                                  [--not-tags <tag>[,<tag>,...]]
                                  [--not-any-tags <tag>[,<tag>,...]]

This is the help text of openstack server list:
# openstack help server list
usage: openstack server list [-h] [-f {csv,json,table,value,yaml}] [-c COLUMN]
                             [--max-width <integer>] [--fit-width]
                             [--print-empty] [--noindent]
                             [--quote {all,minimal,none,nonnumeric}]
                             [--sort-column SORT_COLUMN]
                             [--reservation-id <reservation-id>]
                             [--ip <ip-address-regex>]
                             [--ip6 <ip-address-regex>] [--name <name-regex>]
                             [--instance-name <server-name>]
                             [--status <status>] [--flavor <flavor>]
                             [--image <image>] [--host <hostname>]
                             [--all-projects] [--project <project>]
                             [--project-domain <project-domain>]
                             [--user <user>] [--user-domain <user-domain>]
                             [--long] [-n] [--marker <server>]
                             [--limit <num-servers>] [--deleted]
                             [--changes-since <changes-since>]

We have a stable/queens deployment, it has 1200+ floating ips, here is the result in neutron-server:
2019-10-31 18:49:33.157 725356 INFO neutron.wsgi [None req-d1625269-5142-4f98-973f-2ba0aebbc788 570ad5d4a77b48d49670d6d0ce5ba4be 0805323d5b8a4f68b791e621a6236bc7 - default default] 12.129.209.197 "GET /v2.0/floatingips HTTP/1.1" status: 200  len: 878126 time: 8.4739108

IMO, 8.47s+ is acceptable. The CLI shows a bit long time, IMO, the data transfer and resolving data structure is time-consuming. So I'd like to this is more like a client issue, openstackclient does not support pagination for floating ip list.

# openstack help floating ip list
usage: openstack floating ip list [-h] [-f {csv,json,table,value,yaml}]
                                  [-c COLUMN] [--max-width <integer>]
                                  [--fit-width] [--print-empty] [--noindent]
                                  [--quote {all,minimal,none,nonnumeric}]
                                  [--sort-column SORT_COLUMN]
                                  [--network <network>] [--port <port>]
                                  [--fixed-ip-address <ip-address>] [--long]
                                  [--status <status>] [--project <project>]
                                  [--project-domain <project-domain>]
                                  [--router <router>]
                                  [--tags <tag>[,<tag>,...]]
                                  [--any-tags <tag>[,<tag>,...]]
                                  [--not-tags <tag>[,<tag>,...]]
                                  [--not-any-tags <tag>[,<tag>,...]]

This is the help text of openstack server list:
# openstack help server list
usage: openstack server list [-h] [-f {csv,json,table,value,yaml}] [-c COLUMN]
                             [--max-width <integer>] [--fit-width]
                             [--print-empty] [--noindent]
                             [--quote {all,minimal,none,nonnumeric}]
                             [--sort-column SORT_COLUMN]
                             [--reservation-id <reservation-id>]
                             [--ip <ip-address-regex>]
                             [--ip6 <ip-address-regex>] [--name <name-regex>]
                             [--instance-name <server-name>]
                             [--status <status>] [--flavor <flavor>]
                             [--image <image>] [--host <hostname>]
                             [--all-projects] [--project <project>]
                             [--project-domain <project-domain>]
                             [--user <user>] [--user-domain <user-domain>]
                             [--long] [-n] [--marker <server>]
                             [--limit <num-servers>] [--deleted]
                             [--changes-since <changes-since>]

Revision history for this message

Rodolfo Alonso (rodolfo-alonso-hernandez) wrote on 2019-10-31:

#4

Hello Oleg:

This parameter is also used in https://storyboard.openstack.org/#!/story/1547736. Please check https://review.opendev.org/#/c/558824/.

We should find this performance degradation without removing this field.

Regards.

Revision history for this message

Oleg Bondarev (obondarev) wrote on 2019-10-31:

#5

I do not agree that 8.4739108 sec is acceptable if it can be less then 1 second. Moreover it shouldn't take so long if DNS extension is disabled, and floatingipdnses table has 0 records.

Revision history for this message

Oleg Bondarev (obondarev) wrote on 2019-10-31:

#6

I'm not saying DNS feature is not used with floating IPs, this is just about 'dns' field in FloatingIP OVO not used anywhere in neutron code (unless I missed smth).

Revision history for this message

Mike Bayer (zzzeek) wrote on 2019-11-01:

#7

hi there -

you can't determine "eats all the time of a request" from logs alone. you need cprofile outputs. please apply the recipe at https://docs.sqlalchemy.org/en/13/faq/performance.html#code-profiling and run the problematic function a few times and we will look to see where the time is spent. in particular, if you can provide a cprofile dump via pr.dump_stats(filename) where pr is the Profile object and attach here, I can walk you through what's going on. Typically neutron has issues with queries that are too complicated being constructed over and over again and in the past we have talked about applying the baked query extension https://docs.sqlalchemy.org/en/13/orm/extensions/baked.html?highlight=baked#module-sqlalchemy.ext.baked to solve some of these, however the work at https://review.opendev.org/#/c/609715/ seems to have been abandoned and was not carried over to neutron-lib. someone would still need to pick that up.

Revision history for this message

Mike Bayer (zzzeek) wrote on 2019-11-01:

#8

though if you found just one relationship that seemed to be a large factor in the problem you likely should change the loading method used by that relationship. if its joined, set it to lazy. that kind of thing.

Revision history for this message

Oleg Bondarev (obondarev) wrote on 2019-11-12:

#9

Ok, so it's not related to sqlalchemy, as I expected it's an issue with neutron DB object, fixed in Rocky: https://review.opendev.org/#/c/565358/

Changed in neutron:
status:	In Progress → Invalid

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-11-13: Change abandoned on neutron (master)

#10

Change abandoned by Oleg Bondarev (<email address hidden>) on branch: master
Review: https://review.opendev.org/692280
Reason: The issue was fixed with https://review.opendev.org/#/c/565358/

neutron

FloatingIP list bad performance

Bug Description

Other bug subscribers

Remote bug watches