Inefficient host_status lookup when listing servers with details (regression)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| OpenStack Compute (nova) |
Low
|
Matt Riedemann | ||
| Stein |
Low
|
Matt Riedemann |
Bug Description
We have a performance regression since Stein [1] when listing servers with details concerning the host_status field. The code used to rely on this method [2] to cache the host status information per host when iterating over a list of instances but now it fetches it per host per instance in the view builder [3]. Granted by default policy this would only affect performance for an admin, but if I'm an admin listing 1000 servers across all tenants using "nova list --all-tenants" (which is going to use a microversion high enough to hit this) it could be a noticeable slow down compared to before Stein.
[1] https:/
[2] https:/
[3] https:/
Changed in nova: | |
status: | New → Triaged |
summary: |
Inefficient host_status lookup when listing servers with details + (regression) |
Changed in nova: | |
assignee: | nobody → Matt Riedemann (mriedem) |
Matt Riedemann (mriedem) wrote : | #2 |
To try and get some idea of the performance regression around this I created an 8VCPU/8GB RAM devstack with 4 fake compute nodes and set API_WORKERS=1, and disabled some services I don't need like horizon/
stack@devstack:~$ openstack server list --host devstack1 | grep test-vm | wc -l
105
stack@devstack:~$ openstack server list --host devstack2 | grep test-vm | wc -l
10
stack@devstack:~$ openstack server list --host devstack3 | grep test-vm | wc -l
15
stack@devstack:~$ openstack server list --host devstack4 | grep test-vm | wc -l
20
Listing servers with the 2.16 microversion to get the host_status field takes somewhere between 1 and 2 seconds:
http://
I'll then try to fix this bug and compare the timings with the fix.
Fix proposed to branch: master
Review: https:/
Changed in nova: | |
status: | Triaged → In Progress |
Matt Riedemann (mriedem) wrote : | #4 |
With the patch applied, listing 150 servers as admin with microversion 2.16 didn't seem to make much difference. I'm going to try and get up to 500 servers and see if that is more noticeable.
Matt Riedemann (mriedem) wrote : | #5 |
This time with the patch and 600 test servers it takes between 3.2 and 3.7 seconds to list servers with details as admin using 2.16:
http://
Without the patch I'm not getting much different timings:
Matt Riedemann (mriedem) wrote : | #6 |
This time with 1000 servers:
$ openstack --os-compute-
1000
Without the patch here are the timings over 10 runs:
http://
The average is 5.7822576.
With the patch here are the timings over 10 runs:
http://
The average is 5.2145055 which is better but not very noticeable, especially in an unrealistic scenario like I have where I have hundreds of servers on a single compute node.
Based on this I'll drop the severity on this bug to low.
Changed in nova: | |
importance: | High → Low |
Changed in nova: | |
assignee: | Matt Riedemann (mriedem) → Eric Fried (efried) |
Changed in nova: | |
assignee: | Eric Fried (efried) → Matt Riedemann (mriedem) |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit ab7d923ae7ed4d7
Author: Matt Riedemann <email address hidden>
Date: Tue Jun 18 11:13:32 2019 -0400
Fix GET /servers/detail host_status performance regression
Change I82b11b8866ac82
host_status extended server attribute processing from an
extension to the main servers view builder. This, however,
caused a regression in the detailed listing of servers because
it didn't incorporate the caching mechanism used previously
by the extension so now for each server with details when
microversion 2.16 or greater is used (and the request passes
the policy check), we get the host status per server even if
we have multiple servers on the same host.
This moves the host_status processing out of the show() method
when listing servers with details and processes them in aggregate
similar to security groups and attached volumes.
One catch is the show() method handles instances from down cells
for us so we have to handle that separately in the new host_status
processing, but it's trivial (just don't get host_status for
instances without a host field set).
This reverts commit 0cecd2ac324dc9a
Change-Id: I8278d4ea993ed1
Closes-Bug: #1830260
Changed in nova: | |
status: | In Progress → Fix Released |
Fix proposed to branch: stable/stein
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/stein
commit ef10d8d9a678558
Author: Matt Riedemann <email address hidden>
Date: Tue Jun 18 11:13:32 2019 -0400
Fix GET /servers/detail host_status performance regression
Change I82b11b8866ac82
host_status extended server attribute processing from an
extension to the main servers view builder. This, however,
caused a regression in the detailed listing of servers because
it didn't incorporate the caching mechanism used previously
by the extension so now for each server with details when
microversion 2.16 or greater is used (and the request passes
the policy check), we get the host status per server even if
we have multiple servers on the same host.
This moves the host_status processing out of the show() method
when listing servers with details and processes them in aggregate
similar to security groups and attached volumes.
One catch is the show() method handles instances from down cells
for us so we have to handle that separately in the new host_status
processing, but it's trivial (just don't get host_status for
instances without a host field set).
NOTE(mriedem): This backport does not revert commit
0cecd2ac324
change was only in Train.
Change-Id: I8278d4ea993ed1
Closes-Bug: #1830260
(cherry picked from commit ab7d923ae7ed4d7
This issue was fixed in the openstack/nova 19.0.2 release.
This issue was fixed in the openstack/nova 20.0.0.0rc1 release candidate.
https:/ /review. opendev. org/#/c/ 663502/ should probably be reverted when fixing this bug.