extended_volumes slows down the nova instance list by 40..50%
Bug #1359808 reported by
Attila Fazekas
This bug affects 2 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
Diana Clarke |
Bug Description
When listing ~4096 instances, the nova API (n-api) service has high CPU(100%) usage because it does individual SELECTs,
for every server's block_device_
Please use more efficient way for getting the block_device_
This line initiating the individual select:
https:/
description: | updated |
tags: | added: volumes |
Changed in nova: | |
assignee: | Dan Smith (danms) → Diana Clarke (diana-clarke) |
tags: | added: performance |
Changed in nova: | |
assignee: | Diana Clarke (diana-clarke) → Abhijeet Malawade (abhijeet-malawade) |
Changed in nova: | |
assignee: | Abhijeet Malawade (abhijeet-malawade) → nobody |
Changed in nova: | |
assignee: | nobody → Diana Clarke (diana-clarke) |
Changed in nova: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
The extra ~20-25% is neutron port listing, if neutron is enabled (+~9-10 sec).
The database itself is very fast. I can query the full tables under 0.02 sec.
You can boot up 4096 instances in BUILD scheduling->ERROR state even on a notebook/ or in small vm by stopping the n-sch.
1. $ nova quota-update --instances -1 --cores -1 --ram -1 --floating-ips -1 --fixed-ips -1 <my-tenant> # as admin limit=10000 /etc/nova/ nova.conf; (otherwise you just see 1000 instances, because the client does not gets the next page)
2. kill the n-sch (./rejoin-stack.sh Ctrl+A + " , select n-sch, Ctrl+C , Ctr + A D)
3. increase the osapi_max_
restart the n-api service
similar to the n-sch stopping, but after the Ctrl+C, you need to press the UP arrow and enter, basically restarting the previously terminated service
4. create 4k instance 0.3.2-x86_ 64-uec
$ nova boot test4k --min-count 4096 --max-count 4096 --flavor 42 --image cirros-
5. test
$ time nova list | wc -l
Looks like creating the full response is not fast even without the extended_volumes, but extended_volumes has the biggest contribution to the response time.
The above numbers seen at ~@2GHz physical machine with neutron icehouse.
With above n-sch killing method (all L2 vm in ERROR state), on VM on 3.4GHz with nova network, with latest master/juno I see ~22 sec with the unmodified extended_volumes, and 13 sec when the ExtendedVolumes Controller. _extend_ server replaced with a 'return'.
With the n-sch killing way the instances does not have all the normal attributes (like ip addresses), so it can be an another reason why it was faster this time.
The client warm up time ~1 sec and the authentication included in the time.