Database tuning required
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
Sean Dague |
Bug Description
I posted this on the ML, but I think it should also be a bug because we have some potentially serious issues. See below for details, but here are the issues (let me know if I should split into separate bugs etc):
"Critical"
Table scan of fixed_ips on the network service (row per IP address?)
Use of MyISAM tables, particularly for s3_images and block_device_
Table scan of virtual_interfaces (row per instance?)
Verify that MySQL isn't doing a table scan on http://
"Naughty"
(Mostly because the tables are small)
Table scan of s3_images
Table scan of services
Table scan of networks
Low importance
(Re-fetches aren't a big deal if the queries are fast)
Row re-fetches & re-re-fetches
---
The performance of the metadata query with cloud-init has been causing some people problems (it's so slow cloud-init times out!), and has led to the suggestion that we need lots of caching. (My hypothesis is that we don't...)
By turning on SQL debugging in SQL Alchemy (for which I've proposed a patch for Essex: https:/
I'm focusing on the SQL statements for the metadata call.
The code does this:
1) Checks the cache to see if it has the data
2) Makes a message-bus call to the network service to get the fixed_ip info from the address
3) Looks up all sort of metadata in the database
4) Formats the reply
#1 means that the first call is slower than the others, so we need to focus on the first call.
#2 could be problematic, if the message queue is overloaded or if the network service is slow to response
#3 could be problematic if the DB isn't working properly
#4 is hopefully not the problem.
The relevant SQL log from the API server: http://
And from the network server: http://
I've analyzed each of the SQL statements:
API
http://
http://
http://
http://
http://
Network
http://
http://
http://
http://
http://
http://
http://
http://
http://
http://
http://
We still have a bunch of MyISAM tables (at least with a devstack install):
http://
As I see it, these are the issues (in sort of priority order):
Critical
Table scan of fixed_ips on the network service (row per IP address?)
Use of MyISAM tables, particularly for s3_images and block_device_
Table scan of virtual_interfaces (row per instance?)
Verify that MySQL isn't doing a table scan on http://
Naughty
(Mostly because the tables are small)
Table scan of s3_images
Table scan of services
Table scan of networks
Low importance
(Re-fetches aren't a big deal if the queries are fast)
Row re-fetches & re-re-fetches
My install in nowhere near big enough for any of these to actually cause a real problem, so I'd love to get timings / a log from someone that is having a problem. Even the table scan of fixed_ips should be OK if you have enough RAM.
tags: | added: essex-rc-potential |
Changed in nova: | |
importance: | Undecided → Medium |
status: | New → Confirmed |
tags: | removed: essex-rc-potential |
Changed in nova: | |
assignee: | Sean Dague (sdague-b) → nobody |
Changed in nova: | |
assignee: | nobody → Sean Dague (sdague-b) |
Changed in nova: | |
status: | Fix Committed → In Progress |
tags: | added: db |
Changed in nova: | |
milestone: | none → folsom-1 |
Changed in nova: | |
status: | Fix Committed → Fix Released |
Changed in nova: | |
milestone: | folsom-1 → 2012.2 |
Fix proposed to branch: master /review. openstack. org/5970
Review: https:/