missing index on virtual_interfaces can cause long queries that can cause timeouts in launching instances

Bug #1445675 reported by Mike Bayer
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann

Bug Description

In a load test where a nova environment w/ networking enabled was set up to have ~250K instances, attempting to launch 50 instances would cause many to time out, with the error "Timeout while waiting on RPC response - topic: "network", RPC method: "allocate_for_instance". The tester isolated the latency here to queries against the virtual_interfaces table, which in this test is executed some 500 times, spending ~.5 seconds per query for a total of 200 seconds. An example query looks like:

SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '9774e729-7695-4e2b-a9b2-a104a4b020d0'
LIMIT 1;

Query profiling against this table /query directly proceeded as follows:

I scripted up direct DB access to get 250K rows in a blank database:

MariaDB [nova]> select count(*) from virtual_interfaces;
+----------+
| count(*) |
+----------+
| 250000 |
+----------+
1 row in set (0.09 sec)

emitting the query when the row is found, on this particular system is returning in .03 sec:

MariaDB [nova]> SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '0a269012-cbc7-4093-9602-35f003a766c4' LIMIT 1;
+---------------------+------------+------------+---------+-------+---------------+------------+--------------------------------------+--------------------------------------+
| created_at | updated_at | deleted_at | deleted | id | address | network_id | instance_uuid | uuid |
+---------------------+------------+------------+---------+-------+---------------+------------+--------------------------------------+--------------------------------------+
| 2014-08-12 22:22:14 | NULL | NULL | 0 | 58393 | address_58393 | 22 | 41f1b859-8c5d-4c27-a52e-3e97652dfe7a | 0a269012-cbc7-4093-9602-35f003a766c4 |
+---------------------+------------+------------+---------+-------+---------------+------------+--------------------------------------+--------------------------------------+
1 row in set (0.03 sec)

we can see that for a row not found, where it has to scan the whole table, it's 10x longer:

MariaDB [nova]> SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '0a269012-cbc7-4093-9602-35f003a766c5' LIMIT 1;
Empty set (0.14 sec)

There's nothing mysterious going on here as an EXPLAIN shows plainly that we are doing a full table scan:

MariaDB [nova]> EXPLAIN SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '0a269012-cbc7-4093-9602-35f003a766c4' LIMIT 1;
+------+-------------+--------------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | virtual_interfaces | ALL | NULL | NULL | NULL | NULL | 250170 | Using where |
+------+-------------+--------------------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.00 sec)

adding an index on the uuid field via "create index vuidx on virtual_interfaces(uuid)", the EXPLAIN now shows the index used:

MariaDB [nova]> EXPLAIN SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '0a269012-cbc7-4093-9602-35f003a766c4' LIMIT 1;
+------+-------------+--------------------+------+---------------+-------+---------+-------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------------+------+---------------+-------+---------+-------+------+------------------------------------+
| 1 | SIMPLE | virtual_interfaces | ref | vuidx | vuidx | 111 | const | 1 | Using index condition; Using where |
+------+-------------+--------------------+------+---------------+-------+---------+-------+------+------------------------------------+
1 row in set (0.00 sec)

and we get 0.00 response time for both queries:

MariaDB [nova]> SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '0a269012-cbc7-4093-9602-35f003a766c5' LIMIT 1;
Empty set (0.00 sec)

MariaDB [nova]> SELECT virtual_interfaces.created_at , virtual_interfaces.updated_at , virtual_interfaces.deleted_at , virtual_interfaces.deleted , virtual_interfaces.id , virtual_interfaces.address , virtual_interfaces.network_id , virtual_interfaces.instance_uuid , virtual_interfaces.uuid FROM virtual_interfaces WHERE virtual_interfaces.deleted = 0 AND virtual_interfaces.uuid = '0a269012-cbc7-4093-9602-35f003a766c4' LIMIT 1;
+---------------------+------------+------------+---------+-------+---------------+------------+--------------------------------------+--------------------------------------+
| created_at | updated_at | deleted_at | deleted | id | address | network_id | instance_uuid | uuid |
+---------------------+------------+------------+---------+-------+---------------+------------+--------------------------------------+--------------------------------------+
| 2014-08-12 22:22:14 | NULL | NULL | 0 | 58393 | address_58393 | 22 | 41f1b859-8c5d-4c27-a52e-3e97652dfe7a | 0a269012-cbc7-4093-9602-35f003a766c4 |
+---------------------+------------+------------+---------+-------+---------------+------------+--------------------------------------+--------------------------------------+
1 row in set (0.00 sec)

whether or not the index includes "deleted" doesn't really matter. If we're searching for UUIDs, we get that UUID row first, then the "deleted=0" is checked, not a big deal.

For an immediate fix, I propose to add the aforementioned index to the virtual_interfaces.uuid column.

Changed in nova:
assignee: nobody → Mike Bayer (zzzeek)
status: New → In Progress
Revision history for this message
Mike Bayer (zzzeek) wrote :
Revision history for this message
John Garbutt (johngarbutt) wrote :

adding kilo-rc-potential tag so we can add this into the RC, if it makes it into master in time.

tags: added: kilo-rc-potential
Changed in nova:
importance: Undecided → Medium
Thierry Carrez (ttx)
tags: removed: kilo-rc-potential
Changed in nova:
assignee: Mike Bayer (zzzeek) → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/154243
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7aa4b75da9722496c4e166b44cc04973fd441f52
Submitter: Jenkins
Branch: master

commit 7aa4b75da9722496c4e166b44cc04973fd441f52
Author: Michael Bayer <email address hidden>
Date: Thu Jun 11 08:11:44 2015 -0700

    Add an index to virtual_interfaces.uuid

    Performance issues have been reported such
    that the query performed by virtual_interface_get_by_uuid
    becomes prohibitively slow after a large (~250K) number of instances
    have been created, and an attempt is made to launch instances
    in batch. An index added to uuid shows a dramatic improvement
    to this query in EXPLAIN testing.

    This is the master branch which adds the migration as #294.

    Change-Id: I912e5a01ac63f2a7ab5533e9b6ff8738af819d67
    Closes-bug: #1445675

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/197116

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/197122

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/juno)

Change abandoned by Michael Bayer (<email address hidden>) on branch: stable/juno
Review: https://review.openstack.org/197122
Reason: we've addressed this downstream

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/kilo)

Change abandoned by Michael Bayer (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/197116
Reason: we've addressed this downstream

Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-1 → 12.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.