Comment 0 for bug 2048396

Revision history for this message
Christian Rohmann (christian-rohmann) wrote : backups db table lacks indexes to speed up common queries

I observed strangely long MariaDB duration for queries to the backups table which is used by cinder-backup. This is very likely due to the fact that there no indexes helping to make common queries efficient:

```
EXPLAIN SELECT * FROM backups WHERE project_id="myproject";
+------+-------------+---------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+---------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | backups | ALL | NULL | NULL | NULL | NULL | 431890 | Using where |
+------+-------------+---------+------+---------------+------+---------+------+--------+-------------+

EXPLAIN SELECT * FROM backups WHERE volume_id="myproject" AND deleted=0;
+------+-------------+---------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+---------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | backups | ALL | NULL | NULL | NULL | NULL | 431890 | Using where |
+------+-------------+---------+------+---------------+------+---------+------+--------+-------------+
```

```
SHOW indexes FROM backups;
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Ignored |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| backups | 0 | PRIMARY | 1 | id | A | 431646 | NULL | NULL | | BTREE | | | NO |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+

```

As you can see I have >500k entries in this table, but most of them are (marked as) deleted:

```
SELECT deleted, count(deleted) FROM backups GROUP BY deleted;
+---------+----------------+
| deleted | count(deleted) |
+---------+----------------+
| 0 | 35689 |
| 1 | 482261 |
+---------+----------------+

```

I know about the `cinder-manage db purge` [2] tool. But on larger clouds a few (ten)thousand rows of deleted volumes / backups / ... do accumulate before they are purged away.

And so very common queries for e.g. backups for a certain project_id or volume_id will end up as sequential scans using where.

You might argue that the table is simple and likely cached. But it's not about a single query for a list of backups for a certain project. It's rather about tooling making more intensive use of the API being slowed down unnecessarily.

The volumes table in comparison certainly has some indexes:

```
SHOW indexes FROM volumes;
+---------+------------+--------------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Ignored |
+---------+------------+--------------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| volumes | 0 | PRIMARY | 1 | id | A | 78598 | NULL | NULL | | BTREE | | | NO |
| volumes | 1 | ix_volumes_consistencygroup_id | 1 | consistencygroup_id | A | 2 | NULL | NULL | YES | BTREE | | | NO |
| volumes | 1 | ix_volumes_group_id | 1 | group_id | A | 2 | NULL | NULL | YES | BTREE | | | NO |
| volumes | 1 | volumes_service_uuid_idx | 1 | service_uuid | A | 2 | NULL | NULL | YES | BTREE | | | NO |
| volumes | 1 | volumes_service_uuid_idx | 2 | deleted | A | 4 | NULL | NULL | YES | BTREE | | | NO |
+---------+------------+--------------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
```

and volumes_service_uuid_idx also covering the deleted column, One might argue that NOT having the deleted column as first in the compound index [1] makes looking up non-deleted volumes without service_uuid inefficient as the index will not be used:

```
EXPLAIN * FROM volumes WHERE deleted=0;
+------+-------------+---------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+---------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | volumes | ALL | NULL | NULL | NULL | NULL | 78598 | Using where |
+------+-------------+---------+------+---------------+------+---------+------+-------+-------------+
```

Would it not make sense to have indexes on volume_id, project_id and also the deleted column to cover the most common selections?

[1] https://mariadb.com/kb/en/compound-composite-indexes/
[2] https://docs.openstack.org/cinder/latest/cli/cinder-manage.html