outerjoins used as a result of plugin architecture are inefficient
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Hi there -
I'm posting this as a bug sort of as a means to locate who best to talk about a. how critical these queries are and b. what other approaches would be feasible (I'm zzzeek on IRC).
We're talking here about the plugin architecture in neutron/
What the use of the hook has in common in these cases is that a LEFT OUTER JOIN is applied to the Query early on, in anticipation of either the filter_hook or result_filters being applied to the query, but only *possibly*, and then even within those hooks as supplied, again only *possibly*. It's these two "*possiblies*" that leads to the use of LEFT OUTER JOIN - this extra table is present in the query's FROM clause, but if we decide we don't need to filter on it, its OK! it's just a left outer join. And even, in the case of external_net_db.py, maybe we even add a criteria "WHERE <extra model id> IS NULL", that is doing a "not contains" off of this left outer join.
The result is that we can get a query like this:
SELECT a.* FROM a LEFT OUTER JOIN b ON a.id=b.aid WHERE b.id IS NOT NULL
this can happen for example if using External_
The database will usually have a much easier time if this query is expressed correctly:
SELECT a.* FROM a INNER JOIN b ON a.id=b.aid
the reason this bugs me is because the SQL output is being compromised as a result of how the plugin system is organized here. Preferable would be a system where the plugins are either organized into fewer functions that perform all the checking at once, or if the plugin system had more granularity to know that it needs to apply an optional JOIN or not.
There's a lot of ways I could propose reorganizing this but I wanted to talk to someone on IRC to make sure that no external projects are using these hooks, and to get some other background.
Overall long term I seek to consolidate the use of model_query into oslo.db, so I'm looking to take in all of its variants into a common form.
some background on how this query difference can be a big deal in many cases (more often than not), this schema has 10000 rows in A and around 1M in B, linked to a subset of rows in A. The OUTER JOIN query fails to use the index on b.aid:
mysql> EXPLAIN SELECT a.* FROM a JOIN b ON a.id=b.aid; ------- ------+ ------- +------ --+---- ------- ----+-- ------- +------ ---+--- ------- --+---- ----+-- ------- ------- ------- ---+ ------- ------+ ------- +------ --+---- ------- ----+-- ------- +------ ---+--- ------- --+---- ----+-- ------- ------- ------- ---+ ------- ------+ ------- +------ --+---- ------- ----+-- ------- +------ ---+--- ------- --+---- ----+-- ------- ------- ------- ---+
+----+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+
| 1 | SIMPLE | b | index | b_aid | b_aid | 5 | NULL | 999000 | Using where; Using index |
| 1 | SIMPLE | a | eq_ref | PRIMARY | PRIMARY | 4 | test.b.aid | 1 | NULL |
+----+
2 rows in set (0.00 sec)
mysql> EXPLAIN SELECT a.* FROM a LEFT OUTER JOIN b ON a.id=b.aid WHERE b.id IS NOT NULL; ------- ------+ ------- +------ --+---- ------- ----+-- ------- +------ ---+--- ------- --+---- ----+-- ------- ----+ ------- ------+ ------- +------ --+---- ------- ----+-- ------- +------ ---+--- ------- --+---- ----+-- ------- ----+ ------- ------+ ------- +------ --+---- ------- ----+-- ------- +------ ---+--- ------- --+---- ----+-- ------- ----+
+----+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+
| 1 | SIMPLE | b | ALL | PRIMARY,b_aid | NULL | NULL | NULL | 999000 | Using where |
| 1 | SIMPLE | a | eq_ref | PRIMARY | PRIMARY | 4 | test.b.aid | 1 | NULL |
+----+
2 rows in set (0.00 sec)
mysql>