outerjoins used as a result of plugin architecture are inefficient

Bug #1380823 reported by Mike Bayer
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Invalid
Undecided
Unassigned

Bug Description

Hi there -

I'm posting this as a bug sort of as a means to locate who best to talk about a. how critical these queries are and b. what other approaches would be feasible (I'm zzzeek on IRC).

We're talking here about the plugin architecture in neutron/db/common_db_mixin.py, where the register_model_query_hook() method presents a way of applying modifiers to queries. This system appears to be used by: db/external_net_db.py, plugins/ml2/plugin.py, db/portbindings_db.py, plugins/metaplugin/meta_neutron_plugin.py.

What the use of the hook has in common in these cases is that a LEFT OUTER JOIN is applied to the Query early on, in anticipation of either the filter_hook or result_filters being applied to the query, but only *possibly*, and then even within those hooks as supplied, again only *possibly*. It's these two "*possiblies*" that leads to the use of LEFT OUTER JOIN - this extra table is present in the query's FROM clause, but if we decide we don't need to filter on it, its OK! it's just a left outer join. And even, in the case of external_net_db.py, maybe we even add a criteria "WHERE <extra model id> IS NULL", that is doing a "not contains" off of this left outer join.

The result is that we can get a query like this:

    SELECT a.* FROM a LEFT OUTER JOIN b ON a.id=b.aid WHERE b.id IS NOT NULL

this can happen for example if using External_net_db_mixin, the outerjoin to ExternalNetwork is created, _network_filter_hook applies "expr.or_(ExternalNetwork.network_id != expr.null())", and that's it.

The database will usually have a much easier time if this query is expressed correctly:

   SELECT a.* FROM a INNER JOIN b ON a.id=b.aid

the reason this bugs me is because the SQL output is being compromised as a result of how the plugin system is organized here. Preferable would be a system where the plugins are either organized into fewer functions that perform all the checking at once, or if the plugin system had more granularity to know that it needs to apply an optional JOIN or not.

There's a lot of ways I could propose reorganizing this but I wanted to talk to someone on IRC to make sure that no external projects are using these hooks, and to get some other background.

Overall long term I seek to consolidate the use of model_query into oslo.db, so I'm looking to take in all of its variants into a common form.

Revision history for this message
Mike Bayer (zzzeek) wrote :

some background on how this query difference can be a big deal in many cases (more often than not), this schema has 10000 rows in A and around 1M in B, linked to a subset of rows in A. The OUTER JOIN query fails to use the index on b.aid:

 mysql> EXPLAIN SELECT a.* FROM a JOIN b ON a.id=b.aid;
 +----+-------------+-------+--------+---------------+---------+---------+------------+--------+--------------------------+
 | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
 +----+-------------+-------+--------+---------------+---------+---------+------------+--------+--------------------------+
 | 1 | SIMPLE | b | index | b_aid | b_aid | 5 | NULL | 999000 | Using where; Using index |
 | 1 | SIMPLE | a | eq_ref | PRIMARY | PRIMARY | 4 | test.b.aid | 1 | NULL |
 +----+-------------+-------+--------+---------------+---------+---------+------------+--------+--------------------------+
 2 rows in set (0.00 sec)

 mysql> EXPLAIN SELECT a.* FROM a LEFT OUTER JOIN b ON a.id=b.aid WHERE b.id IS NOT NULL;
 +----+-------------+-------+--------+---------------+---------+---------+------------+--------+-------------+
 | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
 +----+-------------+-------+--------+---------------+---------+---------+------------+--------+-------------+
 | 1 | SIMPLE | b | ALL | PRIMARY,b_aid | NULL | NULL | NULL | 999000 | Using where |
 | 1 | SIMPLE | a | eq_ref | PRIMARY | PRIMARY | 4 | test.b.aid | 1 | NULL |
 +----+-------------+-------+--------+---------------+---------+---------+------------+--------+-------------+
 2 rows in set (0.00 sec)

 mysql>

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

I think it might be better to start off a thread in a mailing list to discuss this/
Plus there might be some interest to discuss the issue at the summit

Changed in neutron:
status: New → Invalid
Revision history for this message
Mike Bayer (zzzeek) wrote :

I feel like the ML is where my ideas go to die of being ignored :). will try thanks

Revision history for this message
Mike Bayer (zzzeek) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.