semijoin much slower than in_to_exists

Bug #929732 reported by fimbulvetr on 2012-02-09
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
In Progress
High
Sergey Petrunia

Bug Description

I originally commented on this bug: https://bugs.launchpad.net/maria/+bug/806894

but now I'm not so sure that is my issue.

With semijoin=on, I am seeing more than an order of magnitude of performance drop vs. in_to_exists.

I will upload my dataset to the ftp server and followup afterwards. For now, query is:

SELECT count(*)
FROM v
         LEFT JOIN c
         ON v.cid = c.cid
WHERE v.t >= '2012-01-31 05:00:00'
AND v.t <= '2012-02-08 07:59:59'
AND v.did = '208'
AND c.pid = '3124'
AND v.cid IN
         ( SELECT c.cid
         FROM c
         WHERE c.pid = '3124'
         AND c.s = 0
         AND
                 (
                         (
                                 c.cid IN
                                 ( /*Inner query 1*/ SELECT v.cid
                                 FROM v
                                 WHERE v.did = 208
                                 AND v.t >= '2012-01-31 05:00:00'
                                 AND v.t <= '2012-02-08 07:59:59'
                                 )
                         )
                 AND
                         (
                                 c.cid IN
                                 ( /*Inner query 2*/ SELECT v.cid
                                 FROM v
                                 WHERE v.did = 208
                                 AND v.t >= '2012-01-31 05:00:00'
                                 AND v.t <= '2012-02-08 07:59:59'
                                 )
                         )
                 AND
                         (
                                 c.cid IN
                                 ( /*Inner query 3*/ SELECT v.cid
                                 FROM v
                                 WHERE v.did = 208
                                 AND v.t >= '2012-01-31 05:00:00'
                                 AND v.t <= '2012-02-08 07:59:59'
                                 )
                         )
                 )
         );

mysql> show variables like 'optimizer_switch';
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| optimizer_switch | index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=on,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=on,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)

mysql> select version();
+----------------------+
| version() |
+----------------------+
| 5.3.4-MariaDB-rc-log |
+----------------------+
1 row in set (0.00 sec)

(revno: 3411)

explain with no semijoin:
+----+--------------------+-------+-----------------+-----------------+---------+---------+------------+------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-----------------+-----------------+---------+---------+------------+------+-----------------------------------------------------------+
| 1 | PRIMARY | v | range | cid,did | did | 17 | NULL | 275 | Using where; Using index; Using temporary; Using filesort |
| 1 | PRIMARY | c | eq_ref | PRIMARY | PRIMARY | 8 | test.v.cid | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | c | unique_subquery | PRIMARY | PRIMARY | 8 | func | 1 | Using where |
| 5 | DEPENDENT SUBQUERY | v | index_subquery | PRIMARY,cid,did | cid | 16 | func,const | 1 | Using index; Using where |
| 4 | DEPENDENT SUBQUERY | v | index_subquery | PRIMARY,cid,did | cid | 16 | func,const | 1 | Using index; Using where |
| 3 | DEPENDENT SUBQUERY | v | index_subquery | PRIMARY,cid,did | cid | 16 | func,const | 1 | Using index; Using where |
+----+--------------------+-------+-----------------+-----------------+---------+---------+------------+------+-----------------------------------------------------------+

result:

+----------+
| count(*) |
+----------+
| 275 |
+----------+
1 row in set (0.01 sec)

explain w/ semijoin (set session optimizer_switch='semijoin=on';)

+----+-------------+-------+--------+---------------+---------+---------+------------------+------+-------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+-------------------------------------------+
| 1 | PRIMARY | v | range | cid,did | did | 17 | NULL | 275 | Using where; Using index |
| 1 | PRIMARY | v | ref | cid,did | cid | 16 | test.v.cid,const | 1 | Using where; Using index; Start temporary |
| 1 | PRIMARY | v | ref | cid,did | cid | 16 | test.v.cid,const | 1 | Using where; Using index |
| 1 | PRIMARY | v | ref | cid,did | cid | 16 | test.v.cid,const | 1 | Using where; Using index |
| 1 | PRIMARY | c | eq_ref | PRIMARY | PRIMARY | 8 | test.v.cid | 1 | Using where |
| 1 | PRIMARY | c | eq_ref | PRIMARY | PRIMARY | 8 | test.v.cid | 1 | Using where; End temporary |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+-------------------------------------------+

result:

+----------+
| count(*) |
+----------+
| 275 |
+----------+
1 row in set (0.51 sec)

A couple of things I found:

With semijoin off and more tables added as left joins in Inner query 1|2|3, the query takes approx the same times (or maybe linear increase). With semijoin on and more tables added as left joins to Inner query 1|2|3, query takes much more time (30s in my tests, so not linear)

On my server with flashcache ssds backed stores and much higher end CPU (Core i7), the semijoin query takes .49s. With aws instance with trashy ebs and 2006 era AMD processors, semijoin varient takes 2.5s. The CPU/IO does not affect query times with semijoin=off. The result is always fast (0.01s).

fimbulvetr (fimbulvetr) wrote :

The test data has been uploaded to the private FTP server, named bug_929732_fimbulvetr.sql.bz2

summary: - semijoin much slower than materialization
+ semijoin much slower than in_to_exists
Changed in maria:
assignee: nobody → Sergey Petrunia (sergefp)
Changed in maria:
status: New → Incomplete
status: Incomplete → In Progress
Sergey Petrunia (sergefp) wrote :

On my laptop, the query takes ~0.8 sec when semi-join is enabled. and 0.01 sec when it is not.

EXPLAINs are same as posted. For both plans, #rows column has values of {275,1,1,1,1,1}, which gives an estimate of about 1375 rows read.

Sergey Petrunia (sergefp) wrote :

(run the query without semi-join)

MariaDB [information_schema]> select * from table_statistics;
+--------------+------------+-----------+--------------+------------------------+
| TABLE_SCHEMA | TABLE_NAME | ROWS_READ | ROWS_CHANGED | ROWS_CHANGED_X_INDEXES |
+--------------+------------+-----------+--------------+------------------------+
| bug929732 | c | 420 | 0 | 0 |
| bug929732 | v | 1509 | 0 | 0 |
+--------------+------------+-----------+--------------+------------------------+
2 rows in set (0.00 sec)

(run the query with semi-join)

MariaDB [information_schema]> select * from table_statistics;
+--------------+------------+-----------+--------------+------------------------+
| TABLE_SCHEMA | TABLE_NAME | ROWS_READ | ROWS_CHANGED | ROWS_CHANGED_X_INDEXES |
+--------------+------------+-----------+--------------+------------------------+
| bug929732 | c | 918 | 0 | 0 |
| bug929732 | v | 896565 | 0 | 0 |
+--------------+------------+-----------+--------------+------------------------+

Apparently, the plan with semi-join reads many more records. We'll need to
figure out if that is justified, and if yes, why the optimizer could not
foresee that.

Changed in maria:
importance: Undecided → High
Sergey Petrunia (sergefp) wrote :

The problem seems to be related to nested subqueries. The top-level subquery is equivalent to a join, if I manually change it to be a join:
SELECT count(*)
FROM (v
         LEFT JOIN c
         ON v.cid = c.cid)
   join c as c2
WHERE v.t >= '2012-01-31 05:00:00'
AND v.t <= '2012-02-08 07:59:59'
AND v.did = '208'
AND c.pid = '3124'
AND v.cid = c2.cid
AND c2.pid = '3124'
AND c2.s = 0
AND
 (
         (
                 c2.cid IN
                 ( /*Inner query 1*/ SELECT v2.cid
                 FROM v AS v2
                 WHERE v2.did = 208
                 AND v2.t >= '2012-01-31 05:00:00'
                 AND v2.t <= '2012-02-08 07:59:59'
                 )
         )
 AND
         (
                 c2.cid IN
                 ( /*Inner query 2*/ SELECT v3.cid
                 FROM v as v3
                 WHERE v3.did = 208
                 AND v3.t >= '2012-01-31 05:00:00'
                 AND v3.t <= '2012-02-08 07:59:59'
                 )
         )
 AND
         (
                 c2.cid IN
                 ( /*Inner query 3*/ SELECT v4.cid
                 FROM v as v4
                 WHERE v4.did = 208
                 AND v4.t >= '2012-01-31 05:00:00'
                 AND v4.t <= '2012-02-08 07:59:59'
                 )
         )
 );

.. then the query takes 0.02-0.03 sec

Changed in maria:
milestone: none → 5.3
Sergey Petrunia (sergefp) wrote :

^^ I mean it takes 0.02 - 0.03 sec regardless of whether I have semijoin=on or semijoin=off

Sergey Petrunia (sergefp) wrote :

Took the original query and changed table names inside deepest-level subqueries to be v2, v3, v4 (so we can tell them apart in the counters), and I get this:

+----+-------------+-------+--------+---------------+---------+---------+-----------------------+------+-------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+------+-------------------------------------------+
| 1 | PRIMARY | v | range | cid,did | did | 17 | NULL | 275 | Using where; Using index |
| 1 | PRIMARY | v2 | ref | cid,did | cid | 16 | bug929732.v.cid,const | 1 | Using where; Using index; Start temporary |
| 1 | PRIMARY | v3 | ref | cid,did | cid | 16 | bug929732.v.cid,const | 1 | Using where; Using index |
| 1 | PRIMARY | v4 | ref | cid,did | cid | 16 | bug929732.v.cid,const | 1 | Using where; Using index |
| 1 | PRIMARY | c | eq_ref | PRIMARY | PRIMARY | 8 | bug929732.v.cid | 1 | Using where |
| 1 | PRIMARY | c2 | eq_ref | PRIMARY | PRIMARY | 8 | bug929732.v.cid | 1 | Using where; End temporary |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+------+-------------------------------------------+
6 rows in set (0.00 sec)

statistics shows this:
MariaDB [bug929732]> SHOW TABLE_STATISTICS;
+--------------+------------+-----------+--------------+-------------------------+
| Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+--------------+------------+-----------+--------------+-------------------------+
| bug929732 | v | 276 | 0 | 0 |
| bug929732 | v2 | 3994 | 0 | 0 |
| bug929732 | v3 | 53506 | 0 | 0 |
| bug929732 | v4 | 837280 | 0 | 0 |
| bug929732 | c | 498 | 0 | 0 |
+--------------+------------+-----------+--------------+-------------------------+
5 rows in set (0.00 sec)

MariaDB [bug929732]> SHOW INDEX_STATISTICS;
+--------------+------------+------------+-----------+
| Table_schema | Table_name | Index_name | Rows_read |
+--------------+------------+------------+-----------+
| bug929732 | v | did | 276 |
| bug929732 | v2 | cid | 3994 |
| bug929732 | v3 | cid | 53506 |
| bug929732 | v4 | cid | 837280 |
| bug929732 | c | PRIMARY | 498 |
+--------------+------------+------------+-----------+

Michael Widenius (monty) on 2012-03-20
Changed in maria:
importance: High → Medium
fimbulvetr (fimbulvetr) wrote :

Hi,

I just confirmed this problem is in maria 5.5.24 (Compiled from source) build as well. We are eagerly awaiting this fix as semijoin is otherwise quite good. I am willing to take a number of steps to help fix this bug.

Sergey Petrunia (sergefp) wrote :

Thanks. I'll raise the priority.

Changed in maria:
milestone: 5.3 → 5.5
importance: Medium → High
fimbulvetr (fimbulvetr) wrote :

Hi Sergey,

That is a very interesting last result you posted. I just compiled 5.5.25 and attempted to reproduce the SHOW TABLE_STATISTICS like this:

+--------------+------------+-----------+--------------+-------------------------+
| Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+--------------+------------+-----------+--------------+-------------------------+
| bug929732 | v | 276 | 0 | 0 |
| bug929732 | v2 | 3994 | 0 | 0 |
| bug929732 | v3 | 53506 | 0 | 0 |
| bug929732 | v4 | 837280 | 0 | 0 |
| bug929732 | c | 498 | 0 | 0 |
+--------------+------------+-----------+--------------+-------------------------+

However, I must confess that I may not know how to get table_statistics to recognize v, v2, v3 and v4 as different tables in the output. Mine always show only v on table_statistics, but the explain output does recognize them as different tables. I am using:

SELECT count(*)
FROM v
         LEFT JOIN c
         ON v.cid = c.cid
WHERE v.t >= '2012-01-31 05:00:00'
AND v.t <= '2012-02-08 07:59:59'
AND v.did = '208'
AND c.pid = '3124'
AND v.cid IN
         ( SELECT c.cid
         FROM c
         WHERE c.pid = '3124'
         AND c.s = 0
         AND
                 (
                         (
                                 c.cid IN
                                 ( /*Inner query 1*/ SELECT v2.cid
                                 FROM v as v2
                                 WHERE v2.did = 208
                                 AND v2.t >= '2012-01-31 05:00:00'
                                 AND v2.t <= '2012-02-08 07:59:59'
                                 )
                         )
                 AND
                         (
                                 c.cid IN
                                 ( /*Inner query 2*/ SELECT v3.cid
                                 FROM v as v3
                                 WHERE v3.did = 208
                                 AND v3.t >= '2012-01-31 05:00:00'
                                 AND v3.t <= '2012-02-08 07:59:59'
                                 )
                         )
                 AND
                         (
                                 c.cid IN
                                 ( /*Inner query 3*/ SELECT v4.cid
                                 FROM v as v4
                                 WHERE v4.did = 208
                                 AND v4.t >= '2012-01-31 05:00:00'
                                 AND v4.t <= '2012-02-08 07:59:59'
                                 )
                         )
                 )
         );

I had noticed that each v (v, v2, v3, v4) in your statistics reads roughly 14x as many rows as the previous. I was trying to see if a v5 would read ~14*837280.

That is an interesting pattern and may lead us to the culprit. I have a debug build compiled, if you can paste in the commands I need to start mysql with to get the relevant debugging for the optimizer, I can have a look at the source and better understand the problem.

Sergey Petrunia (sergefp) wrote :

> However, I must confess that I may not know how to get table_statistics to recognize v, v2, v3 and v4 as different tables in the output.

I do it like this:

create table v2 like v;
insert into v2 select * from v;
## same for v3, v4

make the different subqueries access the created tables v2, v3, v4.

Sergey Petrunia (sergefp) wrote :

To re-iterate:

When I try the flattened (where 'SELECT c2.cid FROM c2' subquery is changed
into a join), I get a decent plan.

It executes in 0.04 secs (my laptop, debug build) and reads 2766 rows.
Playing with semi-join opimization flags in optimizer switch allows one to get
several query plans which all do 1600-2700 reads and are done in 0.03-0.04
sec.

Sergey Petrunia (sergefp) wrote :

For the original query, the explain looks decent: http://mariadb.org/explain_analyzer/analyze/muybt

rows={275, 1, 1, 1, 1, 1}, which gives total_rows =
275+275+275+275+275+275 = 1650, which is on par with the flattened query plans.

The execution is different, though:

+--------------+------------+-----------+
| Table_schema | Table_name | Rows_read |
+--------------+------------+-----------+
| bug929732 | v | 276 | # OK
| bug929732 | v2 | 3994 | # on estimate, should be 276 x 1=276. Instead, it is 14.5 x times more.
| bug929732 | v3 | 53506 | # again, 13.4 times more
| bug929732 | v4 | 837280 | # again, 15.6 times more

| bug929732 | c | 249 | # Now, we are back to small number of reads
| bug929732 | c2 | 249 | # and here, too
+--------------+------------+-----------+

Sergey Petrunia (sergefp) wrote :

## Lets check index statistics:
+-------+------------+----------+--------------+-------------+-----------+-------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality |
+-------+------------+----------+--------------+-------------+-----------+-------------+
| v2 | 1 | cid | 1 | cid | A | 589 |
| v2 | 1 | cid | 2 | did | A | 589 |

## For full scan on table v2, EXPLAIN gives rows=1083, which gives
rec_per_key=1.8. EXPLAIN shows '1'.

Checking how many records subquery with table v2 has for table v:

select count(*) from v where
  v.t >= '2012-01-31 05:00:00'
  AND v.t <= '2012-02-08 07:59:59'
  AND v.did = '208';
# This gives 275 rows, which confirms the previous findings

select count(*) from v where
v.t >= '2012-01-31 05:00:00'
AND v.t <= '2012-02-08 07:59:59'
AND v.did = '208'
and v.cid IN
   ( /*Inner query 1*/ SELECT v2.cid
   FROM v2
   WHERE v2.did = 208
   AND v2.t >= '2012-01-31 05:00:00'
   AND v2.t <= '2012-02-08 07:59:59'
   );
## ^^ 275 rows again: the subquery has a match for each row of table v.

select count(*) from v,v2 where
v.t >= '2012-01-31 05:00:00'
AND v.t <= '2012-02-08 07:59:59'
AND v.did = '208'
and v.cid = v2.cid and
v2.did = 208
   AND v2.t >= '2012-01-31 05:00:00'
   AND v2.t <= '2012-02-08 07:59:59'
   ;
(output is count(*)=1145)

+--------------+------------+-----------+
| Table_schema | Table_name | Rows_read |
+--------------+------------+-----------+
| bug929732 | v | 276 |
| bug929732 | v2 | 3994 |
+--------------+------------+-----------+

## This gives: total subquery's fanout is 1145/276= 4x,
## the number of records that one needs to read is 15x.

Sergey Petrunia (sergefp) wrote :

== Observation about the dataset ==
- The subqueries have high fanout, it is 4x overall, 15x if one counts the
  number of rows that need to be examined.
- The real fanout is much higher than the estimate provided to the 1x.

== Analysis ==
Semi-join optimizer/run-time is incapable of processing nested subqueries. When
it encounters nested subuqeries it will flatten them. The original query of
this bug has this form:

 v LEFT JOIN c SEMI JOIN (c2 SEMI JOIN v2
                             SEMI JOIN v3
                             SEMI JOIN v4)

Note the 2-level semi-join nesting. The nesting is eliminated: the query is
converted to

 v LEFT JOIN c JOIN c2 SEMI JOIN (v2 JOIN v3 JOIN v4)

All subqueries are put together into one cross-product-join-subquery. Fanout
of this new subquery is a product of fanouts of each subqueries.

In some cases, it is not a problem. For instance, consider FirstMatch strategy
which is run on a dataset where the every table from the subquery immediately
produces a matching record (that is, the first row read from the table happens
to satisfy relevant part of the WHERE clause).
In this case, FirstMatch's execution short-cutting property will make
subquery's fanout irrelevant: only the first matching record will be read.

However, for the query in this bug, the plan was Duplicate Weedout:

+-----+------+-----------+----+-----------------------------------------+
|table|type |ref |rows|Extra |
+-----+------+-----------+----+-----------------------------------------+
|v |range |NULL | 275|Using where; Using index |
|v2 |ref |v.cid,const| 1|Using where; Using index; Start temporary|
|v3 |ref |v.cid,const| 1|Using where; Using index |
|v4 |ref |v.cid,const| 1|Using where; Using index |
|c |eq_ref|v.cid | 1|Using where |
|c2 |eq_ref|v.cid | 1|Using where; End temporary |
+-----+------+-----------+----+-----------------------------------------+

the semi-join nest consists of tables (v2, v3, v4), and it is correlated with
table c because of the IN-equalities:

  c2.cid=v2.cid AND c2.cid=v3.cid AND c2.cid=v4.cid

Duplicate Elimination operates on the join order of "v2,v3,v4,c,c2". Note that
the outer table is at the end. When the outer table is at the end, Duplicate
Elimination is unable to do any short-cutting, and so ends up reading many more records.

Sergey Petrunia (sergefp) wrote :

This is not easy to solve. Created https://mariadb.atlassian.net/browse/MDEV-401 to work on a solution.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers