semijoin much slower than in_to_exists
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MariaDB |
In Progress
|
High
|
Sergey Petrunia |
Bug Description
I originally commented on this bug: https:/
but now I'm not so sure that is my issue.
With semijoin=on, I am seeing more than an order of magnitude of performance drop vs. in_to_exists.
I will upload my dataset to the ftp server and followup afterwards. For now, query is:
SELECT count(*)
FROM v
LEFT JOIN c
ON v.cid = c.cid
WHERE v.t >= '2012-01-31 05:00:00'
AND v.t <= '2012-02-08 07:59:59'
AND v.did = '208'
AND c.pid = '3124'
AND v.cid IN
( SELECT c.cid
FROM c
WHERE c.pid = '3124'
AND c.s = 0
AND
(
)
);
mysql> show variables like 'optimizer_switch';
+------
| Variable_name | Value |
+------
| optimizer_switch | index_merge=
+------
1 row in set (0.02 sec)
mysql> select version();
+------
| version() |
+------
| 5.3.4-MariaDB-
+------
1 row in set (0.00 sec)
(revno: 3411)
explain with no semijoin:
+----+-
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-
| 1 | PRIMARY | v | range | cid,did | did | 17 | NULL | 275 | Using where; Using index; Using temporary; Using filesort |
| 1 | PRIMARY | c | eq_ref | PRIMARY | PRIMARY | 8 | test.v.cid | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | c | unique_subquery | PRIMARY | PRIMARY | 8 | func | 1 | Using where |
| 5 | DEPENDENT SUBQUERY | v | index_subquery | PRIMARY,cid,did | cid | 16 | func,const | 1 | Using index; Using where |
| 4 | DEPENDENT SUBQUERY | v | index_subquery | PRIMARY,cid,did | cid | 16 | func,const | 1 | Using index; Using where |
| 3 | DEPENDENT SUBQUERY | v | index_subquery | PRIMARY,cid,did | cid | 16 | func,const | 1 | Using index; Using where |
+----+-
result:
+----------+
| count(*) |
+----------+
| 275 |
+----------+
1 row in set (0.01 sec)
explain w/ semijoin (set session optimizer_
+----+-
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-
| 1 | PRIMARY | v | range | cid,did | did | 17 | NULL | 275 | Using where; Using index |
| 1 | PRIMARY | v | ref | cid,did | cid | 16 | test.v.cid,const | 1 | Using where; Using index; Start temporary |
| 1 | PRIMARY | v | ref | cid,did | cid | 16 | test.v.cid,const | 1 | Using where; Using index |
| 1 | PRIMARY | v | ref | cid,did | cid | 16 | test.v.cid,const | 1 | Using where; Using index |
| 1 | PRIMARY | c | eq_ref | PRIMARY | PRIMARY | 8 | test.v.cid | 1 | Using where |
| 1 | PRIMARY | c | eq_ref | PRIMARY | PRIMARY | 8 | test.v.cid | 1 | Using where; End temporary |
+----+-
result:
+----------+
| count(*) |
+----------+
| 275 |
+----------+
1 row in set (0.51 sec)
A couple of things I found:
With semijoin off and more tables added as left joins in Inner query 1|2|3, the query takes approx the same times (or maybe linear increase). With semijoin on and more tables added as left joins to Inner query 1|2|3, query takes much more time (30s in my tests, so not linear)
On my server with flashcache ssds backed stores and much higher end CPU (Core i7), the semijoin query takes .49s. With aws instance with trashy ebs and 2006 era AMD processors, semijoin varient takes 2.5s. The CPU/IO does not affect query times with semijoin=off. The result is always fast (0.01s).
Changed in maria: | |
assignee: | nobody → Sergey Petrunia (sergefp) |
Changed in maria: | |
status: | New → Incomplete |
status: | Incomplete → In Progress |
Changed in maria: | |
importance: | High → Medium |
The test data has been uploaded to the private FTP server, named bug_929732_ fimbulvetr. sql.bz2