Wrong result with subquery semijoin materialization and outer join

Bug #795530 reported by Timour Katchaounov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Fix Released
High
Sergey Petrunia

Bug Description

The following query over the 'world' database incorrectly produces wrong result
of 1 row instead of an empty result:

create database world;
use world;
source mysql-test/include/world_schema.inc
source mysql-test/include/world.inc

set @@optimizer_switch='materialization=on,semijoin=on';

MariaDB [world]> select * from CountryLanguage where (Language, Country) IN (SELECT City.Name, Country.Code FROM City LEFT JOIN Country ON (Country = Code and Code > 'LLL'));
+---------+----------+------------+
| Country | Language | Percentage |
+---------+----------+------------+
| KEN | Meru | 5.5 |
+---------+----------+------------+
1 row in set (0.22 sec)

set @@optimizer_switch='materialization=on,semijoin=off';
=> empty result;

Revision history for this message
Timour Katchaounov (timour) wrote :

In addition, running the above query with semijoin materialization is twice slower
than with non-semijoin materialization.

set @@optimizer_switch='materialization=on,semijoin=off';

MariaDB [world]> select * from CountryLanguage where (Language, Country) IN (SELECT City.Name, Country.Code FROM City LEFT JOIN Country ON (Country = Code and Code > 'LLL'));

Empty set (0.10 sec)

Notice 0.1 sec vs 0.22 sec, averaged over several executions.

Changed in maria:
status: New → Confirmed
assignee: nobody → Sergey Petrunia (sergefp)
importance: Undecided → High
milestone: none → 5.3
Revision history for this message
Sergey Petrunia (sergefp) wrote :

A smaller testcase:

create table t1 (a int);
create table t2 (a int, b char(10));

insert into t1 values (1),(2);
insert into t2 values (1, 'one'), (3, 'three');

create table t3 (b char(10));
insert into t3 values('three'),( 'four');
insert into t3 values('three'),( 'four');
insert into t3 values('three'),( 'four');
insert into t3 values('three'),( 'four');

MariaDB [j48]> set optimizer_switch='semijoin=off';
Query OK, 0 rows affected (0.00 sec)

MariaDB [j48]> select * from t3 where t3.b in (select t2.b from t1 left join t2 on t1.a=t2.a);
Empty set (0.01 sec)

MariaDB [j48]> set optimizer_switch='semijoin=on';
Query OK, 0 rows affected (0.00 sec)

MariaDB [j48]> select * from t3 where t3.b in (select t2.b from t1 left join t2 on t1.a=t2.a);
+-------+
| b |
+-------+
| three |
| three |
| three |
| three |
+-------+
4 rows in set (0.00 sec)

Revision history for this message
Sergey Petrunia (sergefp) wrote :

The problem seems to be in SJ-Merged materialization. For some reason, end_sj_materialize() fails to see that it is writing columns of NULL-comlemented rows, and writes some (previous?) non-NULL data instead.

Revision history for this message
Sergey Petrunia (sergefp) wrote :

The wrong query result problem has been fixed, and the fix has been pushed.

Revision history for this message
Sergey Petrunia (sergefp) wrote :

As for performance:

The difference between performance of semi-join and non-semi-join materialization plans is because
non-semijoin materialization does materialization+lookup (its only strategy):

+----+-------------+-----------------+--------+---------------+---------+---------+-----------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------+------+--------------------------+
| 1 | PRIMARY | CountryLanguage | ALL | NULL | NULL | NULL | NULL | 984 | Using where |
| 2 | SUBQUERY | City | ALL | NULL | NULL | NULL | NULL | 4079 | |
| 2 | SUBQUERY | Country | eq_ref | PRIMARY | PRIMARY | 3 | j4.City.Country | 1 | Using where; Using index |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------+------+--------------------------+

while semi-join materialization does materialization+scan

+----+-------------+-----------------+--------+---------------+---------+---------+------------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+--------+---------------+---------+---------+------------------------------+------+--------------------------+
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 4079 | |
| 1 | PRIMARY | CountryLanguage | eq_ref | PRIMARY | PRIMARY | 33 | j4.Country.Code,j4.City.Name | 1 | Using index condition |
| 2 | SUBQUERY | City | ALL | NULL | NULL | NULL | NULL | 4079 | |
| 2 | SUBQUERY | Country | eq_ref | PRIMARY | PRIMARY | 3 | j4.City.Country | 1 | Using where; Using index |
+----+-------------+-----------------+--------+---------------+---------+---------+------------------------------+------+--------------------------+

Revision history for this message
Sergey Petrunia (sergefp) wrote :

I'm getting the following average query times (in seconds):
semi-join materialization, mean: 0.1042105
non-semi-join materialzation, mean: 0.1852632

Revision history for this message
Sergey Petrunia (sergefp) wrote :

The slowdown will be addressed separately here: https://bugs.launchpad.net/maria/+bug/806894

Changed in maria:
status: Confirmed → Fix Committed
Changed in maria:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.