Point deletes get locked when partition pruning in effect

Bug #1666187 reported by Rick Pizzi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL Server
Unknown
Unknown
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.6
Triaged
High
Unassigned
5.7
Invalid
High
Unassigned

Bug Description

It seems that issuing a DELETE that does partition pruning causes all other deletes to get locked in toku_db_get_range_lock().

Please consider the following table:

CREATE TABLE `BIG_STORAGE_00` (
  `HASH_ID` char(64) NOT NULL,
  `SERIALIZATION_TYPE` enum('JAVA','PROTOSTUFF') NOT NULL,
  `COMPRESSION_TYPE` enum('DEFLATE','LZ4','NONE','GZIP') NOT NULL,
  `RAW_DATA` mediumblob NOT NULL,
  `LAST_UPDATE` datetime NOT NULL,
  `EXPIRE_DATE` datetime NOT NULL,
  KEY `EXPIRE_DATE_IX` (`EXPIRE_DATE`),
  KEY `HASH_ID` (`HASH_ID`)
) ENGINE=TokuDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (TO_DAYS(EXPIRE_DATE))
(PARTITION p652 VALUES LESS THAN (736739) ENGINE = TokuDB,
 PARTITION p653 VALUES LESS THAN (736740) ENGINE = TokuDB,
 PARTITION p654 VALUES LESS THAN (736741) ENGINE = TokuDB,
 PARTITION p655 VALUES LESS THAN (736742) ENGINE = TokuDB,
 PARTITION p656 VALUES LESS THAN (736743) ENGINE = TokuDB,
 PARTITION p657 VALUES LESS THAN (736744) ENGINE = TokuDB,
 PARTITION p658 VALUES LESS THAN (736745) ENGINE = TokuDB,
 PARTITION p659 VALUES LESS THAN (736746) ENGINE = TokuDB,
....

The following DELETEs will behave correctly:

DELETE FROM testdb1.BIG_STORAGE_00 WHERE HASH_ID = SHA2('0elucwrjledekmwc6462265kjkkzivtrsjiv', 256);

The following DELETEs will take a long time and cause all other DELETEs to be locked in toku_db_get_range_lock:

DELETE FROM testdb1.BIG_STORAGE_00 WHERE HASH_ID = SHA2('9xntrpqvbjnxilpf0251103kzlwcczgvxruh', 256) AND EXPIRE_DATE > NOW() ;

The difference is only in partition pruning (first DELETE type does not prune, second one does).

In the second case, issuing a show processlist shows that the first DELETE that gets executed is taking a long time:

| 9456 | sb | localhost | sbtest | Query | 63 | Queried about 3420000 rows | DELETE FROM testdb1.BIG_STORAGE_00 WHERE HASH_ID = SHA2('6ywwdfzzzaphircc3195044astqtthdgmwwb', 256) AND EXPIRE_DATE > NOW() | 0 | 3427667 |

This is reproducibile with sysbench, any number of threads will do, even two:

| 9742 | sb | localhost | sbtest | Query | 3 | updating | DELETE FROM testdb1.BIG_STORAGE_00 WHERE HASH_ID = SHA2('0cighcnthwitbxyw3603818gzwjferdbhncq', 256) AND EXPIRE_DATE > NOW() | 0 | 0 |
| 9743 | sb | localhost | sbtest | Query | 7 | Queried about 440000 rows | DELETE FROM testdb1.BIG_STORAGE_00 WHERE HASH_ID = SHA2('0elucwrjledekmwe0146646cckwuyijcdxhv', 256) AND EXPIRE_DATE > NOW() | 0 | 444891 |

Below is the stack dump that shows the locking problem (sysbench running with 100 threads). We are being hit by this in production.

Server version: 5.6.35-80.0-log Percona Server (GPL), Release 80.0, Revision f113994f31

   2304 pthread_cond_wait,ha_tokudb::??(ha_tokudb.so),start_thread(libpthread.so.0),clone(libc.so.6)
    903 pthread_cond_wait,one_thread_per_connection_end,do_handle_one_connection,handle_one_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
     99 pthread_cond_timedwait,toku_db_wait_range_lock(ha_tokudb.so),toku_db_get_range_lock(ha_tokudb.so),ha_tokudb::prelock_range(ha_tokudb.so),ha_tokudb::read_range_first(ha_tokudb.so),ha_partition::handle_ordered_index_scan,handler::multi_range_read_next,QUICK_RANGE_SELECT::get_next,::??,mysql_delete,mysql_execute_command,mysql_parse,dispatch_command,do_handle_one_connection,handle_one_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
     18 libaio::??(libaio.so.1),::??,::??,::??,start_thread(libpthread.so.0),clone(libc.so.6)
      4 pthread_cond_timedwait,::??,::??,start_thread(libpthread.so.0),clone(libc.so.6)
      3 pthread_cond_timedwait,ha_tokudb::??(ha_tokudb.so),start_thread(libpthread.so.0),clone(libc.so.6)
      2 select(libc.so.6),::??,::??,start_thread(libpthread.so.0),clone(libc.so.6)
      2 pthread_cond_wait,::??,::??,start_thread(libpthread.so.0),clone(libc.so.6)
      1 sigwait(libpthread.so.0),signal_hand,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 sigwaitinfo(libc.so.6),::??,start_thread(libpthread.so.0),clone(libc.so.6)
      1 select(libc.so.6),::??,??,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_mutex_lock(libpthread.so.0),ha_tokudb::??(ha_tokudb.so),ha_tokudb::??(ha_tokudb.so),ha_tokudb::??(ha_tokudb.so),toku_db_start_range_lock(ha_tokudb.so),ha_tokudb::??(ha_tokudb.so),ha_tokudb::??(ha_tokudb.so),ha_tokudb::??(ha_tokudb.so),ha_tokudb::??(ha_tokudb.so),ha_tokudb::??(ha_tokudb.so),ha_tokudb::??(ha_tokudb.so),ha_tokudb::??(ha_tokudb.so),toku_ft_search(ha_tokudb.so),toku_ft_cursor_set(ha_tokudb.so),toku_c_getf_set(ha_tokudb.so),ha_tokudb::??(ha_tokudb.so),ha_tokudb::??(ha_tokudb.so),ha_tokudb::read_full_row(ha_tokudb.so),ha_tokudb::get_next(ha_tokudb.so),ha_tokudb::index_next(ha_tokudb.so),handler::ha_index_next,handler::read_range_next,ha_tokudb::read_range_next(ha_tokudb.so),ha_partition::handle_ordered_next,handler::multi_range_read_next,QUICK_RANGE_SELECT::get_next,::??,mysql_delete,mysql_execute_command,mysql_parse,dispatch_command,do_handle_one_connection,handle_one_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_wait,tokudb::background::job_manager_t::real_thread_func(ha_tokudb.so),start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,::??,ib_wqueue_timedwait,::??,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,evictor::run_eviction_thread(ha_tokudb.so),ha_tokudb::??(ha_tokudb.so),start_thread(libpthread.so.0),clone(libc.so.6)
      1 poll(libc.so.6),vio_io_wait,vio_socket_io_wait,vio_read,::??,::??,my_net_read,do_command,do_handle_one_connection,handle_one_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 poll(libc.so.6),handle_connections_sockets,mysqld_main,__libc_start_main(libc.so.6),_start
      1 nanosleep(libc.so.6),usleep(libc.so.6),ha_tokudb::??(ha_tokudb.so),start_thread(libpthread.so.0),clone(libc.so.6)

Tags: upstream
Revision history for this message
Rick Pizzi (pizzi) wrote :

Forget to mention...
Server version: 5.6.35-80.0-log Percona Server (GPL), Release 80.0, Revision f113994f31

Rick Pizzi (pizzi)
summary: - TokuDB: possible scalability issue with many inserts in multiple
- schemas
+ TokuDB: deletes by range get locked in toku_db_get_range_lock()
Rick Pizzi (pizzi)
description: updated
Revision history for this message
Rick Pizzi (pizzi) wrote : Re: TokuDB: deletes by range get locked in toku_db_get_range_lock()

We just tested this using InnoDB and guess what, the problem is not inside Toku, it is at the upper layer (MySQL)!

See below.

   2304 pthread_cond_wait,ha_tokudb::??(ha_tokudb.so),start_thread(libpthread.so.0),clone(libc.so.6)
     99 pthread_cond_wait,::??,::??,::??,::??,::??,handler::ha_index_read_map,handler::read_range_first,ha_partition::handle_ordere
d_index_scan,handler::multi_range_read_next,QUICK_RANGE_SELECT::get_next,::??,mysql_delete,mysql_execute_command,mysql_parse,dispat
ch_command,do_handle_one_connection,handle_one_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
     18 libaio::??(libaio.so.1),::??,::??,::??,start_thread(libpthread.so.0),clone(libc.so.6)
      4 pthread_cond_timedwait,::??,::??,start_thread(libpthread.so.0),clone(libc.so.6)
      3 pthread_cond_timedwait,ha_tokudb::??(ha_tokudb.so),start_thread(libpthread.so.0),clone(libc.so.6)
      2 select(libc.so.6),::??,::??,start_thread(libpthread.so.0),clone(libc.so.6)
      2 pthread_cond_wait,one_thread_per_connection_end,do_handle_one_connection,handle_one_connection,pfs_spawn_thread,start_threa
d(libpthread.so.0),clone(libc.so.6)
      2 pthread_cond_wait,::??,::??,start_thread(libpthread.so.0),clone(libc.so.6)
      1 sigwaitinfo(libc.so.6),::??,start_thread(libpthread.so.0),clone(libc.so.6)
      1 sigwait(libpthread.so.0),signal_hand,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 select(libc.so.6),::??,??,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_wait,tokudb::background::job_manager_t::real_thread_func(ha_tokudb.so),start_thread(libpthread.so.0),clone(lib
c.so.6)
      1 pthread_cond_timedwait,evictor::run_eviction_thread(ha_tokudb.so),ha_tokudb::??(ha_tokudb.so),start_thread(libpthread.so.0)
,clone(libc.so.6)
      1 pthread_cond_timedwait,::??,ib_wqueue_timedwait,::??,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pread64(libpthread.so.0),::??,::??,::??,::??,::??,::??,::??,::??,handler::ha_index_next,handler::read_range_next,ha_partiti
on::handle_ordered_next,handler::multi_range_read_next,QUICK_RANGE_SELECT::get_next,::??,mysql_delete,mysql_execute_command,mysql_p
arse,dispatch_command,do_handle_one_connection,handle_one_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6
)
      1 poll(libc.so.6),vio_io_wait,vio_socket_io_wait,vio_read,::??,::??,my_net_read,do_command,do_handle_one_connection,handle_on
e_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 poll(libc.so.6),handle_connections_sockets,mysqld_main,__libc_start_main(libc.so.6),_start
      1 nanosleep(libc.so.6),usleep(libc.so.6),ha_tokudb::??(ha_tokudb.so),start_thread(libpthread.so.0),clone(libc.so.6)

tags: removed: tokudb
summary: - TokuDB: deletes by range get locked in toku_db_get_range_lock()
+ Deletes by range get locked when partition pruning in effect
summary: - Deletes by range get locked when partition pruning in effect
+ Point deletes get locked when partition pruning in effect
Revision history for this message
Rick Pizzi (pizzi) wrote :

Updated bug title to the above effect.
This is a table without an explicit primary key.
Will try now to add a primary key and see if the problem persists.

Revision history for this message
Rick Pizzi (pizzi) wrote :

I confirm that this problem only hits when there is no primary key in the table.

Revision history for this message
Rick Pizzi (pizzi) wrote :

It also looks to be tied to the presence of partitioning.

Revision history for this message
Rick Pizzi (pizzi) wrote :
Download full text (35.8 KiB)

From further testing, it seems that the problem is due to one of the DELETEs not using the index on HASH_ID and performing a full table scan, locking all encountered rows in the process.

This behaviour is the same with TokuDB or InnoDB.

Please see example below (InnoDB) where you can clearly see that the DELETE with key value '4lxhgdldpvvujbcy5327094ofsclzyjbdeda' is taking long time and locking all rows in the process.
This is, in turn, not only locking DELETEs but also INSERTs, due to index locking on the EXPIRE_DATE_IX key.

This is with 40 threads.

>show processlist;
+------+--------+-----------+---------+---------+------+----------+------------------------------------------------------------------------------------------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+------+--------+-----------+---------+---------+------+----------+------------------------------------------------------------------------------------------------------+-----------+---------------+
| 8 | rpizzi | localhost | testdb1 | Query | 0 | init | show processlist | 0 | 0 |
| 5616 | sb | localhost | sbtest | Query | 26 | updating | DELETE FROM testdb1.BIG_STORAGE_INNO WHERE HASH_ID = SHA2('5rhzrnfvzfvzllnu1265637abroygafhsdmm', 25 | 0 | 0 |
| 5617 | sb | localhost | sbtest | Query | 26 | update | INSERT INTO testdb1.BIG_STORAGE_INNO VALUES(SHA2('3cazdcpqwhkuhtqp7403240mmovjqarjdoqm', 256), 'PRO | 0 | 0 |
| 5618 | sb | localhost | sbtest | Query | 26 | updating | DELETE FROM testdb1.BIG_STORAGE_INNO WHERE HASH_ID = SHA2('2lyxdikshdufnlzp2242581segxwsypdgecz', 25 | 0 | 0 |
| 5619 | sb | localhost | sbtest | Query | 26 | updating | DELETE FROM testdb1.BIG_STORAGE_INNO WHERE HASH_ID = SHA2('8vmfsrhtyllkqven7054433fkzycoederdoe', 25 | 0 | 0 |
| 5620 | sb | localhost | sbtest | Query | 26 | updating | DELETE FROM testdb1.BIG_STORAGE_INNO WHERE HASH_ID = SHA2('5mgeyldjwkxifigg5852506jdrlzskoicznv', 25 | 0 | 0 |
| 5621 | sb | localhost | sbtest | Query | 26 | updating | DELETE FROM testdb1.BIG_STORAGE_INNO WHERE HASH_ID = SHA2('0irydlwwywtoxuvn9130689gthtzsqgzkcvk', 25 | 0 | 0 |
| 5622 | sb | localhost | sbtest | Query | 148 | updating | DELETE FROM testdb1.BIG_STORAGE_INNO WHERE HASH_ID = SHA2('4lxhgdldpvvujbcy5327094ofsclzyjbdeda', 25 | 0 | 1157127 |
| 5623 | sb | localhost | sbtest | Query | 26 | updating | DELETE FROM testdb1.BIG_STORAGE_INNO WHERE HASH_ID = SHA2('4pyhhfwemyvqlguk3167581ddvkfsahsaydz', 25 | 0 | 0 |
| 5624 | sb | localhost | sbtest | Query | 26 | update | INSERT INTO testdb1.BIG_STORAGE_INNO VALUES(SHA2('0pdsnnaenjkrvfgu9164068pkrgktokkfmpp', 256), 'PRO | 0 | 0 |
| 5625 | sb | localhost | sb...

Revision history for this message
Rick Pizzi (pizzi) wrote :

Please note - output of processlist is truncated, but all the DELETEs are of this type:

DELETE FROM testdb1.BIG_STORAGE_INNO WHERE HASH_ID = SHA2('1bhmsimfmaacxras8463772txokjavcztfdw', 256) AND EXPIRE_DATE > NOW()

Also, this is a select - delete - insert test (autocommit mode).

Revision history for this message
Rick Pizzi (pizzi) wrote :
Download full text (3.2 KiB)

Wait, just found that the optimizer *randomly* seems to prefer the index on EXPIRE_DATE instead of the one on HASH_ID for the DELETEs......

>explain partitions DELETE FROM testdb1.BIG_STORAGE_INNO WHERE HASH_ID = SHA2('1bhmsimfmaacxras8463772txokjavcztfdw', 256) AND EXPIRE_DATE > NOW();
+----+-------------+------------------+------------------------------------------------------------------------------------------------------------------------------+-------+------------------------+----------------+---------+-------+------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------------------------------------------------------------------------------------------------------------------------------+-------+------------------------+----------------+---------+-------+------+-------------+
| 1 | SIMPLE | BIG_STORAGE_INNO | p652,p660,p661,p662,p663,p664,p665,p666,p667,p668,p669,p670,p671,p672,p673,p674,p675,p676,p677,p678,p679,p680,p681,p682,p683 | range | EXPIRE_DATE_IX,HASH_ID | EXPIRE_DATE_IX | 5 | const | 4 | Using where |
+----+-------------+------------------+------------------------------------------------------------------------------------------------------------------------------+-------+------------------------+----------------+---------+-------+------+-------------+
1 row in set (0.00 sec)

db11>explain partitions DELETE FROM testdb1.BIG_STORAGE_INNO WHERE HASH_ID = SHA2('1bhmsimfmaacxras8463772txokjavcztfdw', 256) AND EXPIRE_DATE > NOW();
+----+-------------+------------------+------------------------------------------------------------------------------------------------------------------------------+-------+------------------------+---------+---------+-------+------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------------------------------------------------------------------------------------------------------------------------------+-------+------------------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | BIG_STORAGE_INNO | p652,p660,p661,p662,p663,p664,p665,p666,p667,p668,p669,p670,p671,p672,p673,p674,p675,p676,p677,p678,p679,p680,p681,p682,p683 | range | EXPIRE_DATE_IX,HASH_ID | HASH_ID | 64 | const | 10 | Using where |
+----+-------------+------------------+------------------------------------------------------------------------------------------------------------------------------+-------+------------------------+---------+---------+-------+------+-------------+
1 row in set (0.07 sec)

This just doesn't seem right to me (HASH_ID has a high cardinality, unlike the date, and also the date is a partitioning key and should only be used for pruning purposes) but may exp...

Read more...

Revision history for this message
Rick Pizzi (pizzi) wrote :

Looks like an upstream bug, except in Oracle MySQL, it is *much* worse, as the optimizer *always* prefers the index on EXPIRE_DATE, even if cardinality of that index is almost zero:

explain DELETE FROM testdb1.BIG_STORAGE_INNO WHERE HASH_ID = SHA2('6fpsrcwcccjigdek2851400rxuafibyepqny', 256) AND EXPIRE_DATE > NOW() ;
+----+-------------+------------------+-------+------------------------+----------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+------------------------+----------------+---------+-------+------+-------------+
| 1 | SIMPLE | BIG_STORAGE_INNO | range | EXPIRE_DATE_IX,HASH_ID | EXPIRE_DATE_IX | 5 | const | 3 | Using where |
+----+-------------+------------------+-------+------------------------+----------------+---------+-------+------+-------------+
1 row in set (0.00 sec)

select INDEX_NAME, CARDINALITY FROM information_schema.STATISTICS s where s.table_schema = 'testdb1' AND s.table_name = 'BIG_STORAGE_INNO';
+----------------+-------------+
| INDEX_NAME | CARDINALITY |
+----------------+-------------+
| EXPIRE_DATE_IX | 8 |
| HASH_ID | 2657212 |
+----------------+-------------+
2 rows in set (0.00 sec)

Revision history for this message
Rick Pizzi (pizzi) wrote :
tags: added: upstream
Revision history for this message
Rick Pizzi (pizzi) wrote :

This is actually an optimizer bug, see upstream bug report.

Revision history for this message
Sveta Smirnova (svetasmirnova) wrote :

Not repeatable with 5.7

Revision history for this message
Sveta Smirnova (svetasmirnova) wrote :

On the attached dump run following queries to repeat with 5.6:

explain partitions delete FROM test.BIG_STORAGE_INNO WHERE HASH_ID = 'db1c3c3191c9c2af0cc9bcd14b5b930b8ecc3555a43c4185e901be21339ee6d9' AND EXPIRE_DATE > '2017-02-24 00:03:00' ;
explain partitions select * FROM test.BIG_STORAGE_INNO WHERE HASH_ID = 'db1c3c3191c9c2af0cc9bcd14b5b930b8ecc3555a43c4185e901be21339ee6d9' AND EXPIRE_DATE > '2017-02-24 00:03:00' ;

Revision history for this message
Rick Pizzi (pizzi) wrote :

This should be the same bug fixed in https://bugs.launchpad.net/percona-server/+bug/1671152

Revision history for this message
Sveta Smirnova (svetasmirnova) wrote :

I tested with InnoDB, same as Oracle bug. While bug #1671152 is about TokuDB.

Revision history for this message
Rick Pizzi (pizzi) wrote :

My bad, I mixed up bug reports :-) Filed to many lately!!
Please ignore my comment above. Thanks!

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1063

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.