pt-online-schema-change causes deadlocks under heavy write load

Bug #988036 reported by DJ Hanson
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
High
Daniel Nichter
2.1
Fix Released
High
Daniel Nichter

Bug Description

While attempting to run some tests to determine the expected additional load pt-online-schema-change would cause in write heavy environment I have run into a repeatable deadlock problem.

I am trying to answer a couple questions before using this tool in our production environment. Those questions are
1) Can this tool be used safely in a production environment under heavy write load.
2) What is the potential impact to write throughput while the schema change is in flight.

In order to accomplish this I have done the following
1) Take a mysqldump snapshot of the production database.
2) Copied 1 weeks worth of binlogs, starting with the transaction referenced in the mysqldump, and processed them into a stream of sql files

My experimental control:
1) Restore the production snapshot
2) Replay the processed binlogs timing how long it takes to apply each one (e.g. for X in binlog*.sql; do time mysql -u root --default-character-set=utf8 < $X; echo $X; done)

My experimental procedure is identical to my control except that while the binlogs being streamed into mysql run the online schema change (e.g. pt-online-schema-change --alter "ADD COLUMN aNewNumber BIGINT(20) NULL DEFAULT NULL" --drop-old-table --user root -A utf8 --execute D=somedb,t=someTable )

The version of the percona toolkit is 2.1.1-1
$ pt-online-schema-change --version
pt-online-schema-change 2.1.1

The mysql database version is Percona 5.5.16-55
mysql> show variables like 'version%';
+-------------------------+-----------------------------------------------------+
| Variable_name | Value |
+-------------------------+-----------------------------------------------------+
| version | 5.5.16-55 |
| version_comment | Percona Server (GPL), Release rel22.0, Revision 189 |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
+-------------------------+-----------------------------------------------------+

The full (sanitized) output one of the runs that resulted in the deadlock follows including the innodb status immediately after pt-online-schema-change bailed out.

pt-online-schema-change --alter "ADD COLUMN newby BIGINT(20) NULL DEFAULT NULL" --drop-old-table --user root -A utf8 --execute D=somedb,t=sometable
Altering `somedb`.`sometable`...
Creating new table...
Created new table somedb._sometable_new OK.
Altering new table...
Altered `somedb`.`_sometable_new` OK.
Creating triggers...
Created triggers OK.
Copying approximately 302588620 rows...
Copying `somedb`.`sometable`: 0% 02:50:16 remain
Copying `somedb`.`sometable`: 0% 02:53:06 remain
...
Error copying rows from `somedb`.`sometable` to `somedb`.`_sometable_new`: DBD::mysql::st execute failed: Deadlock found when trying to get lock; try restarting transaction [for Statement "INSERT LOW_PRIORITY IGNORE INTO `somedb`.`_sometable_new` (`somedataid`, `somerowid`, `columndefid`, `datatype`, `someid`, `displayname`, `datastring`, `datanumeric`, `formuladefinition`, `someformat1`, `haslink`, `somelogid`) SELECT `somedataid`, `somerowid`, `columndefid`, `datatype`, `someid`, `displayname`, `datastring`, `datanumeric`, `formuladefinition`, `someformat1`, `haslink`, `somelogid` FROM `somedb`.`sometable` FORCE INDEX(`PRIMARY`) WHERE ((`somedataid` >= ?)) AND ((`somedataid` <= ?)) /*pt-online-schema-change 31672 copy nibble*/" with ParamValues: 1='262934336', 0='262910306'] at /usr/bin/pt-online-schema-change line 6593.

Dropping triggers...
Dropped triggers OK.
Dropping new table...
Dropped new table OK.
`somedb`.`sometable` was not altered.

mysql> SHOW ENGINE INNODB STATUS\G
*************************** 1. row ***************************
  Type: InnoDB
  Name:
Status:
=====================================
120423 15:18:31 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 4 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 3413 1_second, 2815 sleeps, 341 10_second, 19 background, 19 flush
srv_master_thread log flush and writes: 4395
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 41485, signal count 56845
Mutex spin waits 241897, rounds 1592751, OS waits 35886
RW-shared spins 17882, rounds 80843, OS waits 1301
RW-excl spins 7590, rounds 148186, OS waits 3174
Spin rounds per wait: 6.58 mutex, 4.52 RW-shared, 19.52 RW-excl
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
605243 OS file reads, 2345220 OS file writes, 156562 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 67793, seg size 67795, 15302 merges
merged operations:
 insert 32390, delete mark 28567, delete 6482
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 101998957, node heap has 736 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---

LOG
---
Log sequence number 671964880604
Log flushed up to 671964880604
Last checkpoint at 671964879630
Max checkpoint age 216721613
Checkpoint age target 209949063
Modified age 974
Checkpoint age 974
0 pending log writes, 0 pending chkp writes
79961 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 52948893696; in additional pool allocated 0
Internal hash tables (constant factor + variable factor)
    Adaptive hash index 828054496 (815991656 + 12062840)
    Page hash 51000296 (buffer pool 0 only)
    Dictionary cache 204965986 (204000176 + 965810)
    File system 145592 (82672 + 62920)
    Lock system 127499984 (127499608 + 376)
    Recovery system 0 (0 + 0)
Dictionary memory allocated 965810
Buffer pool size 3145727

Buffer pool size, bytes 51539591168
Free buffers 2534916
Database pages 610075
Old database pages 225196
Modified db pages 23
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 29429, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 605091, created 901318, written 2213475
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 610075, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
---OLDEST VIEW---
Normal read view
Read view low limit trx n:o BE930
Read view up limit trx id BE930
Read view low limit trx id BE930
Read view individually stored trx ids:
-----------------
Main thread process no. 31604, id 1198377280, state: flushing log
Number of rows inserted 76114387, updated 217064, deleted 68485, read 173066333
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
------------------------
LATEST DETECTED DEADLOCK
------------------------

120423 14:41:01
*** (1) TRANSACTION:
TRANSACTION B67FA, ACTIVE 0 sec fetching rows
mysql tables in use 2, locked 2
LOCK WAIT 10 lock struct(s), heap size 1248, 711 row lock(s), undo log entries 706
MySQL thread id 2, OS thread handle 0x4a9a4940, query id 1544298 localhost root Sending data
INSERT LOW_PRIORITY IGNORE INTO `somedb`.`_sometable_new` (`somedataid`, `somerowid`, `columndefid`, `datatype`, `someid`, `displayname`, `datastring`, `datanumeric`, `formuladefinition`, `someformat1`, `haslink`, `somelogid`) SELECT `somedataid`, `somerowid`, `columndefid`, `datatype`, `someid`, `displayname`, `datastring`, `datanumeric`, `formuladefinition`, `someformat1`, `haslink`, `somelogid` FROM `somedb`.`sometable` FORCE INDEX(`PRIMARY`) WHERE ((`somedataid` >= '262910306')) AND ((`somedataid` <= '262934336')) /*pt-online-schema-change 31672 copy nibble*/
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 213 page no 948647 n bits 232 index `PRIMARY` of table `somedb`.`sometable` trx id B67FA lock mode S waiting
*** (2) TRANSACTION:
TRANSACTION B67B6, ACTIVE 26 sec setting auto-inc lock
mysql tables in use 2, locked 2
1107 lock struct(s), heap size 145848, 2024 row lock(s), undo log entries 1486
MySQL thread id 9, OS thread handle 0x4a9e5940, query id 1544300 localhost root update
REPLACE INTO `somedb`.`_sometable_new` (`somedataid`, `somerowid`, `columndefid`, `datatype`, `someid`, `displayname`, `datastring`, `datanumeric`, `formuladefinition`, `someformat1`, `haslink`, `somelogid`) VALUES (NEW.`somedataid`, NEW.`somerowid`, NEW.`columndefid`, NEW.`datatype`, NEW.`someid`, NEW.`displayname`, NEW.`datastring`, NEW.`datanumeric`, NEW.`formuladefinition`, NEW.`someformat1`, NEW.`haslink`, NEW.`somelogid`)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 213 page no 948647 n bits 232 index `PRIMARY` of table `somedb`.`sometable` trx id B67B6 lock_mode X locks rec but not gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
TABLE LOCK table `somedb`.`_sometable_new` trx id B67B6 lock mode AUTO-INC waiting
*** WE ROLL BACK TRANSACTION (1)
------------
TRANSACTIONS
------------
Trx id counter BE930
Purge done for trx's n:o < BE927 undo n:o < 0
History list length 1796
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 37, OS thread handle 0x4a963940, query id 2134938 localhost root
SHOW ENGINE INNODB STATUS
----------------------------
END OF INNODB MONITOR OUTPUT
============================

Related branches

Revision history for this message
DJ Hanson (dj-a) wrote :

Additionally, I have just confirmed that If I add a delay (5-15ms) between each statement as it is fed into the mysql client while replaying the binary logs that the deadlock takes longer to get unlucky, but still occurs consistently during the process.

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

Deadlocks are inevitable in transactional systems, so the problem is not that the tool is causing a deadlock, but that it doesn't handle it correctly. It should retry several times. In fact we designed it so that it is most likely to be the victim of a deadlock, and hence will not disrupt other processes (which are often NOT written to handle deadlocks correctly, a good reason not to cause problems for them if avoidable).

We need to retry on deadlocks up to the configurable number of times, specified by --retries.

Revision history for this message
Julien Duponchelle (noplay) wrote :

I see the same problems. A retries options could be very usefull.

Revision history for this message
DJ Hanson (dj-a) wrote :

I just tried again with retries. I still got saw deadlocks in both pt-online-schema-change and my simulated binlog stream. Since the lock-wait-timeout was set to 5 I would expect to have all deadlocks happen in the pt-online-schema change instead of the "application" with the lock timeout set to 50.

pt-online-schema-change --alter "ADD COLUMN newby BIGINT(20) NULL DEFAULT NULL" --retries 10 --lock-wait-timeout 5 --drop-old-table --user root -A utf8 --execute D=somedb,t=someTable

mysql> show variables like 'innodb_lock%';
+--------------------------------+-------+
| Variable_name | Value |
+--------------------------------+-------+
| innodb_lock_wait_timeout | 50 |
| innodb_locks_unsafe_for_binlog | OFF |
+--------------------------------+-------+
2 rows in set (0.00 sec)

tags: added: dead-lock pt-online-schema-change
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

How is your simulated binlog stream handling transactions/locking? If it's holding locks on huge ranges of rows, then pt-osc --retries may need to be very high. Or the inverse might be happening: since pt-osc 2.1 adjusts its own chunk size, maybe it's using a very large chunk size and it can't get locks for all those rows. Along with increasing --retries, try lowering --chunk-time to keep the tool from selecting too many rows.

tags: added: percona-23453
Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

Note that lock wait timeout settings have absolutely nothing to do with deadlocks.

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

Two things were done: first, actual Deadlock errors are not caught and retried; second, there's a 0.25s delay between retries to let one or the other contentious process to move on. The customer reports that the changes have allowed the tool to run without dying on a very large, long-running table.

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

s/actual Deadlock errors are not caught/actual Deadlock errors are NOW caught/

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-308

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.