inter-node deadlocking of DDL and write statement
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Galera |
New
|
Undecided
|
Unassigned | ||
MySQL patches by Codership |
New
|
Undecided
|
Seppo Jaakola |
Bug Description
We are running Galera in a heavy production environment the specifics are 5.5.24-log Percona XtraDB Cluster with two nodes and the Galera arbitrator, only one node is being queried, the other node is simply on standby for now. We've had several occurrences where a DDL statement simultaneous with a write statement to the same table on the same node will create (we are guessing) some sort of inter-node deadlocking and lock the entire cluster indefinitely. The fact that it locks the entire cluster is probably due to Bug #928919, however, the fact a deadlock occurs when a DDL and write happen simultaneously on the same node is the issue for this bug.
We seen this manifest in two specific examples,
1) a truncate in the application deadlocked with an insert, both called by application code.
2) using pt-online-
The reason we believe it has something to do with inter-node deadlocking is we also ran a single instance of Percona cluster (no other nodes) for a number of weeks, and used pt-online-
I'll post the processlist of the latter below which I'll modify/truncate as there are about 120 other read/write statements either metadata locks on table1, row locks on table2, or other various states.
mysql> show processlist;
+------
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read |
+------
| 943723 | unauthenticated user | connecting host | NULL | Connect | NULL | login | NULL | 0 | 0 | 1 |
| 943731 | replicant | xxxxxxxxx:45853 | NULL | Binlog Dump | 763330 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL | 0 | 0 | 2 |
| 174268601 | root | localhost | xxxxxxxx | Sleep | 720 | | NULL | 14 | 14 | 15 |
| 174753898 | root | localhost | xxxxxxxx | Query | 0 | sleeping | show processlist | 0 | 0 | 1 |
| 174779808 | root | localhost | xxxxxxxx | Query | 11 | Waiting for table metadata lock | CREATE TRIGGER `pt_osc_
| 174780182 | root | localhost:60992 | xxxxxxxxx | Query | 11 | query end | INSERT INTO xxxxxxxxx.table2 (udid, mac_address, ifa, open_id, open_id_slot, android_device_id, andr | 0 | 2 | 2 |
| 174780183 | root | localhost:22389 | xxxxxxxxx | Query | 11 | query end | UPDATE xxxxxxxx.table2
SET currently_
| 0 | 4 | 2 |
| 174780184 | root | localhost:50262 | xxxxxxxx | Query | 11 | query end | UPDATE xxxxxxxxx.table2
SET currently_
| 0 | 4 | 2 |
| 174780187 | root | localhost:42015 | xxxxxxxxx | Query | 11 | Updating | UPDATE xxxxxxxxx.table2 SET
| 174780188 | root | localhost:22392 | xxxxxxxx | Query | 11 | wsrep waiting on replaying | UPDATE xxxxxxx.table2 SET
| 174780192 | root | localhost:60102 | xxxxxxxxx | Query | 11 | Waiting for table metadata lock | UPDATE xxxxxxx.table1
SET last_session_
| 174780194 | root | localhost:46779 | xxxxxxxxx | Query | 11 | Sending data | SELECT id, first_IAP_ID, suspected_user FROM xxxxxxxx.table1
WHERE user_id = NAME_CONS | 0 | 0 | 1 |
| 174780196 | root | localhost:32770 | xxxxxxxxx | Query | 11 | Updating | UPDATE xxxxxxx.table2 SET
| 174780197 | root | localhost:22395 | xxxxxxxx | Query | 11 | Sending data | SELECT id, installs, updates, purchases, sessions, current_players
FROM xxxxxxx.table3 | 0 | 4 | 1 |
-------
Seppo, could you please comment on this. Is it MDL-related? Could it have been fixed?