inter-node deadlocking of DDL and write statement

Reported by michael morse on 2012-10-23
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Galera
Undecided
Unassigned
MySQL patches by Codership
Undecided
Seppo Jaakola

Bug Description

We are running Galera in a heavy production environment the specifics are 5.5.24-log Percona XtraDB Cluster with two nodes and the Galera arbitrator, only one node is being queried, the other node is simply on standby for now. We've had several occurrences where a DDL statement simultaneous with a write statement to the same table on the same node will create (we are guessing) some sort of inter-node deadlocking and lock the entire cluster indefinitely. The fact that it locks the entire cluster is probably due to Bug #928919, however, the fact a deadlock occurs when a DDL and write happen simultaneously on the same node is the issue for this bug.

We seen this manifest in two specific examples,

1) a truncate in the application deadlocked with an insert, both called by application code.
2) using pt-online-schema-change, the triggers that are created as a first step, deadlocked with several updates

The reason we believe it has something to do with inter-node deadlocking is we also ran a single instance of Percona cluster (no other nodes) for a number of weeks, and used pt-online-schema-change on a number of busy tables and had no issues with either case. Once we saw #1, We also took out the truncate and replaced with a delete * (it's a small table), and have had no further issues.

I'll post the processlist of the latter below which I'll modify/truncate as there are about 120 other read/write statements either metadata locks on table1, row locks on table2, or other various states.

mysql> show processlist;
+-----------+----------------------+-----------------------+------------+-------------+--------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+-----------+---------------+-----------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read |
+-----------+----------------------+-----------------------+------------+-------------+--------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+-----------+---------------+-----------+
| 943723 | unauthenticated user | connecting host | NULL | Connect | NULL | login | NULL | 0 | 0 | 1 |
| 943731 | replicant | xxxxxxxxx:45853 | NULL | Binlog Dump | 763330 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL | 0 | 0 | 2 |
| 174268601 | root | localhost | xxxxxxxx | Sleep | 720 | | NULL | 14 | 14 | 15 |
| 174753898 | root | localhost | xxxxxxxx | Query | 0 | sleeping | show processlist | 0 | 0 | 1 |
| 174779808 | root | localhost | xxxxxxxx | Query | 11 | Waiting for table metadata lock | CREATE TRIGGER `pt_osc_xxxxxxxxxx_table1_del` AFTER DELETE ON `xxxxxxxx`.`xxxxx | 0 | 0 | 1 |
| 174780182 | root | localhost:60992 | xxxxxxxxx | Query | 11 | query end | INSERT INTO xxxxxxxxx.table2 (udid, mac_address, ifa, open_id, open_id_slot, android_device_id, andr | 0 | 2 | 2 |
| 174780183 | root | localhost:22389 | xxxxxxxxx | Query | 11 | query end | UPDATE xxxxxxxx.table2
        SET currently_playing=0,
            currently_playing_game=NULL
  | 0 | 4 | 2 |
| 174780184 | root | localhost:50262 | xxxxxxxx | Query | 11 | query end | UPDATE xxxxxxxxx.table2
        SET currently_playing=0,
            currently_playing_game=NULL
  | 0 | 4 | 2 |
| 174780187 | root | localhost:42015 | xxxxxxxxx | Query | 11 | Updating | UPDATE xxxxxxxxx.table2 SET
            last_play= NAME_CONST('i_timestamp',_latin1'2012-10-23 09:2 | 0 | 0 | 1 |
| 174780188 | root | localhost:22392 | xxxxxxxx | Query | 11 | wsrep waiting on replaying | UPDATE xxxxxxx.table2 SET
            last_play= NAME_CONST('i_timestamp',_latin1'2012-10-23 09:2 | 0 | 1 | 2 |
| 174780192 | root | localhost:60102 | xxxxxxxxx | Query | 11 | Waiting for table metadata lock | UPDATE xxxxxxx.table1
          SET last_session_timestamp = NAME_CONST('i_current_time | 0 | 2 | 1 |
| 174780194 | root | localhost:46779 | xxxxxxxxx | Query | 11 | Sending data | SELECT id, first_IAP_ID, suspected_user FROM xxxxxxxx.table1
 WHERE user_id = NAME_CONS | 0 | 0 | 1 |
| 174780196 | root | localhost:32770 | xxxxxxxxx | Query | 11 | Updating | UPDATE xxxxxxx.table2 SET
            last_play= NAME_CONST('i_timestamp',_latin1'2012-10-23 09:2 | 0 | 0 | 1 |
| 174780197 | root | localhost:22395 | xxxxxxxx | Query | 11 | Sending data | SELECT id, installs, updates, purchases, sessions, current_players
        FROM xxxxxxx.table3 | 0 | 4 | 1 |

--------------------

Alex Yurchenko (ayurchen) wrote :

Seppo, could you please comment on this. Is it MDL-related? Could it have been fixed?

Changed in codership-mysql:
assignee: nobody → Seppo Jaakola (seppo-jaakola)
michael morse (esrom-m) wrote :

just to revisit, we've subsequently seen this behavior even on one independent node, and it's likely this was not a Galera issue and more likely a classic metadata locking issue where an open transaction is holding an open connection, in this case ID #174268601 in 'sleep' state above is the oldest, causing the lock.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers