Deadlock on COMMIT after ALTER TABLE on separate connection
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Server version: 5.6.20-68.0-56-log Percona XtraDB Cluster (GPL), Release 25.7, wsrep_25.7.r4126
Using ALTER TABLE to update the default of a column will provoke a deadlock on other connections that did read that table.
Example:
CREATE TABLE foobar (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
foo_id INT NOT NULL,
bar_id INT DEFAULT 1);
INSERT INTO foobar(foo_id) VALUES (1), (1), (1), (2), (2);
/* On connection 1 */
BEGIN;
SELECT * FROM foobar WHERE foo_id = 1;
/* On connection 2 */
ALTER TABLE foobar ALTER COLUMN bar_id SET DEFAULT 2; -- instantaneous
/* On connection 1 */
COMMIT;-- -> ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
Tested with MySQL 5.5.54 and MariaDB 10.1.23, their behavior is that the ALTER TABLE locks until the first transaction is commited.
Our current workaround is to use LOCK TABLE foobar WRITE in the ALTER TABLE so it will wait until the first transaction is commited
PXC is multi-master solution and so some of these semantics that apply to local server are not directly applicable to multi-master solution.
For example: In case above, ALTER TABLE is being invoked on node-2 which is then replicated instantaneously on node-1. Now PXC semantics says replicated transaction takes priority over local transaction. Also, local transaction is still in progress and not yet committed.
There is conflict of read lock and so local transaction is forced aborted and rolled back causing the DEADLOCK error message.
You can get more information in log file by enabling wsrep_debug=1 (don't keep it ON on production server).
2017-06-19 21:01:45 5160 [Note] WSREP: MDL conflict
schema: test
request: (2 seqno 6 wsrep (1, 1, 0) cmd 0 3 ALTER TABLE foobar ALTER COLUMN bar_id SET DEFAULT 2)
granted: (3 seqno -1 wsrep (0, 0, 2) cmd 0 0 (null))
2017-06-19 21:01:45 5160 [Note] WSREP: MDL ticket: type: shared read space: TABLE db: test name: foobar
2017-06-19 21:01:45 5160 [Note] WSREP: MDL conflict-> BF abort
schema: test
request: (2 seqno 6 wsrep (1, 1, 0) cmd 0 3 ALTER TABLE foobar ALTER COLUMN bar_id SET DEFAULT 2)
granted: (3 seqno -1 wsrep (0, 0, 2) cmd 0 55 (null))
2017-06-19 21:01:45 5160 [Note] WSREP: MDL ticket: type: shared read space: TABLE db: test name: foobar
2017-06-19 21:01:45 5160 [Note] WSREP: wsrep_abort_thd, by: 140074722432768, victim: 140074722699008
2017-06-19 21:01:45 5160 [Note] WSREP: abort transaction: BF: ALTER TABLE foobar ALTER COLUMN bar_id SET DEFAULT 2 victim: (null)
2017-06-19 21:01:45 5160 [Note] WSREP: cluster conflict due to high priority abort for threads:
2017-06-19 21:01:45 5160 [Note] WSREP: Winning thread:
THD: 2, mode: applier, state: executing, conflict: no conflict, seqno: 6
SQL: ALTER TABLE foobar ALTER COLUMN bar_id SET DEFAULT 2
2017-06-19 21:01:45 5160 [Note] WSREP: Victim thread:
THD: 3, mode: local, state: idle, conflict: aborting, seqno: -1
SQL: (null)
2017-06-19 21:01:45 5160 [Note] WSREP: BF kill (1, seqno: 6), victim: (3) trx: 1817