"alter table" + full disk => inconsistent cluster state
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC |
Expired
|
Undecided
|
Unassigned |
Bug Description
Had an incident in production (RHEL6) and have confirmed this in a lab environment (Ubuntu14.04) as well:
* Created a table with a text field
* Inserted a significant amount of rows in the table
* Filled up the disk on one node in the cluster
* Done an "alter table foo CHANGE COLUMN TEXT TEXT varchar(1024) CHARACTER SET 'utf8' COLLATE 'utf8_unicode_ci' NULL DEFAULT NULL;" on the node with almost full disk
* It fails, disk goes full while it's writing to a temp table: "ERROR 1114 (HY000): The table '#sql-217e_8d6bb' is full"
* As expected, the schema change did not go through at the node with almost full disk
* BUT! The change has gone through at the other nodes!
* Next: "insert into foo (text) values ('this is funny');"
* the other cluster nodes will go down in flames.
Error messages:
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
at galera/
Retrying 2th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
at galera/
Retrying 3th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
at galera/
Retrying 4th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
at galera/
Retrying 5th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
at galera/
Retrying 6th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
at galera/
Retrying 7th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
at galera/
Retrying 8th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
at galera/
Retrying 9th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
at galera/
Retrying 10th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [ERROR] WSREP: Failed to apply trx: source: ae217f1d-
151111 15:07:03 [ERROR] WSREP: Failed to apply trx 213836 10 times
151111 15:07:03 [ERROR] WSREP: Node consistency compromized, aborting...
151111 15:07:03 [Note] WSREP: Closing send monitor...
151111 15:07:03 [Note] WSREP: Closed send monitor.
151111 15:07:03 [Note] WSREP: gcomm: terminating thread
151111 15:07:03 [Note] WSREP: gcomm: joining thread
151111 15:07:03 [Note] WSREP: gcomm: closing backend
151111 15:07:03 [Note] WSREP: view(view_
} joined {
} left {
} partitioned {
})
151111 15:07:03 [Note] WSREP: view((empty))
151111 15:07:03 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
151111 15:07:03 [Note] WSREP: gcomm: closed
151111 15:07:03 [Note] WSREP: Flow-control interval: [16, 16]
151111 15:07:03 [Note] WSREP: Received NON-PRIMARY.
151111 15:07:03 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 213836)
151111 15:07:03 [Note] WSREP: Received self-leave message.
151111 15:07:03 [Note] WSREP: Flow-control interval: [0, 0]
151111 15:07:03 [Note] WSREP: Received SELF-LEAVE. Closing connection.
151111 15:07:03 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 213836)
151111 15:07:03 [Note] WSREP: RECV thread exiting 0: Success
151111 15:07:03 [Note] WSREP: recv_thread() joined.
151111 15:07:03 [Note] WSREP: Closing replication queue.
151111 15:07:03 [Note] WSREP: Closing slave action queue.
151111 15:07:03 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Aborted
Package versions:
Trusty:
ii percona-xtrabackup 2.1.8-1 amd64 Open source backup tool for InnoDB and XtraDB
ii percona-
ii percona-
ii percona-
ii percona-
ii percona-
RHEL:
Percona-
Percona-
Percona-
Percona-
percona-
There were some issues in 5.5 in handling disk full.
Seems like you could reproduce the issue in lab environment can you check if goes off with 5.6.26-25.12.1.
If yes, then I would strongly recommend to schedule a full upgrade.