Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

"alter table" + full disk => inconsistent cluster state

Bug #1515293 reported by tobixen on 2015-11-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Expired	Undecided	Unassigned

Bug Description

Had an incident in production (RHEL6) and have confirmed this in a lab environment (Ubuntu14.04) as well:

* Created a table with a text field
* Inserted a significant amount of rows in the table
* Filled up the disk on one node in the cluster
* Done an "alter table foo CHANGE COLUMN TEXT TEXT varchar(1024) CHARACTER SET 'utf8' COLLATE 'utf8_unicode_ci' NULL DEFAULT NULL;" on the node with almost full disk
* It fails, disk goes full while it's writing to a temp table: "ERROR 1114 (HY000): The table '#sql-217e_8d6bb' is full"
* As expected, the schema change did not go through at the node with almost full disk
* BUT! The change has gone through at the other nodes!
* Next: "insert into foo (text) values ('this is funny');"
* the other cluster nodes will go down in flames.

Error messages:

151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 2th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 3th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 4th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 5th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 6th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 7th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 8th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 9th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 10th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [ERROR] WSREP: Failed to apply trx: source: ae217f1d-66cc-11e5-af6c-b6baac5aee45 version: 2 local: 0 state: APPLYING flags: 1 conn_id: 579259 trx_id: 3353415 seqnos (l: 95073, g: 213836, s: 213835, d: 213835, ts: 1447250823359790453)
151111 15:07:03 [ERROR] WSREP: Failed to apply trx 213836 10 times
151111 15:07:03 [ERROR] WSREP: Node consistency compromized, aborting...
151111 15:07:03 [Note] WSREP: Closing send monitor...
151111 15:07:03 [Note] WSREP: Closed send monitor.
151111 15:07:03 [Note] WSREP: gcomm: terminating thread
151111 15:07:03 [Note] WSREP: gcomm: joining thread
151111 15:07:03 [Note] WSREP: gcomm: closing backend
151111 15:07:03 [Note] WSREP: view(view_id(NON_PRIM,38d360bb-84c4-11e5-ba3a-1e63560571d9,147) memb {
        bf167418-669d-11e5-8566-cbb5402aedbc,
} joined {
} left {
} partitioned {
        38d360bb-84c4-11e5-ba3a-1e63560571d9,
        ae217f1d-66cc-11e5-af6c-b6baac5aee45,
})
151111 15:07:03 [Note] WSREP: view((empty))
151111 15:07:03 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
151111 15:07:03 [Note] WSREP: gcomm: closed
151111 15:07:03 [Note] WSREP: Flow-control interval: [16, 16]
151111 15:07:03 [Note] WSREP: Received NON-PRIMARY.
151111 15:07:03 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 213836)
151111 15:07:03 [Note] WSREP: Received self-leave message.
151111 15:07:03 [Note] WSREP: Flow-control interval: [0, 0]
151111 15:07:03 [Note] WSREP: Received SELF-LEAVE. Closing connection.
151111 15:07:03 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 213836)
151111 15:07:03 [Note] WSREP: RECV thread exiting 0: Success
151111 15:07:03 [Note] WSREP: recv_thread() joined.
151111 15:07:03 [Note] WSREP: Closing replication queue.
151111 15:07:03 [Note] WSREP: Closing slave action queue.
151111 15:07:03 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Aborted

Package versions:

Trusty:

ii percona-xtrabackup 2.1.8-1 amd64 Open source backup tool for InnoDB and XtraDB
ii percona-xtradb-cluster-client-5.5 5.5.37-25.10+dfsg-0ubuntu0.14.04.1 amd64 Percona Server database client binaries
ii percona-xtradb-cluster-common-5.5 5.5.37-25.10+dfsg-0ubuntu0.14.04.1 all Percona Server database common files
ii percona-xtradb-cluster-galera-2.x 165-0ubuntu1 amd64 Synchronous multi-master replication plugin for transactional applications
ii percona-xtradb-cluster-server 5.5.37-25.10+dfsg-0ubuntu0.14.04.1 all Percona XtraDB Cluster database server
ii percona-xtradb-cluster-server-5.5 5.5.37-25.10+dfsg-0ubuntu0.14.04.1 amd64 Percona Server database server binaries

RHEL:

Percona-Server-shared-51.x86_64 5.1.73-rel14.12.625.rhel6 @percona-updated
Percona-XtraDB-Cluster-client-55.x86_64
                                   1:5.5.41-25.11.853.el6 @percona-updated
Percona-XtraDB-Cluster-galera-3.x86_64
                                   3.12.2-1.rhel6 @percona-updated
Percona-XtraDB-Cluster-server-55.x86_64
                                   1:5.5.41-25.11.853.el6 @percona-updated
percona-xtrabackup.x86_64 2.3.2-1.el6 @percona-updated