"alter table" + full disk => inconsistent cluster state

Bug #1515293 reported by tobixen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Expired
Undecided
Unassigned

Bug Description

Had an incident in production (RHEL6) and have confirmed this in a lab environment (Ubuntu14.04) as well:

* Created a table with a text field
* Inserted a significant amount of rows in the table
* Filled up the disk on one node in the cluster
* Done an "alter table foo CHANGE COLUMN TEXT TEXT varchar(1024) CHARACTER SET 'utf8' COLLATE 'utf8_unicode_ci' NULL DEFAULT NULL;" on the node with almost full disk
* It fails, disk goes full while it's writing to a temp table: "ERROR 1114 (HY000): The table '#sql-217e_8d6bb' is full"
* As expected, the schema change did not go through at the node with almost full disk
* BUT! The change has gone through at the other nodes!
* Next: "insert into foo (text) values ('this is funny');"
* the other cluster nodes will go down in flames.

Error messages:

151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 2th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 3th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 4th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 5th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 6th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 7th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 8th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 9th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [Warning] WSREP: Failed to apply app buffer: seqno: 213836, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 10th time
151111 15:07:03 [ERROR] Slave SQL: Column 1 of table 'tobiastest.foo' cannot be converted from type 'varchar(3072)' to type 'varchar(1013)', Error_code: 1677
151111 15:07:03 [Warning] WSREP: RBR event 2 Write_rows apply warning: 3, 213836
151111 15:07:03 [ERROR] WSREP: Failed to apply trx: source: ae217f1d-66cc-11e5-af6c-b6baac5aee45 version: 2 local: 0 state: APPLYING flags: 1 conn_id: 579259 trx_id: 3353415 seqnos (l: 95073, g: 213836, s: 213835, d: 213835, ts: 1447250823359790453)
151111 15:07:03 [ERROR] WSREP: Failed to apply trx 213836 10 times
151111 15:07:03 [ERROR] WSREP: Node consistency compromized, aborting...
151111 15:07:03 [Note] WSREP: Closing send monitor...
151111 15:07:03 [Note] WSREP: Closed send monitor.
151111 15:07:03 [Note] WSREP: gcomm: terminating thread
151111 15:07:03 [Note] WSREP: gcomm: joining thread
151111 15:07:03 [Note] WSREP: gcomm: closing backend
151111 15:07:03 [Note] WSREP: view(view_id(NON_PRIM,38d360bb-84c4-11e5-ba3a-1e63560571d9,147) memb {
        bf167418-669d-11e5-8566-cbb5402aedbc,
} joined {
} left {
} partitioned {
        38d360bb-84c4-11e5-ba3a-1e63560571d9,
        ae217f1d-66cc-11e5-af6c-b6baac5aee45,
})
151111 15:07:03 [Note] WSREP: view((empty))
151111 15:07:03 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
151111 15:07:03 [Note] WSREP: gcomm: closed
151111 15:07:03 [Note] WSREP: Flow-control interval: [16, 16]
151111 15:07:03 [Note] WSREP: Received NON-PRIMARY.
151111 15:07:03 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 213836)
151111 15:07:03 [Note] WSREP: Received self-leave message.
151111 15:07:03 [Note] WSREP: Flow-control interval: [0, 0]
151111 15:07:03 [Note] WSREP: Received SELF-LEAVE. Closing connection.
151111 15:07:03 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 213836)
151111 15:07:03 [Note] WSREP: RECV thread exiting 0: Success
151111 15:07:03 [Note] WSREP: recv_thread() joined.
151111 15:07:03 [Note] WSREP: Closing replication queue.
151111 15:07:03 [Note] WSREP: Closing slave action queue.
151111 15:07:03 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Aborted

Package versions:

Trusty:

ii percona-xtrabackup 2.1.8-1 amd64 Open source backup tool for InnoDB and XtraDB
ii percona-xtradb-cluster-client-5.5 5.5.37-25.10+dfsg-0ubuntu0.14.04.1 amd64 Percona Server database client binaries
ii percona-xtradb-cluster-common-5.5 5.5.37-25.10+dfsg-0ubuntu0.14.04.1 all Percona Server database common files
ii percona-xtradb-cluster-galera-2.x 165-0ubuntu1 amd64 Synchronous multi-master replication plugin for transactional applications
ii percona-xtradb-cluster-server 5.5.37-25.10+dfsg-0ubuntu0.14.04.1 all Percona XtraDB Cluster database server
ii percona-xtradb-cluster-server-5.5 5.5.37-25.10+dfsg-0ubuntu0.14.04.1 amd64 Percona Server database server binaries

RHEL:

Percona-Server-shared-51.x86_64 5.1.73-rel14.12.625.rhel6 @percona-updated
Percona-XtraDB-Cluster-client-55.x86_64
                                   1:5.5.41-25.11.853.el6 @percona-updated
Percona-XtraDB-Cluster-galera-3.x86_64
                                   3.12.2-1.rhel6 @percona-updated
Percona-XtraDB-Cluster-server-55.x86_64
                                   1:5.5.41-25.11.853.el6 @percona-updated
percona-xtrabackup.x86_64 2.3.2-1.el6 @percona-updated

Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

There were some issues in 5.5 in handling disk full.

Seems like you could reproduce the issue in lab environment can you check if goes off with 5.6.26-25.12.1.
If yes, then I would strongly recommend to schedule a full upgrade.

Changed in percona-xtradb-cluster:
status: New → Incomplete
Revision history for this message
tobixen (tobixen) wrote :

Reproduced on 5.6, trusty

$ mysqld --version
mysqld Ver 5.6.26-74.0-56 for debian-linux-gnu on x86_64 (Percona XtraDB Cluster (GPL), Release rel74.0, Revision 624ef81, WSREP version 25.12, wsrep_25.12)

$ sudo aptitude search ~i,?perc
i A percona-xtrabackup - Open source backup tool for InnoDB and XtraDB
iB percona-xtradb-cluster-client-5.6 - Percona XtraDB Cluster database client binaries
i A percona-xtradb-cluster-common-5.6 - Percona XtraDB Cluster database common files (e.g. /etc/mysql/my.cnf)
iBA percona-xtradb-cluster-galera-3.x - Galera components of Percona XtraDB Cluster
iB percona-xtradb-cluster-server-5.6 - Percona XtraDB Cluster database server binaries

Revision history for this message
tobixen (tobixen) wrote :

Is there anything more I can do?

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Percona XtraDB Cluster because there has been no activity for 60 days.]

Changed in percona-xtradb-cluster:
status: Incomplete → Expired
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1865

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.