Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Handle full tables more gracefully

Bug #1250380 reported by Daniël van Eeden on 2013-11-12

This bug affects 1 person

	Status	Importance	Assigned to
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Status tracked in 5.6
5.5	Confirmed	Undecided	Unassigned
5.6	Incomplete	Critical	Unassigned

Bug Description

Setup 3 node PXC cluster

Config node 1 and node 2:
[mysqld]
datadir=/var/lib/mysql
innodb_data_file_path = "ibdata1:10M:autoextend"
wsrep_provider = /usr/lib/libgalera_smm.so
wsrep_sst_method="xtrabackup"
wsrep_sst_auth="root:"
wsrep_cluster_address="gcomm://galera1,galera2,galera3"
wsrep_cluster_name=mycluster1

The config on node 3 has one setting changed:
innodb_data_file_path = "ibdata1:10M:autoextend:max:30M"

Now start to insert data on node 1 of the cluster.

When the ibdata1 file on node 3 gets full this will be printed in the error log:
131112 10:54:28 [ERROR] /usr/sbin/mysqld: The table 'test1' is full
131112 10:54:29 [ERROR] Slave SQL: Error 'The table 'test1' is full' on query. Default database: 'test1'. Query: 'insert into test1(name) select name from test1', Error_code: 1114
131112 10:54:29 [Warning] WSREP: RBR event 2 Query apply warning: 1, 200
131112 10:54:29 [ERROR] WSREP: Failed to apply trx: source: 68577052-4b75-11e3-9ab5-a3d2291aca1d version: 2 local: 0 state: APPLYING flags: 129 conn_id: 57 trx_id: 1329 seqnos (l: 27, g: 200, s: 199, d: 199, ts: 1384250065993742598)
131112 10:54:29 [ERROR] WSREP: Failed to apply app buffer: seqno: 200, status: WSREP_FATAL
at galera/src/replicator_smm.cpp:apply_wscoll():52
at galera/src/replicator_smm.cpp:apply_trx_ws():118
131112 10:54:29 [ERROR] WSREP: Node consistency compromized, aborting...

PXC Version: 5.5.34-23.7.6-565.precise

One possible solution: If a tables is (almost) full then stop certification which will prevent the client from inserting more data. The cluster will keep to function as readonly setup.

The second option: Just kick the node out of the cluster (maybe just put it in desync). To make it easier for the admin to connect to the server and fix the issue.

When fixing this issue Galera used SST on this node instead of IST, which should have been possible.

Please also note that XtraBackup for SST will also check the actual file size of ibdata1 and the maximum in the config file, which may not match and will cause SST to fail.

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-11-12:

@Daniel,

Regarding your first solution, yes, we are looking at handling ENOSPC more gracefully on Percona Server side (which should apply to PXC as well). I will link to the appropriate blueprint/bug later on.

Revision history for this message

Daniël van Eeden (dveeden) wrote on 2013-11-12:

@Raghavendra,

Thats great. But keep in mind that there will not be a ENOSPC for a datafile with a max setting as there will be space on the filesystem, but not in the datafile.

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-11-13:

@Daniel,

Yes, noticed that, but DB_OUT_OF_FILE_SPACE is used for both in InnoDB (but needs to be checked deeper again). There is also DB_MUST_GET_MORE_FILE_SPACE which is used alongside it to indicate another condition.

Revision history for this message

Nilnandan Joshi (nilnandan-joshi) wrote on 2014-08-13:

Verified with PXC 5.5.37 by using above test case.

140813 12:01:16 [ERROR] /usr/sbin/mysqld: The table 'nil' is full
140813 12:01:16 [ERROR] Slave SQL: Could not execute Write_rows event on table test.nil; The table 'nil' is full, Error_code: 1114; handler error HA_ERR_RECORD_FILE_FULL; the event's master log FIRST, end_log_pos 1079, Error_code: 1114
140813 12:01:16 [Warning] WSREP: RBR event 2 Write_rows apply warning: 135, 23
140813 12:01:16 [ERROR] WSREP: Failed to apply trx: source: a184f43e-22b2-11e4-b9dc-43c26ee89db5 version: 2 local: 0 state: APPLYING flags: 1 conn_id: 6 trx_id: 1300 seqnos (l: 25, g: 23, s: 22, d: 22, ts: 1407911473082143065)
140813 12:01:16 [ERROR] WSREP: Failed to apply trx 23 10 times
140813 12:01:16 [ERROR] WSREP: Node consistency compromized, aborting...
140813 12:01:16 [Note] WSREP: Closing send monitor...
140813 12:01:16 [Note] WSREP: Closed send monitor.
140813 12:01:16 [Note] WSREP: gcomm: terminating thread

Revision history for this message

Nilnandan Joshi (nilnandan-joshi) wrote on 2014-08-13:

Tried to verify with PXC 5.6.19. I didn't get any error on any of the server but node silently stopped writing when it reaches to the threshold value of ibdata1. Also found data inconsistency.

On Master Node1/Node2:

mysql> select count(*) from nil;
+----------+
| count(*) |
+----------+
| 16777216 |
+----------+
1 row in set (16.35 sec)

On Node3:

mysql> select count(*) from nil;
+----------+
| count(*) |
+----------+
| 8388608 |
+----------+
1 row in set (18.83 sec)

mysql>

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-924

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.