Handle full tables more gracefully

Bug #1250380 reported by Daniël van Eeden
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Confirmed
Undecided
Unassigned
5.6
Incomplete
Critical
Unassigned

Bug Description

Setup 3 node PXC cluster

Config node 1 and node 2:
[mysqld]
datadir=/var/lib/mysql
innodb_data_file_path = "ibdata1:10M:autoextend"
wsrep_provider = /usr/lib/libgalera_smm.so
wsrep_sst_method="xtrabackup"
wsrep_sst_auth="root:"
wsrep_cluster_address="gcomm://galera1,galera2,galera3"
wsrep_cluster_name=mycluster1

The config on node 3 has one setting changed:
innodb_data_file_path = "ibdata1:10M:autoextend:max:30M"

Now start to insert data on node 1 of the cluster.

When the ibdata1 file on node 3 gets full this will be printed in the error log:
131112 10:54:28 [ERROR] /usr/sbin/mysqld: The table 'test1' is full
131112 10:54:29 [ERROR] Slave SQL: Error 'The table 'test1' is full' on query. Default database: 'test1'. Query: 'insert into test1(name) select name from test1', Error_code: 1114
131112 10:54:29 [Warning] WSREP: RBR event 2 Query apply warning: 1, 200
131112 10:54:29 [ERROR] WSREP: Failed to apply trx: source: 68577052-4b75-11e3-9ab5-a3d2291aca1d version: 2 local: 0 state: APPLYING flags: 129 conn_id: 57 trx_id: 1329 seqnos (l: 27, g: 200, s: 199, d: 199, ts: 1384250065993742598)
131112 10:54:29 [ERROR] WSREP: Failed to apply app buffer: seqno: 200, status: WSREP_FATAL
         at galera/src/replicator_smm.cpp:apply_wscoll():52
         at galera/src/replicator_smm.cpp:apply_trx_ws():118
131112 10:54:29 [ERROR] WSREP: Node consistency compromized, aborting...

PXC Version: 5.5.34-23.7.6-565.precise

One possible solution: If a tables is (almost) full then stop certification which will prevent the client from inserting more data. The cluster will keep to function as readonly setup.

The second option: Just kick the node out of the cluster (maybe just put it in desync). To make it easier for the admin to connect to the server and fix the issue.

When fixing this issue Galera used SST on this node instead of IST, which should have been possible.

Please also note that XtraBackup for SST will also check the actual file size of ibdata1 and the maximum in the config file, which may not match and will cause SST to fail.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Daniel,

Regarding your first solution, yes, we are looking at handling ENOSPC more gracefully on Percona Server side (which should apply to PXC as well). I will link to the appropriate blueprint/bug later on.

Revision history for this message
Daniël van Eeden (dveeden) wrote :

@Raghavendra,

Thats great. But keep in mind that there will not be a ENOSPC for a datafile with a max setting as there will be space on the filesystem, but not in the datafile.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Daniel,

Yes, noticed that, but DB_OUT_OF_FILE_SPACE is used for both in InnoDB (but needs to be checked deeper again). There is also DB_MUST_GET_MORE_FILE_SPACE which is used alongside it to indicate another condition.

Revision history for this message
Nilnandan Joshi (nilnandan-joshi) wrote :

Verified with PXC 5.5.37 by using above test case.

140813 12:01:16 [ERROR] /usr/sbin/mysqld: The table 'nil' is full
140813 12:01:16 [ERROR] Slave SQL: Could not execute Write_rows event on table test.nil; The table 'nil' is full, Error_code: 1114; handler error HA_ERR_RECORD_FILE_FULL; the event's master log FIRST, end_log_pos 1079, Error_code: 1114
140813 12:01:16 [Warning] WSREP: RBR event 2 Write_rows apply warning: 135, 23
140813 12:01:16 [ERROR] WSREP: Failed to apply trx: source: a184f43e-22b2-11e4-b9dc-43c26ee89db5 version: 2 local: 0 state: APPLYING flags: 1 conn_id: 6 trx_id: 1300 seqnos (l: 25, g: 23, s: 22, d: 22, ts: 1407911473082143065)
140813 12:01:16 [ERROR] WSREP: Failed to apply trx 23 10 times
140813 12:01:16 [ERROR] WSREP: Node consistency compromized, aborting...
140813 12:01:16 [Note] WSREP: Closing send monitor...
140813 12:01:16 [Note] WSREP: Closed send monitor.
140813 12:01:16 [Note] WSREP: gcomm: terminating thread

Revision history for this message
Nilnandan Joshi (nilnandan-joshi) wrote :

Tried to verify with PXC 5.6.19. I didn't get any error on any of the server but node silently stopped writing when it reaches to the threshold value of ibdata1. Also found data inconsistency.

On Master Node1/Node2:

mysql> select count(*) from nil;
+----------+
| count(*) |
+----------+
| 16777216 |
+----------+
1 row in set (16.35 sec)

On Node3:

mysql> select count(*) from nil;
+----------+
| count(*) |
+----------+
| 8388608 |
+----------+
1 row in set (18.83 sec)

mysql>

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-924

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.