Whole cluster freezes if one node goes full

Bug #1525300 reported by tobixen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Won't Fix
Undecided
Unassigned

Bug Description

$ mysqld --version
mysqld Ver 5.6.26-74.0-56 for debian-linux-gnu on x86_64 (Percona XtraDB Cluster (GPL), Release rel74.0, Revision 624ef81, WSREP version 25.12, wsrep_25.12)

Three-node setup.

I trimmed down the disk space so the disk was almost full at one node, ran some big insert jobs at another node (one query insert into foo select * from bar ...). The node with the full disk did eventually die, and the cluster could live on with only two nodes active. This is good.

Then I turned on log_slave_updates on the node with full disk, restarted percona and repeated the experiment, but now with many small inserts to make sure the binlog would grow. This caused the whole cluster to freeze ...

(error log on node with full disk)
2015-12-11 15:32:05 23823 [Warning] Disk is full writing '/var/lib/mysql/bin.000003' (Errcode: 28 - No space left on device). Waiting fo
r someone to free space...
2015-12-11 15:32:05 23823 [Warning] Retry in 60 secs. Message reprinted in 600 secs

On the other nodes, all write queries got stuck in state "wsrep in pre-commit stage". (I didn't check read queries though).

Please advice if you want more information.

Revision history for this message
Marc Castrovinci (marc-castrovinci) wrote :

This seems like an expected result. If certification doesn't pass then why would the cluster apply the change?

Revision history for this message
Jericho Rivera (jericho-rivera) wrote :
Download full text (3.5 KiB)

Can confirm on latest PXC 5.6
[root@pxc4 ~]# rpm -qa|grep Percona-XtraDB
Percona-XtraDB-Cluster-client-56-5.6.32-25.17.1.el6.x86_64
Percona-XtraDB-Cluster-shared-56-5.6.32-25.17.1.el6.x86_64
Percona-XtraDB-Cluster-56-5.6.32-25.17.1.el6.x86_64
Percona-XtraDB-Cluster-galera-3-3.17-1.rhel6.x86_64
Percona-XtraDB-Cluster-server-56-5.6.32-25.17.1.el6.x86_64

-- Fri Oct 21 01:19:35 EDT 2016
Filesystem 1K-blocks Used Available Use% Mounted on
/data/perc 370416932 114732276 255684656 31% /
none 492 0 492 0% /dev
/dev/loop0 1479088 1385904 0 100% /data <-- datadir on pxc4
tmpfs 8075084 0 8075084 0% /dev/shm

sysbench on pxc1 (see writes) go to 0.00
[ 784s] threads: 2, tps: 0.00, reads: 0.00, writes: 632.05, response time: 6.41ms (95%), errors: 0.00, reconnects: 0.00
[ 786s] threads: 2, tps: 0.00, reads: 0.00, writes: 1063.51, response time: 4.45ms (95%), errors: 0.00, reconnects: 0.00
[ 788s] threads: 2, tps: 0.00, reads: 0.00, writes: 1340.50, response time: 2.69ms (95%), errors: 0.00, reconnects: 0.00
[ 790s] threads: 2, tps: 0.00, reads: 0.00, writes: 648.49, response time: 4.61ms (95%), errors: 0.00, reconnects: 0.00
[ 792s] threads: 2, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00
[ 794s] threads: 2, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00
[ 796s] threads: 2, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00
[ 798s] threads: 2, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00

myq_status on pxc1
01:19:32 P 5 3 Sync 864µs 0 0 0 1405 0.0 0.7M 0 0 235k 122 90 1 1 0
01:19:33 P 5 3 Sync 914µs 0 0 0 1453 0.0 0.7M 0 0 235k 144 90 1 1 0
01:19:34 P 5 3 Sync 923µs 0 0 0 442 0.0 0.2M 0 0 235k 586 91 1 1 412
01:19:35 P 5 3 Sync 923µs 0 0 0 0 0.0 0.0 0 0 235k 586 91 1 1 1070
01:19:36 P 5 3 Sync 923µs 0 0 0 0 0.0 0.0 0 0 235k 586 91 1 1 1118
percona_cluster / pxc1 (idx: 2) / Galera 3.17(r447d194)
Wsrep Cluster Node Repl Queue Ops Bytes Conflct Gcache Window Flow
    time P cnf # Stat Laten Up Dn Up Dn Up Dn lcf bfa ist idx dst appl comm p_ms
01:19:38 P 5 3 Sync 923µs 0 0 0 0 0.0 0.0 0 0 235k 586 91 1 1 1081
01:19:39 P 5 3 Sync 923µs 0 0 0 0 0.0 0.0 0 0 235k 586 91 1 1 1046
01:19:40 P 5 3 Sync 923µs 0 0 0 0 0.0 0.0 0 0 235k 586 91 1 1 1043

myq_status on pxc4 node that had disk filled, myq_status stopped updating at 01:19:32
01:19:27 P 5 3 Sync 947µs 0 0 0 652 0.0 0.3M 0 0 469k 94 90 1 1 0
01:19:28 P 5 3 Sync 952µs 0 0 0 795 0.0 0.4M 0 0 469k 107 90 1 1 0
01:19:29 P 5 3 Sync 960µs 0 0 0 671 0.0 0.3M 0 0 469k 128 90 1 1 0
01:19:30 P 5 3 Sync 953µs 0 2 0 1232 0....

Read more...

Changed in percona-xtradb-cluster:
status: New → Confirmed
Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

From log it seems like node is waiting to write bin-log event but due to disk full couldn't proceed. Node is unfortunately not killed so from cluster perspective it is still an active entity that is not able to process things.

I also see flow control is triggered causing other nodes to stop till the node is recovered.

Unfortunately the node is not in recoverable state.

----------------

As pointed in previous comment, with 5.7 MySQL ensure that server goes down for fatal error like this which is correct fix for this issue.

Changed in percona-xtradb-cluster:
status: Confirmed → Invalid
status: Invalid → Won't Fix
Revision history for this message
markus_albe (markus-albe) wrote :

Krunal,

Would be possible to back-port the behavior to 5.6? It's affecting internal customer (i209101) and seems a sensible choice to evict a node that is stalled in a way that is known to [likely] require human intervention to resolve.

Thanks for checking!

Revision history for this message
Przemek (pmalkowski) wrote :
Download full text (7.2 KiB)

I think it is important to distinguish two situations:

1 - main MySQL datadir partition gets full, so that InnoDB table spaces cannot grow
2 - binary logs are on separate partition and only this partition gets full.

In the first case, indeed 5.7 node aborts:

2017-10-19T13:25:50.805645Z 1 [ERROR] InnoDB: posix_fallocate(): Failed to preallocate data for file ./sbtest/sbtest2.ibd, desired size 1048576 bytes. Operating system error number 28. Check that the disk is not full or a disk quota exceeded. Make sure the file system supports this function. Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/ operating-system-error-codes.html
2017-10-19T13:25:50.810999Z 1 [Warning] InnoDB: 1048576 bytes should have been written. Only 933888 bytes written. Retrying for the remaining bytes.
2017-10-19T13:25:50.811230Z 1 [Warning] InnoDB: Retry attempts for writing partial data failed.
2017-10-19T13:25:50.811689Z 1 [ERROR] InnoDB: Write to file ./sbtest/sbtest2.ibdfailed at offset 9437184, 1048576 bytes should have been written, only 933888 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
2017-10-19T13:25:50.811796Z 1 [ERROR] InnoDB: Error number 28 means 'No space left on device'
2017-10-19T13:25:50.811865Z 1 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
2017-10-19T13:25:50.812077Z 1 [Warning] InnoDB: Error while writing 1048576 zeroes to ./sbtest/sbtest2.ibd starting at offset 9437184
2017-10-19T13:25:50.819133Z 1 [ERROR] /usr/sbin/mysqld: The table 'sbtest2' is full
2017-10-19T13:25:50.819276Z 1 [ERROR] Slave SQL: Could not execute Write_rows event on table sbtest.sbtest2; The table 'sbtest2' is full, Error_code: 1114; handler error HA_ERR_RECORD_FILE_FULL; the event's master log FIRST, end_log_pos 1338042, Error_code: 1114
2017-10-19T13:25:50.819370Z 1 [Warning] WSREP: RBR event 166 Write_rows apply warning: 135, 176703
2017-10-19T13:25:51.178155Z 1 [Warning] WSREP: Failed to apply app buffer: seqno: 176703, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 2th time
2017-10-19T13:25:51.470102Z 1 [ERROR] InnoDB: posix_fallocate(): Failed to preallocate data for file ./sbtest/sbtest2.ibd, desired size 114688 bytes. Operating system error number 28. Check that the disk is not full or a disk quota exceeded. Make sure the file system supports this function. Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/ operating-system-error-codes.html
2017-10-19T13:25:51.470750Z 1 [Warning] InnoDB: Retry attempts for writing partial data failed.
2017-10-19/usr/bin/mysqld_safe: line 218: 10469 Aborted (...)

In a similar test on 5.6.37, when peer node main datadir partition is full (binlogs disabled), the writer node gets stuck on write transaction forever:
*************************** 4. row ***************************
           Id: 15
         User: root
         Host: localhost
           db: test
      Command: Query
         Time: 4...

Read more...

Revision history for this message
Przemek (pmalkowski) wrote :
Download full text (3.9 KiB)

Apologies, please disregard part of my previous comment about 5.6 behavior different in first scenario. I just tested on a wrong, very big table, so just the INSERT..SELECT was too long.

Correct test ended up with similar abort on 5.6.37:

2017-10-19 19:13:51 7fc63404f700 InnoDB: Error: Write to file ./test/j2.ibd failed at offset 19922944.
InnoDB: 1048576 bytes should have been written, only 0 were written.
InnoDB: Operating system error number 28.
InnoDB: Check that your OS and file system support files of this size.
InnoDB: Check also that the disk is not full or a disk quota exceeded.
InnoDB: Error number 28 means 'No space left on device'.
InnoDB: Some operating system error numbers are described at
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/operating-system-error-codes.html
2017-10-19 19:13:51 28725 [ERROR] /usr/sbin/mysqld: The table 'j2' is full
2017-10-19 19:13:51 28725 [ERROR] Slave SQL: Could not execute Write_rows event on table test.j2; The table 'j2' is full, Error_code: 1114; handler error HA_ERR_RECORD_FILE_FULL; the event's master log FIRST, end_log_pos 8156999, Error_code: 1114
2017-10-19 19:13:51 28725 [Warning] WSREP: RBR event 1001 Write_rows apply warning: 135, 698196
2017-10-19 19:13:53 28725 [Note] WSREP: 0.0 (percona2): State transfer to 2.0 (percona1) complete.
2017-10-19 19:13:53 28725 [Note] WSREP: Member 0.0 (percona2) synced with group.
2017-10-19 19:13:58 28725 [Warning] WSREP: Failed to apply app buffer: seqno: 698196, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 2th time
2017-10-19 19:14:03 28725 [ERROR] /usr/sbin/mysqld: The table 'j2' is full
2017-10-19 19:14:03 28725 [ERROR] Slave SQL: Could not execute Write_rows event on table test.j2; The table 'j2' is full, Error_code: 1114; handler error HA_ERR_RECORD_FILE_FULL; the event's master log FIRST, end_log_pos 8156999, Error_code: 1114
2017-10-19 19:14:03 28725 [Warning] WSREP: RBR event 1001 Write_rows apply warning: 135, 698196
2017-10-19 19:14:08 28725 [Warning] WSREP: Failed to apply app buffer: seqno: 698196, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 3th time
2017-10-19 19:14:13 28725 [ERROR] /usr/sbin/mysqld: The table 'j2' is full
2017-10-19 19:14:13 28725 [ERROR] Slave SQL: Could not execute Write_rows event on table test.j2; The table 'j2' is full, Error_code: 1114; handler error HA_ERR_RECORD_FILE_FULL; the event's master log FIRST, end_log_pos 8156999, Error_code: 1114
2017-10-19 19:14:13 28725 [Warning] WSREP: RBR event 1001 Write_rows apply warning: 135, 698196
2017-10-19 19:14:20 28725 [Warning] WSREP: Failed to apply app buffer: seqno: 698196, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 4th time
2017-10-19 19:14:26 28725 [ERROR] /usr/sbin/mysqld: The table 'j2' is full
2017-10-19 19:14:26 28725 [ERROR] Slave SQL: Could not execute Write_rows event on table test.j2; The table 'j2' is full, Error_code: 1114; handler error HA_ERR_RECORD_FILE_FULL; the event's master log FIRST, end_log_pos 8156999, Error_code: 1114
2017-10-19 19:14:26 28725 [Warning] WSREP: RBR event 1001 Write_rows apply warning: 135, 698196
2017-10-19 19:14:32 28725 [ERROR] WSREP: Failed to apply trx...

Read more...

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1871

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.