disk full log-bin node causes cluster hang
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
MySQL patches by Codership |
New
|
Undecided
|
Unassigned | |||
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC | Status tracked in 5.6 | |||||
5.5 |
Invalid
|
Undecided
|
Unassigned | |||
5.6 |
Invalid
|
Undecided
|
Unassigned |
Bug Description
A node that fills its disk will generate RBR apply errors without log-bin enabled:
140922 15:08:09 [ERROR] Slave SQL: Could not execute Write_rows event on table sbtest.sbtest1; The table 'sbtest1' is full, Error_code: 1114; handler error HA_ERR_
140922 15:08:09 [Warning] WSREP: RBR event 2 Write_rows apply warning: 135, 40012
The node then aborts and the cluster continues. However, I question if this is good behavior in the case of multiple nodes filling up disk simultaneously.
If instead that node has log-bin enabled, it does not RBR abort, it instead hangs with this message:
140922 15:00:18 [Warning] Disk is full writing './mysql-
140922 15:00:18 [Warning] Retry in 60 secs. Message reprinted in 600 secs
In this case the node just blocks writes and causes flow control forever -- total cluster hang.
These are both mysql behaviors, I think. The question is if either behavior is correct or incorrect and if we can/should do anything to modify it.
I tested this on 5.5, but I think it applies to 5.6 as well.
I tested 5.6 and couldn't reproduce this. I'm guessing some behavior changed in 5.6 that causes the node to abort instead of hang. This may be a 'wonfix', but at least the situation is documented now.