Comment 4 for bug 1019473

Revision history for this message
Josiah Poirier (josiahp) wrote :

I am also experiencing this error. It happened yesterday and again today. I have a 3 node cluster and mine is running on Amazon EC2 on m2.xlarge instances. Each node is in a different Availability Zone, but they are all in the same region.

Each time this happens I'm left with one node that still has a bunch of mysql threads running, but it won't allow me to log in with the mysql client, it just hangs after I enter the password and I have to close the terminal and ssh to the box again.

The only way I'm able to recover the cluster is to kill all the mysql processes, enable split brain mode, start one node, then the next node does a full re-sync, and then start up the third node which also does a full re-sync.

There is one php process that inserts records into the table frequently and then there is another process that can every 20 seconds and can hit any of the three nodes in the cluster, and could possibly hit more than one node at the same time. The queries look like this:

$time = time();
$query1 = "SELECT column1 FROM some_table_one WHERE column2 <=".$time;
...php while loop...
$query2 = "DELETE FROM some_table_one WHERE column2 <=".$time;

This code is from a third party software, but it is open source...

I have pulled the error logs from the SQL servers and can pull the variables and table create info. If you need them, let me know where to send them.