Comment 3 for bug 985747

Revision history for this message
Laurent Minost (lolomin) wrote :

Hi,

I'm facing this bug very frequently when restarting our nodes, this is not the case everytime but almost 80% of time. The behaviour that is most problematic with this bug is that the two nodes are then stalled in a state where they seems to wait each other for the IST to end and so two nodes are then unavailable which needs direct human action then to recover the situation, by the way as new information on the node which should received IST to recover from the situation I need to kill -9 mysql otherwise it cannot stop properly by itself.

node1 (receiver/recovering node) :
...
Version: '5.5.20-log' socket: '/opt/mysql-galera/data/mysql.sock' port: 3306 Source distribution, wsrep_23.4.rXXXX
120426 9:39:06 [Note] WSREP: SST received: 61408137-81c8-11e1-0800-5dbd477990e1:8373272
120426 9:39:06 [Note] WSREP: Receiving IST: 114 writesets, seqnos 8373272-8373386
...

node2 (donor) :
...
120426 9:39:05 [Note] WSREP: IST request: 61408137-81c8-11e1-0800-5dbd477990e1:8373272-8373386|tcp://192.168.0.10:4568
120426 9:39:05 [Note] WSREP: Running: 'wsrep_sst_rsync 'donor' '192.168.0.10:4444/rsync_sst' 'sst:5T13wPid' '/opt/mysql-galera/data/' '/etc/my-galera.cnf' '61408137-81c8-11e1-0800-5dbd477990e1' '8373272' '1''
120426 9:39:05 [Note] WSREP: sst_donor_thread signaled with 0
120426 9:39:05 [Note] WSREP: async IST sender starting to serve tcp://192.168.0.10:4568 sending 8373273-8373386
...

As I'm able to reproduce the problem "easily" for the moment, if you need any other informations or debug trace don't hesitate.