MySQL patches by Codership

rsync SST script returns confusing error code and little diagnostic on rsync protocol mismatch

Reported by Yuri Golovko on 2012-01-18
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Medium
Alex Yurchenko
5.1
Medium
Alex Yurchenko
5.5
Medium
Alex Yurchenko
Percona XtraDB Cluster
Undecided
Unassigned

Bug Description

I'm having a problem similar to this one: https://bugs.launchpad.net/codership-mysql/+bug/797396

Environment: Percona XtraDB cluster, installed from Percona repo on RHEL 5, wsrep config taken from http://www.percona.com/doc/percona-xtradb-cluster/3nodesec2.html

When I'm trying to join second node (node01) I see this error on second node:

120118 9:07:32 [Note] WSREP: Prepared IST receiver, listening at: tcp://192.168.100.220:4568
120118 9:07:32 [Note] WSREP: Node 1 (node01) requested state transfer from '*any*'. Selected 0 (node00)(SYNCED) as donor.
120118 9:07:32 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0)
120118 9:07:32 [Note] WSREP: Requesting state transfer: success, donor: 0
120118 9:07:33 [ERROR] WSREP: Failed to parse uuid:seqno pair: 'rsync process ended without creating '/var/lib/mysql//rsync_sst_complete''
120118 9:07:33 [ERROR] WSREP: SST failed: 22 (Invalid argument)
120118 9:07:33 [ERROR] Aborting

And I see following error on first node (node00) when trying to join second one:

120118 8:10:50 [ERROR] WSREP: Failed to read from: wsrep_sst_rsync 'donor' '192.168.100.220:4444/rsync_sst' '(null)' '/var/lib/mysql/' '/etc/my.cnf' 'd75c72ca-41c4-11e1-0800-20f251b58169' '0' '0' 2>sst.err

120118 8:10:50 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync 'donor' '192.168.100.220:4444/rsync_sst' '(null)' '/var/lib/mysql/' '/etc/my.cnf' 'd75c72ca-41c4-11e1-0800-20f251b58169' '0' '0' 2>sst.err: 12 (Cannot allocate memory)

both nodes have enough memory and basic configuration of wsrep. Please help to figure it out - is this a bug or misconfiguration issue or something.

Vadim Tkachenko (vadim-tk) wrote :

Before anything else
Did you disable SELinux (echo 0 > /selinux/enforce)
and stop firewall ( service iptables stop) ?

Yuri Golovko (yuris) wrote :

Sure. Selinux is disabled, firewall is off, all servers are on the same subnet of LAN, no network issues.

Vadim Tkachenko (vadim-tk) wrote :

Then please show content of sst.err files on both nodes.

Yuri Golovko (yuris) wrote :

At node00:

$ cat /var/lib/mysql/sst.err
rsync: read error: Connection reset by peer (104)
rsync error: error in rsync protocol data stream (code 12) at io.c(614) [sender=2.6.8]

At node01 apparently this file has not been created.

Teemu Ollakka (teemu-ollakka) wrote :

Please check that both nodes have the same rsync version. It might be that wsrep_sst_rsync script returns return value 12 of rsync process, which means 'Error in rsync protocol data stream'. As far as I remember this could be caused by incompatible rsync versions.

Yuri Golovko (yuris) wrote :

Thank you very much :)

It was rsync versions interoperability issue due to different versions indeed.

summary: - second node crush when trying to join cluster
+ rsync SST script returns confusing error code and little diagnostic on
+ rsync protocol mismatch
no longer affects: galera
Changed in percona-xtradb-cluster:
status: New → In Progress
Alex Yurchenko (ayurchen) wrote :

fix released in 5.5.20-23.4

Alex Yurchenko (ayurchen) wrote :

fix released in 5.1.62-23.4

Changed in percona-xtradb-cluster:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers