rsync SST script returns confusing error code and little diagnostic on rsync protocol mismatch

Bug #918218 reported by Yuri Golovko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Fix Released
Medium
Alex Yurchenko
5.1
Fix Released
Medium
Alex Yurchenko
5.5
Fix Released
Medium
Alex Yurchenko
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Fix Released
Undecided
Unassigned

Bug Description

I'm having a problem similar to this one: https://bugs.launchpad.net/codership-mysql/+bug/797396

Environment: Percona XtraDB cluster, installed from Percona repo on RHEL 5, wsrep config taken from http://www.percona.com/doc/percona-xtradb-cluster/3nodesec2.html

When I'm trying to join second node (node01) I see this error on second node:

120118 9:07:32 [Note] WSREP: Prepared IST receiver, listening at: tcp://192.168.100.220:4568
120118 9:07:32 [Note] WSREP: Node 1 (node01) requested state transfer from '*any*'. Selected 0 (node00)(SYNCED) as donor.
120118 9:07:32 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0)
120118 9:07:32 [Note] WSREP: Requesting state transfer: success, donor: 0
120118 9:07:33 [ERROR] WSREP: Failed to parse uuid:seqno pair: 'rsync process ended without creating '/var/lib/mysql//rsync_sst_complete''
120118 9:07:33 [ERROR] WSREP: SST failed: 22 (Invalid argument)
120118 9:07:33 [ERROR] Aborting

And I see following error on first node (node00) when trying to join second one:

120118 8:10:50 [ERROR] WSREP: Failed to read from: wsrep_sst_rsync 'donor' '192.168.100.220:4444/rsync_sst' '(null)' '/var/lib/mysql/' '/etc/my.cnf' 'd75c72ca-41c4-11e1-0800-20f251b58169' '0' '0' 2>sst.err

120118 8:10:50 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync 'donor' '192.168.100.220:4444/rsync_sst' '(null)' '/var/lib/mysql/' '/etc/my.cnf' 'd75c72ca-41c4-11e1-0800-20f251b58169' '0' '0' 2>sst.err: 12 (Cannot allocate memory)

both nodes have enough memory and basic configuration of wsrep. Please help to figure it out - is this a bug or misconfiguration issue or something.

Revision history for this message
Vadim Tkachenko (vadim-tk) wrote :

Before anything else
Did you disable SELinux (echo 0 > /selinux/enforce)
and stop firewall ( service iptables stop) ?

Revision history for this message
Yuri Golovko (yuris) wrote :

Sure. Selinux is disabled, firewall is off, all servers are on the same subnet of LAN, no network issues.

Revision history for this message
Vadim Tkachenko (vadim-tk) wrote :

Then please show content of sst.err files on both nodes.

Revision history for this message
Yuri Golovko (yuris) wrote :

At node00:

$ cat /var/lib/mysql/sst.err
rsync: read error: Connection reset by peer (104)
rsync error: error in rsync protocol data stream (code 12) at io.c(614) [sender=2.6.8]

At node01 apparently this file has not been created.

Revision history for this message
Teemu Ollakka (teemu-ollakka) wrote :

Please check that both nodes have the same rsync version. It might be that wsrep_sst_rsync script returns return value 12 of rsync process, which means 'Error in rsync protocol data stream'. As far as I remember this could be caused by incompatible rsync versions.

Revision history for this message
Yuri Golovko (yuris) wrote :

Thank you very much :)

It was rsync versions interoperability issue due to different versions indeed.

summary: - second node crush when trying to join cluster
+ rsync SST script returns confusing error code and little diagnostic on
+ rsync protocol mismatch
no longer affects: galera
Changed in percona-xtradb-cluster:
status: New → In Progress
Revision history for this message
Alex Yurchenko (ayurchen) wrote :

fix released in 5.5.20-23.4

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

fix released in 5.1.62-23.4

Changed in percona-xtradb-cluster:
status: In Progress → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1201

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.