Percona XtraDB Cluster - HA scalable solution for MySQL

Can't join second node probably due to rsync related issues (handshake interrupted by system call)

Reported by Daniel Guzmán Burgos on 2012-02-13
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster
Undecided
Unassigned

Bug Description

I was following the 3 node setup in singlebox in my lab instance (CentOS 5), just as described in
http://www.percona.com/doc/percona-xtradb-cluster/singlebox.html

The SST method is rsync

First node starts without any problem:

Extract from show status like 'wsrep%':

| wsrep_cluster_state_uuid | da848620-53fa-11e1-0800-19d93dacb492 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_index | 0 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <email address hidden> |
| wsrep_provider_version | 2.0beta(rXXXX) |
| wsrep_ready | ON |
+----------------------------+--------------------------------------+

Open ports:

tcp 0 0 0.0.0.0:4000 0.0.0.0:* LISTEN 29437/mysqld
tcp 0 0 0.0.0.0:4010 0.0.0.0:* LISTEN 29437/mysqld

When trying to start second node to join into the cluster, evertyhing seems fine (wsrep_sst_rsync, SST request, etc) but at the end of the log says:

 [Note] WSREP: SST received: da848620-53fa-11e1-0800-19d93dacb492:0
terminate called after throwing an instance of 'gu::Exception'
  what(): interrupted by ctrl: 4 (Interrupted system call)
  at galera/src/ist.cpp:recv_handshake_response():336

And aborted.

Primary node tries to reconect unsuccesfuly:

120213 10:04:07 [Note] WSREP: (cea2bdc2-5652-11e1-0800-002f2e624285, 'tcp://0.0.0.0:4010') reconnecting to d72a5b08-5652-11e1-0800-a686a66b4bf5 (tcp://10.240.110.31:5010), attempt 480
120213 10:04:37 [Note] WSREP: (cea2bdc2-5652-11e1-0800-002f2e624285, 'tcp://0.0.0.0:4010') reconnecting to d72a5b08-5652-11e1-0800-a686a66b4bf5 (tcp://10.240.110.31:5010), attempt 510

I'm using the binaries for RHEL5

Complete log:

[root@mysql-labo01 mysql]# bin/mysqld --defaults-file=/etc/my.5000.cnf
120213 9:27:38 [Note] WSREP: wsrep_load(): loading provider library '/usr/local/mysql/lib/libgalera_smm.so'
120213 9:27:38 [Note] WSREP: wsrep_load(): Galera 2.0beta(rXXXX) by Codership Oy <email address hidden> loaded succesfully.
120213 9:27:38 [Note] WSREP: Reusing existing '/data/bench/d2//galera.cache'.
120213 9:27:38 [Note] WSREP: Passing config to GCS: gcache.dir = /data/bench/d2/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/bench/d2//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gmcast.listen_addr = tcp://0.0.0.0:5010; replicator.commit_order = 3
120213 9:27:38 [Note] WSREP: wsrep_sst_grab()
120213 9:27:38 [Note] WSREP: Start replication
120213 9:27:38 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
120213 9:27:38 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
120213 9:27:38 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
120213 9:27:38 [Note] WSREP: protonet asio version 0
120213 9:27:38 [Note] WSREP: backend: asio
120213 9:27:38 [Note] WSREP: GMCast version 0
120213 9:27:38 [Note] WSREP: (e1902662-564e-11e1-0800-620c62a38645, 'tcp://0.0.0.0:5010') listening at tcp://0.0.0.0:5010
120213 9:27:38 [Note] WSREP: (e1902662-564e-11e1-0800-620c62a38645, 'tcp://0.0.0.0:5010') multicast: , ttl: 1
120213 9:27:38 [Note] WSREP: EVS version 0
120213 9:27:38 [Note] WSREP: PC version 0
120213 9:27:38 [Note] WSREP: gcomm: connecting to group 'trimethylxanthine', peer '10.240.110.31:4010'
120213 9:27:39 [Note] WSREP: GMCast::handle_stable_view: view(view_id(PRIM,2594eb66-564c-11e1-0800-19e0d6ca0192,2) memb {
 2594eb66-564c-11e1-0800-19e0d6ca0192,
 e1902662-564e-11e1-0800-620c62a38645,
} joined {
} left {
} partitioned {
})
120213 9:27:39 [Note] WSREP: declaring 2594eb66-564c-11e1-0800-19e0d6ca0192 stable
120213 9:27:39 [Note] WSREP: gcomm: connected
120213 9:27:39 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
120213 9:27:39 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
120213 9:27:39 [Note] WSREP: Opened channel 'trimethylxanthine'
120213 9:27:39 [Note] WSREP: Waiting for SST to complete.
120213 9:27:39 [Note] WSREP: New COMPONENT: primary = yes, my_idx = 1, memb_num = 2
120213 9:27:39 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
120213 9:27:39 [Note] WSREP: STATE EXCHANGE: sent state msg: e1dd4a1e-564e-11e1-0800-867286936c4d
120213 9:27:39 [Note] WSREP: STATE EXCHANGE: got state msg: e1dd4a1e-564e-11e1-0800-867286936c4d from 0 (node4000)
120213 9:27:39 [Note] WSREP: STATE EXCHANGE: got state msg: e1dd4a1e-564e-11e1-0800-867286936c4d from 1 (node5000)
120213 9:27:39 [Note] WSREP: Quorum results:
 version = 2,
 component = PRIMARY,
 conf_id = 1,
 members = 1/2 (joined/total),
 act_id = 0,
 last_appl. = -1,
 protocols = 0/2/1 (gcs/repl/appl),
 group UUID = da848620-53fa-11e1-0800-19d93dacb492
120213 9:27:39 [Note] WSREP: Flow-control interval: [12, 23]
120213 9:27:39 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 0)
120213 9:27:39 [Note] WSREP: New cluster view: global state: da848620-53fa-11e1-0800-19d93dacb492:0, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 1
120213 9:27:39 [Warning] WSREP: Gap in state sequence. Need state transfer.
120213 9:27:41 [Note] WSREP: Running: 'wsrep_sst_rsync 'joiner' '10.240.110.31:5020' 'root:' '/data/bench/d2/' '/etc/my.5000.cnf' '26094' 2>sst.err'
120213 9:27:42 [Note] WSREP: Prepared SST request: rsync|10.240.110.31:5020/rsync_sst
120213 9:27:42 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
120213 9:27:42 [Note] WSREP: Assign initial position for certification: 0, protocol version: 1
120213 9:27:42 [Note] WSREP: State transfer required:
 Group state: da848620-53fa-11e1-0800-19d93dacb492:0
 Local state: 00000000-0000-0000-0000-000000000000:-1
120213 9:27:42 [Note] WSREP: Prepared IST receiver, listening at: tcp://10.240.110.31:5011
120213 9:27:42 [Note] WSREP: Node 1 (node5000) requested state transfer from '*any*'. Selected 0 (node4000)(SYNCED) as donor.
120213 9:27:42 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0)
120213 9:27:42 [Note] WSREP: Requesting state transfer: success, donor: 0
120213 9:27:42 [Note] WSREP: 0 (node4000): State transfer to 1 (node5000) complete.
120213 9:27:42 [Note] WSREP: Member 0 (node4000) synced with group.
120213 9:27:43 [Note] WSREP: SST complete, seqno: 0
120213 9:27:43 [Note] Plugin 'FEDERATED' is disabled.
120213 9:27:43 InnoDB: The InnoDB memory heap is disabled
120213 9:27:43 InnoDB: Mutexes and rw_locks use GCC atomic builtins
120213 9:27:43 InnoDB: Compressed tables use zlib 1.2.3
120213 9:27:43 InnoDB: Using Linux native AIO
120213 9:27:43 InnoDB: Initializing buffer pool, size = 128.0M
120213 9:27:43 InnoDB: Completed initialization of buffer pool
120213 9:27:43 InnoDB: highest supported file format is Barracuda.
120213 9:27:43 InnoDB: Waiting for the background threads to start
120213 9:27:44 Percona XtraDB (http://www.percona.com) 1.1.8-20.1 started; log sequence number 1597945
120213 9:27:44 [Note] Event Scheduler: Loaded 0 events
120213 9:27:44 [Note] WSREP: Signalling provider to continue.
120213 9:27:44 [Note] WSREP: Received SST: da848620-53fa-11e1-0800-19d93dacb492:0
120213 9:27:44 [Note] bin/mysqld: ready for connections.
Version: '5.5.17-22.1' socket: '/tmp/mysql.5000.sock' port: 5000 Percona XtraDB Cluster (GPL), Release 22.1, Revision 3683 wsrep_22.3.r3683
120213 9:27:44 [Note] WSREP: SST received: da848620-53fa-11e1-0800-19d93dacb492:0
terminate called after throwing an instance of 'gu::Exception'
  what(): interrupted by ctrl: 4 (Interrupted system call)
  at galera/src/ist.cpp:recv_handshake_response():336
Abortado

Daniel Guzmán Burgos (nethalo) wrote :

This was due rsync version problems. Now it's working

Changed in percona-xtradb-cluster:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers