setting gmcast.listen_addr manually does not allow nodes own address in gcomm address list

Reported by Teemu Ollakka on 2013-01-14
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Galera
Status tracked in 3.x
2.x
High
Teemu Ollakka
3.x
High
Teemu Ollakka
Percona XtraDB Cluster
Status tracked in Trunk
5.6
Undecided
Unassigned
Trunk
Undecided
Unassigned

Bug Description

Having specified gmcast.listen_addr manually in provider options

wsrep_provider_options="gmcast.listen_addr=tcp://10.10.1.100:4567;gcache.size=20G;gcs.fc_factor=0.99;gcs.fc_limit=256;gcs.fc_master_slave=yes"

and having node's own IP address in gcomm address list

wsrep_cluster_address=gcomm://10.10.1.100,10.10.1.101,10.10.1.102

causes gcomm to fail in connecting cluster with following message:

130111 13:53:46 [ERROR] WSREP: failed to open gcomm backend connection: 22: connect address points to listen address 'tcp://10.10.1.100:4567', check that cluster address '10.10.1.100:4567' is correct: 22 (Invalid argument)
         at gcomm/src/gmcast.cpp:GMCast():165
130111 13:53:46 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -22 (Invalid argument)

The reason for this is a check in GMCast constructor which does not allow having listen address in peer address list. This check should be removed and own listen address in address list should just be skipped.

This can affect more when setting more than one instance on same box.

Download full text (3.6 KiB)

Still reproduceable:

sudo /pxc/bin/mysqld --defaults-file=/pxc/etc/my.cnf.local --basedir=/pxc --user=mysql --wsrep_provider_options='gmcast.listen_addr=tcp://127.0.0.1:4567' --wsrep-cluster-address="gcomm://127.0.0.1" --wsrep-start-position='5805bbc4-3038-11e3-937d-aa946afe7370:18490'
131227 22:55:13 [Warning] WSREP: wsrep_sst_receive_address is set to '127.0.0.1:4001' which makes it impossible for another host to reach this one. Please set it to the address which this node can be connected at by other cluster members.
131227 22:55:13 [Note] WSREP: wsrep_start_position var submitted: '5805bbc4-3038-11e3-937d-aa946afe7370:18490'
131227 22:55:13 [Note] WSREP: Read nil XID from storage engines, skipping position init
131227 22:55:13 [Note] WSREP: wsrep_load(): loading provider library '/pxc/lib/libgalera_smm.so'
131227 22:55:13 [Note] WSREP: wsrep_load(): Galera 2.8(rXXXX) by Codership Oy <email address hidden> loaded successfully.
131227 22:55:13 [Note] WSREP: Found saved state: 5805bbc4-3038-11e3-937d-aa946afe7370:-1
131227 22:55:13 [Note] WSREP: Reusing existing '/pxc/datadir//galera.cache'.
131227 22:55:13 [Note] WSREP: Passing config to GCS: base_host = 127.0.0.1; base_port = 4567; cert.log_conflicts = no; gcache.dir = /pxc/datadir/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /pxc/datadir//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://127.0.0.1:4567; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
131227 22:55:13 [Note] WSREP: Assign initial position for certification: 18490, protocol version: -1
131227 22:55:13 [Note] WSREP: wsrep_sst_grab()
131227 22:55:13 [Note] WSREP: Start replication
131227 22:55:13 [Note] WSREP: Setting initial position to 5805bbc4-3038-11e3-937d-aa946afe7370:18490
131227 22:55:13 [Note] WSREP: protonet asio version 0
131227 22:55:13 [Note] WSREP: backend: asio
131227 22:55:13 [Note] WSREP: GMCast version 0
131227 22:55:13 [ERROR] WSREP: failed to open gcomm backend connection: 22: connect address points to listen address 'tcp://127.0.0.1:4567', check that cluster address '127.0.0.1:4567' is correct: 22 (Invalid argument)
         at gcomm/src/gmcast.cpp:GMCast():166
131227 22:55:13 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():196: Failed to open backend connection: -22 (Invalid argument)
131227 22:55:13 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1292: Failed to open channel 'Archie' at 'gcomm://127.0.0.1': -22 (Invalid argument)
131227 22:55:13 [ERROR] WSREP: gcs connect failed: Invalid argument
131227 22:55:13 [ERROR] WSREP: wsrep::connect() failed: 7
131227 22:55:13 [ERROR] Aborting

131227 22:55:13 [Note] WSREP: Service disconnected.
131227 22:55:14 [Note] WSREP: Some threads may fail to exit.
131227 22:55:14 [Note] /pxc/bin/mysqld: Shutdown complete

Workaround would be to:

a) Skip the $self address in either of gmcast.listen_addr (this skipping that option) or removing $self from wsrep_cluster_ad...

Read more...

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers