setting gmcast.listen_addr manually does not allow nodes own address in gcomm address list

Bug #1099478 reported by Teemu Ollakka
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Galera
Status tracked in 3.x
2.x
Fix Released
High
Teemu Ollakka
3.x
Fix Released
High
Teemu Ollakka
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Fix Released
Undecided
Unassigned
5.6
Fix Released
Undecided
Unassigned
percona-xtradb-cluster-galera-2.x (Ubuntu)
New
Undecided
Unassigned

Bug Description

Having specified gmcast.listen_addr manually in provider options

wsrep_provider_options="gmcast.listen_addr=tcp://10.10.1.100:4567;gcache.size=20G;gcs.fc_factor=0.99;gcs.fc_limit=256;gcs.fc_master_slave=yes"

and having node's own IP address in gcomm address list

wsrep_cluster_address=gcomm://10.10.1.100,10.10.1.101,10.10.1.102

causes gcomm to fail in connecting cluster with following message:

130111 13:53:46 [ERROR] WSREP: failed to open gcomm backend connection: 22: connect address points to listen address 'tcp://10.10.1.100:4567', check that cluster address '10.10.1.100:4567' is correct: 22 (Invalid argument)
         at gcomm/src/gmcast.cpp:GMCast():165
130111 13:53:46 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -22 (Invalid argument)

The reason for this is a check in GMCast constructor which does not allow having listen address in peer address list. This check should be removed and own listen address in address list should just be skipped.

Related branches

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

This can affect more when setting more than one instance on same box.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :
Download full text (3.6 KiB)

Still reproduceable:

sudo /pxc/bin/mysqld --defaults-file=/pxc/etc/my.cnf.local --basedir=/pxc --user=mysql --wsrep_provider_options='gmcast.listen_addr=tcp://127.0.0.1:4567' --wsrep-cluster-address="gcomm://127.0.0.1" --wsrep-start-position='5805bbc4-3038-11e3-937d-aa946afe7370:18490'
131227 22:55:13 [Warning] WSREP: wsrep_sst_receive_address is set to '127.0.0.1:4001' which makes it impossible for another host to reach this one. Please set it to the address which this node can be connected at by other cluster members.
131227 22:55:13 [Note] WSREP: wsrep_start_position var submitted: '5805bbc4-3038-11e3-937d-aa946afe7370:18490'
131227 22:55:13 [Note] WSREP: Read nil XID from storage engines, skipping position init
131227 22:55:13 [Note] WSREP: wsrep_load(): loading provider library '/pxc/lib/libgalera_smm.so'
131227 22:55:13 [Note] WSREP: wsrep_load(): Galera 2.8(rXXXX) by Codership Oy <email address hidden> loaded successfully.
131227 22:55:13 [Note] WSREP: Found saved state: 5805bbc4-3038-11e3-937d-aa946afe7370:-1
131227 22:55:13 [Note] WSREP: Reusing existing '/pxc/datadir//galera.cache'.
131227 22:55:13 [Note] WSREP: Passing config to GCS: base_host = 127.0.0.1; base_port = 4567; cert.log_conflicts = no; gcache.dir = /pxc/datadir/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /pxc/datadir//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://127.0.0.1:4567; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
131227 22:55:13 [Note] WSREP: Assign initial position for certification: 18490, protocol version: -1
131227 22:55:13 [Note] WSREP: wsrep_sst_grab()
131227 22:55:13 [Note] WSREP: Start replication
131227 22:55:13 [Note] WSREP: Setting initial position to 5805bbc4-3038-11e3-937d-aa946afe7370:18490
131227 22:55:13 [Note] WSREP: protonet asio version 0
131227 22:55:13 [Note] WSREP: backend: asio
131227 22:55:13 [Note] WSREP: GMCast version 0
131227 22:55:13 [ERROR] WSREP: failed to open gcomm backend connection: 22: connect address points to listen address 'tcp://127.0.0.1:4567', check that cluster address '127.0.0.1:4567' is correct: 22 (Invalid argument)
         at gcomm/src/gmcast.cpp:GMCast():166
131227 22:55:13 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():196: Failed to open backend connection: -22 (Invalid argument)
131227 22:55:13 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1292: Failed to open channel 'Archie' at 'gcomm://127.0.0.1': -22 (Invalid argument)
131227 22:55:13 [ERROR] WSREP: gcs connect failed: Invalid argument
131227 22:55:13 [ERROR] WSREP: wsrep::connect() failed: 7
131227 22:55:13 [ERROR] Aborting

131227 22:55:13 [Note] WSREP: Service disconnected.
131227 22:55:14 [Note] WSREP: Some threads may fail to exit.
131227 22:55:14 [Note] /pxc/bin/mysqld: Shutdown complete

Workaround would be to:

a) Skip the $self address in either of gmcast.listen_addr (this skipping that option) or removing $self from wsrep_cluster_ad...

Read more...

Revision history for this message
Alex Yurchenko (ayurchen) wrote :
Revision history for this message
Max Krasilnikov (pseudo) wrote :

percona-xtradb-cluster-galera-2.x 165-0ubuntu1, on Ubuntu 14.04.3 LTS (trusty) also affected.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1280

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.