Percona XtraDB Cluster - HA scalable solution for MySQL

Can't add a node's own IP address to gcomm, signal 11

Reported by Miguel Angel Nieto on 2013-01-14
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Galera
Undecided
Unassigned
Percona XtraDB Cluster
High
Unassigned

Bug Description

I've created a three node cluster:

Debian Squeeze - 5.5.28-55 Percona XtraDB Cluster (GPL), wsrep_23.7.r3821

PXC1 (192.168.1.201)
PXC2 (192.168.1.202)
PXC3 (192.168.1.203)

After the bootstrap process on PXC1 (gcomm://) I try to start the other two nodes. If they include its own IP address on wsrep_cluster_address it crashes. For example, can't start PXC2 with:

wsrep_cluster_address = gcomm://192.168.1.201,192.168.1.202,192.168.1.203

but I can start it with:

wsrep_cluster_address = gcomm://192.168.1.201,192.168.1.203

The same applies to the third node. This is the signal 11 it shows on the error log:

130114 14:37:05 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
130114 14:37:05 mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.mRIZ4Gjtj8
130114 14:37:10 mysqld_safe WSREP: Failed to recover position:
130114 14:37:10 [Note] WSREP: Read nil XID from storage engines, skipping position init
130114 14:37:10 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/libgalera_smm.so'
130114 14:37:10 [Note] WSREP: wsrep_load(): Galera 2.2(r115) by Codership Oy <email address hidden> loaded succesfully.
130114 14:37:10 [Note] WSREP: Found saved state: c517cd30-5e46-11e2-0800-28ff9c7ae02f:-1
130114 14:37:10 [Note] WSREP: Reusing existing 'pxc_cache'.
130114 14:37:10 [Note] WSREP: Passing config to GCS: base_host = 192.168.1.202; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = pxc_cache; gcache.page_size = 128M; gcache.size = 512M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
130114 14:37:10 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
130114 14:37:10 [Note] WSREP: wsrep_sst_grab()
130114 14:37:10 [Note] WSREP: Start replication
130114 14:37:10 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
130114 14:37:10 [Note] WSREP: protonet asio version 0
130114 14:37:10 [Note] WSREP: backend: asio
130114 14:37:10 [Note] WSREP: GMCast version 0
130114 14:37:10 [Note] WSREP: (7f507679-5e4f-11e2-0800-a2583c8f4758, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
130114 14:37:10 [Note] WSREP: (7f507679-5e4f-11e2-0800-a2583c8f4758, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
130114 14:37:10 [Note] WSREP: EVS version 0
130114 14:37:10 [Note] WSREP: PC version 0
130114 14:37:10 [Note] WSREP: gcomm: connecting to group 'PXC', peer '192.168.1.201:,192.168.1.202:,192.168.1.203:'
13:37:10 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=0
read_buffer_size=131072
max_used_connections=0
max_threads=151
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 330485 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x7e4d15]
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x6b2884]
/lib/libpthread.so.0(+0xeff0)[0x7fb960c26ff0]
/lib/libc.so.6(memcpy+0x292)[0x7fb95fe6fdc2]
/usr/lib/libstdc++.so.6(_ZNSt15basic_streambufIcSt11char_traitsIcEE6xsputnEPKcl+0x7b)[0x7fb95df1925b]
/usr/lib/libstdc++.so.6(_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l+0x1b5)[0x7fb95df0f3b5]
/usr/lib64/libgalera_smm.so(_ZN5gcomm6GMCast18handle_establishedEPNS_6gmcast5ProtoE+0x23f)[0x7fb95e8679af]
/usr/lib64/libgalera_smm.so(_ZN5gcomm6GMCast9handle_upEPKvRKN2gu8DatagramERKNS_11ProtoUpMetaE+0x86e)[0x7fb95e86b5ee]
/usr/lib64/libgalera_smm.so(_ZN5gcomm10Protostack8dispatchEPKvRKN2gu8DatagramERKNS_11ProtoUpMetaE+0x5c)[0x7fb95e89490c]
/usr/lib64/libgalera_smm.so(_ZN5gcomm12AsioProtonet8dispatchERKPKvRKN2gu8DatagramERKNS_11ProtoUpMetaE+0x4b)[0x7fb95e8c66cb]
/usr/lib64/libgalera_smm.so(_ZN5gcomm13AsioTcpSocket12read_handlerERKN4asio10error_codeEm+0x758)[0x7fb95e8a3778]
/usr/lib64/libgalera_smm.so(_ZN4asio6detail7read_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceIS4_EEEEN5boost5arrayINS_14mutable_bufferELm1EEENS8_3_bi6bind_tImNS8_4_mfi3mf2ImN5gcomm13AsioTcpSocketERKNS_10error_codeEmEENSC_5list3INSC_5valueINS8_10shared_ptrISH_EEEEPFNS8_3argILi1EEEvEPFNSR_ILi2EEEvEEEEENSD_IvNSF_IvSH_SK_mEESY_EEEclESK_mi+0xc5)[0x7fb95e8b8c95]
/usr/lib64/libgalera_smm.so(_ZN4asio6detail23reactive_socket_recv_opINS0_17consuming_buffersINS_14mutable_bufferEN5boost5arrayIS3_Lm1EEEEENS0_7read_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceISB_EEEES6_NS4_3_bi6bind_tImNS4_4_mfi3mf2ImN5gcomm13AsioTcpSocketERKNS_10error_codeEmEENSF_5list3INSF_5valueINS4_10shared_ptrISK_EEEEPFNS4_3argILi1EEEvEPFNSU_ILi2EEEvEEEEENSG_IvNSI_IvSK_SN_mEES11_EEEEE11do_completeEPNS0_15task_io_serviceEPNS0_25task_io_service_operationESL_m+0x306)[0x7fb95e8b9966]
/usr/lib64/libgalera_smm.so(_ZN4asio6detail15task_io_service3runERNS_10error_codeE+0x45a)[0x7fb95e8d0bba]
/usr/lib64/libgalera_smm.so(_ZN5gcomm12AsioProtonet10event_loopERKN2gu8datetime6PeriodE+0x1d6)[0x7fb95e8c7216]
/usr/lib64/libgalera_smm.so(_ZN5gcomm2PC7connectEv+0x1a0)[0x7fb95e87e060]
/usr/lib64/libgalera_smm.so(_ZN9GCommConn7connectERKSs+0x431)[0x7fb95e8e5771]
/usr/lib64/libgalera_smm.so(+0x16592d)[0x7fb95e8e192d]
/usr/lib64/libgalera_smm.so(gcs_core_open+0x8d)[0x7fb95e8d9e9d]
/usr/lib64/libgalera_smm.so(gcs_open+0x272)[0x7fb95e8ddf42]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM7connectERKSsS2_S2_+0x273)[0x7fb95e9277d3]
/usr/lib64/libgalera_smm.so(galera_connect+0xa7)[0x7fb95e941df7]
/usr/sbin/mysqld(_Z23wsrep_start_replicationv+0xfa)[0x66b1ea]
/usr/sbin/mysqld(_Z18wsrep_init_startupb+0x65)[0x66bbe5]
/usr/sbin/mysqld[0x527103]
/usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x851)[0x52bee1]
/lib/libc.so.6(__libc_start_main+0xfd)[0x7fb95fe0ec8d]
/usr/sbin/mysqld[0x51f5d9]
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
130114 14:37:10 mysqld_safe mysqld from pid file /var/lib/mysql/PXC2.pid ended

PXC1 (192.168.1.201):

[mysqld]

datadir=/var/lib/mysql

wsrep_cluster_name = PXC
wsrep_cluster_address = gcomm://
wsrep_provider = /usr/lib64/libgalera_smm.so
wsrep_provider_options = 'gcache.name=pxc_cache;gcache.size=512M'
wsrep_sst_method = xtrabackup
wsrep_retry_autocommit = 2
wsrep_slave_threads = 6
wsrep_node_name = PXC1
wsrep_node_address = 192.168.1.201

wsrep_sst_auth=root

binlog_format = ROW
server-id = 201
innodb_locks_unsafe_for_binlog = 1
innodb_autoinc_lock_mode = 2
innodb_flush_log_at_trx_commit = 2

PXC2 (192.168.1.202):

[mysqld]

datadir=/var/lib/mysql

wsrep_cluster_name = PXC
wsrep_cluster_address = gcomm://192.168.1.201,192.168.1.202,192.168.1.203
wsrep_provider = /usr/lib64/libgalera_smm.so
wsrep_provider_options = 'gcache.name=pxc_cache;gcache.size=512M'
wsrep_sst_method = xtrabackup
wsrep_retry_autocommit = 2
wsrep_slave_threads = 6
wsrep_node_name = PXC2
wsrep_node_address = 192.168.1.202

wsrep_sst_auth=root

binlog_format = ROW
server-id = 202
innodb_locks_unsafe_for_binlog = 1
innodb_autoinc_lock_mode = 2
innodb_flush_log_at_trx_commit = 2

PXC3 (192.168.1.203):

[mysqld]

datadir=/var/lib/mysql

wsrep_cluster_name = PXC
wsrep_cluster_address = gcomm://192.168.1.201,192.168.1.202,192.168.1.203
wsrep_provider = /usr/lib64/libgalera_smm.so
wsrep_provider_options = 'gcache.name=pxc_cache;gcache.size=512M'
wsrep_sst_method = xtrabackup
wsrep_retry_autocommit = 2
wsrep_slave_threads = 6
wsrep_node_name = PXC3
wsrep_node_address = 192.168.1.203

wsrep_sst_auth=root

binlog_format = ROW
server-id = 203
innodb_locks_unsafe_for_binlog = 1
innodb_autoinc_lock_mode = 2
innodb_flush_log_at_trx_commit = 2

Teemu Ollakka (teemu-ollakka) wrote :

This is actually Galera bug, already freed pointer was dereferenced when constructing message for exception to be thrown.

Fix was pushed in lp:galera/2.x revision 133 (http://bazaar.launchpad.net/~codership/galera/2.x/revision/133)

Robert Navarro (crshman) wrote :

@Teemu any idea as to when this will get packaged up and pushed to the repos?

Changed in percona-xtradb-cluster:
status: New → Confirmed
importance: Undecided → High
milestone: none → 5.5.29-23.8
Changed in percona-xtradb-cluster:
milestone: 5.5.29-23.8 → 5.5.29-27.3.1
status: Confirmed → Fix Committed
milestone: 5.5.29-27.3.1 → 5.5.29-23.7.1
Changed in percona-xtradb-cluster:
status: Fix Committed → Fix Released
affects: codership-mysql → galera
Changed in galera:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers