Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

xtradb cluster instance crashed in openshift origin 1.2

Bug #1615397 reported by Cameron Braid on 2016-08-21

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	New	Undecided	Unassigned

Bug Description

Sorry - I'm not sure what information you want here. So I have pasted the log file. The stacktrace is at the bottom.

The environment is running in a openshift origin cluster, v1.2

Host is CentOS Linux release 7.2.1511 (Core)

Using a custom container (extract from dockerfile):
FROM ubuntu:precise
...
RUN echo "deb http://repo.percona.com/apt precise main" >> /etc/apt/sources.list.d/percona.list
RUN echo "deb-src http://repo.percona.com/apt precise main" >> /etc/apt/sources.list.d/percona.list
RUN apt-key adv --keyserver keys.gnupg.net --recv-keys 1C4CBDCDCD2EFD2A
RUN apt-get update && apt-get upgrade -y
RUN apt-get install -y percona-xtradb-cluster-56 qpress xtrabackup python-software-properties vim wget curl netcat telnet less dnsutils iputils-ping

The datadir in the container uses a host path volume type in origin.

The volume is on a ext4 filesystem on the host.

I'm working on a staging cluster so am currently prototyping. I have seen this error a few times. Restarting the container results in the same error. The only way to resolve it is to rm -f the data dir then restart the container (which triggers a SST)

I'll hang onto the datadir so that I can run any further debugging with it if you direct me to.

Here is the log :

exec mysqld --log-error=error.log --wsrep-node-name=xtradb-node02 --wsrep_start_position=84871112-57c2-11e6-b954-bf80bb6ad885:505206 --wsrep_sst_donor=10.128.0.9
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fe4694e436d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): Connection ID (thread ID): 1
Status: NOT_KILLED

You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
2016-08-21 15:18:16 0 [Warning] option 'wsrep_max_ws_rows': unsigned value 1310720 adjusted to 1048576
2016-08-21 15:18:16 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2016-08-21 15:18:16 0 [Note] mysqld (mysqld 5.6.30-76.3-56) starting as process 1 ...
2016-08-21 15:18:16 1 [Note] WSREP: Read nil XID from storage engines, skipping position init
2016-08-21 15:18:16 1 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so'
2016-08-21 15:18:16 1 [Note] WSREP: wsrep_load(): Galera 3.16(r5c765eb) by Codership Oy <email address hidden> loaded successfully.
2016-08-21 15:18:16 1 [Note] WSREP: CRC-32C: using hardware acceleration.
2016-08-21 15:18:16 1 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
2016-08-21 15:18:16 1 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.128.1.3; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false
2016-08-21 15:18:16 1 [Note] WSREP: Service thread queue flushed.
2016-08-21 15:18:16 1 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2016-08-21 15:18:16 1 [Note] WSREP: wsrep_sst_grab()
2016-08-21 15:18:16 1 [Note] WSREP: Start replication
2016-08-21 15:18:16 1 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2016-08-21 15:18:16 1 [Note] WSREP: protonet asio version 0
2016-08-21 15:18:16 1 [Note] WSREP: Using CRC-32C for message checksums.
2016-08-21 15:18:16 1 [Note] WSREP: backend: asio
2016-08-21 15:18:16 1 [Note] WSREP: gcomm thread scheduling priority set to other:0
2016-08-21 15:18:16 1 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2016-08-21 15:18:16 1 [Note] WSREP: restore pc from disk failed
2016-08-21 15:18:16 1 [Note] WSREP: GMCast version 0
2016-08-21 15:18:16 1 [Warning] WSREP: Failed to resolve tcp://xtradb-node02:4567
2016-08-21 15:18:16 1 [Note] WSREP: (7c144311, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2016-08-21 15:18:16 1 [Note] WSREP: (7c144311, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2016-08-21 15:18:16 1 [Note] WSREP: EVS version 0
2016-08-21 15:18:16 1 [Note] WSREP: gcomm: connecting to group 'xtradb', peer 'xtradb-node01:,xtradb-node02:,xtradb-node03:'
2016-08-21 15:18:16 1 [Note] WSREP: (7c144311, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2016-08-21 15:18:17 1 [Note] WSREP: declaring 600ebe01 at tcp://10.128.0.9:4567 stable
2016-08-21 15:18:17 1 [Note] WSREP: declaring 95940ff9 at tcp://10.128.2.8:4567 stable
2016-08-21 15:18:17 1 [Note] WSREP: Node 600ebe01 state prim
2016-08-21 15:18:17 1 [Note] WSREP: view(view_id(PRIM,600ebe01,43) memb {
600ebe01,0
7c144311,0
95940ff9,0
} joined {
} left {
} partitioned {
})
2016-08-21 15:18:17 1 [Note] WSREP: save pc into disk
2016-08-21 15:18:17 1 [Note] WSREP: gcomm: connected
2016-08-21 15:18:17 1 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2016-08-21 15:18:17 1 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2016-08-21 15:18:17 1 [Note] WSREP: Opened channel 'xtradb'
2016-08-21 15:18:17 1 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
2016-08-21 15:18:17 1 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2016-08-21 15:18:17 1 [Note] WSREP: Waiting for SST to complete.
2016-08-21 15:18:17 1 [Note] WSREP: STATE EXCHANGE: sent state msg: 7c64dc2a-67b2-11e6-973c-b3cd3d13b2e2
2016-08-21 15:18:17 1 [Note] WSREP: STATE EXCHANGE: got state msg: 7c64dc2a-67b2-11e6-973c-b3cd3d13b2e2 from 0 (xtradb-node01)
2016-08-21 15:18:17 1 [Note] WSREP: STATE EXCHANGE: got state msg: 7c64dc2a-67b2-11e6-973c-b3cd3d13b2e2 from 2 (xtradb-node03)
2016-08-21 15:18:17 1 [Note] WSREP: STATE EXCHANGE: got state msg: 7c64dc2a-67b2-11e6-973c-b3cd3d13b2e2 from 1 (xtradb-node02)
2016-08-21 15:18:17 1 [Note] WSREP: Quorum results:
version = 4,
component = PRIMARY,
conf_id = 42,
members = 2/3 (joined/total),
act_id = 516900,
last_appl. = -1,
protocols = 0/7/3 (gcs/repl/appl),
group UUID = 84871112-57c2-11e6-b954-bf80bb6ad885
2016-08-21 15:18:17 1 [Note] WSREP: Flow-control interval: [28, 28]
2016-08-21 15:18:17 1 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 516900)
2016-08-21 15:18:17 1 [Note] WSREP: State transfer required:
Group state: 84871112-57c2-11e6-b954-bf80bb6ad885:516900
Local state: 00000000-0000-0000-0000-000000000000:-1
2016-08-21 15:18:17 1 [Note] WSREP: New cluster view: global state: 84871112-57c2-11e6-b954-bf80bb6ad885:516900, view# 43: Primary, number of nodes: 3, my index: 1, protocol version 3
2016-08-21 15:18:17 1 [Warning] WSREP: Gap in state sequence. Need state transfer.
2016-08-21 15:18:17 1 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.128.1.3' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '1' '' '
WSREP_SST: [INFO] Streaming with xbstream (20160821 15:18:17.656)
WSREP_SST: [INFO] Using socat as streamer (20160821 15:18:17.658)
WSREP_SST: [INFO] Stale sst_in_progress file: /var/lib/mysql//sst_in_progress (20160821 15:18:17.662)
WSREP_SST: [INFO] Evaluating timeout -k 1810 1800 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20160821 15:18:17.687)
2016-08-21 15:18:17 1 [Note] WSREP: Prepared SST request: xtrabackup-v2|10.128.1.3:4444/xtrabackup_sst//1
2016-08-21 15:18:17 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-08-21 15:18:17 1 [Note] WSREP: REPL Protocols: 7 (3, 2)
2016-08-21 15:18:17 1 [Note] WSREP: Service thread queue flushed.
2016-08-21 15:18:17 1 [Note] WSREP: Assign initial position for certification: 516900, protocol version: 3
2016-08-21 15:18:17 1 [Note] WSREP: Service thread queue flushed.
2016-08-21 15:18:17 1 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (84871112-57c2-11e6-b954-bf80bb6ad885): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():507. IST will be unavailable.
2016-08-21 15:18:17 1 [Warning] WSREP: Member 1.0 (xtradb-node02) requested state transfer from '10.128.0.9', but it is impossible to select State Transfer donor: No route to host
2016-08-21 15:18:17 1 [ERROR] WSREP: Requesting state transfer failed: -113(No route to host)
2016-08-21 15:18:17 1 [ERROR] WSREP: State transfer request failed unrecoverably: 113 (No route to host). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2016-08-21 15:18:17 1 [Note] WSREP: Closing send monitor...
2016-08-21 15:18:17 1 [Note] WSREP: Closed send monitor.
2016-08-21 15:18:17 1 [Note] WSREP: gcomm: terminating thread
2016-08-21 15:18:17 1 [Note] WSREP: gcomm: joining thread
2016-08-21 15:18:17 1 [Note] WSREP: gcomm: closing backend
2016-08-21 15:18:17 1 [Note] WSREP: view(view_id(NON_PRIM,600ebe01,43) memb {
7c144311,0
} joined {
} left {
} partitioned {
600ebe01,0
95940ff9,0
})
2016-08-21 15:18:17 1 [Note] WSREP: view((empty))
2016-08-21 15:18:17 1 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2016-08-21 15:18:17 1 [Note] WSREP: gcomm: closed
2016-08-21 15:18:17 1 [Note] WSREP: Flow-control interval: [16, 16]
2016-08-21 15:18:17 1 [Note] WSREP: Received NON-PRIMARY.
2016-08-21 15:18:17 1 [Note] WSREP: Shifting PRIMARY -> OPEN (TO: 516900)
2016-08-21 15:18:17 1 [Note] WSREP: Received self-leave message.
2016-08-21 15:18:17 1 [Note] WSREP: Flow-control interval: [0, 0]
2016-08-21 15:18:17 1 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2016-08-21 15:18:17 1 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 516900)
2016-08-21 15:18:17 1 [Note] WSREP: RECV thread exiting 0: Success
2016-08-21 15:18:17 1 [Note] WSREP: recv_thread() joined.
2016-08-21 15:18:17 1 [Note] WSREP: Closing replication queue.
2016-08-21 15:18:17 1 [Note] WSREP: Closing slave action queue.
2016-08-21 15:18:17 1 [Note] WSREP: mysqld: Terminated.
15:18:17 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=0
read_buffer_size=131072
max_used_connections=0
max_threads=10002
thread_count=2
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3984923 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f54d4000990
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f54f405ea60 thread_stack 0x40000
mysqld(my_print_stacktrace+0x2e)[0x90943e]
mysqld(handle_fatal_signal+0x494)[0x69c384]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f557db85cb0]
/lib/x86_64-linux-gnu/libc.so.6(abort+0xd7)[0x7f557cfdf6f7]
/usr/lib/galera3/libgalera_smm.so(+0x72fa8)[0x7f54efc10fa8]
/usr/lib/galera3/libgalera_smm.so(+0x1ee91f)[0x7f54efd8c91f]
/usr/lib/galera3/libgalera_smm.so(_ZN6galera13ReplicatorSMM18send_state_requestEPKNS0_12StateRequestEb+0x376)[0x7f54efd9c3e6]
/usr/lib/galera3/libgalera_smm.so(_ZN6galera13ReplicatorSMM22request_state_transferEPvRK10wsrep_uuidlPKvl+0xdd)[0x7f54efd9da1d]
/usr/lib/galera3/libgalera_smm.so(_ZN6galera13ReplicatorSMM19process_conf_changeEPvRK15wsrep_view_infoiNS_10Replicator5StateEl+0x84c)[0x7f54efd9247c]
/usr/lib/galera3/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x4eb)[0x7f54efd6d5bb]
/usr/lib/galera3/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x57)[0x7f54efd6ea27]
/usr/lib/galera3/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x7b)[0x7f54efd9173b]
/usr/lib/galera3/libgalera_smm.so(galera_recv+0x1d)[0x7f54efda0d8d]
mysqld[0x5e1d01]
mysqld(start_wsrep_THD+0x2f8)[0x5c7288]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f557db7de9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f557d09936d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): Connection ID (thread ID): 1
Status: NOT_KILLED

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1921

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.