Undetected state gap. Can't continue.

Bug #1334606 reported by Teemu Ollakka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Status tracked in 5.6
5.5
Fix Committed
High
Teemu Ollakka
5.6
Fix Released
High
Teemu Ollakka
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Fix Released
High
Unassigned
5.6
Fix Released
High
Unassigned

Bug Description

Nod that had been previously part of the cluster was started so that it first entered into non-primary component. For some reason local state was cleared on mysqld side, leading to undetected state gap error.

2014-06-26 10:49:37 6056 [Note] WSREP: Read nil XID from storage engines, skippi
ng position init
2014-06-26 10:49:37 6056 [Note] WSREP: wsrep_load(): loading provider library '/
home/vagrant/galera/local4/galera/lib/libgalera_smm.so'
2014-06-26 10:49:38 6056 [Note] WSREP: wsrep_load(): Galera 3.6dev(rXXXX) by Cod
ership Oy <email address hidden> loaded successfully.
2014-06-26 10:49:38 6056 [Note] WSREP: CRC-32C: using hardware acceleration.
2014-06-26 10:49:38 6056 [Note] WSREP: Found saved state: 2a7f5131-fc61-11e3-809
a-e6e6e4d4557c:0
2014-06-26 10:49:38 6056 [Note] WSREP: Passing config to GCS: base_host = 192.16
8.17.11; base_port = 4567; cert.log_conflicts = no; debug = no; evs.debug_log_ma
sk = 0x4; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.
info_log_mask = 0x4; evs.join_retrans_period = PT1S; evs.max_install_timeouts =
3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT
5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /hom
e/vagrant/galera/local4/mysql/var/; gcache.keep_pages_size = 0; gcache.mem_size
= 0; gcache.name = /home/vagrant/galera/local4/mysql/var//galera.cache; gcache.p
age_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.
fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_th
rottle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limi
t = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announ
ce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb =
 false; pc.npvo
2014-06-26 10:49:38 6056 [Note] WSREP: Service thread queue flushed.
2014-06-26 10:49:38 6056 [Note] WSREP: Assign initial position for certification
: 0, protocol version: -1
2014-06-26 10:49:38 6056 [Note] WSREP: wsrep_sst_grab()
2014-06-26 10:49:38 6056 [Note] WSREP: Start replication
2014-06-26 10:49:38 6056 [Note] WSREP: Setting initial position to 2a7f5131-fc61
-11e3-809a-e6e6e4d4557c:0
2014-06-26 10:49:38 6056 [Note] WSREP: protonet asio version 0
2014-06-26 10:49:38 6056 [Note] WSREP: Using CRC-32C (optimized) for message che
cksums.
2014-06-26 10:49:38 6056 [Note] WSREP: backend: asio
2014-06-26 10:49:38 6056 [Warning] WSREP: access file(gvwstate.dat) failed(No su
ch file or directory)
2014-06-26 10:49:38 6056 [Note] WSREP: restore pc from disk failed
2014-06-26 10:49:38 6056 [Note] WSREP: GMCast version 0
2014-06-26 10:49:38 6056 [Note] WSREP: (91a8037e, 'tcp://0.0.0.0:10041') listeni
ng at tcp://0.0.0.0:10041
2014-06-26 10:49:38 6056 [Note] WSREP: (91a8037e, 'tcp://0.0.0.0:10041') multica
st: , ttl: 1
2014-06-26 10:49:38 6056 [Note] WSREP: EVS version 0
2014-06-26 10:49:38 6056 [Note] WSREP: PC version 0
2014-06-26 10:49:38 6056 [Note] WSREP: gcomm: connecting to group 'my_wsrep_clus
ter', peer '192.168.17.13:10031'
2014-06-26 10:49:38 6056 [Note] WSREP: (91a8037e, 'tcp://0.0.0.0:10041') turning
 message relay requesting on, nonlive peers:
2014-06-26 10:49:38 6056 [Note] WSREP: declaring 918e2134 at tcp://192.168.17.13
:10031 stable
2014-06-26 10:49:38 6056 [Warning] WSREP: no nodes coming from prim view, prim n
ot possible
2014-06-26 10:49:38 6056 [Note] WSREP: gcomm: connected
2014-06-26 10:49:38 6056 [Note] WSREP: Changing maximum packet size to 64500, re
sulting msg size: 32636
2014-06-26 10:49:38 6056 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2014-06-26 10:49:38 6056 [Note] WSREP: Opened channel 'my_wsrep_cluster'
2014-06-26 10:49:38 6056 [Note] WSREP: New COMPONENT: primary = no, bootstrap =
no, my_idx = 1, memb_num = 2
2014-06-26 10:49:38 6056 [Note] WSREP: Waiting for SST to complete.
2014-06-26 10:49:38 6056 [Note] WSREP: Flow-control interval: [23, 23]
2014-06-26 10:49:38 6056 [Note] WSREP: Received NON-PRIMARY.
2014-06-26 10:49:38 6056 [Note] WSREP: New cluster view: global state: 2a7f5131-
fc61-11e3-809a-e6e6e4d4557c:0, view# -1: non-Primary, number of nodes: 2, my ind
ex: 1, protocol version -1
2014-06-26 10:49:38 6056 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-06-26 10:49:46 6056 [Note] WSREP: (91a8037e, 'tcp://0.0.0.0:10041') turning message relay requesting off
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 8e9b4e2a at tcp://192.168.17.11:10011 stable
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 918e2134 at tcp://192.168.17.13:10031 stable
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 91931b6f at tcp://192.168.17.12:10021 stable
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 91aae867 at tcp://192.168.17.12:10051 stable
2014-06-26 10:49:48 6056 [Note] WSREP: Node 8e9b4e2a state prim
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 8e9b4e2a at tcp://192.168.17.11:10011 stable
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 918e2134 at tcp://192.168.17.13:10031 stable
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 91931b6f at tcp://192.168.17.12:10021 stable
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 9199b15d at tcp://192.168.17.13:10091 stable
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 91a2005e at tcp://192.168.17.11:10071 stable
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 91a93081 at tcp://192.168.17.13:10061 stable
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 91aae867 at tcp://192.168.17.12:10051 stable
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 9455a28c at tcp://192.168.17.12:10081 stable
2014-06-26 10:49:49 6056 [Note] WSREP: Node 8e9b4e2a state prim
2014-06-26 10:49:49 6056 [Note] WSREP: save pc into disk
2014-06-26 10:49:49 6056 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 5, memb_num = 9
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: sent state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 1 (node3)
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 2 (node2)
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 3 (node9)
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 5 (node4)
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 6 (node6)
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 7 (node5)
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 8 (node8)
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 0 (node1)
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 4 (node7)
2014-06-26 10:49:49 6056 [Note] WSREP: Quorum results:
        version = 3,
        component = PRIMARY,
        conf_id = 2,
        members = 9/9 (joined/total),
        act_id = 0,
        last_appl. = -1,
        protocols = 1/5/3 (gcs/repl/appl),
        group UUID = 2a7f5131-fc61-11e3-809a-e6e6e4d4557c
2014-06-26 10:49:49 6056 [Note] WSREP: Flow-control interval: [48, 48]
2014-06-26 10:49:49 6056 [Note] WSREP: Restored state OPEN -> JOINED (0)
2014-06-26 10:49:49 6056 [Note] WSREP: New cluster view: global state: 2a7f5131-fc61-11e3-809a-e6e6e4d4557c:0, view# 3: Primary, number of nodes: 9, my index: 5, protocol version 3
2014-06-26 10:49:49 6056 [ERROR] WSREP: Undetected state gap. Can't continue.
2014-06-26 10:49:49 6056 [ERROR] WSREP: WSREP: Group state: 2a7f5131-fc61-11e3-809a-e6e6e4d4557c:0
2014-06-26 10:49:49 6056 [ERROR] WSREP: WSREP: Local state: 00000000-0000-0000-0000-000000000000:-1
2014-06-26 10:49:49 6056 [ERROR] Aborting

Revision history for this message
Teemu Ollakka (teemu-ollakka) wrote :
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1012

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.