After crash the node refuses to start in SYNC

Bug #908021 reported by Vadim Tkachenko on 2011-12-23
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Galera
Medium
Teemu Ollakka

Bug Description

The master crashed ( report will be in different bug)
and I am trying to start it, but it can't start in SYNCED

/usr/sbin/mysqld
111223 1:47:27 [Note] Flashcache bypass: disabled
111223 1:47:27 [Note] Flashcache setup error is : ioctl failed

111223 1:47:27 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/libgalera_smm.so'
111223 1:47:27 [Note] WSREP: wsrep_load(): Galera 2.0beta(r99) by Codership Oy <email address hidden> loaded succesfully.
111223 1:47:27 [Note] WSREP: Reusing existing '/mnt/data//galera.cache'.
111223 1:47:27 [Note] WSREP: Passing config to GCS: gcache.dir = /mnt/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /mnt/data//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; replicator.commit_order = 3
111223 1:47:27 [Note] WSREP: wsrep_sst_grab()
111223 1:47:27 [Note] WSREP: Start replication
111223 1:47:27 [Note] WSREP: Found saved state: a11cef9b-2d31-11e1-0800-695615454002:0
111223 1:47:27 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1
111223 1:47:27 [Note] WSREP: Setting initial position to a11cef9b-2d31-11e1-0800-695615454002:0
111223 1:47:27 [Note] WSREP: protonet asio version 0
111223 1:47:27 [Note] WSREP: backend: asio
111223 1:47:27 [Note] WSREP: GMCast version 0
111223 1:47:27 [Note] WSREP: (fa996d02-2d31-11e1-0800-e95116f8b4c3, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
111223 1:47:27 [Note] WSREP: (fa996d02-2d31-11e1-0800-e95116f8b4c3, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
111223 1:47:27 [Note] WSREP: EVS version 0
111223 1:47:27 [Note] WSREP: PC version 0
111223 1:47:27 [Note] WSREP: gcomm: connecting to group 'trimethylxanthine', peer ''
111223 1:47:27 [Note] WSREP: GMCast::handle_stable_view: view(view_id(PRIM,fa996d02-2d31-11e1-0800-e95116f8b4c3,1) memb {
        fa996d02-2d31-11e1-0800-e95116f8b4c3,
} joined {
} left {
} partitioned {
})
111223 1:47:27 [Note] WSREP: gcomm: connected
111223 1:47:27 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
111223 1:47:27 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
111223 1:47:27 [Note] WSREP: Opened channel 'trimethylxanthine'
111223 1:47:27 [Note] WSREP: New COMPONENT: primary = yes, my_idx = 0, memb_num = 1
111223 1:47:27 [Note] WSREP: Waiting for SST to complete.
111223 1:47:27 [Note] WSREP: STATE_EXCHANGE: sent state UUID: fa9a2969-2d31-11e1-0800-c6c36efcc325
111223 1:47:27 [Note] WSREP: STATE EXCHANGE: sent state msg: fa9a2969-2d31-11e1-0800-c6c36efcc325
111223 1:47:27 [Note] WSREP: STATE EXCHANGE: got state msg: fa9a2969-2d31-11e1-0800-c6c36efcc325 from 0 (node1)
111223 1:47:27 [Note] WSREP: Quorum results:
        version = 2,
        component = PRIMARY,
        conf_id = 0,
        members = 1/1 (joined/total),
        act_id = 0,
        last_appl. = -1,
        protocols = 0/2/1 (gcs/repl/appl),
        group UUID = a11cef9b-2d31-11e1-0800-695615454002
111223 1:47:27 [Note] WSREP: Flow-control interval: [8, 16]
111223 1:47:27 [Note] WSREP: Restored state OPEN -> JOINED (0)
111223 1:47:27 [Note] WSREP: Member 0 (node1) synced with group.
111223 1:47:27 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 0)
111223 1:47:27 [Note] WSREP: New cluster view: global state: a11cef9b-2d31-11e1-0800-695615454002:0, view# 1: Primary, number of nodes: 1, my index: 0, protocol version 1
111223 1:47:27 [Note] WSREP: SST complete, seqno: 0
111223 1:47:27 [Note] Plugin 'FEDERATED' is disabled.
111223 1:47:27 InnoDB: The InnoDB memory heap is disabled
111223 1:47:27 InnoDB: Mutexes and rw_locks use GCC atomic builtins
111223 1:47:27 InnoDB: Compressed tables use zlib 1.2.3
111223 1:47:27 InnoDB: Using Linux native AIO
111223 1:47:27 InnoDB: Initializing buffer pool, size = 128.0M
111223 1:47:27 InnoDB: Completed initialization of buffer pool
111223 1:47:27 InnoDB: highest supported file format is Barracuda.
InnoDB: The log sequence number in ibdata files does not match
InnoDB: the log sequence number in the ib_logfiles!
111223 1:47:27 InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
111223 1:47:27 InnoDB: Waiting for the background threads to start
111223 1:47:28 Percona XtraDB (http://www.percona.com) 1.1.8-20.1 started; log sequence number 1605425
111223 1:47:28 [Note] Event Scheduler: Loaded 0 events
111223 1:47:28 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.17' socket: '/var/lib/mysql/mysql.sock' port: 3306 Percona XtraDB Cluster (GPL), Release alpha22.1, Revision 3673 wsrep_22.3.r3673
111223 1:47:28 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
111223 1:47:28 [Note] WSREP: Assign initial position for certification: 0, protocol version: 1
111223 1:47:28 [Note] WSREP: Synchronized with group, ready for connections
111223 1:47:28 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
111223 1:47:29 [Note] WSREP: evs::msg{version=0,type=1,user_type=255,order=4,seq=0,seq_range=0,aru_seq=-1,flags=0,source=fa996d02-2d31-11e1-0800-e95116f8b4c3,source_view_id=view_id(REG,6e8a535a-2d31-11e1-0800-c9012ea14f6c,16),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=16,node_list=()
} 64
111223 1:47:29 [ERROR] WSREP: exception caused by message: evs::msg{version=0,type=3,user_type=255,order=1,seq=0,seq_range=-1,aru_seq=0,flags=4,source=6e8a535a-2d31-11e1-0800-c9012ea14f6c,source_view_id=view_id(REG,6e8a535a-2d31-11e1-0800-c9012ea14f6c,16),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=337,node_list=()
}
111223 1:47:29 [ERROR] WSREP: state after handling message: evs::proto(evs::proto(fa996d02-2d31-11e1-0800-e95116f8b4c3, OPERATIONAL, view_id(REG,6e8a535a-2d31-11e1-0800-c9012ea14f6c,16)), OPERATIONAL) {
current_view=view(view_id(REG,6e8a535a-2d31-11e1-0800-c9012ea14f6c,16) memb {
        6e8a535a-2d31-11e1-0800-c9012ea14f6c,
        fa996d02-2d31-11e1-0800-e95116f8b4c3,
} joined {
} left {
} partitioned {
}),
input_map=evs::input_map: {aru_seq=0,safe_seq=0,node_index=node: {idx=0,range=[1,0],safe_seq=0} node: {idx=1,range=[1,0],safe_seq=0} ,msg_index= (1,0),evs::msg{version=0,type=1,user_type=255,order=4,seq=0,seq_range=0,aru_seq=-1,flags=0,source=fa996d02-2d31-11e1-0800-e95116f8b4c3,source_view_id=view_id(REG,6e8a535a-2d31-11e1-0800-c9012ea14f6c,16),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=16,node_list=()
}
,recovery_index= (0,0),evs::msg{version=0,type=1,user_type=255,order=4,seq=0,seq_range=0,aru_seq=-1,flags=4,source=6e8a535a-2d31-11e1-0800-c9012ea14f6c,source_view_id=view_id(REG,6e8a535a-2d31-11e1-0800-c9012e
111223 1:47:29 [ERROR] WSREP: exception from gcomm, backend must be restarted:msg_state.get_to_seq() == max_to_seq: fa996d02-2d31-11e1-0800-e95116f8b4c3 node fa996d02-2d31-11e1-0800-e95116f8b4c3 to seq not consistent with local state: max to seq 39 msg state to seq 2 (FATAL)
         at gcomm/src/pc_proto.cpp:validate_state_msgs():463
111223 1:47:29 [Note] WSREP: Received self-leave message.
111223 1:47:29 [Note] WSREP: Flow-control interval: [0, 0]
111223 1:47:29 [Note] WSREP: Received SELF-LEAVE. Closing connection.
111223 1:47:29 [Note] WSREP: Shifting SYNCED -> CLOSED (TO: 0)
111223 1:47:29 [Note] WSREP: RECV thread exiting 0: Success
111223 1:47:29 [Note] WSREP: New cluster view: global state: a11cef9b-2d31-11e1-0800-695615454002:0, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version 1
111223 1:47:29 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
111223 1:47:29 [Note] WSREP: applier thread exiting (code:0)

When I try to shutdown it, it hung in
111223 1:49:26 [Note] /usr/sbin/mysqld: Normal shutdown

111223 1:49:26 [Note] WSREP: Stop replication
111223 1:49:26 [Note] WSREP: Closing send monitor...
111223 1:49:26 [Note] WSREP: Closed send monitor.

config is:

[mysqld]
datadir=/mnt/data
user=mysql

binlog_format=ROW

wsrep_provider=/usr/lib64/libgalera_smm.so

wsrep_cluster_address=gcomm://

wsrep_slave_threads=2
wsrep_cluster_name=trimethylxanthine
wsrep_sst_method=rsync
wsrep_node_name=node1

innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2

affects: codership-mysql → galera
Alex Yurchenko (ayurchen) wrote :

This is a consequence of lp:908025 - Node B didn't crash and was trying to reconnect to Node A. However Node B was in non-primary component at the moment, whereas Node A was explicitly primary and in this particular case it would be desirable that Node B obeys Node A.

Changed in galera:
importance: Undecided → Medium
milestone: none → 22.1.2
status: New → Confirmed
assignee: nobody → Teemu Ollakka (teemu-ollakka)
Changed in galera:
milestone: 23.1.2 → 23.2.1
Changed in galera:
milestone: 23.2.1 → 24.2.5
milestone: 24.2.5 → none
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers