Master crashed when slave IST failed
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Galera |
Fix Released
|
Critical
|
Teemu Ollakka |
Bug Description
This bug led to https:/
The setup is node A in EC2 west-US, the node B is in EC2 west-EU.
Due to misconfiguration the nodeB has no proper IST config.
NodeA config:
[mysqld]
datadir=/mnt/data
user=mysql
binlog_format=ROW
wsrep_provider=
wsrep_cluster_
wsrep_slave_
wsrep_cluster_
wsrep_sst_
wsrep_node_
innodb_
innodb_
NodeB config:
[mysqld]
datadir=/mnt/data
user=mysql
binlog_format=ROW
wsrep_provider=
wsrep_cluster_
wsrep_sst_
wsrep_slave_
wsrep_cluster_
wsrep_sst_
wsrep_node_
innodb_
innodb_
Now when we start NodeB and it tries to do IST, I have a crashed NodeA.
0x00007fb9c7e80905 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007fb9c7e80905 in raise () from /lib64/libc.so.6
#1 0x00007fb9c7e820e5 in abort () from /lib64/libc.so.6
#2 0x00007fb9c62c7a7d in __gnu_cxx:
#3 0x00007fb9c62c5c06 in ?? () from /usr/lib64/
#4 0x00007fb9c62c5c33 in std::terminate() () from /usr/lib64/
#5 0x00007fb9c62c5c46 in ?? () from /usr/lib64/
#6 0x00007fb9c62c52d3 in __cxa_call_
#7 0x00007fb9c6b607e3 in galera:
req_
#8 0x00007fb9c6b35f1c in galera:
at galera/
#9 0x00007fb9c6b366e8 in galera:
at galera/
#10 0x00007fb9c6b533ad in galera:
at galera/
#11 0x00007fb9c6b67893 in galera_recv (gh=<value optimized out>, recv_ctx=<value optimized out>)
at galera/
#12 0x000000000058c9eb in wsrep_replicati
#13 0x000000000051abc3 in start_wsrep_THD ()
#14 0x00007fb9c8cac7e1 in start_thread () from /lib64/
#15 0x00007fb9c7f3377d in clone () from /lib64/libc.so.6
Log from NodeA
111223 1:58:34 [Note] WSREP: GMCast:
} joined {
} left {
} partitioned {
})
111223 1:58:34 [Note] WSREP: New COMPONENT: primary = yes, my_idx = 1, memb_num = 2
111223 1:58:34 [Note] WSREP: declaring 872108f6-
111223 1:58:34 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
111223 1:58:35 [Note] WSREP: STATE EXCHANGE: sent state msg: 8806c568-
111223 1:58:35 [Note] WSREP: STATE EXCHANGE: got state msg: 8806c568-
111223 1:58:35 [Note] WSREP: STATE EXCHANGE: got state msg: 8806c568-
111223 1:58:35 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 3,
members = 1/2 (joined/total),
act_id = 3,
last_appl. = 0,
protocols = 0/2/1 (gcs/repl/appl),
group UUID = b5d62c24-
111223 1:58:35 [Note] WSREP: Flow-control interval: [12, 23]
111223 1:58:35 [Note] WSREP: New cluster view: global state: b5d62c24-
111223 1:58:35 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
111223 1:58:35 [Note] WSREP: Assign initial position for certification: 3, protocol version: 1
111223 1:58:37 [Note] WSREP: Node 0 (node2) requested state transfer from '*any*'. Selected 1 (node1)(SYNCED) as donor.
111223 1:58:37 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 3)
111223 1:58:37 [Note] WSREP: IST request: b5d62c24-
111223 1:58:37 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
111223 1:58:37 [Note] WSREP: Running: 'wsrep_sst_rsync 'donor' 'ec2-176-
111223 1:58:37 [Note] WSREP: sst_donor_thread signaled with 0
terminate called after throwing an instance of 'boost:
what(): Network is unreachable
111223 1:59:36 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.
key_buffer_
read_buffer_
max_used_
max_threads=151
thread_count=3
connection_count=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x7fb9a0000990
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fb9b5bf9e78 thread_stack 0x40000
/usr/sbin/
/usr/sbin/
/lib64/
/lib64/
/lib64/
111223 1:59:36 [Warning] WSREP: last inactive check more than PT1.5S ago, skipping check
/usr/lib64/
/usr/lib64/
/usr/lib64/
/usr/lib64/
/usr/lib64/
/usr/lib64/
/usr/lib64/
/usr/lib64/
/usr/lib64/
/usr/lib64/
/usr/sbin/
/usr/sbin/
/lib64/
/lib64/
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query ((nil)): is an invalid pointer
Connection ID (thread ID): 2
Status: NOT_KILLED
The manual page at http://
information that should help you find out what is causing the crash.
Log from NodeB
111223 1:58:33 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/
111223 1:58:33 [Note] WSREP: wsrep_load(): Galera 2.0beta(r99) by Codership Oy <email address hidden> loaded succesfully.
111223 1:58:33 [Note] WSREP: Reusing existing '/mnt/data/
111223 1:58:33 [Note] WSREP: Passing config to GCS: gcache.dir = /mnt/data/; gcache.
111223 1:58:33 [Note] WSREP: wsrep_sst_grab()
111223 1:58:33 [Note] WSREP: Start replication
111223 1:58:33 [Note] WSREP: Found saved state: b5d62c24-
111223 1:58:33 [Note] WSREP: Assign initial position for certification: 1, protocol version: -1
111223 1:58:33 [Note] WSREP: Setting initial position to b5d62c24-
111223 1:58:33 [Note] WSREP: protonet asio version 0
111223 1:58:33 [Note] WSREP: backend: asio
111223 1:58:33 [Note] WSREP: GMCast version 0
111223 1:58:33 [Note] WSREP: (872108f6-
111223 1:58:33 [Note] WSREP: (872108f6-
111223 1:58:33 [Note] WSREP: EVS version 0
111223 1:58:33 [Note] WSREP: PC version 0
111223 1:58:33 [Note] WSREP: gcomm: connecting to group 'trimethylxanth
111223 1:58:34 [Note] WSREP: GMCast:
} joined {
} left {
} partitioned {
})
111223 1:58:34 [Note] WSREP: declaring b5d57d4e-
111223 1:58:34 [Note] WSREP: gcomm: connected
111223 1:58:34 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
111223 1:58:34 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
111223 1:58:34 [Note] WSREP: Opened channel 'trimethylxanthine'
111223 1:58:34 [Note] WSREP: New COMPONENT: primary = yes, my_idx = 0, memb_num = 2
111223 1:58:34 [Note] WSREP: Waiting for SST to complete.
111223 1:58:34 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 8806c568-
111223 1:58:34 [Note] WSREP: STATE EXCHANGE: sent state msg: 8806c568-
111223 1:58:34 [Note] WSREP: STATE EXCHANGE: got state msg: 8806c568-
111223 1:58:34 [Note] WSREP: STATE EXCHANGE: got state msg: 8806c568-
111223 1:58:34 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 3,
members = 1/2 (joined/total),
act_id = 3,
last_appl. = -1,
protocols = 0/2/1 (gcs/repl/appl),
group UUID = b5d62c24-
111223 1:58:34 [Note] WSREP: Flow-control interval: [12, 23]
111223 1:58:34 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 3)
111223 1:58:34 [Note] WSREP: New cluster view: global state: b5d62c24-
111223 1:58:34 [Warning] WSREP: Gap in state sequence. Need state transfer.
111223 1:58:37 [Note] WSREP: Running: 'wsrep_sst_rsync 'joiner' 'ec2-176-
111223 1:58:37 [Note] WSREP: Prepared SST request: rsync|ec2-
111223 1:58:37 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
111223 1:58:37 [Note] WSREP: Assign initial position for certification: 3, protocol version: 1
111223 1:58:37 [Note] WSREP: prepared IST receiver, listening in: tcp://10.
111223 1:58:37 [Note] WSREP: Node 0 (node2) requested state transfer from '*any*'. Selected 1 (node1)(SYNCED) as donor.
111223 1:58:37 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 3)
111223 1:58:37 [Note] WSREP: Requesting state transfer: success, donor: 1
111223 1:58:39 [Note] WSREP: SST complete, seqno: 1
111223 1:58:39 [Note] Plugin 'FEDERATED' is disabled.
111223 1:58:39 InnoDB: The InnoDB memory heap is disabled
111223 1:58:39 InnoDB: Mutexes and rw_locks use GCC atomic builtins
111223 1:58:39 InnoDB: Compressed tables use zlib 1.2.3
111223 1:58:39 InnoDB: Using Linux native AIO
111223 1:58:39 InnoDB: Initializing buffer pool, size = 128.0M
111223 1:58:39 InnoDB: Completed initialization of buffer pool
111223 1:58:39 InnoDB: highest supported file format is Barracuda.
111223 1:58:39 InnoDB: Waiting for the background threads to start
111223 1:58:40 Percona XtraDB (http://
111223 1:58:40 [Note] Event Scheduler: Loaded 0 events
111223 1:58:40 [Note] WSREP: Signalling provider to continue.
111223 1:58:40 [Note] WSREP: Received SST: b5d62c24-
111223 1:58:40 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.17' socket: '/var/lib/
111223 1:58:40 [Note] WSREP: SST finished: b5d62c24-
111223 1:58:40 [Note] WSREP: Receiving IST: 2 writesets, seqnos 1-3
111223 1:58:40 [Note] WSREP: (872108f6-
111223 1:58:41 [Note] WSREP: (872108f6-
111223 1:58:42 [Note] WSREP: evs::proto(
111223 1:58:43 [Note] WSREP: evs::proto(
111223 1:58:43 [Note] WSREP: evs::proto(
111223 1:58:44 [Note] WSREP: evs::proto(
111223 1:58:44 [Note] WSREP: evs::proto(
111223 1:58:45 [Note] WSREP: evs::proto(
111223 1:58:45 [Note] WSREP: evs::proto(
111223 1:58:46 [Note] WSREP: evs::proto(
111223 1:58:46 [Note] WSREP: evs::proto(
111223 1:58:47 [Note] WSREP: evs::proto(
111223 1:58:47 [Note] WSREP: evs::proto(
111223 1:58:48 [Note] WSREP: evs::proto(
111223 1:58:48 [Note] WSREP: evs::proto(
111223 1:58:49 [Note] WSREP: evs::proto(
111223 1:58:49 [Note] WSREP: evs::proto(
111223 1:58:50 [Note] WSREP: evs::proto(
111223 1:58:50 [Note] WSREP: evs::proto(
111223 1:58:51 [Note] WSREP: evs::proto(
111223 1:58:51 [Note] WSREP: evs::proto(
111223 1:58:52 [Note] WSREP: evs::proto(
111223 1:58:52 [Note] WSREP: evs::proto(
111223 1:58:52 [Note] WSREP: New COMPONENT: primary = no, my_idx = 0, memb_num = 1
111223 1:58:52 [Note] WSREP: Flow-control interval: [8, 16]
111223 1:58:52 [Note] WSREP: Received NON-PRIMARY.
111223 1:58:52 [Note] WSREP: Shifting JOINER -> OPEN (TO: 3)
111223 1:58:52 [Note] WSREP: GMCast:
} joined {
} left {
} partitioned {
})
111223 1:58:52 [Note] WSREP: New COMPONENT: primary = no, my_idx = 0, memb_num = 1
111223 1:58:52 [Note] WSREP: Flow-control interval: [8, 16]
111223 1:58:52 [Note] WSREP: Received NON-PRIMARY.
111223 1:58:52 [Note] WSREP: GMCast:
} joined {
} left {
} partitioned {
})
111223 2:00:01 [Note] WSREP: (872108f6-
affects: | codership-mysql → galera |
Changed in galera: | |
assignee: | nobody → Teemu Ollakka (teemu-ollakka) |
importance: | Undecided → Critical |
milestone: | none → 22.2.0beta |
Changed in galera: | |
milestone: | 22.2.0beta → 23.2.0 |
Changed in galera: | |
status: | New → In Progress |
Changed in galera: | |
status: | Fix Committed → Fix Released |
Fix pushed in lp:galera/2.x revision 107