sometimes IST does not start
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Galera |
Fix Released
|
High
|
Teemu Ollakka |
Bug Description
Reported in https:/
- node1 : named cygnus is the node which was stopped and that is now coming back as Joiner and should received properly IST
- node2 : named laurane is the node which was choosed as Donor.
node1's shutdown was initiated at 14:46:47 as the log shows.
node1's log :
120419 14:30:02 [Note] WSREP: MDL conflict -> BF abort
request: (2 seqno 5154113 mode 1 Qstate 1 cmd 0 8 TRUNCATE TABLE poller_time)
granted: (583918 seqno 5154114 mode 0 Qstate 1 cmd 3 5 INSERT INTO poller_time (poller_id, pid, start_time, end_time) VALUES (0, 19630, NOW()
120419 14:46:47 [Note] /opt/mysql-
120419 14:46:47 [Note] WSREP: Stop replication
120419 14:46:47 [Note] WSREP: Closing send monitor...
120419 14:46:47 [Note] WSREP: Closed send monitor.
120419 14:46:47 [Note] WSREP: gcomm: terminating thread
120419 14:46:47 [Note] WSREP: gcomm: joining thread
120419 14:46:47 [Note] WSREP: gcomm: closing backend
120419 14:46:47 [Note] WSREP: evs::proto(
120419 14:46:47 [Note] WSREP: evs::proto(
120419 14:46:47 [Note] WSREP: GMCast:
} joined {
} left {
} partitioned {
})
120419 14:46:47 [Note] WSREP: New COMPONENT: primary = no, my_idx = 0, memb_num = 1
120419 14:46:47 [Note] WSREP: GMCast:
120419 14:46:47 [Note] WSREP: gcomm: closed
120419 14:46:47 [Note] WSREP: Flow-control interval: [8, 16]
120419 14:46:47 [Note] WSREP: Received NON-PRIMARY.
120419 14:46:47 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 5160086)
120419 14:46:47 [Note] WSREP: Received self-leave message.
120419 14:46:47 [Note] WSREP: Flow-control interval: [0, 0]
120419 14:46:47 [Note] WSREP: Received SELF-LEAVE. Closing connection.
120419 14:46:47 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 5160086)
120419 14:46:47 [Note] WSREP: RECV thread exiting 0: Success
120419 14:46:47 [Note] WSREP: recv_thread() joined.
120419 14:46:47 [Note] WSREP: Closing slave action queue.
120419 14:46:47 [Note] WSREP: New cluster view: global state: 61408137-
120419 14:46:48 [Note] WSREP: New cluster view: global state: 61408137-
120419 14:46:48 [Note] WSREP: applier thread exiting (code:0)
120419 14:46:49 [Note] WSREP: SST kill local trx: 588248
120419 14:46:49 [Note] WSREP: rollbacker thread exiting
120419 14:46:49 [Note] Event Scheduler: Purging the queue. 0 events
120419 14:46:49 [Note] WSREP: dtor state: CLOSED
120419 14:46:49 [Note] WSREP: mon: entered 3 oooe fraction 0 oool fraction 0
120419 14:46:49 [Note] WSREP: mon: entered 3 oooe fraction 0 oool fraction 0
120419 14:46:49 [Note] WSREP: mon: entered 930739 oooe fraction 0 oool fraction 0.000114962
120419 14:46:49 [Note] WSREP: cert index usage at exit 804
120419 14:46:49 [Note] WSREP: cert trx map usage at exit 309
120419 14:46:49 [Note] WSREP: deps set usage at exit 0
120419 14:46:49 [Note] WSREP: avg deps dist 330.854
120419 14:46:49 [Note] WSREP: wsdb trx map usage 0 conn query map usage 0
120419 14:46:49 [Note] WSREP: Shifting CLOSED -> DESTROYED (TO: 5160086)
120419 14:46:49 [Note] WSREP: Flushing memory map to disk...
120419 14:46:50 [Note] unregister_
120419 14:46:50 InnoDB: Starting shutdown...
120419 14:46:53 InnoDB: Shutdown completed; log sequence number 12275710641
120419 14:46:53 [Note] /opt/mysql-
120419 14:46:54 mysqld_safe mysqld from pid file /opt/mysql-
120419 14:47:38 mysqld_safe Starting mysqld daemon with databases from /opt/mysql-
120419 14:47:38 [Note] WSREP: wsrep_load(): loading provider library '/opt/mysql-
120419 14:47:38 [Note] WSREP: wsrep_load(): Galera 2.0(rXXXX) by Codership Oy <email address hidden> loaded succesfully.
120419 14:47:38 [Note] WSREP: Reusing existing '/opt/mysql-
120419 14:47:38 [Note] WSREP: Passing config to GCS: gcache.dir = /opt/mysql-
120419 14:47:38 [Note] WSREP: wsrep_sst_grab()
120419 14:47:38 [Note] WSREP: Start replication
120419 14:47:38 [Note] WSREP: Found saved state: 61408137-
120419 14:47:38 [Note] WSREP: Assign initial position for certification: 5160086, protocol version: -1
120419 14:47:38 [Note] WSREP: Setting initial position to 61408137-
120419 14:47:38 [Note] WSREP: protonet asio version 0
120419 14:47:38 [Note] WSREP: backend: asio
120419 14:47:38 [Note] WSREP: GMCast version 0
120419 14:47:38 [Note] WSREP: (d86affd3-
120419 14:47:38 [Note] WSREP: (d86affd3-
120419 14:47:38 [Note] WSREP: EVS version 0
120419 14:47:38 [Note] WSREP: PC version 0
120419 14:47:38 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '192.168.0.5:4567'
120419 14:47:39 [Note] WSREP: GMCast:
} joined {
} left {
} partitioned {
})
120419 14:47:39 [Note] WSREP: declaring c61c9f05-
120419 14:47:39 [Note] WSREP: declaring d2e4fb87-
120419 14:47:39 [Note] WSREP: gcomm: connected
120419 14:47:39 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
120419 14:47:39 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
120419 14:47:39 [Note] WSREP: Opened channel 'my_wsrep_cluster'
120419 14:47:39 [Note] WSREP: New COMPONENT: primary = yes, my_idx = 2, memb_num = 3
120419 14:47:39 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
120419 14:47:39 [Note] WSREP: Waiting for SST to complete.
120419 14:47:39 [Note] WSREP: STATE EXCHANGE: sent state msg: d911f3b9-
120419 14:47:39 [Note] WSREP: STATE EXCHANGE: got state msg: d911f3b9-
120419 14:47:39 [Note] WSREP: STATE EXCHANGE: got state msg: d911f3b9-
120419 14:47:39 [Note] WSREP: STATE EXCHANGE: got state msg: d911f3b9-
120419 14:47:39 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 63,
members = 2/3 (joined/total),
act_id = 5160427,
last_appl. = -1,
protocols = 0/3/1 (gcs/repl/appl),
group UUID = 61408137-
120419 14:47:39 [Note] WSREP: Flow-control interval: [14, 28]
120419 14:47:39 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 5160427)
120419 14:47:39 [Note] WSREP: State transfer required:
Group state: 61408137-
Local state: 61408137-
120419 14:47:39 [Note] WSREP: New cluster view: global state: 61408137-
120419 14:47:39 [Warning] WSREP: Gap in state sequence. Need state transfer.
120419 14:47:41 [Note] WSREP: Running: 'wsrep_sst_rsync 'joiner' '192.168.0.1' 'sst:5T13wPid' '/opt/mysql-
120419 14:47:42 [Note] WSREP: Prepared SST request: rsync|192.
ERROR 2003 (HY000): Can't connect to MySQL server on 'cygnus' (111)
120419 14:47:42 [Note] WSREP: Assign initial position for certification: 5160427, protocol version: 2
120419 14:47:42 [Note] WSREP: Prepared IST receiver, listening at: tcp://192.
120419 14:47:42 [Note] WSREP: Node 2 (cygnus.
120419 14:47:42 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 5160434)
120419 14:47:42 [Note] WSREP: Requesting state transfer: success, donor: 0
120419 14:47:44 [Note] WSREP: SST complete, seqno: 5160086
120419 14:47:44 InnoDB: The InnoDB memory heap is disabled
120419 14:47:44 InnoDB: Mutexes and rw_locks use GCC atomic builtins
120419 14:47:44 InnoDB: Compressed tables use zlib 1.2.3
120419 14:47:44 InnoDB: Using Linux native AIO
120419 14:47:44 InnoDB: Initializing buffer pool, size = 512.0M
120419 14:47:44 InnoDB: Completed initialization of buffer pool
120419 14:47:44 InnoDB: highest supported file format is Barracuda.
120419 14:47:45 InnoDB: Waiting for the background threads to start
120419 14:47:46 InnoDB: 1.1.8 started; log sequence number 12275710641
120419 14:47:47 [Note] Event Scheduler: Loaded 0 events
120419 14:47:47 [Note] WSREP: Signalling provider to continue.
120419 14:47:47 [Note] WSREP: Received SST: 61408137-
120419 14:47:47 [Note] /opt/mysql-
Version: '5.5.20-log' socket: '/opt/mysql-
120419 14:47:47 [Note] WSREP: SST received: 61408137-
120419 14:47:47 [Note] WSREP: Receiving IST: 341 writesets, seqnos 5160086-5160427
node2's log :
120419 10:50:02 [Note] WSREP: MDL conflict -> BF abort
request: (1 seqno 5080636 mode 1 Qstate 1 cmd 0 8 TRUNCATE TABLE poller_time)
granted: (346474 seqno 5080637 mode 0 Qstate 1 cmd 3 5 INSERT INTO poller_time (poller_id, pid, start_time, end_time) VALUES (0, 14193, NOW()
120419 14:46:47 [Note] WSREP: GMCast:
} joined {
} left {
} partitioned {
})
120419 14:46:47 [Note] WSREP: New COMPONENT: primary = yes, my_idx = 0, memb_num = 2
120419 14:46:47 [Note] WSREP: forgetting 2d6df99a-
120419 14:46:47 [Note] WSREP: declaring d2e4fb87-
120419 14:46:47 [Note] WSREP: STATE_EXCHANGE: sent state UUID: ba05ef61-
120419 14:46:47 [Note] WSREP: STATE EXCHANGE: sent state msg: ba05ef61-
120419 14:46:47 [Note] WSREP: STATE EXCHANGE: got state msg: ba05ef61-
120419 14:46:47 [Note] WSREP: STATE EXCHANGE: got state msg: ba05ef61-
120419 14:46:47 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 62,
members = 2/2 (joined/total),
act_id = 5160086,
last_appl. = 5159779,
protocols = 0/3/1 (gcs/repl/appl),
group UUID = 61408137-
120419 14:46:47 [Note] WSREP: Flow-control interval: [12, 23]
120419 14:46:47 [Note] WSREP: New cluster view: global state: 61408137-
120419 14:46:48 [Note] WSREP: Assign initial position for certification: 5160086, protocol version: 2
120419 14:46:52 [Note] WSREP: cleaning up 2d6df99a-
120419 14:47:39 [Note] WSREP: GMCast:
} joined {
} left {
} partitioned {
})
120419 14:47:39 [Note] WSREP: New COMPONENT: primary = yes, my_idx = 0, memb_num = 3
120419 14:47:39 [Note] WSREP: declaring d2e4fb87-
120419 14:47:39 [Note] WSREP: declaring d86affd3-
120419 14:47:39 [Note] WSREP: STATE_EXCHANGE: sent state UUID: d911f3b9-
120419 14:47:39 [Note] WSREP: STATE EXCHANGE: sent state msg: d911f3b9-
120419 14:47:39 [Note] WSREP: STATE EXCHANGE: got state msg: d911f3b9-
120419 14:47:39 [Note] WSREP: STATE EXCHANGE: got state msg: d911f3b9-
120419 14:47:39 [Note] WSREP: STATE EXCHANGE: got state msg: d911f3b9-
120419 14:47:39 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 63,
members = 2/3 (joined/total),
act_id = 5160427,
last_appl. = 5160161,
protocols = 0/3/1 (gcs/repl/appl),
group UUID = 61408137-
120419 14:47:39 [Note] WSREP: Flow-control interval: [14, 28]
120419 14:47:39 [Note] WSREP: New cluster view: global state: 61408137-
120419 14:47:40 [Note] WSREP: Assign initial position for certification: 5160427, protocol version: 2
120419 14:47:42 [Note] WSREP: Node 2 (cygnus.
120419 14:47:42 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 5160434)
120419 14:47:42 [Note] WSREP: IST request: 61408137-
120419 14:47:43 [Note] WSREP: Running: 'wsrep_sst_rsync 'donor' '192.168.
120419 14:47:43 [Note] WSREP: sst_donor_thread signaled with 0
120419 14:47:43 [Note] WSREP: async IST sender starting to serve tcp://192.
node1's netstat :
# netstat -lpvnt | grep -iE "(3306|4567|4568)"
tcp 0 0 0.0.0.0:4567 0.0.0.0:* LISTEN 6751/mysqld
tcp 0 0 192.168.0.1:3306 0.0.0.0:* LISTEN 6751/mysqld
node2's netstat :
# netstat -lpvnt | grep -iE "(3306|4567|4568)"
tcp 0 0 192.168.0.5:3306 0.0.0.0:* LISTEN 23714/mysqld
tcp 0 0 0.0.0.0:4567 0.0.0.0:* LISTEN 23714/mysqld
Related branches
Changed in galera: | |
importance: | Undecided → High |
milestone: | none → 23.2.1 |
status: | New → Confirmed |
Changed in galera: | |
status: | Fix Committed → Fix Released |
Just out of curiosity: It seems Galera is trying to do both IST and SST at the same time? Is this normal or part of the bug?