Server crashes when wsrep_cluster_address is changed and wsrep_sst_method is not mysqldump

Bug #600250 reported by Alex Yurchenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Fix Released
Undecided
Alex Yurchenko

Bug Description

Current code leads to a bad crash when wsrep_cluster_address is changed, because one thread is trying to initialize wsrep provider while another tries to shutdown mysqld because non-mysqldump snapshot is not supported when storage engines are initialized.

Error log:
100629 16:21:14 [Note] WSREP: Stop replication
100629 16:21:14 [Note] WSREP: Closing send monitor...
100629 16:21:14 [Note] WSREP: Closed send monitor.
100629 16:21:14 [Note] WSREP: gcomm: terminating thread
100629 16:21:14 [Note] WSREP: gcomm: joining thread
100629 16:21:14 [Note] WSREP: gcomm: closing backend
100629 16:21:14 [Note] WSREP: New COMPONENT: primary = no, my_idx = 0, memb_num = 1
100629 16:21:14 [Warning] WSREP: socket in state 0
100629 16:21:14 [Note] WSREP: gcomm: closed
100629 16:21:14 [Note] WSREP: Flow-control interval: [0, 1]
100629 16:21:14 [Note] WSREP: Received NON-PRIMARY.
100629 16:21:14 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 565558)
100629 16:21:14 [Note] WSREP: Received self-leave message.
100629 16:21:14 [Note] WSREP: Flow-control interval: [-6917529027641081856, -9223372036854775808
]
100629 16:21:14 [Note] WSREP: Received SELF-LEAVE. Closing connection.
100629 16:21:14 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 565558)
100629 16:21:14 [Note] WSREP: RECV thread exiting 0: Success
100629 16:21:14 [Note] WSREP: recv_thread() joined.
100629 16:21:14 [Note] WSREP: Closing slave action queue.
100629 16:21:14 [Note] WSREP: Closed GCS connection
100629 16:21:14 [Note] WSREP: gcs_recv() returned -77 (File descriptor in bad state)
...
100629 16:21:14 [Note] wsrep recv thread exiting (code:5)
100629 16:21:14 [Note] WSREP starting shutdown
...
100629 16:21:14 [Note] /tmp/galera/mysql/libexec/mysqld: Normal shutdown

100629 16:21:14 [Note] WSREP: gcs_recv() returned -77 (File descriptor in bad state)
100629 16:21:14 [Note] WSREP: Stop replication
...
100629 16:21:14 [Note] wsrep recv thread exiting (code:5)
100629 16:21:14 [Note] WSREP: mm_galera_recv(): return 0
100629 16:21:14 [Note] wsrep recv thread exiting (code:0)
100629 16:21:16 [Note] WSREP: rollbacker thread exiting
100629 16:21:16 [Note] WSREP: Shifting CLOSED -> DESTROYED (TO: 565558)
...
*** glibc detected *** /tmp/galera/mysql/libexec/mysqld: corrupted double-linked list: 0x000000000ddf3370 ***
100629 16:21:16 [Note] WSREP:
100629 16:21:16 [Note] WSREP: Start replication
100629 16:21:16 [Note] WSREP: Provider options: log_debug = 0; persistent_writesets = 0; local_cache_size = 20971520; dbug_spec = ;
100629 16:21:16 [Note] WSREP: Configured state: 7cbe6e26-83c6-11df-0800-ef472a8480ef:565558

Changing wsrep_cluster_address should be supported unless we really need to take snapshot after it (in case when we want to get out of split brain, we don't).

Suggested solution: mimic successful SST (since we have state UUID and state seqno) and rely on provider logic to discover that the current state UUID and seqno don't correspond to group's.

Changed in codership-mysql:
assignee: nobody → Alex Yurchenko (ayurchen)
milestone: none → 0.8
status: New → Confirmed
Revision history for this message
Alex Yurchenko (ayurchen) wrote :

duplicate of lp:711993

Changed in codership-mysql:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.