SST fails with misleading error and crash
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MySQL patches by Codership |
Fix Released
|
Low
|
Alex Yurchenko |
Bug Description
When sst fails because wsrep_sst_rsync is not found,
we have next error.log:
110710 9:36:49 [Note] WSREP: New cluster view: group UUID: 8d0cd8e8-
110710 9:36:49 [Warning] WSREP: Gap in state sequence. Need state transfer.
110710 9:36:51 [Note] WSREP: Running: 'wsrep_sst_rsync 'joiner' '10.11.12.206' 'root:' '/data/bench/gal/' '21614' 2>sst.err'
110710 9:36:51 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync 'joiner' '10.11.12.206' 'root:' '/data/bench/gal/' '21614' 2>sst.err
Read: '(null)'
110710 9:36:51 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync 'joiner' '10.11.12.206' 'root:' '/data/bench/gal/' '21614' 2>sst.err: 127 (Key has expired)
110710 9:36:51 [ERROR] WSREP: SST failed: 127 (Key has expired)
110710 9:36:51 [ERROR] Aborting
110710 9:36:51 [Note] WSREP: Closing send monitor...
110710 9:36:51 [ERROR] WSREP: SST preparation failed: 127 (Key has expired)
110710 9:36:51 [Note] WSREP: Closed send monitor.
110710 9:36:51 [Warning] WSREP: wsrep_notify_cmd is not defined, skipping notification.
110710 9:36:51 [Note] WSREP: gcomm: terminating thread
110710 9:36:51 [ERROR] WSREP: View callback failed: 127 (Key has expired. This is unrecoverable, restart required.
110710 9:36:51 [Note] WSREP: Closing send monitor...
110710 9:36:51 [Note] WSREP: Closed send monitor.
110710 9:36:51 [Note] WSREP: gcomm: joining thread
110710 9:36:51 [Note] WSREP: gcomm: closing backend
110710 9:36:51 [Note] WSREP: evs::proto(
110710 9:36:51 [Note] WSREP: GMCast:
} joined {
} left {
} partitioned {
})
110710 9:36:51 [Note] WSREP: GMCast:
110710 9:36:51 [Note] WSREP: New COMPONENT: primary = no, my_idx = 0, memb_num = 1
110710 9:36:51 [Note] WSREP: gcomm: closed
110710 9:36:51 [Warning] WSREP: gcomm: backend already closed
110710 9:36:51 [ERROR] WSREP: gcs/src/
110710 9:36:51 [Note] WSREP: recv_thread() joined.
110710 9:36:51 [Warning] WSREP: Broken shutdown sequence: GCS connection state is PRIMARY, expected CLOSED
110710 9:36:51 [Note] WSREP: Shifting PRIMARY -> CLOSED (TO: 0)
110710 9:36:51 [Note] WSREP: Closing slave action queue.
110710 9:36:51 [Note] WSREP: libexec/mysqld: Terminated.
110710 9:36:51 [Note] WSREP: Flow-control interval: [8, 16]
110710 9:36:51 [Note] WSREP: Received NON-PRIMARY.
Aborted (core dumped)
There are two problems:
1. Error message [ERROR] WSREP: SST preparation failed: 127 (Key has expired)
is misleading, as "Key has expired" is totally confusing
2. MySQL ended up with coredump, but not proper exit. I even do not think mysqld should stop,
as later we may try to connect to another cluster or using different SST method
affects: | galera → codership-mysql |
Changed in codership-mysql: | |
assignee: | nobody → Alex Yurchenko (ayurchen) |
importance: | Undecided → Low |
status: | New → Fix Committed |
milestone: | none → 0.8.1 |
Changed in codership-mysql: | |
status: | Fix Committed → Fix Released |
Vadim,
1. Error code comes from the pclose() system call. Not really sure what to do about it. It is displayed for reference, should it be hidden instead?
2. To begin with, mysqld has not really started yet. In general if SST preparation call failed, there is no certainty about mysql data dir state. It could be wiped out and further initialization might be impossible. But the real problem is that you know that you can't use rsync or xtrabackup (or anything but mysqldump) when server is initialized. So there is little hope for recovery later and it is really better to stop mysqld before it really started.
3. Did it really dump core?