gu_abort in gcs_core_recv during SST leads to dangling child processes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Galera |
Invalid
|
Undecided
|
Unassigned | ||
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC |
Confirmed
|
Critical
|
Unassigned |
Bug Description
Pasting verbatim from https:/
=======
140316 2:58:09 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 235864)
140316 2:58:09 [Note] WSREP: Requesting state transfer: success, donor: 0
140316 2:58:09 [Warning] WSREP: 0 (Arch1): State transfer to 1 (Arch2) failed: -2 (No such file or directory)
140316 2:58:09 [ERROR] WSREP: gcs/src/
140316 2:58:09 [Note] WSREP: gcomm: terminating thread
140316 2:58:09 [Note] WSREP: gcomm: joining thread
140316 2:58:09 [Note] WSREP: gcomm: closing backend
140316 2:58:10 [Note] WSREP: view(view_
} joined {
} left {
} partitioned {
})
140316 2:58:10 [Note] WSREP: view((empty))
140316 2:58:10 [Note] WSREP: gcomm: closed
140316 2:58:10 [Note] WSREP: /pxcd/bin/mysqld: Terminated.
This is an issue in gcs of galera. It aborts the process without doing any shutdown of mysqld (not even ungraceful shutdown) with gu_abort.
void
gu_abort (void)
{
/* avoid coredump */
struct rlimit core_limits = { 0, 0 };
setrlimit (RLIMIT_CORE, &core_limits);
/* restore default SIGABRT handler */
signal (SIGABRT, SIG_DFL);
#if defined(
gu_info ("%s: Terminated.", GU_SYS_
#else
gu_info ("Program terminated.");
#endif
abort();
}
This also ensures that no signal is sent to child processes
(wsrep_sst_*), so they keep running.
It gu_aborts in gcs_core_recv
if (ret < 0) {
assert (recv_act->id < 0);
if (GCS_ACT_TORDERED == recv_act->act.type && recv_act->act.buf) {
}
if (-ENOTRECOVERABLE == ret) {
}
(gdb) bt
#0 0x00007ffff637c389 in raise () from /usr/lib/libc.so.6
#1 0x00007ffff637d788 in abort () from /usr/lib/libc.so.6
#2 0x00007ffff5389477 in gu_abort () at galerautils/
#3 0x00007ffff546c2a2 in gcs_core_recv (conn=0x11d3000, recv_act=
#4 0x00007ffff5470d6c in gcs_recv_thread (arg=0x11d2de0) at gcs/src/gcs.c:1116
#5 0x00007ffff736f0a2 in start_thread () from /usr/lib/
#6 0x00007ffff642cd1d in clone () from /usr/lib/libc.so.6
(gdb) quit
A debugging session is active.
Changed in percona-xtradb-cluster: | |
status: | New → Invalid |
status: | Invalid → Confirmed |
importance: | Undecided → Critical |
Related comment https:/ /bugs.launchpad .net/percona- xtradb- cluster/ +bug/1284245/ comments/ 14