wsrep_sst_xtrabackup-v2 doesn't stop when mysql is SIGKILLed

Bug #1380697 reported by Sergii Golovatiuk on 2014-10-13
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Fix Released
Medium
Raghavendra D Prabhu
5.6
Fix Released
Medium
Raghavendra D Prabhu

Bug Description

I see that all xtrabackup related services are present on system, though mysql process was killed by pacemaker. Also, MySQL cannot inititiate a new transfer as port 4444 is already utilized

How to reproduce

Run SST between the nodes
kill -9 mysqld

Expected
ps aux | egrep 'socat|wsrep_sst_xtrabackup-v2|xbstream'

Got
ps aux | egrep 'socat|wsrep_sst_xtrabackup-v2|xbstream'
mysql 7036 0.0 0.0 4408 612 ? S 15:18 0:00 sh -c wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.0.4' --auth 'wsrep_sst:******' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '7019' ''
mysql 7037 0.0 0.0 9712 1740 ? S 15:18 0:00 /bin/bash -ue /usr//bin/wsrep_sst_xtrabackup-v2 --role joiner --address 192.168.0.4 --auth wsrep_sst:password --datadir /var/lib/mysql/ --defaults-file /etc/my.cnf --parent 7019
mysql 7323 0.0 0.0 26220 1768 ? S 15:18 0:00 socat -u TCP-LISTEN:4444,reuseaddr,nodelay,sndbuf=1048576,rcvbuf=1048576 stdio
mysql 7324 0.0 0.0 84696 1852 ? S 15:18 0:00 xbstream -x

As a temporary fix you may kill all processes in question and re initiate SST.

Related branches

Vladimir Kuklin (vkuklin) wrote :

I see xtrabackup issuing select sys calls on the fd that is not open according to /proc file system

here is an output snippet:

root@node-11:~# ps aux | grep mysql
mysql 7036 0.0 0.0 4408 612 ? S 15:18 0:00 sh -c wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.0.4' --auth 'wsrep_sst:password' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '7019' ''
mysql 7037 0.0 0.0 9712 1740 ? S 15:18 0:00 /bin/bash -ue /usr//bin/wsrep_sst_xtrabackup-v2 --role joiner --address 192.168.0.4 --auth wsrep_sst:password --datadir /var/lib/mysql/ --defaults-file /etc/my.cnf --parent 7019
mysql 7323 0.0 0.0 26220 1768 ? S 15:18 0:00 socat -u TCP-LISTEN:4444,reuseaddr,nodelay,sndbuf=1048576,rcvbuf=1048576 stdio
mysql 7324 0.0 0.0 84696 1852 ? S 15:18 0:00 xbstream -x
root 19155 0.0 0.0 12708 1940 ? S 15:42 0:00 /bin/bash /usr/lib/ocf/resource.d/mirantis/mysql-wss start
root 21767 0.0 0.0 9396 928 pts/3 S+ 15:45 0:00 grep --color=auto mysql
root@node-11:~# netstat -ntlp | grep 7323
root@node-11:~# ps aux | grep socat
mysql 7323 0.0 0.0 26220 1768 ? S 15:18 0:00 socat -u TCP-LISTEN:4444,reuseaddr,nodelay,sndbuf=1048576,rcvbuf=1048576 stdio
root 22040 0.0 0.0 9396 928 pts/3 S+ 15:46 0:00 grep --color=auto socat
root@node-11:~# netstat -nlp | grep 4444
root@node-11:~# ps aux | grpe 4444
No command 'grpe' found, did you mean:
 Command 'grape' from package 'groovy' (universe)
 Command 'grpn' from package 'grpn' (universe)
 Command 'gpre' from package 'firebird2.5-super' (universe)
 Command 'gpre' from package 'firebird2.1-classic' (universe)
 Command 'gpre' from package 'firebird2.5-classic-common' (universe)
 Command 'gpre' from package 'firebird2.1-super' (universe)
 Command 'grep' from package 'grep' (main)
 Command 'gvpe' from package 'gvpe' (universe)
 Command 'nrpe' from package 'nagios-nrpe-server' (main)
grpe: command not found
root@node-11:~# strace -f -p 7323
Process 7323 attached - interrupt to quit
select(12, [11], [], [], NULL^C <unfinished ...>
Process 7323 detached
root@node-11:~# cat /proc/7323/fd
fd/ fdinfo/
root@node-11:~# cat /proc/7323/fd
fd/ fdinfo/
root@node-11:~# cat /proc/7323/fd/
0 1 10 11 14 15 2 3 5 6 7 8 9

As you can see, xtrabackup is running select() syscall on non-existing socket

Alexey Kopytov (akopytov) wrote :

From the symptoms it looks like the correct bug summary should be "wsrep_sst_xtrabackup-v2 doesn't stop when mysql is killed". I'm also fairly sure it also applies to other SST methods, not just the XtraBackup one.

Changing the summary and reassigning to the Percona XtraDB Cluster project.

summary: - xtrabackup doesn't stop when mysql is killed
+ wsrep_sst_xtrabackup-v2 doesn't stop when mysql is killed
affects: percona-xtrabackup → percona-xtradb-cluster

When " kill -9 mysqld" is done, the process doesn't get an opportunity to cleanup the children and grandchildren.

If you need to kill the enter process group of mysqld/mysqld_safe and all its children, then you need to run:

 kill -9 -$(ps --no-headers -o pgid -C mysqld_safe)

Note the '-' before pid, it is to send SIGKILL to entire process group.

summary: - wsrep_sst_xtrabackup-v2 doesn't stop when mysql is killed
+ wsrep_sst_xtrabackup-v2 doesn't stop when mysql is SIGKILLed
Alexey Kopytov (akopytov) wrote :

As discussed elsewhere, we can use prctl(PR_SET_DEATHSIG, ...) which is a Linux specific call instructing the kernel to send a specified signal to the current process when a parent process dies for whatever reasons.

With some modifications to the SST code, we can make the SST process cleanup its children when the parent process (mysqld) dies.

Can be reproduced and seems can also be fixed, so changed to "confirmed".

Alexey Kopytov (akopytov) wrote :

This is not a duplicate of bug #1382797. It's just that fixing bug #1382797 is a prerequisite to fixing this one.

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1113

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers