If donor is killed during xtrabackup SST some processes still kept running

Bug #1138439 reported by Vadim Tkachenko
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
New
Undecided
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Fix Committed
Medium
Unassigned

Bug Description

If I kill donor during xtrabackup SST, the processes are still running:

on donor:

mysql 25318 0.0 0.0 106056 1488 pts/0 S 14:46 0:00 /bin/bash -ue /usr//bin/wsrep_sst_xtrabackup --role donor --address 208.88.225.240:4444/xtrabackup_sst --auth (null) --socket /var/lib/mysql/mysql.sock --datadir /mnt/data/mysql/ --defaults-file /etc/my.cnf --gtid 866edaf2-7fab-11e2-0800-68035b54c6d2:1644544
mysql 25334 0.1 0.1 147788 11460 pts/0 S 14:46 0:00 perl /usr//bin/innobackupex --galera-info --tmpdir=/tmp --stream=tar --defaults-file=/etc/my.cnf --socket=/var/lib/mysql/mysql.sock /tmp
mysql 25335 22.9 0.0 7540 620 pts/0 S 14:46 0:16 nc 208.88.225.240 4444
mysql 25373 36.1 0.5 242672 42272 pts/0 Sl 14:46 0:22 xtrabackup_55 --defaults-file=/etc/my.cnf --defaults-group=mysqld --backup --suspend-at-end --target-dir=/tmp --stream=tar

on joiner:
mysql 1110 0.0 0.0 106056 1504 pts/1 S 14:46 0:00 /bin/bash -ue /usr//bin/wsrep_sst_xtrabackup --role joiner --address 208.88.225.240 --auth --datadir /mnt/data/mysql/ --defaults-file /etc/my.cnf --parent 1085
mysql 1123 44.4 0.0 7540 624 pts/1 R 14:46 0:40 nc -dl 4444
mysql 1128 39.6 0.0 116000 1172 pts/1 S 14:46 0:36 tar xfi - -C /mnt/data/mysql/

problem with that that we can't restart DONOR and JOINER in clean way
and these process has to be killed manually.

handling these processes should be automatic.

Revision history for this message
Vadim Tkachenko (vadim-tk) wrote :

The problem here I think is that when mysqld dies it does not send a signal to child process
"wsrep_sst_xtrabackup", but it should.

So I add this bug to codership-mysql project

Revision history for this message
Vadim Tkachenko (vadim-tk) wrote :

Ok, partially this problem is related to
https://bugs.launchpad.net/percona-xtrabackup/+bug/1135441

where innobackupex should handle better failure of mysqld.

But to proper solve issue, the caller (mysqld) should send appropriate signal to child.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

I am not able to reproduce / have not seen it with donor, but can reproduce it with receiver.

..............
130306 17:06:16 [Warning] WSREP: Gap in state sequence. Need state transfer.
130306 17:06:18 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'joiner' --address '10.0.2.161' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '4600''
130306 17:06:18 [Note] WSREP: Prepared SST request: xtrabackup|10.0.2.161:4444/xtrabackup_sst
130306 17:06:18 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
130306 17:06:18 [Note] WSREP: Assign initial position for certification: 0, protocol version: 2
130306 17:06:18 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (e9f4b618-8651-11e2-0800-96555ca76cdb): 1 (Operation not permitted)
         at galera/src/replicator_str.cpp:prepare_for_IST():442. IST will be unavailable.
130306 17:06:18 [Note] WSREP: Node 0 (Bxc2) requested state transfer from '*any*'. Selected 1 (Bxc1)(SYNCED) as donor.
130306 17:06:18 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0)
130306 17:06:18 [Note] WSREP: Requesting state transfer: success, donor: 1
raghu Buu:/var/lib [338:137]% p wsrep
mysql 4620 0.0 0.0 4400 616 pts/0 S 17:06 0:00 sh -c wsrep_sst_xtrabackup --role 'joiner' --address '10.0.2.161' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '4600'
mysql 4621 0.0 0.1 12372 1596 pts/0 S 17:06 0:00 /bin/bash -ue /usr//bin/wsrep_sst_xtrabackup --role joiner --address 10.0.2.161 --auth --datadir /var/lib/mysql/ --defaults-file /etc/my.cnf --parent 4600
raghu 4656 0.0 0.0 9384 924 pts/0 S+ 17:06 0:00 grep -i --color wsrep
raghu Buu:/var/lib [343]% p inno
raghu 4658 0.0 0.0 9384 924 pts/0 S+ 17:06 0:00 grep -i --color inno
raghu Buu:/var/lib [344]% /usr//bin/wsrep_sst_common: line 94: /dev/stderr: Permission denied
.....................
.................

It dies after a while though, that 'Permission denied' is from that.

In case of donor, does it still continue or die after sometime?

Revision history for this message
Vadim Tkachenko (vadim-tk) wrote :

on donor it continues to run for unlimited time,
as what happens in this case - innobackupex dies, but it does not send signal to xtrabackup_55,
so we hit the bug https://bugs.launchpad.net/percona-xtrabackup/+bug/1135441

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Waiting for xtrabackup 2.0.7 to be released before testing further.

Changed in percona-xtradb-cluster:
status: New → Triaged
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Seems the push fix has been moved to PXB 2.0.8/2.1.4

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

No longer reproduceable.

Changed in percona-xtradb-cluster:
status: Triaged → Fix Committed
Changed in percona-xtradb-cluster:
importance: Undecided → Medium
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1054

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.