MySQL patches by Codership

when mysqld dies it leaves SST scripts running

Bug #1076816 reported by Vadim Tkachenko on 2012-11-09

This bug report is a duplicate of: Bug #1138439: If donor is killed during xtrabackup SST some processes still kept running. Edit Remove

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	MySQL patches by Codership	New	Undecided	Unassigned
	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	New	Undecided	Unassigned

Bug Description

I still observe cases when mysqld dies, but SST script and external utils are continuing.

i.e.
ps auxww
mysql 32618 0.0 0.0 106148 1348 ? S 15:19 0:00 /bin/bash -ue /usr/local/Percona-XtraDB-Cluster-5.5.28-23.7.366.Linux.x86_64//bin/wsrep_sst_xtrabackup --role joiner --address 10.4.101.225 --auth --datadir /mnt/data/mysql/ --defaults-file /etc/my.cnf --parent 32598
mysql 32631 0.0 0.0 7608 508 ? S 15:19 0:00 nc -dl 4444
mysql 32632 0.0 0.0 113948 936 ? S 15:19 0:00 tar xfi - -C /mnt/data/mysql/

but mysqld died:
===
121108 15:20:53 [Note] WSREP: Created page /mnt/data/mysql/gcache.page.000001 of size 134217728 bytes
121108 15:21:19 [ERROR] WSREP: We ran out of resources, seemingly because we've been unsuccessfully requesting state transfer for over 116 seconds. Please check that there is at least one fully synced member in the group. Application must be restarted.
121108 15:21:19 [ERROR] WSREP: State transfer request failed unrecoverably: 35 (Resource deadlock avoided). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
121108 15:21:19 [Note] WSREP: Closing send monitor...
121108 15:21:19 [Note] WSREP: Closed send monitor.
121108 15:21:19 [Note] WSREP: gcomm: terminating thread
121108 15:21:19 [Note] WSREP: gcomm: joining thread
121108 15:21:19 [Note] WSREP: gcomm: closing backend
121108 15:21:20 [Note] WSREP: view(view_id(NON_PRIM,4b185455-29f5-11e2-0800-fdd22c2474b0,17) memb {
        b9be24b9-29fa-11e2-0800-c5e7fbdecba2,
} joined {
} left {
} partitioned {
        4b185455-29f5-11e2-0800-fdd22c2474b0,
        c277d921-29e4-11e2-0800-a963478d2ef7,
})
121108 15:21:20 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
121108 15:21:20 [Note] WSREP: view((empty))
121108 15:21:20 [Note] WSREP: gcomm: closed
121108 15:21:20 [Note] WSREP: Flow-control interval: [16, 16]
121108 15:21:20 [Note] WSREP: Received NON-PRIMARY.
121108 15:21:20 [Note] WSREP: Shifting PRIMARY -> OPEN (TO: 17556706)
121108 15:21:20 [Note] WSREP: Received self-leave message.
121108 15:21:20 [Note] WSREP: Flow-control interval: [0, 0]
121108 15:21:20 [Note] WSREP: Received SELF-LEAVE. Closing connection.
121108 15:21:20 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 17556706)
121108 15:21:20 [Note] WSREP: RECV thread exiting 0: Success
121108 15:21:20 [Note] WSREP: recv_thread() joined.
121108 15:21:20 [Note] WSREP: Closing slave action queue.
121108 15:21:20 [Note] WSREP: /usr/local/Percona-XtraDB-Cluster-5.5.28-23.7.366.Linux.x86_64/bin/mysqld: Terminated.
121108 15:21:20 mysqld_safe mysqld from pid file /mnt/data/mysql/server-1352234799-az-1-region-a-geo-1.localdomain.pid ended
===

It would be good to have a way to kill SST script and nc + tar utils, as they are using network IP address and port,
and it is impossible to start node again without killing them manually.

Revision history for this message

Vadim Tkachenko (vadim-tk) wrote on 2012-11-09:

in fact, even I do clean mysqladmin shutdown,

if donor was running at that time, I still see running processes
mysql 21920 0.0 0.0 106092 1328 pts/2 S 17:05 0:00 /bin/bash -ue /usr/local/Percona-XtraDB-Cluster-5.5.28-23.7.366.Linux.x86_64//bin/wsrep_sst_xtrabackup --role donor --address 10.4.101.225:4444/xtrabackup_sst --auth (null) --socket /tmp/mysql.sock --datadir /mnt/data/mysql/ --defaults-file /etc/my.cnf --gtid 9db8f201-2858-11e2-0800-0532898e059e:19606001
mysql 21932 0.1 0.0 143228 10824 pts/2 S 17:05 0:00 perl /usr/bin/innobackupex --galera-info --tmpdir=/tmp --stream=tar --defaults-file=/etc/my.cnf --socket=/tmp/mysql.sock /tmp
mysql 21934 0.0 0.0 106092 1160 pts/2 S 17:05 0:00 sh -c mysql --defaults-file='/etc/my.cnf' --socket='/tmp/mysql.sock' --unbuffered -- >/tmp/jq644fvuFh 2>/tmp/FfogPFCicM
mysql 21935 0.0 0.0 197916 1888 pts/2 S 17:05 0:00 mysql --defaults-file=/etc/my.cnf --socket=/tmp/mysql.sock --unbuffered --

after mysqld was destroyed.

Revision history for this message

Vadim Tkachenko (vadim-tk) wrote on 2012-11-09:

Download full text (3.5 KiB)

Now, even JOINER fails, and DONOR informed about it, I still see the processes running on DONOR.

DONOR log:
121108 18:53:02 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'donor' --address '10.4.101.225:4444/xtrabackup_sst' --auth '(null)' --socket '/tmp/mysql.sock' --datadir '/mnt/data/mysql/' --defaults-file '/etc/my.cnf' --gtid '9db8f201-2858-11e2-0800-0532898e059e:19606001''
121108 18:53:02 [Note] WSREP: sst_donor_thread signaled with 0
121108 18:54:05 [Note] WSREP: view(view_id(PRIM,7d0187d8-2a18-11e2-0800-3ad438ce660f,3) memb {
        7d0187d8-2a18-11e2-0800-3ad438ce660f,
} joined {
} left {
} partitioned {
        91dab590-2a18-11e2-0800-66ce82790edc,
})
121108 18:54:05 [Note] WSREP: forgetting 91dab590-2a18-11e2-0800-66ce82790edc (tcp://10.4.101.225:4567)
121108 18:54:05 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
121108 18:54:05 [Note] WSREP: STATE_EXCHANGE: sent state UUID: b9cd2dba-2a18-11e2-0800-cc1626de4219
121108 18:54:05 [Note] WSREP: STATE EXCHANGE: sent state msg: b9cd2dba-2a18-11e2-0800-cc1626de4219
121108 18:54:05 [Note] WSREP: STATE EXCHANGE: got state msg: b9cd2dba-2a18-11e2-0800-cc1626de4219 from 0 (node1)
121108 18:54:05 [Note] WSREP: Quorum results:
        version = 2,
        component = PRIMARY,
        conf_id = 2,
        members = 1/1 (joined/total),
        act_id = 19606001,
        last_appl. = 0,
        protocols = 0/4/2 (gcs/repl/appl),
        group UUID = 9db8f201-2858-11e2-0800-0532898e059e
121108 18:54:05 [Note] WSREP: Flow-control interval: [16, 16]
121108 18:54:05 [Note] WSREP: New cluster view: global state: 9db8f201-2858-11e2-0800-0532898e059e:19606001, view# 3: Primary, number of nodes: 1, my index: 0, protocol version 2
121108 18:54:05 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
121108 18:54:05 [Note] WSREP: Assign initial position for certification: 19606001, protocol version: 2
121108 18:54:10 [Note] WSREP: cleaning up 91dab590-2a18-11e2-0800-66ce82790edc (tcp://10.4.101.225:4567)

====

DONOR processes:
root 2771 0.0 0.0 106228 1416 pts/2 S 18:52 0:00 /bin/sh /usr/local/mysql/bin/mysqld_safe
mysql 3121 0.5 5.5 12581992 918872 pts/2 Sl 18:52 0:01 /usr/local/Percona-XtraDB-Cluster-5.5.28-23.7.366.Linux.x86_64/bin/mysqld --basedir=/usr/local/Percona-XtraDB-Cluster-5.5.28-23.7.366.Linux.x86_64 --datadir=/mnt/data/mysql --plugin-dir=/usr/local/Percona-XtraDB-Cluster-5.5.28-23.7.366.Linux.x86_64/lib/mysql/plugin --user=mysql --log-error=/mnt/data/mysql/error.log --pid-file=/mnt/data/mysql/server-1352234832-az-1-region-a-geo-1.localdomain.pid
mysql 3160 0.0 0.0 106092 1324 pts/2 S 18:53 0:00 /bin/bash -ue /usr/local/Percona-XtraDB-Cluster-5.5.28-23.7.366.Linux.x86_64//bin/wsrep_sst_xtrabackup --role donor --address 10.4.101.225:4444/xtrabackup_sst --auth (null) --socket /tmp/mysql.sock --datadir /mnt/data/mysql/ --defaults-file /etc/my.cnf --gtid 9db8f201-2858-11e2-0800-0532898e059e:19606001
mysql 3172 0.1 0.0 143228 10824 pts/2 S 18:53 0:00 perl /usr/bin/innobackupex --galera-info --tmpdir=/tmp --stream=tar --defaults-file=/etc/my.cnf --socket=/tmp/mysql.soc...

Now, even JOINER fails, and DONOR informed about it, I still see the processes running on DONOR.

DONOR log:
121108 18:53:02 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'donor' --address '10.4.101.225:4444/xtrabackup_sst' --auth '(null)' --socket '/tmp/mysql.sock' --datadir '/mnt/data/mysql/' --defaults-file '/etc/my.cnf' --gtid '9db8f201-2858-11e2-0800-0532898e059e:19606001''
121108 18:53:02 [Note] WSREP: sst_donor_thread signaled with 0
121108 18:54:05 [Note] WSREP: view(view_id(PRIM,7d0187d8-2a18-11e2-0800-3ad438ce660f,3) memb {
        7d0187d8-2a18-11e2-0800-3ad438ce660f,
} joined {
} left {
} partitioned {
        91dab590-2a18-11e2-0800-66ce82790edc,
})
121108 18:54:05 [Note] WSREP: forgetting 91dab590-2a18-11e2-0800-66ce82790edc (tcp://10.4.101.225:4567)
121108 18:54:05 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
121108 18:54:05 [Note] WSREP: STATE_EXCHANGE: sent state UUID: b9cd2dba-2a18-11e2-0800-cc1626de4219
121108 18:54:05 [Note] WSREP: STATE EXCHANGE: sent state msg: b9cd2dba-2a18-11e2-0800-cc1626de4219
121108 18:54:05 [Note] WSREP: STATE EXCHANGE: got state msg: b9cd2dba-2a18-11e2-0800-cc1626de4219 from 0 (node1)
121108 18:54:05 [Note] WSREP: Quorum results:
        version    = 2,
        component  = PRIMARY,
        conf_id    = 2,
        members    = 1/1 (joined/total),
        act_id     = 19606001,
        last_appl. = 0,
        protocols  = 0/4/2 (gcs/repl/appl),
        group UUID = 9db8f201-2858-11e2-0800-0532898e059e
121108 18:54:05 [Note] WSREP: Flow-control interval: [16, 16]
121108 18:54:05 [Note] WSREP: New cluster view: global state: 9db8f201-2858-11e2-0800-0532898e059e:19606001, view# 3: Primary, number of nodes: 1, my index: 0, protocol version 2
121108 18:54:05 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
121108 18:54:05 [Note] WSREP: Assign initial position for certification: 19606001, protocol version: 2
121108 18:54:10 [Note] WSREP:  cleaning up 91dab590-2a18-11e2-0800-66ce82790edc (tcp://10.4.101.225:4567)

====

DONOR processes:
root      2771  0.0  0.0 106228  1416 pts/2    S    18:52   0:00 /bin/sh /usr/local/mysql/bin/mysqld_safe
mysql     3121  0.5  5.5 12581992 918872 pts/2 Sl   18:52   0:01 /usr/local/Percona-XtraDB-Cluster-5.5.28-23.7.366.Linux.x86_64/bin/mysqld --basedir=/usr/local/Percona-XtraDB-Cluster-5.5.28-23.7.366.Linux.x86_64 --datadir=/mnt/data/mysql --plugin-dir=/usr/local/Percona-XtraDB-Cluster-5.5.28-23.7.366.Linux.x86_64/lib/mysql/plugin --user=mysql --log-error=/mnt/data/mysql/error.log --pid-file=/mnt/data/mysql/server-1352234832-az-1-region-a-geo-1.localdomain.pid
mysql     3160  0.0  0.0 106092  1324 pts/2    S    18:53   0:00 /bin/bash -ue /usr/local/Percona-XtraDB-Cluster-5.5.28-23.7.366.Linux.x86_64//bin/wsrep_sst_xtrabackup --role donor --address 10.4.101.225:4444/xtrabackup_sst --auth (null) --socket /tmp/mysql.sock --datadir /mnt/data/mysql/ --defaults-file /etc/my.cnf --gtid 9db8f201-2858-11e2-0800-0532898e059e:19606001
mysql     3172  0.1  0.0 143228 10824 pts/2    S    18:53   0:00 perl /usr/bin/innobackupex --galera-info --tmpdir=/tmp --stream=tar --defaults-file=/etc/my.cnf --socket=/tmp/mysql.sock /tmp
mysql     3174  0.0  0.0 106092  1160 pts/2    S    18:53   0:00 sh -c mysql  --defaults-file='/etc/my.cnf' --socket='/tmp/mysql.sock' --unbuffered -- >/tmp/wxtn5N8ejt 2>/tmp/gD1CmG_htH
mysql     3175  0.0  0.0 197916  1888 pts/2    S    18:53   0:00 mysql --defaults-file=/etc/my.cnf --socket=/tmp/mysql.sock --unbuffered --

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-04-05:

Marking as a duplicate of lp:1138439

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1138439 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.