when mysqld dies it leaves SST scripts running

Bug #1076816 reported by Vadim Tkachenko
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
New
Undecided
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
New
Undecided
Unassigned

Bug Description

I still observe cases when mysqld dies, but SST script and external utils are continuing.

i.e.
ps auxww
mysql 32618 0.0 0.0 106148 1348 ? S 15:19 0:00 /bin/bash -ue /usr/local/Percona-XtraDB-Cluster-5.5.28-23.7.366.Linux.x86_64//bin/wsrep_sst_xtrabackup --role joiner --address 10.4.101.225 --auth --datadir /mnt/data/mysql/ --defaults-file /etc/my.cnf --parent 32598
mysql 32631 0.0 0.0 7608 508 ? S 15:19 0:00 nc -dl 4444
mysql 32632 0.0 0.0 113948 936 ? S 15:19 0:00 tar xfi - -C /mnt/data/mysql/

but mysqld died:
===
121108 15:20:53 [Note] WSREP: Created page /mnt/data/mysql/gcache.page.000001 of size 134217728 bytes
121108 15:21:19 [ERROR] WSREP: We ran out of resources, seemingly because we've been unsuccessfully requesting state transfer for over 116 seconds. Please check that there is at least one fully synced member in the group. Application must be restarted.
121108 15:21:19 [ERROR] WSREP: State transfer request failed unrecoverably: 35 (Resource deadlock avoided). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
121108 15:21:19 [Note] WSREP: Closing send monitor...
121108 15:21:19 [Note] WSREP: Closed send monitor.
121108 15:21:19 [Note] WSREP: gcomm: terminating thread
121108 15:21:19 [Note] WSREP: gcomm: joining thread
121108 15:21:19 [Note] WSREP: gcomm: closing backend
121108 15:21:20 [Note] WSREP: view(view_id(NON_PRIM,4b185455-29f5-11e2-0800-fdd22c2474b0,17) memb {
        b9be24b9-29fa-11e2-0800-c5e7fbdecba2,
} joined {
} left {
} partitioned {
        4b185455-29f5-11e2-0800-fdd22c2474b0,
        c277d921-29e4-11e2-0800-a963478d2ef7,
})
121108 15:21:20 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
121108 15:21:20 [Note] WSREP: view((empty))
121108 15:21:20 [Note] WSREP: gcomm: closed
121108 15:21:20 [Note] WSREP: Flow-control interval: [16, 16]
121108 15:21:20 [Note] WSREP: Received NON-PRIMARY.
121108 15:21:20 [Note] WSREP: Shifting PRIMARY -> OPEN (TO: 17556706)
121108 15:21:20 [Note] WSREP: Received self-leave message.
121108 15:21:20 [Note] WSREP: Flow-control interval: [0, 0]
121108 15:21:20 [Note] WSREP: Received SELF-LEAVE. Closing connection.
121108 15:21:20 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 17556706)
121108 15:21:20 [Note] WSREP: RECV thread exiting 0: Success
121108 15:21:20 [Note] WSREP: recv_thread() joined.
121108 15:21:20 [Note] WSREP: Closing slave action queue.
121108 15:21:20 [Note] WSREP: /usr/local/Percona-XtraDB-Cluster-5.5.28-23.7.366.Linux.x86_64/bin/mysqld: Terminated.
121108 15:21:20 mysqld_safe mysqld from pid file /mnt/data/mysql/server-1352234799-az-1-region-a-geo-1.localdomain.pid ended
===

It would be good to have a way to kill SST script and nc + tar utils, as they are using network IP address and port,
and it is impossible to start node again without killing them manually.

Revision history for this message
Vadim Tkachenko (vadim-tk) wrote :

in fact, even I do clean mysqladmin shutdown,

if donor was running at that time, I still see running processes
mysql 21920 0.0 0.0 106092 1328 pts/2 S 17:05 0:00 /bin/bash -ue /usr/local/Percona-XtraDB-Cluster-5.5.28-23.7.366.Linux.x86_64//bin/wsrep_sst_xtrabackup --role donor --address 10.4.101.225:4444/xtrabackup_sst --auth (null) --socket /tmp/mysql.sock --datadir /mnt/data/mysql/ --defaults-file /etc/my.cnf --gtid 9db8f201-2858-11e2-0800-0532898e059e:19606001
mysql 21932 0.1 0.0 143228 10824 pts/2 S 17:05 0:00 perl /usr/bin/innobackupex --galera-info --tmpdir=/tmp --stream=tar --defaults-file=/etc/my.cnf --socket=/tmp/mysql.sock /tmp
mysql 21934 0.0 0.0 106092 1160 pts/2 S 17:05 0:00 sh -c mysql --defaults-file='/etc/my.cnf' --socket='/tmp/mysql.sock' --unbuffered -- >/tmp/jq644fvuFh 2>/tmp/FfogPFCicM
mysql 21935 0.0 0.0 197916 1888 pts/2 S 17:05 0:00 mysql --defaults-file=/etc/my.cnf --socket=/tmp/mysql.sock --unbuffered --

after mysqld was destroyed.

Revision history for this message
Vadim Tkachenko (vadim-tk) wrote :
Download full text (3.5 KiB)

Now, even JOINER fails, and DONOR informed about it, I still see the processes running on DONOR.

DONOR log:
121108 18:53:02 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'donor' --address '10.4.101.225:4444/xtrabackup_sst' --auth '(null)' --socket '/tmp/mysql.sock' --datadir '/mnt/data/mysql/' --defaults-file '/etc/my.cnf' --gtid '9db8f201-2858-11e2-0800-0532898e059e:19606001''
121108 18:53:02 [Note] WSREP: sst_donor_thread signaled with 0
121108 18:54:05 [Note] WSREP: view(view_id(PRIM,7d0187d8-2a18-11e2-0800-3ad438ce660f,3) memb {
        7d0187d8-2a18-11e2-0800-3ad438ce660f,
} joined {
} left {
} partitioned {
        91dab590-2a18-11e2-0800-66ce82790edc,
})
121108 18:54:05 [Note] WSREP: forgetting 91dab590-2a18-11e2-0800-66ce82790edc (tcp://10.4.101.225:4567)
121108 18:54:05 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
121108 18:54:05 [Note] WSREP: STATE_EXCHANGE: sent state UUID: b9cd2dba-2a18-11e2-0800-cc1626de4219
121108 18:54:05 [Note] WSREP: STATE EXCHANGE: sent state msg: b9cd2dba-2a18-11e2-0800-cc1626de4219
121108 18:54:05 [Note] WSREP: STATE EXCHANGE: got state msg: b9cd2dba-2a18-11e2-0800-cc1626de4219 from 0 (node1)
121108 18:54:05 [Note] WSREP: Quorum results:
        version = 2,
        component = PRIMARY,
        conf_id = 2,
        members = 1/1 (joined/total),
        act_id = 19606001,
        last_appl. = 0,
        protocols = 0/4/2 (gcs/repl/appl),
        group UUID = 9db8f201-2858-11e2-0800-0532898e059e
121108 18:54:05 [Note] WSREP: Flow-control interval: [16, 16]
121108 18:54:05 [Note] WSREP: New cluster view: global state: 9db8f201-2858-11e2-0800-0532898e059e:19606001, view# 3: Primary, number of nodes: 1, my index: 0, protocol version 2
121108 18:54:05 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
121108 18:54:05 [Note] WSREP: Assign initial position for certification: 19606001, protocol version: 2
121108 18:54:10 [Note] WSREP: cleaning up 91dab590-2a18-11e2-0800-66ce82790edc (tcp://10.4.101.225:4567)

====

DONOR processes:
root 2771 0.0 0.0 106228 1416 pts/2 S 18:52 0:00 /bin/sh /usr/local/mysql/bin/mysqld_safe
mysql 3121 0.5 5.5 12581992 918872 pts/2 Sl 18:52 0:01 /usr/local/Percona-XtraDB-Cluster-5.5.28-23.7.366.Linux.x86_64/bin/mysqld --basedir=/usr/local/Percona-XtraDB-Cluster-5.5.28-23.7.366.Linux.x86_64 --datadir=/mnt/data/mysql --plugin-dir=/usr/local/Percona-XtraDB-Cluster-5.5.28-23.7.366.Linux.x86_64/lib/mysql/plugin --user=mysql --log-error=/mnt/data/mysql/error.log --pid-file=/mnt/data/mysql/server-1352234832-az-1-region-a-geo-1.localdomain.pid
mysql 3160 0.0 0.0 106092 1324 pts/2 S 18:53 0:00 /bin/bash -ue /usr/local/Percona-XtraDB-Cluster-5.5.28-23.7.366.Linux.x86_64//bin/wsrep_sst_xtrabackup --role donor --address 10.4.101.225:4444/xtrabackup_sst --auth (null) --socket /tmp/mysql.sock --datadir /mnt/data/mysql/ --defaults-file /etc/my.cnf --gtid 9db8f201-2858-11e2-0800-0532898e059e:19606001
mysql 3172 0.1 0.0 143228 10824 pts/2 S 18:53 0:00 perl /usr/bin/innobackupex --galera-info --tmpdir=/tmp --stream=tar --defaults-file=/etc/my.cnf --socket=/tmp/mysql.soc...

Read more...

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Marking as a duplicate of lp:1138439

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.