xtrabackup-v2 SST fails with multiple log_bin directives in my.cnf

Bug #1362830 reported by Ernie Souhrada on 2014-08-28
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Fix Released
Undecided
Krunal Bauskar

Bug Description

When attempting to join a fresh, new node to an existing cluster, if there are multiple "log_bin" or "log-bin" directives in the system's my.cnf, the xtrabackup-v2 SST method fails. On systems such as Ubuntu/Debian, which tend to use include files, the likelihood of this occuring is apparently higher than you might think. However, RHEL/CentOS is also impacted (tests below are from CentOS 6.5):

To reproduce...
Bootstrap a single PXC node with whatever configuration you want. Set wsrep_sst_method=xtrabackup-v2

On the second node, add two "log_bin" or "log-bin" directives, such as:

log_bin = mysql-bin
log_bin = mysql-bin

It doesn't matter if you use log-bin or log_bin, nor does it matter if the paths are identical. Try to start the second node.

On the DONOR, you'll see the following in the MySQL error log:
WSREP_SST: [INFO] Streaming the backup to joiner at 192.168.224.131 4444 (20140828 12:15:12.963)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/my.cnf --no-version-check $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:192.168.224.131:4444; RC=( ${PIPESTATUS[@]} ) (20140828 12:15:12.967)
2014/08/28 12:15:12 socat[5812] E connect(4, AF=2 192.168.224.131:4444, 16): Connection refused
WSREP_SST: [ERROR] innobackupex finished with error: 1. Check /var/lib/mysql//innobackup.backup.log (20140828 12:15:13.137)
WSREP_SST: [ERROR] Cleanup after exit with status:22 (20140828 12:15:13.139)
WSREP_SST: [INFO] Cleaning up temporary directories (20140828 12:15:13.141)
2014-08-28 12:15:13 5512 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.224.131:4444/xtrabackup_sst' --auth 'pxb:pxb' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' '' --gtid '5e0b98c4-2ee7-11e4-b8a5-2bb3225feb07:2'
2014-08-28 12:15:13 5512 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.224.131:4444/xtrabackup_sst' --auth 'pxb:pxb' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' '' --gtid '5e0b98c4-2ee7-11e4-b8a5-2bb3225feb07:2': 22 (Invalid argument)
2014-08-28 12:15:13 5512 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.224.131:4444/xtrabackup_sst' --auth 'pxb:pxb' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' '' --gtid '5e0b98c4-2ee7-11e4-b8a5-2bb3225feb07:2'
2014-08-28 12:15:13 5512 [Warning] WSREP: Could not find peer: 9a92816a-2ee7-11e4-a391-f72fde4981d7
2014-08-28 12:15:13 5512 [Warning] WSREP: 0.0 (galera01): State transfer to -1.-1 (left the group) failed: -22 (Invalid argument)

------

On the JOINER, you'll see this:
2014-08-28 12:15:02 4249 [Note] WSREP: (9a92816a, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [INFO] Proceeding with SST (20140828 12:15:02.962)
WSREP_SST: [INFO] Cleaning the existing datadir (20140828 12:15:02.964)
removed `/var/lib/mysql/gvwstate.dat'
dirname: extra operand `mysql-bin'
Try `dirname --help' for more information.
WSREP_SST: [ERROR] Cleanup after exit with status:1 (20140828 12:15:02.974)
WSREP_SST: [INFO] Removing the sst_in_progress file (20140828 12:15:02.975)
2014-08-28 12:15:02 4249 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.224.131' --auth 'pxb:pxb' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '4249' '' : 1 (Operation not permitted)
2014-08-28 12:15:02 4249 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2014-08-28 12:15:02 4249 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
2014-08-28 12:15:02 4249 [ERROR] Aborting

-------

Note the "dirname: extra operand" line from the joiner's mysqld.log.

The issue comes from lines 779 - 789 of the wsrep_sst_xtrabackup-v2 script, and also the parse_cnf function in this script, lines 229-238. Running just the parse_cnf function against a my.cnf that contains two log_bin (or log-bin) entries returns the following:

/var/log/mysql/mysql-bin /var/log/mysql/mysql-bin

Which, when passed to 'dirname', gives you the error mentioned above:

(root@pxc-02 14:20:25)# dirname /var/log/mysql/mysql-bin /var/log/mysql/mysql-bin
dirname: extra operand `/var/log/mysql/mysql-bin'
Try `dirname --help' for more information.

Anyway, once you comment out one of the log_bin/log-bin directives, SST proceeds without issue. The regular xtrabackup (v1) SST method is not impacted.

------

Here is one simple way to fix it by always taking the first value returned; you may prefer some other approach.

(root@pxc-02 14:30:04)# diff -Naur wsrep_sst_xtrabackup-v2.orig wsrep_sst_xtrabackup-v2
--- wsrep_sst_xtrabackup-v2.orig 2014-08-28 14:23:52.398146520 -0700
+++ wsrep_sst_xtrabackup-v2 2014-08-28 14:29:12.078991948 -0700
@@ -778,6 +778,7 @@
             fi
             tempdir=$(parse_cnf mysqld log-bin "")
             if [[ -n ${tempdir:-} ]];then
+ tempdir=$(echo $tempdir | cut -d\ -f 1)
                 binlog_dir=$(dirname $tempdir)
                 binlog_file=$(basename $tempdir)
                 if [[ -n ${binlog_dir:-} && $binlog_dir != '.' && $binlog_dir != $DATA ]];then

Hi,

I have tried to reproduce this with PXC 5.5 and 5.6 but couldn't able to do it. Second node started successfully.

[root@percona-pxc55-2 mysql]# cat /etc/my.cnf | grep log_bin
log_bin = mysql-bin
log_bin = mysql-bin
[root@percona-pxc55-2 mysql]#

root@deb-pxc56-2:/var/lib/mysql# cat /etc/mysql/my.cnf | grep log_bin
#log_bin = /var/log/mysql/mysql-bin.log
log_bin = mysql-bin
log_bin = mysql-bin
root@deb-pxc56-2:/var/lib/mysql#

Can you provide my.cnf for both the nodes? it might be depend on what section, you have set those variables.

Changed in percona-xtradb-cluster:
status: New → Incomplete

Node 1:

(root@pxc-base 15:22:40)# cat /etc/my.cnf
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
log-bin=mysql-bin
skip_name_resolve
max_allowed_packet=64M
binlog_format=ROW
innodb_autoinc_lock_mode=2
innodb_log_file_size=50M
innodb_file_per_table=1
default_storage_engine=InnoDB
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=pxb:pxb
wsrep_cluster_address=gcomm://10.10.10.136,10.10.10.137
wsrep_node_address=10.10.10.136
wsrep_node_name=galera01
wsrep_provider=/usr/lib64/libgalera_smm.so
server_id=100
open_files_limit=65535

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

(root@pxc-base 15:24:35)# service mysql bootstrap-pxc
Bootstrapping PXC (Percona XtraDB Cluster)Starting MySQL (Percona XtraDB Cluster).. SUCCESS!

-------

Node 2:
(root@pxc-02 15:22:32)# cat /etc/my.cnf
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
log-bin=mysql-bin
log-bin=mysql-bin
skip_name_resolve
max_allowed_packet=64M
binlog_format=ROW
innodb_autoinc_lock_mode=2
innodb_log_file_size=50M
innodb_file_per_table=1
default_storage_engine=InnoDB
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=pxb:pxb
wsrep_cluster_address=gcomm://10.10.10.136,10.10.10.137
wsrep_node_address=10.10.10.137
wsrep_node_name=galera02
wsrep_provider=/usr/lib64/libgalera_smm.so
server_id=100
open_files_limit=65535

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

(root@pxc-02 15:27:26)# rm -rf /var/lib/mysql/*
[/var/lib/mysql]

(root@pxc-02 15:27:31)# ls -l /var/lib/mysql/
total 0

[/var/lib/mysql]
(root@pxc-02 15:27:33)# service mysql start
Starting MySQL (Percona XtraDB Cluster)....State transfer in progress, setting sleep higher
. ERROR! The server quit without updating PID file (/var/lib/mysql/pxc-02.deadbunny.lan.pid).
 ERROR! MySQL (Percona XtraDB Cluster) server startup failed!

----
CentOS 6.5, PXC 5.6.19

Launchpad Janitor (janitor) wrote :

[Expired for Percona XtraDB Cluster because there has been no activity for 60 days.]

Changed in percona-xtradb-cluster:
status: Incomplete → Expired

Opening the bug again. i57706 also affected. SST fixed after one of the duplicate lines was removed.

Changed in percona-xtradb-cluster:
status: Expired → New
tags: added: i57706
Krunal Bauskar (krunal-bauskar) wrote :

I tried this with latest revison from trunk but couldn't reproduce it as quoted in bug.
[pxc-node] $grep "log-bin" n2.cnf
log-bin=mysql-bin
log-bin=mysql-bin

What I surely saw is the error message from script

removed directory: ‘/opt/projects/codebase/pxc/installed/pxc/pxc-node/dn2/performance_schema’
WSREP_SST: [INFO] Cleaning the binlog directory .
. as well (20151204 12:56:39.540)
find: paths must precede expression: ./mysql-bin\.[0-9]+$
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
rm: cannot remove ‘.’: Is a directory
rm: cannot remove ‘./*.index’: No such file or directory

(But this didn't caused node startup to get aborted).

Krunal Bauskar (krunal-bauskar) wrote :

So as pointed before the issue was with duplicate values.
We have resolved this by fixing the parse_cnf function. parse_cnf will now consider the last occurring value if there are multiple values.

Hrvoje Matijakovic (hrvojem) wrote :
Changed in percona-xtradb-cluster:
status: New → Fix Committed
milestone: none → 5.6.27-25.13
Changed in percona-xtradb-cluster:
assignee: nobody → Krunal Bauskar (krunal-bauskar)
Changed in percona-xtradb-cluster:
status: Fix Committed → Fix Released

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1723

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers