sst_xtrabackup and sst_xtrabackup-v2 are broken on solaris

Bug #1263476 reported by Ryan Gordon
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
New
Undecided
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Confirmed
Medium
Unassigned
5.5
Won't Fix
Medium
Unassigned
5.6
Won't Fix
Medium
Unassigned

Bug Description

After attempting to boot a new PXC cluster on Solaris we ran into a bug in using xtrabackup or xtrabackup-v2 SST's mechanisms. The error manifests as the following in the MySQL error logs:

innobackupex: Error: Terminated with SIGPIPE at /mysql//bin/innobackupex line 1736.
/mysql//bin/wsrep_sst_xtrabackup-v2: line 413: ss: command not found
grep: illegal option -- E
Usage: grep [-c|-l|-q] [-r|-R] -hHbnsviw pattern file . . .
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20131222 12:19:07.871)
/mysql//bin/wsrep_sst_xtrabackup-v2: line 413: ss: command not found
grep: illegal option -- E
Usage: grep [-c|-l|-q] [-r|-R] -hHbnsviw pattern file . . .
/mysql//bin/wsrep_sst_xtrabackup-v2: line 413: ss: command not found
grep: illegal option -- E
Usage: grep [-c|-l|-q] [-r|-R] -hHbnsviw pattern file . . .
/mysql//bin/wsrep_sst_xtrabackup-v2: line 413: ss: command not found
grep: illegal option -- E
Usage: grep [-c|-l|-q] [-r|-R] -hHbnsviw pattern file . . .
/mysql//bin/wsrep_sst_xtrabackup-v2: line 413: ss: command not found
grep: illegal option -- E
Usage: grep [-c|-l|-q] [-r|-R] -hHbnsviw pattern file . . .
/mysql//bin/wsrep_sst_xtrabackup-v2: line 413: ss: command not found
grep: illegal option -- E
Usage: grep [-c|-l|-q] [-r|-R] -hHbnsviw pattern file . . .
/mysql//bin/wsrep_sst_xtrabackup-v2: line 413: ss: command not found
grep: illegal option -- E
Usage: grep [-c|-l|-q] [-r|-R] -hHbnsviw pattern file . . .
/mysql//bin/wsrep_sst_xtrabackup-v2: line 413: ss: command not found

Essentially the problem si that 'ss' is a linux-only program and thus is not cross compatible with solaris. We're using rsync for now until this gets fixed

Tags: sst xtrabackup
tags: added: sst xtrabackup
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Ryan,

Yes, looks like ifconfig needs to be used there. Since solaris is
not a supported platform, SST scripts are not tested there.

So, you may issues with other binaries like qpress, pv (though
these are optional), xtrabackup (and its perl deps), please check http://www.percona.com/doc/percona-xtradb-cluster/5.5/manual/xtrabackup_sst.html and let us know if any other binaries' dependencies may not be met on Solaris.

Revision history for this message
Ryan Gordon (ryan-5) wrote :

Hi,

I went ahead and double checked what is available against those dependencies and it does indeed look like only pv and ss are not available on solaris. We have been using xtrabackup on solaris for a while with a patch to make it compile

It also does look like qpress is compatible with solaris fortunately.

Best,
Ryan

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Looks like pv is available for Solaris as here:
http://www.ivarch.com/programs/pv.shtml

Regarding ss, it needs to be replaced with netstat/lsof for solaris.

ie.

        ss -p state listening "( sport = :$PORT )" | grep -qE 'socat|nc' && break

replaced by

        lsof -i :$PORT | grep LISTEN | grep -qE 'socat|nc' && break

Let us know if that works.

Revision history for this message
Ryan Gordon (ryan-5) wrote :

Hi,

I went ahead and tried that, initially I got this:

/mysql//bin/wsrep_sst_xtrabackup-v2: line 413: lsof: command not found
grep: illegal option -- E
Usage: grep [-c|-l|-q] [-r|-R] -hHbnsviw pattern file . . .
/mysql//bin/wsrep_sst_xtrabackup-v2: line 413: lsof: command not found
grep: illegal option -- E

I confirmed that lsof did exist:

# which lsof
/opt/local/bin/lsof

The problem is that xtrabackup-v2 sst is started with this PATH: /usr/sbin:/sbin:/mysql//bin:/sbin:/usr/sbin:/bin:/usr/bin:/mysql/bin

Perhaps this is a separate bug because we correctly start MySQL with this path: /opt/local/bin:/opt/local/sbin:/usr/bin:/usr/sbin:/mysql/bin

As a workaround I hardcoded it as this: /opt/local/bin/lsof -i :$PORT | /opt/local/bin/grep LISTEN | /opt/local/bin/grep -qE 'socat|nc' && break

After applying that workaround I'm not entirely sure this is still working correctly.

lsof -i looks for a file in this format:

* Displaying file information for all processes..
PID PROCESS FILE/DEVICE
-----------------------------------------------------------------
83620 socat /path/to/log/file.log
83621 xbstream /path/to/log/file.log

so unfortunately lsof -i :4444 will never return anything because it needs to be in the format of a file.

However lsof -p 4444 will work for example:
# lsof -o 4444
* Displaying process information for port 4444..
PID PROCESS IP PORT
-----------------------------------------------------------------
16402 socat 0.0.0.0 4444

It looks like if we use the following the command will work correctly:
lsof -o $PORT | grep -qE 'socat|nc' && break

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Ryan,

Thanks for the details.

a)
The lsof I had provided was for linux so may not have matched
correctly.

The requirement is that it should see if there is a listening
process named either socat or nc on that $PORT.

Now:
 lsof -o $PORT | grep -qE 'socat|nc' && break

 Is there a way to check if the $PORT is a listening one?

b)
 Regarding /opt, in general, the FHS states that any standard
 binaries/packages have to be in /usr, /bin, /sbin etc. and /opt
 is reserved for custom installed ones - http://www.pathname.com/fhs/2.2/fhs-3.12.html

 Don't /bin|/sbin|/usr/bin etc. not have grep, lsof etc.?

 In any case, for now, I suggest symlinking or updating the PATH
 for daemon (which will be passed to script) or having them in
 $basedir/bin (mysql basedir).

Revision history for this message
Ryan Gordon (ryan-5) wrote :

Hi,

a)

Unfortunately lsof for unix does not make that distinction

b)

On Solaris this is not how it works unfortunately. GNU tools are not installed to /usr, /bin, /sbin, or /opt. In fact /usr, /bin, and /sbin are all read-only mounts from the global zone and /opt is a writeable directory. GNU tools are installed to /opt/local/bin which is provided in the PATH we pass to the MySQL daemon but I do not know weather it is a bug or not that the PATH we pass to the MySQL daemon is not traversed to the SST script or if there is a separate configuration option to pass to the SST script's PATH. Without this distinction the tools we are getting are there, but in fact are not GNU. They are older, solaris specific, tools (old grep, old lsof etc).

Revision history for this message
Ryan Gordon (ryan-5) wrote :

Hi,

Wanted to report another solaris specific issue. I can report this separately if needed:

get_proc()
{
    set +e
    nproc=$(grep -c processor /proc/cpuinfo)
    [[ -z $nproc || $nproc -eq 0 ]] && nproc=1
    set -e
}

/proc/cpuinfo isn't available in Solaris. There are ways to count the number of physical CPUs available but the nature of Illumos Solaris is that several clients share their own "zone" on a server (almost like a VM) so if a server has 32 cpu's but the zone is only allocated 8 then the entire zone will be allowed to burst for a short time and then rate limited which is not ideal. the ideal solution is to allow an option like --processors to allow the number of processors to use to be set.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Ryan,

I will look into adding /opt/bin to PATH as well. Regarding /proc/cpuinfo, will something like http://www.alper.net/hardware/numbers-of-cpu-cores-in-solaris-10/ help or won't that take into consideration the zone? For now, I will check if that file exists.

Revision history for this message
Ryan Gordon (ryan-5) wrote :

Hi Raghu,

Just to be clear on solaris it should be /opt/local/bin not /opt/bin

Using one of those commands would help determine the amount of physical cores on the server but it would still be necessary to implement an override option for those of us who use Joyent (or anyone who uses a Solaris/SunOS/Illumos Zone)

Thanks for looking into this!

Best,
Ryan

Revision history for this message
Derek (ygan-dsr6r-9ft1) wrote :

Hi,

I've attached a patch for SmartOS (Joyent Public Cloud) support. Can we get this introduced in the next rounds of Xtrabackup? That would be great.

Thanks!

Revision history for this message
Derek (ygan-dsr6r-9ft1) wrote :

Attaching a new patch that works for SunOS and SmartOS.

Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

Solaris is not a supported platform I am not sure if we want to fold this patch and start maintaining it till we have full support for Solaris.

Revision history for this message
Tao Zhou (angeloudy) wrote :

I am experiencing the same issue on free BSD. replacing

ss -p state listening "( sport = :$PORT )" | grep -qE 'socat|nc' && break

with

        lsof -i :$PORT | grep LISTEN | grep -qE 'socat|nc' && break

Solved the problem.
Also, the tar cmd is not compatible with FreeBSD either.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1094

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.