XtraDB cluster does not properly detect bonded network interfaces

Bug #1007554 reported by Swany
22
This bug affects 5 people
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Incomplete
Undecided
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Expired
Undecided
Unassigned

Bug Description

when using bonded network interfaces:
WSREP: Failed to read output of: '/sbin/ifconfig | grep -m1 -1 -E '^[a-z]?eth[0-9]' | tail -n 1 | awk '{ print $2 }' | awk -F : '{ print $2 }''
[Warning] WSREP: Failed to autoguess base node address
[Note] WSREP: Service disconnected.
[Note] WSREP: Some threads may fail to exit.

This might work better and should be interface type agnostic:
ping -i 1 -c 1 $(hostname) | grep -m1 -1 -E '\([0-9.]+{4}\)' | awk '{ print $3 }' | sed 's/[()]//g' | grep -v '127.0.0.1'

There should also probably be an option to set the preferred IP via an environment variable before installing the RPM. It may be the case that multiple interfaces exist, and the administrator prefers to use a particular interface (not just the first one from the list, or the one associated with the hostname). For unattended installations, setting the interface automatically may not work.

Revision history for this message
Maciej Dobrzanski (mushu) wrote :

I think pinging $(hostname) is not a good solution either, because it also relies on something that does not have to exist (e.g. hostname may not resolve into anything). The following command finds all non-local IP addresses associated with network interfaces.

/sbin/ifconfig | sed -e '/./{H;$!d};x;/LOOPBACK/d;/inet addr:.\+ UP [ A-Z]\+RUNNING/!d;s/.*inet addr:\([0-9.]\+\).*/\1/'

'| head -1' can be appended to automatically choose the first IP address. However as ifconfig is begin deprecated, it might be a good idea to use /sbin/ip instead.

Revision history for this message
Baron Schwartz (baron-xaprb) wrote : Re: [Bug 1007554] Re: XtraDB cluster does not properly detect bonded network interfaces

The sed command suggested in the last comment may not be completely
reliable and portable. I've found that sed works differently in some
cases with respect to regex patterns and so forth. The command doesn't
work on my laptop, for example.

Perl might actually be a better tool for this case:

/sbin/ifconfig | perl -e
'$/="";while(<>){$_!~m/LOOPBACK/&&/UP.*RUNNING/&&s/^.*addr:(\S+).*$/$1/s&&print
$_, "\n"}'

Revision history for this message
Maciej Dobrzanski (mushu) wrote :

Perl is no longer shipped with the base system in some Linux distributions, so I guess Python would be a better choice these days. But this should be done with simple tools if portability is important. I don't see any reason why not to replace the one-liner with a call to an external bash script that would have a lot more flexibility in handling the task.

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

I agree on a script instead of a one-liner.

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

I can't confirm this with the current mysql-wsrep code. In fact, no wsrep initialisation, and therefore address guessing should happen during RPM installation. If I understand it correctly, the only reason to run mysqld during installation is to create the base data directory and this can (and should) be done in a standalone mode.

Changed in codership-mysql:
status: New → Incomplete
Revision history for this message
Piotr Biel (piotr-biel) wrote :

Alex,

This is a bigger problem. It affects not only installation/initialization but it affects entire cluster.

With:
wsrep_sst_receive_address=bond0 10.1.4.20

It can't use xtrabackup/rsync for SST:

[ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup 'donor' 'bond0 10.1.2.10:4444/xtrabackup_sst'

nc: port range not valid

This is because:

+ nc bond0 10.1.4.21 4444

is being executed.

Revision history for this message
Piotr Biel (piotr-biel) wrote :

120604 3:10:08 [Warning] WSREP: Gap in state sequence. Need state transfer.
120604 3:10:10 [ERROR] WSREP: Failed to read output of: '/sbin/ifconfig | grep -m1 -1 -E '^[a-z]?eth[0-9]' | tail -n 1 | awk '{ print $2 }' | awk -F : '{ print $2 }''
120604 3:10:10 [ERROR] WSREP: Could not prepare state transfer request: failed to guess address to accept state transfer at. wsrep_sst_receive_address must be set manually.
120604 3:10:10 [ERROR] Aborting

Now, if I set:

wsrep_sst_receive_address=bond0 10.1.4.20

it ends up with issue described above.

Revision history for this message
Piotr Biel (piotr-biel) wrote :

With:

wsrep_sst_receive_address=10.1.4.20

works fine.

Revision history for this message
Vadim Tkachenko (vadim-tk) wrote :

Alex,

The call /sbin/ifconfig happens from mysqld.
And the main point is to use external script, not hardcoded line from mysqld.

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Vadim,

I know that this ifconfig call is invoked from mysqld. What I meant is that if it is happening _during_ install, the bug is not even that it fails to detect the right address, but that this call happens at all. And I suspect that this bug is either in Percona's mysqld code or in RPM installation script since I could not reproduce that with our rpms.

As for the script - are you seriously considering to write a script that would 100% correctly autodetect the right IP address on the right interface in the installations targeted for HA and advise people to rely on that instead of the manual configuration?

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Here's another manifestation of that bug: http://forum.percona.com/index.php?t=rview&goto=8749#msg_8749
- again wsrep provider (Galera) is loaded during installation.

Changed in percona-xtradb-cluster:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Percona XtraDB Cluster because there has been no activity for 60 days.]

Changed in percona-xtradb-cluster:
status: Incomplete → Expired
Revision history for this message
churnd (churnd) wrote :

I'm having this exact problem as well when trying to use xtrabackup for SST. Was there ever a recommended solution?

Changed in percona-xtradb-cluster:
status: Expired → Incomplete
Revision history for this message
churnd (churnd) wrote :

Please disregard the previous comment. My issue was related to selinux being enabled, not this particular bug.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@churnd,

Setting wsrep_node_address should do.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Percona XtraDB Cluster because there has been no activity for 60 days.]

Changed in percona-xtradb-cluster:
status: Incomplete → Expired
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1219

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.