Rejoining galera cluster fails with default wsrep_sst_method setting

Bug #1478105 reported by Jimmy McCrory
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Critical
David Wilde
Juno
Fix Released
Critical
David Wilde
Kilo
Fix Released
Critical
David Wilde
Trunk
Fix Released
Critical
David Wilde

Bug Description

wsrep_sst_method is set to xtrabackup, but this seems to be causing a problem when a node attempts to re-join a cluster by requesting a state transfer.

This issue was run into when an infrastructure host was offline for around a week for hardware replacements, changing the setting to wsrep_stt_method=xtrabackup-v2 allowed it to rejoin the cluster.

Reproduced on a Juno deployment:
From within single infra host's galera container,
service mysql stop
rm /var/lib/mysql/*
service mysql start

Attaching logs for the joining node and the donor node, which was chosen to provide the state transfer.

The actual failure looks to be coming from the /usr/bin/innobackupex script, so there could be some problem with the particular version installed.

# dpkg -l | grep xtrabackup
ii percona-xtrabackup 2.1.8-1 amd64 Open source backup tool for InnoDB and XtraDB
ii xtrabackup 2.1.8-1 all Transitional package for percona-xtrabackup

# dpkg -l | grep mariadb-galera
ii mariadb-galera-server-5.5 5.5.41+maria-1~trusty amd64 MariaDB database server with Galera cluster binaries

Revision history for this message
Jimmy McCrory (jimmy-mccrory) wrote :
Revision history for this message
Kevin Carter (kevin-carter) wrote :

In Kilo we're working on getting the stack to use MariaDB10 w/ Xtrabackup-v2 which is a spec/PR awaiting reviews here: https://review.openstack.org/#/c/178259/ . As for Juno we're looking into the problems and will about getting a fix in soonish.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-ansible-deployment (kilo)

Fix proposed to branch: kilo
Review: https://review.openstack.org/206118

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-ansible-deployment (juno)

Fix proposed to branch: juno
Review: https://review.openstack.org/206122

Revision history for this message
Kevin Carter (kevin-carter) wrote :
Revision history for this message
Jacob Wagner (swagner1104) wrote :
Download full text (3.3 KiB)

Some output from the log file when this is set wrong, adding for bug history purposes

150727 15:18:48 [Note] WSREP: Flow-control interval: [23, 23]
150727 15:18:48 [Note] WSREP: New cluster view: global state: 02e60dd2-20fb-11e5-b5d1-b668c5066ff4:6805821, view# 2: Primary, number of nodes: 2, my index: 0, protocol version 3
150727 15:18:48 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
150727 15:18:48 [Note] WSREP: REPL Protocols: 7 (3, 2)
150727 15:18:48 [Note] WSREP: Service thread queue flushed.
150727 15:18:48 [Note] WSREP: Assign initial position for certification: 6805821, protocol version: 3
150727 15:18:48 [Note] WSREP: Service thread queue flushed.
150727 15:18:49 [Note] WSREP: Member 1.0 (605021-infra02_galera_container-d43cfc6c) requested state transfer from '*any*'. Selected 0.0 (605023-infra03_galera_container-bd4b49ab)(SYNCED) as donor.
150727 15:18:49 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 6805826)
150727 15:18:49 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
150727 15:18:49 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'donor' --address '172.27.236.238:4444/xtrabackup_sst' --auth 'root:5cd96ef20bdc302a5c1d0e965bfe5b1a' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --gtid '02e60dd2-20fb-11e5-b5d1-b668c5066ff4:6805826''
150727 15:18:49 [Note] WSREP: sst_donor_thread signaled with 0
WSREP_SST: [INFO] Streaming with tar (20150727 15:18:49.302)
WSREP_SST: [INFO] Using socat as streamer (20150727 15:18:49.304)
WSREP_SST: [INFO] Streaming the backup to joiner at 172.27.236.238 4444 (20150727 15:18:49.313)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/mysql/my.cnf $INNOEXTRA --galera-info --stream=$sfmt ${TMPDIR} 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:172.27.236.238:4444; RC=( ${PIPESTATUS[@]} ) (20150727 15:18:49.316)
WSREP_SST: [ERROR] innobackupex finished with error: 13. Check /var/lib/mysql//innobackup.backup.log (20150727 15:18:49.511)
WSREP_SST: [ERROR] Cleanup after exit with status:22 (20150727 15:18:49.514)
150727 15:18:49 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup --role 'donor' --address '172.27.236.238:4444/xtrabackup_sst' --auth 'root:5cd96ef20bdc302a5c1d0e965bfe5b1a' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --gtid '02e60dd2-20fb-11e5-b5d1-b668c5066ff4:6805826'
150727 15:18:49 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role 'donor' --address '172.27.236.238:4444/xtrabackup_sst' --auth 'root:5cd96ef20bdc302a5c1d0e965bfe5b1a' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --gtid '02e60dd2-20fb-11e5-b5d1-b668c5066ff4:6805826': 22 (Invalid argument)
150727 15:18:49 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup --role 'donor' --address '172.27.236.238:4444/xtrabackup_sst' --auth 'root:5cd96ef20bdc302a5c1d0e965bfe5b1a' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --gtid '02e60dd2-20fb-11e5-b5d1-b668c5066ff4:6805826'
150727 15:18:49 [Warning] WSREP: ...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (kilo)

Reviewed: https://review.openstack.org/206118
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=6f170025e6f32b9a80bfd1efee627eea3d97f267
Submitter: Jenkins
Branch: kilo

commit 6f170025e6f32b9a80bfd1efee627eea3d97f267
Author: Dave Wilde <email address hidden>
Date: Mon Jul 27 10:42:15 2015 -0500

    Templateize and change Galera SST method

    Per the bug we need to be using xtrabackup-v2 for the wsrep_sst_method. This
    patch creates an galera_sst_method variable and defaults it to xtrabackup-v2.

    Change-Id: Iee88b49e84e3a8aaf477af45b4a42a4a2c31634e
    Closes-Bug: 1478105

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (juno)

Reviewed: https://review.openstack.org/206122
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=a19e72f3610d0a13289e9cb8fbf9fb5afd5c7fb3
Submitter: Jenkins
Branch: juno

commit a19e72f3610d0a13289e9cb8fbf9fb5afd5c7fb3
Author: Dave Wilde <email address hidden>
Date: Mon Jul 27 10:48:01 2015 -0500

    Change wsrep_sst_method to xtrabackup-v2

    Per the bug we should and should have been using xtrabackup-v2 for the
    wsrep_sst_method. This patch defaults the value to xtrabackup-v2.

    Change-Id: Ibd0be33e0d69a928a9cab50065ab9d3721839d58
    Closes-Bug: 1478105

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (master)

Reviewed: https://review.openstack.org/207082
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=78d1bb611055177f393d3a9c08047a4f83c3ab23
Submitter: Jenkins
Branch: master

commit 78d1bb611055177f393d3a9c08047a4f83c3ab23
Author: Dave Wilde <email address hidden>
Date: Mon Jul 27 10:42:15 2015 -0500

    Templateize and change Galera SST method

    Per the bug we need to be using xtrabackup-v2 for the wsrep_sst_method. This
    patch creates an galera_sst_method variable and defaults it to xtrabackup-v2.

    Change-Id: Iee88b49e84e3a8aaf477af45b4a42a4a2c31634e
    Closes-Bug: 1478105
    (cherry picked from commit 6f170025e6f32b9a80bfd1efee627eea3d97f267)

Changed in openstack-ansible:
status: Confirmed → Fix Committed
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/openstack-ansible 11.2.11

This issue was fixed in the openstack/openstack-ansible 11.2.11 release.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/openstack-ansible 11.2.12

This issue was fixed in the openstack/openstack-ansible 11.2.12 release.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/openstack-ansible 11.2.14

This issue was fixed in the openstack/openstack-ansible 11.2.14 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.