garbd for backup doesn't work with lock for backup

Bug #1443154 reported by Frederic Descamps
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
New
Undecided
Unassigned

Bug Description

If we check the documentation of Codership (http://galeracluster.com/documentation-webpages/backingupthecluster.html) it's possible to backup your cluster using garbd, and several people is using this technique (I'm not 100% sure but Cluster Control does too I guess).

However since the implementation of locks for backup, this is not working any more:

[root@node3 backup]# garbd --address gcomm://192.168.90.2:4567 -o gmcast.listen_addr=tcp://0.0.0.0:444 --group LefredPXC --donor node1 --sst xtrabackup-v2
2015-04-12 01:06:29.865 INFO: CRC-32C: using "slicing-by-8" algorithm.
2015-04-12 01:06:29.866 INFO: Read config:
 daemon: 0
 name: garb
 address: gcomm://192.168.90.2:4567
 group: LefredPXC
 sst: xtrabackup-v2
 donor: node1
 options: gmcast.listen_addr=tcp://0.0.0.0:444; gcs.fc_limit=9999999; gcs.fc_factor=1.0; gcs.fc_master_slave=yes
 cfg:
 log:

2015-04-12 01:06:29.866 INFO: protonet asio version 0
2015-04-12 01:06:29.867 INFO: Using CRC-32C for message checksums.
2015-04-12 01:06:29.867 INFO: backend: asio
2015-04-12 01:06:29.867 WARN: access file(./gvwstate.dat) failed(No such file or directory)
2015-04-12 01:06:29.867 INFO: restore pc from disk failed
2015-04-12 01:06:29.867 INFO: GMCast version 0
2015-04-12 01:06:29.867 INFO: (63519dac, 'tcp://0.0.0.0:444') listening at tcp://0.0.0.0:444
2015-04-12 01:06:29.867 INFO: (63519dac, 'tcp://0.0.0.0:444') multicast: , ttl: 1
2015-04-12 01:06:29.868 INFO: EVS version 0
2015-04-12 01:06:29.868 INFO: gcomm: connecting to group 'LefredPXC', peer '192.168.90.2:4567'
2015-04-12 01:06:29.870 INFO: (63519dac, 'tcp://0.0.0.0:444') turning message relay requesting on, nonlive peers: tcp://192.168.90.3:4567
2015-04-12 01:06:30.370 INFO: declaring 3102fc82 at tcp://192.168.90.2:4567 stable
2015-04-12 01:06:30.370 INFO: declaring 8374222a at tcp://192.168.90.3:4567 stable
2015-04-12 01:06:30.371 INFO: Node 3102fc82 state prim
2015-04-12 01:06:30.372 INFO: view(view_id(PRIM,3102fc82,16) memb {
 3102fc82,0
 63519dac,0
 8374222a,0
} joined {
} left {
} partitioned {
})
2015-04-12 01:06:30.372 INFO: save pc into disk
2015-04-12 01:06:30.869 INFO: gcomm: connected
2015-04-12 01:06:30.869 INFO: Changing maximum packet size to 64500, resulting msg size: 32636
2015-04-12 01:06:30.869 INFO: Shifting CLOSED -> OPEN (TO: 0)
2015-04-12 01:06:30.869 INFO: Opened channel 'LefredPXC'
2015-04-12 01:06:30.869 INFO: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
2015-04-12 01:06:30.869 INFO: STATE EXCHANGE: Waiting for state UUID.
2015-04-12 01:06:30.869 INFO: STATE EXCHANGE: sent state msg: 639e1f48-e09f-11e4-a700-db023c846e92
2015-04-12 01:06:30.869 INFO: STATE EXCHANGE: got state msg: 639e1f48-e09f-11e4-a700-db023c846e92 from 0 (node1)
2015-04-12 01:06:30.869 INFO: STATE EXCHANGE: got state msg: 639e1f48-e09f-11e4-a700-db023c846e92 from 2 (node2)
2015-04-12 01:06:30.870 INFO: STATE EXCHANGE: got state msg: 639e1f48-e09f-11e4-a700-db023c846e92 from 1 (garb)
2015-04-12 01:06:30.870 INFO: Quorum results:
 version = 3,
 component = PRIMARY,
 conf_id = 13,
 members = 2/3 (joined/total),
 act_id = 18800,
 last_appl. = -1,
 protocols = 0/7/3 (gcs/repl/appl),
 group UUID = b226add4-df80-11e4-a059-df2f970fa58d
2015-04-12 01:06:30.871 INFO: Flow-control interval: [9999999, 9999999]
2015-04-12 01:06:30.871 INFO: Shifting OPEN -> PRIMARY (TO: 18800)
2015-04-12 01:06:30.871 INFO: Sending state transfer request: 'xtrabackup-v2', size: 13
2015-04-12 01:06:30.872 INFO: Member 1.0 (garb) requested state transfer from 'node1'. Selected 0.0 (node1)(SYNCED) as donor.
2015-04-12 01:06:30.872 INFO: Shifting PRIMARY -> JOINER (TO: 18800)
2015-04-12 01:06:30.873 INFO: Closing send monitor...
2015-04-12 01:06:30.873 INFO: Closed send monitor.
2015-04-12 01:06:30.873 INFO: gcomm: terminating thread
2015-04-12 01:06:30.873 INFO: gcomm: joining thread
2015-04-12 01:06:30.873 INFO: gcomm: closing backend
2015-04-12 01:06:30.877 INFO: 1.0 (garb): State transfer from 0.0 (node1) complete.
2015-04-12 01:06:30.877 INFO: Shifting JOINER -> JOINED (TO: 18800)
2015-04-12 01:06:30.879 INFO: view(view_id(NON_PRIM,3102fc82,16) memb {
 63519dac,0
} joined {
} left {
} partitioned {
 3102fc82,0
 8374222a,0
})
2015-04-12 01:06:30.879 INFO: view((empty))
2015-04-12 01:06:30.880 INFO: gcomm: closed
2015-04-12 01:06:30.880 WARN: 0x1f68500 down context(s) not set
2015-04-12 01:06:30.880 WARN: Failed to send SYNC signal: -107 (Transport endpoint is not connected)
2015-04-12 01:06:30.881 INFO: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2015-04-12 01:06:30.881 INFO: Flow-control interval: [9999999, 9999999]
2015-04-12 01:06:30.881 INFO: Received NON-PRIMARY.
2015-04-12 01:06:30.881 INFO: Shifting JOINED -> OPEN (TO: 18800)
2015-04-12 01:06:30.881 INFO: Received self-leave message.
2015-04-12 01:06:30.881 INFO: Flow-control interval: [9999999, 9999999]
2015-04-12 01:06:30.881 INFO: Received SELF-LEAVE. Closing connection.
2015-04-12 01:06:30.881 INFO: Shifting OPEN -> CLOSED (TO: 18800)
2015-04-12 01:06:30.881 INFO: RECV thread exiting 0: Success
2015-04-12 01:06:30.881 INFO: recv_thread() joined.
2015-04-12 01:06:30.881 INFO: Closing replication queue.
2015-04-12 01:06:30.881 INFO: Closing slave action queue.
2015-04-12 01:06:30.881 WARN: Attempt to close a closed connection
2015-04-12 01:06:30.881 INFO: Exiting main loop
2015-04-12 01:06:30.881 INFO: Shifting CLOSED -> DESTROYED (TO: 18800)

and on the DONOR:

2015-04-12 01:06:30 5087 [Note] WSREP: Service thread queue flushed.
WSREP_SST: [INFO] Streaming with xbstream (20150412 01:06:31.298)
WSREP_SST: [INFO] Using socat as streamer (20150412 01:06:31.299)
WSREP_SST: [ERROR] Upgrade joiner to 5.6.21 or higher for backup locks support (20150412 01:06:31.303)
WSREP_SST: [ERROR] The joiner is not supported for this version of donor (20150412 01:06:31.304)
WSREP_SST: [ERROR] Cleanup after exit with status:93 (20150412 01:06:31.306)
WSREP_SST: [INFO] Cleaning up temporary directories (20150412 01:06:31.308)
2015-04-12 01:06:31 5087 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address '' --auth 'root:' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --binlog 'node1-bin' --gtid 'b226add4-df80-11e4-a059-df2f970fa58d:18800'

But of course you don't want to configure the nodes to not use locks for backup as then this will be the default behaviour for the nodes in the cluster when SST is needed.

I don't know if it wlll be possible to use then garbd for backup and in case it's impossible, then we should document it.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

"
 WSREP_SST: [ERROR] Upgrade joiner to 5.6.21 or higher for backup locks support (20150412 01:06:31.303)
WSREP_SST: [ERROR] The joiner is not supported for this version of donor (20150412 01:06:31.304)
"

is the error here. What is the version of garbd here.

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

> 2015-04-12 01:06:31 5087 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address '' --auth 'root:' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --binlog 'node1-bin' --gtid 'b226add4-df80-11e4-a059-df2f970fa58d:18800'

- somehow garbd sent state request with an empty receive address. Or, donor lost it halfway. So it must be unrelated to backup locks.

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Disregard the above comment. The address in this case should be empty, because it is the backup script run by the donor. So initial analysis by Fred is correct. It has nothing to do with garbd version.

Revision history for this message
Holger Thiel (holger-thiel) wrote :

The variable $sst_ver is zero.

677 if [[ -z $sst_ver ]];then
678 wsrep_log_error "Upgrade joiner to 5.6.21 or higher for backup locks support"
679 wsrep_log_error "The joiner is not supported for this version of donor"
680 exit 93
681 fi

$set_ver is set by

  483 setup_ports()
  484 {
  485 if [[ "$WSREP_SST_OPT_ROLE" == "donor" ]];then
  486 SST_PORT=$(echo $WSREP_SST_OPT_ADDR | awk -F '[:/]' '{ print $2 }')
  487 REMOTEIP=$(echo $WSREP_SST_OPT_ADDR | awk -F ':' '{ print $1 }')
  488 lsn=$(echo $WSREP_SST_OPT_ADDR | awk -F '[:/]' '{ print $4 }')
  489 sst_ver=$(echo $WSREP_SST_OPT_ADDR | awk -F '[:/]' '{ print $5 }')
  490 else
  491 SST_PORT=$(echo ${WSREP_SST_OPT_ADDR} | awk -F ':' '{ print $2 }')
  492 fi
  493 }

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1818

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.