Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

wsrep_sst_xtrabackup-v2 script, longer sleep needed before SST

Bug #1413879 reported by Thomas Daugherty on 2015-01-23

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Status tracked in 5.6
5.5	Fix Released	Undecided	Raghavendra D Prabhu	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.5.41-25.11
5.6	Fix Released	Undecided	Raghavendra D Prabhu	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.6.22-25.8

Bug Description

In file/script wsrep_sst_xtrabackup-v2, there is a sleep before SST, which is set to 10 seconds be default. However on my systems it takes longer than 10 seconds for the joiner to delete existing files, before it start the socat receiver. To the donor waits 10 seconds, then tried to connect via socat to the joiner, which hasn't yet run socat.

I would say this sleep needs to be at least 20 seconds, to allow time to remove existing files on the joiner side.

wsrep_log_info "Sleeping before data transfer for SST"
sleep 10

percona-xtrabackup-2.2.3-4982.el6.x86_64

Running on RHEL 6.6.

Tags:

Revision history for this message

Nilnandan Joshi (nilnandan-joshi) wrote on 2015-01-29:

Verified. Its hardcoded.

<code>
...
        tcmd="$ttcmd"
        if [[ -n $progress ]];then
            get_footprint
            tcmd="$pcmd | $tcmd"
        elif [[ -n $rlimit ]];then
            adjust_progress
            tcmd="$pcmd | $tcmd"
        fi

wsrep_log_info "Sleeping before data transfer for SST"
sleep 10

wsrep_log_info "Streaming the backup to joiner at ${REMOTEIP} ${SST_PORT:-4444}"

        if [[ -n $scomp ]];then
            tcmd="$scomp | $tcmd"
        fi

...
<code>

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2015-01-29:

This is not a PXB bug, reported in wrong component.

no longer affects:	percona-xtrabackup
no longer affects:	percona-xtrabackup/2.1
no longer affects:	percona-xtrabackup/2.2

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2015-01-29:

This issue has been fixed elsewhere, will be pushed to experimental repo in a few days.

Revision history for this message

Przemek (pmalkowski) wrote on 2015-01-29:

IMHO hardcoding such timeout is not the best idea. It may happen that in one case 3 seconds will be fair enough, but in others it may take even 30 seconds or more. But this is not the reason to make the SST wait for let's say 60 seconds in all cases even if not necessary. Why can't the script on the donor do a several retries if the joiner is ready? Some kind of SST readiness negotiations before the real transfer starts and then just long enough timeout if failed?

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2015-01-30:

No, it has not been hardcoded in the fix, nor the fix depends on it.

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1793

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.