wsrep_sst_xtrabackup-v2 script, longer sleep needed before SST

Bug #1413879 reported by Thomas Daugherty
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Fix Released
Undecided
Raghavendra D Prabhu
5.6
Fix Released
Undecided
Raghavendra D Prabhu

Bug Description

In file/script wsrep_sst_xtrabackup-v2, there is a sleep before SST, which is set to 10 seconds be default. However on my systems it takes longer than 10 seconds for the joiner to delete existing files, before it start the socat receiver. To the donor waits 10 seconds, then tried to connect via socat to the joiner, which hasn't yet run socat.

I would say this sleep needs to be at least 20 seconds, to allow time to remove existing files on the joiner side.

        wsrep_log_info "Sleeping before data transfer for SST"
        sleep 10

percona-xtrabackup-2.2.3-4982.el6.x86_64

Running on RHEL 6.6.

Tags: sst xtrabackup
Revision history for this message
Nilnandan Joshi (nilnandan-joshi) wrote :

Verified. Its hardcoded.

<code>
...
        tcmd="$ttcmd"
        if [[ -n $progress ]];then
            get_footprint
            tcmd="$pcmd | $tcmd"
        elif [[ -n $rlimit ]];then
            adjust_progress
            tcmd="$pcmd | $tcmd"
        fi

        wsrep_log_info "Sleeping before data transfer for SST"
        sleep 10

        wsrep_log_info "Streaming the backup to joiner at ${REMOTEIP} ${SST_PORT:-4444}"

        if [[ -n $scomp ]];then
            tcmd="$scomp | $tcmd"
        fi

...
<code>

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

This is not a PXB bug, reported in wrong component.

no longer affects: percona-xtrabackup
no longer affects: percona-xtrabackup/2.1
no longer affects: percona-xtrabackup/2.2
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

This issue has been fixed elsewhere, will be pushed to experimental repo in a few days.

Revision history for this message
Przemek (pmalkowski) wrote :

IMHO hardcoding such timeout is not the best idea. It may happen that in one case 3 seconds will be fair enough, but in others it may take even 30 seconds or more. But this is not the reason to make the SST wait for let's say 60 seconds in all cases even if not necessary. Why can't the script on the donor do a several retries if the joiner is ready? Some kind of SST readiness negotiations before the real transfer starts and then just long enough timeout if failed?

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

No, it has not been hardcoded in the fix, nor the fix depends on it.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1793

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.