rsync issues with SST/IST on joiner

Bug #1169676 reported by Raghavendra D Prabhu
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Fix Released
Undecided
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Fix Released
High
Raghavendra D Prabhu

Bug Description

Following issues with SST and IST on joiner with rsync SST:

a) When SST fails, the rsync daemon remains.

b) When SST is tried again, the rsync fails with pid error.

In step a), if there is a failure the daemon must be killed OR it
shouldn't fail in b) with that error.

During IST, it fails as:

        group UUID = 11264cec-06e6-11e2-0800-61616b1fc754
130417 0:00:36 [Note] WSREP: Flow-control interval: [12, 23]
130417 0:00:36 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 17)
130417 0:00:36 [Note] WSREP: State transfer required:
        Group state: 11264cec-06e6-11e2-0800-61616b1fc754:17
        Local state: 11264cec-06e6-11e2-0800-61616b1fc754:3
130417 0:00:36 [Note] WSREP: New cluster view: global state: 11264cec-06e6-11e2-0800-61616b1fc754:17, view# 6: Primary, number of nodes: 2, my index: 1, protocol version 2
130417 0:00:36 [Warning] WSREP: Gap in state sequence. Need state transfer.
130417 0:00:38 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.0.2.154' --auth 'root:test' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '2656''
/usr//bin/wsrep_sst_common: line 94: /dev/stderr: Permission denied
130417 0:00:38 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address '10.0.2.154' --auth 'root:test' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '2656'
        Read: '(null)'
130417 0:00:38 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '10.0.2.154' --auth 'root:test' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '2656': 1 (Operation not permitted)
130417 0:00:38 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
130417 0:00:38 [ERROR] Aborting

Related branches

Changed in percona-xtradb-cluster:
milestone: none → 5.5.30-24.8
importance: Undecided → High
status: New → Triaged
assignee: nobody → Raghavendra D Prabhu (raghavendra-prabhu)
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Similar issue highlighted in lp:1143052 is where rsync daemon is running between ISTs and causes issues.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

With 5.5.30 I don't see any issues since there is cleanup in place, however, if rsync daemon is already running, subsequent SST and IST can fail. This means rsync daemon has to be killed manually since the fact it is running is abnormal because the script kills it when quitting.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

One thing, which I see is in wsrep_rsync_sst -- trap cleanup_joiner EXIT -- the cleanup_joiner is not called when TERM etc. are sent, so will require manual cleanup if the parent script is killed with those signals.

Revision history for this message
Mrten (bugzilla-ii) wrote :

I regularly have a rsync daemon left running, usually when the upgrade fails for some reason.

I normally trap EXIT SIGTERM SIGKILL SIGINT. Any reason not to add those? I suppose a crash is not exit...

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

The full code actually is

    trap "exit 32" HUP PIPE
    trap "exit 3" INT TERM
    trap cleanup_joiner EXIT

So SIGHUP, SIGPIPE, SIGINT and SIGTERM are converted into exit, and cleanup_joiner is supposed to run on exit. IIRC SIGKILL cannot be trapped, it's -9.

What comes to me is that perhaps SIGABRT may be inherited from parent and must be trapped as well?

Changed in percona-xtradb-cluster:
status: Triaged → Fix Committed
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

The fix released in this case is for xtrabackup - to cleanup_joiner on exit. For killings, manual cleanup needs to be done.

Changed in percona-xtradb-cluster:
status: Fix Committed → Fix Released
Revision history for this message
Alex Yurchenko (ayurchen) wrote :

merged into codership-mysql some time in the past (around r3838 in 5.5 and r3917 in 5.6)

Changed in codership-mysql:
status: New → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-957

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.