sst_donor_thread stuck/tables_flushed file not created

Bug #1246787 reported by Teemu Ollakka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
New
Undecided
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Incomplete
Undecided
Unassigned
5.6
Incomplete
Undecided
Unassigned

Bug Description

In seesaw test donor was stuck in

2013-10-31 15:44:19 11500 [Note] WSREP: Provider paused at a13fb7c4-422f-11e3-94de-7e24119842fa:191505
2013-10-31 15:44:19 11500 [Note] WSREP: Tables flushed.

It appeared that sst_donor_thread was waiting input from wsrep_sst_rsync script:

#0 0x00007f00405c48cd in read () at ../sysdeps/unix/syscall-template.S:82
#1 0x00007f0040558ff8 in _IO_new_file_underflow (fp=0x7effd4008190) at fileops.c:619
#2 0x00007f004055a03e in _IO_default_uflow (fp=0x7effd4008190) at genops.c:440
#3 0x00007f004054e18a in _IO_getline_info (fp=0x7effd4008190, buf=0x7efff8ff8dc0 "flush tables", n=127, delim=10,
    extract_delim=1, eof=0x0) at iogetline.c:74
#4 0x00007f004054d06b in _IO_fgets (buf=0x7efff8ff8dc0 "flush tables", n=<optimized out>, fp=0x7effd4008190) at iofgets.c:58
#5 0x000000000062ba0e in my_fgets (buf=0x7efff8ff8dc0 "flush tables", buf_len=128, stream=0x7effd4008190)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.6/sql/wsrep_sst.cc:302
#6 0x000000000062dd47 in sst_donor_thread (a=0x7f003c0d6040)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.6/sql/wsrep_sst.cc:859
#7 0x00007f00410b6e9a in start_thread (arg=0x7efff8ff9700) at pthread_create.c:308
#8 0x00007f00405d1ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#9 0x0000000000000000 in ?? ()

(gdb) f 6
#6 0x000000000062dd47 in sst_donor_thread (a=0x7f003c0d6040)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.6/sql/wsrep_sst.cc:859
859 out= my_fgets (out_buf, out_len, proc.pipe());
(gdb) p locked
$14 = true

In other words, wsrep_sst_rsync had written "flush tables" in pipe.

Processlist indicates that wsrep_sst_rsync was waiting for "tables_flushed" file to be created:

11500 pts/19 tl 1:07 /run/shm/galera/local2/mysql/sbin/mysqld --defaults-file=/run/shm/galera/local2/mysql/etc/my.cnf --user=teemu -
14848 pts/19 S 0:00 \_ sh -c wsrep_sst_rsync --role 'donor' --address 'gw:10013/rsync_sst' --auth 'root:rootpass' --socket '/run/s
14850 pts/19 S 0:52 \_ /bin/bash -ue /run/shm/galera/local2/mysql//bin/wsrep_sst_rsync --role donor --address gw:10013/rsync_s
28488 pts/19 S 0:00 \_ sleep 0.2

Sleep corresponds to lines

        # wait for tables flushed and state ID written to the file
        while [ ! -r "$FLUSHED" ] && ! grep -q ':' "$FLUSHED" >/dev/null 2>&1
        do
            sleep 0.2
        done

in script.

So either "tables_flushed" was not created (does not look likely that it could happen without error message in log), or somehow the file got deleted before script saw it.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

A related issue here is that, that loop can get infinite depending on circumstances, so it would be prudent to keep a timeout there where it bails out (if FTWRL is not possible at all).

Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

Can you test this with latest 5.6.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1499

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.