sst_donor_thread stuck/tables_flushed file not created

Bug #1246787 reported by Teemu Ollakka on 2013-10-31
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Undecided
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Incomplete
Undecided
Unassigned
5.6
Incomplete
Undecided
Unassigned

Bug Description

In seesaw test donor was stuck in

2013-10-31 15:44:19 11500 [Note] WSREP: Provider paused at a13fb7c4-422f-11e3-94de-7e24119842fa:191505
2013-10-31 15:44:19 11500 [Note] WSREP: Tables flushed.

It appeared that sst_donor_thread was waiting input from wsrep_sst_rsync script:

#0 0x00007f00405c48cd in read () at ../sysdeps/unix/syscall-template.S:82
#1 0x00007f0040558ff8 in _IO_new_file_underflow (fp=0x7effd4008190) at fileops.c:619
#2 0x00007f004055a03e in _IO_default_uflow (fp=0x7effd4008190) at genops.c:440
#3 0x00007f004054e18a in _IO_getline_info (fp=0x7effd4008190, buf=0x7efff8ff8dc0 "flush tables", n=127, delim=10,
    extract_delim=1, eof=0x0) at iogetline.c:74
#4 0x00007f004054d06b in _IO_fgets (buf=0x7efff8ff8dc0 "flush tables", n=<optimized out>, fp=0x7effd4008190) at iofgets.c:58
#5 0x000000000062ba0e in my_fgets (buf=0x7efff8ff8dc0 "flush tables", buf_len=128, stream=0x7effd4008190)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.6/sql/wsrep_sst.cc:302
#6 0x000000000062dd47 in sst_donor_thread (a=0x7f003c0d6040)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.6/sql/wsrep_sst.cc:859
#7 0x00007f00410b6e9a in start_thread (arg=0x7efff8ff9700) at pthread_create.c:308
#8 0x00007f00405d1ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#9 0x0000000000000000 in ?? ()

(gdb) f 6
#6 0x000000000062dd47 in sst_donor_thread (a=0x7f003c0d6040)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.6/sql/wsrep_sst.cc:859
859 out= my_fgets (out_buf, out_len, proc.pipe());
(gdb) p locked
$14 = true

In other words, wsrep_sst_rsync had written "flush tables" in pipe.

Processlist indicates that wsrep_sst_rsync was waiting for "tables_flushed" file to be created:

11500 pts/19 tl 1:07 /run/shm/galera/local2/mysql/sbin/mysqld --defaults-file=/run/shm/galera/local2/mysql/etc/my.cnf --user=teemu -
14848 pts/19 S 0:00 \_ sh -c wsrep_sst_rsync --role 'donor' --address 'gw:10013/rsync_sst' --auth 'root:rootpass' --socket '/run/s
14850 pts/19 S 0:52 \_ /bin/bash -ue /run/shm/galera/local2/mysql//bin/wsrep_sst_rsync --role donor --address gw:10013/rsync_s
28488 pts/19 S 0:00 \_ sleep 0.2

Sleep corresponds to lines

        # wait for tables flushed and state ID written to the file
        while [ ! -r "$FLUSHED" ] && ! grep -q ':' "$FLUSHED" >/dev/null 2>&1
        do
            sleep 0.2
        done

in script.

So either "tables_flushed" was not created (does not look likely that it could happen without error message in log), or somehow the file got deleted before script saw it.

A related issue here is that, that loop can get infinite depending on circumstances, so it would be prudent to keep a timeout there where it bails out (if FTWRL is not possible at all).

Krunal Bauskar (krunal-bauskar) wrote :

Can you test this with latest 5.6.

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1499

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers