apply-log hangs on partial backup

Bug #342540 reported by Percona
2
Affects Status Importance Assigned to Milestone
Percona XtraBackup moved to https://jira.percona.com/projects/PXB
Fix Released
High
Unassigned

Bug Description

I did backup as
./innobackupex-1.5.1 --include=art53.* ./ --stream=tar > /mnt/data/tmp/dat.tar

I unpacked dat.tar
and
do
 ./innobackupex-1.5.1 --apply-log /mnt/data/tmp/dat/

there bunch of errors like:

090313 16:25:30 InnoDB: Error: table 'test/test_innodb'
InnoDB: in InnoDB data dictionary has tablespace id 1699,
InnoDB: but tablespace with that id or name does not exist. Have
InnoDB: you deleted or moved .ibd files?
InnoDB: This may also be a table created with CREATE TEMPORARY TABLE
InnoDB: whose .ibd and .frm files MySQL automatically removed, but the
InnoDB: table still exists in the InnoDB internal data dictionary.
InnoDB: Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.0/en/innodb-troubleshooting.html
InnoDB: for how to resolve the issue.
090313 16:25:30 InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
InnoDB: Apply batch completed
InnoDB: Last MySQL binlog file position 0 437810557, file name ./db02-bin.001225
InnoDB: Starting in background the rollback of uncommitted transactions
090313 16:25:35 InnoDB: Rolling back trx with id 4 9421214, 12 rows to undo
090313 16:25:36 InnoDB: Started; log sequence number 1171 2823684590
090313 16:25:36 InnoDB: error: space object of table art60/link_out60,
InnoDB: space id 1453 did not exist in memory. Retrying an open.
090313 16:25:36 InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
090313 16:25:36 InnoDB: Error: trying to open a table, but could not
InnoDB: open the tablespace file './art60/link_out60.ibd'!
InnoDB: Have you moved InnoDB .ibd files around without using the
InnoDB: commands DISCARD TABLESPACE and IMPORT TABLESPACE?
InnoDB: It is also possible that this is a temporary table #sql...,
InnoDB: and MySQL removed the .ibd file for this.
InnoDB: Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.0/en/innodb-troubleshooting.html
InnoDB: for how to resolve the issue.

[notice (again)]
  If you use binary log and don't use any hack of group commit,
  the binary log position seems to be:
InnoDB: Last MySQL binlog file position 0 437810557, file name ./db02-bin.001225

090313 16:25:36 InnoDB: Starting shutdown...

InnoDB: Rolling back of trx id 4 9421214 completed
090313 16:25:36 InnoDB: Rollback of non-prepared transactions completed

but on last operation is seem hang (I waited few hours).

After Ctrl-C I got:

innobackupex: Error:
innobackupex: ibbackup failed at ./innobackupex-1.5.1 line 427.

Revision history for this message
Yasufumi Kinoshita (yasufumi-kinoshita) wrote :

It seems to be short to specify the problem.
I'd like to see stack traces of the xtrabackup threads at the hangup state.
Thank you.

Revision history for this message
Percona (percona-team) wrote :

Hang in another place. Check error message:

InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 090316 23:08:37 InnoDB: ERROR: the age of the last checkpoint is 65066237,
InnoDB: which exceeds the log group capacity 58567680.
InnoDB: If you are using big BLOB or TEXT rows, you must set the
InnoDB: combined size of log files at least 10 times bigger than the
InnoDB: largest such row.
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

no progress after this.

xtrabackup backtrace:

(gdb) bt
#0 0x00000030e42c28d6 in __select_nocancel () from /lib64/tls/libc.so.6
#1 0x00000000004f9ecb in os_thread_sleep ()
#2 0x00000000004acb87 in recv_apply_hashed_log_recs ()
#3 0x00000000004a7896 in log_checkpoint ()
#4 0x00000000004a7e9f in log_fsp_current_free_limit_set_and_checkpoint ()
#5 0x00000000004b210e in fsp_header_get_free_limit ()
#6 0x000000000040c7e5 in innobase_start_or_create_for_mysql ()
#7 0x0000000000403a29 in innodb_init () at xtrabackup.c:1187
#8 0x0000000000406385 in xtrabackup_prepare_func () at xtrabackup.c:2621
#9 0x00000000004067dc in main (argc=0, argv=0x677c48) at xtrabackup.c:2818

Revision history for this message
Percona (percona-team) wrote :

Another similar hang

Note size of xtrabackup_logfile is 82650K

090316 23:55:58 InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 090316 23:56:01 InnoDB: ERROR: the age of the last checkpoint is 84630934,
InnoDB: which exceeds the log group capacity 76173927.
InnoDB: If you are using big BLOB or TEXT rows, you must set the
InnoDB: combined size of log files at least 10 times bigger than the
InnoDB: largest such row.
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

Revision history for this message
Yasufumi Kinoshita (yasufumi-kinoshita) wrote :

select() as usleep emulation seems to be hang up sometimes from xtrabackup.
I have added the patch to change to usleep() from select() for now.

Changed in percona-xtrabackup:
assignee: nobody → yasufumi-kinoshita
status: New → Fix Committed
Revision history for this message
Percona (percona-team) wrote :

What about error message

InnoDB: ERROR: the age of the last checkpoint is 84630934,
InnoDB: which exceeds the log group capacity 76173927.

?

Revision history for this message
Yasufumi Kinoshita (yasufumi-kinoshita) wrote :

Hmm... Good question.
It shows that the new transaction log is generated during the recovery.
It may be no problem as far as the contents of xtrabackup_logfile are not overwritten.
We may have to keep enough margin to xtrabackup_logfile before the recovery, or suppress to generate new logs.
I will investigate. Please give me some time.
Thank you.

Revision history for this message
Yasufumi Kinoshita (yasufumi-kinoshita) wrote :

Hello,
The message
"
InnoDB: ERROR: the age of the last checkpoint is 84630934,
InnoDB: which exceeds the log group capacity 76173927.
"
is no problem as far as 'xtrabackup --prepare' doesn't crash or terminated during the recovery.

If you see the message
"
InnoDB: ERROR: We were only able to scan the log up to
InnoDB: 0 1180263936, but a checkpoint was at 0 1180264362.
InnoDB: It is possible that the database is now corrupt!
"
during 'xtrabackup --prepare' after the crash of 'xtrabackup --prepare', it shows that the new logs were overwritten to xtrabackup_logfile.
It will cause if the crash is after "InnoDB: Apply batch completed".

Changed in percona-xtrabackup:
importance: Undecided → High
Revision history for this message
Percona (percona-team) wrote :

It still hangs for me

InnoDB: Doing recovery: scanned up to log sequence number 1315 1446268928
InnoDB: Doing recovery: scanned up to log sequence number 1315 1451511808
InnoDB: Doing recovery: scanned up to log sequence number 1315 1456754688
InnoDB: Doing recovery: scanned up to log sequence number 1315 1461997568
InnoDB: Doing recovery: scanned up to log sequence number 1315 1467240448
InnoDB: Doing recovery: scanned up to log sequence number 1315 1472483328
InnoDB: Doing recovery: scanned up to log sequence number 1315 1477726208
InnoDB: Doing recovery: scanned up to log sequence number 1315 1482969088
InnoDB: Doing recovery: scanned up to log sequence number 1315 1488211968
InnoDB: Doing recovery: scanned up to log sequence number 1315 1493454848
InnoDB: Doing recovery: scanned up to log sequence number 1315 1498697728
InnoDB: Doing recovery: scanned up to log sequence number 1315 1503940608
InnoDB: Doing recovery: scanned up to log sequence number 1315 1509183488
InnoDB: Doing recovery: scanned up to log sequence number 1315 1514426368
InnoDB: Doing recovery: scanned up to log sequence number 1315 1519669248
InnoDB: Doing recovery: scanned up to log sequence number 1315 1524912128
InnoDB: Doing recovery: scanned up to log sequence number 1315 1530155008
InnoDB: Doing recovery: scanned up to log sequence number 1315 1535397888
090325 23:49:13 InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

backtrace:

0x00000030e42901d5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6
(gdb) bt
#0 0x00000030e42901d5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6
#1 0x00000030e42c3058 in usleep () from /lib64/tls/libc.so.6
#2 0x00000000004accf7 in recv_apply_hashed_log_recs ()
#3 0x00000000004ae859 in recv_scan_log_recs ()
#4 0x00000000004af7c3 in recv_recovery_from_checkpoint_start ()
#5 0x000000000040c7cf in innobase_start_or_create_for_mysql ()
#6 0x00000000004039d9 in innodb_init () at xtrabackup.c:1187
#7 0x00000000004063c2 in xtrabackup_prepare_func () at xtrabackup.c:2636
#8 0x000000000040681c in main (argc=0, argv=0x677c48) at xtrabackup.c:2833

Please note it hangs all time on 99% - may be it is related somehow to 99% ?

Changed in percona-xtrabackup:
status: Fix Committed → Confirmed
Revision history for this message
Yasufumi Kinoshita (yasufumi-kinoshita) wrote :

Hmm... strange...

It may be at log0recv.c:
====
 /* Wait until all the pages have been processed */

 while (recv_sys->n_addrs != 0) {

  mutex_exit(&(recv_sys->mutex));

  os_thread_sleep(500000);

  mutex_enter(&(recv_sys->mutex));
 }
====

The possible causers are,
(1) __nanosleep_nocancel hangs up.
OR
(2) recv_sys->n_addrs never become 0.

If not (1), recv_sys->n_addrs may be not 0.
But I cannot reproduce the hang up and I have not found wrong code yet.

please give me the information.

(gdb) p recv_sys->n_addrs
at the time

Revision history for this message
Vadim Tkachenko (vadim-tk) wrote :

I think hang may also happen here:

        if (recv_sys->apply_batch_on) {

                mutex_exit(&(recv_sys->mutex));

                os_thread_sleep(500000);

                goto loop;
        }

here is bt with more debug info:

(gdb) bt
#0 0x00000030e42901d5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6
#1 0x00000030e42c3058 in usleep () from /lib64/tls/libc.so.6
#2 0x0000000000501168 in os_thread_sleep (tm=548682064480) at os0thread.c:271
#3 0x00000000004b2c45 in recv_apply_hashed_log_recs (allow_ibuf=0) at log0recv.c:1441
#4 0x00000000004b4a85 in recv_scan_log_recs (apply_automatically=1, available_memory=100663296, store_to_hash=1,
    buf=0x2a9c2a4200 "\221?E%\002", len=65536, start_lsn={high = 1315, low = 3163179008},
    contiguous_lsn=0x7fbffff160, group_scanned_lsn=0x7fbffff170) at log0recv.c:2538
#5 0x00000000004b4bda in recv_group_scan_log_recs (group=0x2a956604b8, contiguous_lsn=0x7fbffff160,
    group_scanned_lsn=0x7fbffff170) at log0recv.c:2577
#6 0x00000000004b55d1 in recv_recovery_from_checkpoint_start (type=78656949, limit_lsn=
      {high = 4294967295, low = 4294967295}, min_flushed_lsn={high = 1132, low = 2546717136}, max_flushed_lsn=
      {high = 1132, low = 2546717136}) at log0recv.c:2819
#7 0x000000000040c2d3 in innobase_start_or_create_for_mysql () at srv0start.c:1446
#8 0x00000000004039d9 in innodb_init () at xtrabackup.c:1187
#9 0x00000000004063c2 in xtrabackup_prepare_func () at xtrabackup.c:2636
#10 0x000000000040681c in main (argc=0, argv=0x67dc48) at xtrabackup.c:2833
(gdb) p recv_sys->apply_batch_on
$1 = 1
(gdb) p recv_sys->n_addrs
$2 = 4

Revision history for this message
Yasufumi Kinoshita (yasufumi-kinoshita) wrote :

I have fixed to count down recv_sys->n_addrs even if the tablespace does not exist.

http://bazaar.launchpad.net/~percona-dev/percona-xtrabackup/alpha-0.3/revision/30

Hmm... It may be InnoDB's bug...

*
I think the hang never happen at
"if (recv_sys->apply_batch_on) {"

Changed in percona-xtrabackup:
status: Confirmed → Fix Committed
Revision history for this message
Yasufumi Kinoshita (yasufumi-kinoshita) wrote :

I have reported the bug related to this bug for bugs.mysql.

http://bugs.mysql.com/43948

Changed in percona-xtrabackup:
status: Fix Committed → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXB-251

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.