Assertion failure on --prepare --apply-log-only

Bug #938594 reported by Alexey Kopytov
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Percona XtraBackup moved to https://jira.percona.com/projects/PXB
Fix Released
High
Alexey Kopytov
1.6
Fix Released
High
Alexey Kopytov
2.0
Fix Released
High
Alexey Kopytov

Bug Description

In MySQL 5.1.57 a new assertion was added as a part of the fix for http://bugs.mysql.com/bug.php?id=59641. That assertion is not applicable when doing recovery with --apply-log-only option, so it fails in this case like this after successfully applying the log:

xtrabackup: starting shutdown with innodb_fast_shutdown = 1
120221 8:35:01 InnoDB: Starting shutdown...
120221 8:35:01 InnoDB: Assertion failure in thread 46955612838640 in
file trx/trx0sys.c line 1914
InnoDB: Failing assertion: UT_LIST_GET_LEN(trx_sys->trx_list) ==
trx_n_prepared
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
Aborted

Since the assertion was introduced in 5.1.57 and XtraBackup was rebased from 5.1.55 to 5.1.59 in version 1.6.4, it's a regression from 1.6.3.

Related branches

summary: Assertion failure on --prepare --apply-log-only when XA transactions are
+ used
summary: - Assertion failure on --prepare --apply-log-only when XA transactions are
- used
+ Assertion failure on --prepare --apply-log-only
Revision history for this message
Matt (whereswardy) wrote :

I've tested this branch and it fixes the problem we were having too. It wasn't from a XA transaction, but we were getting the same assertion failure.

Also, our prepare times for --redo-only have gone from about 30 minutes to 15 seconds, (doesn't seem to do log recovery at all anymore) not sure if it should've been like this before but the tested backups are still valid so that's pretty good!

Revision history for this message
Alexey Kopytov (akopytov) wrote :

Yes, it was not in fact related to XA transactions, so I updated the summary.

But I don't see how it might have any effect on prepare times. It must be something else.

Revision history for this message
Matt (whereswardy) wrote :

The difference is that with 1.6.5 during the prepare step it would do a huge log recovery step that would take about 30 minutes, that looked like this:

120223 17:13:31 InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Doing recovery: scanned up to log sequence number 63858176 (1 %)
...
InnoDB: Doing recovery: scanned up to log sequence number 68738858 (99 %)
120223 17:13:32 InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
InnoDB: Apply batch completed

It would be the "Progress in percents" step that would take about 99% of the prepare time before, and I'd always have to give it a lot of memory (typically 16GB) in order to get it to run at that speed (any less and it would take even longer).

This branch doesn't do that log recovery bit at all, but I still end up with a valid backup at the end.

Revision history for this message
Matt (whereswardy) wrote :

My bad, replication was stopped at the time so there were no queries or transactions occurring during the backups as it's just a backup slave, which meant that there was nothing to replay or recover. Doing a backup at the moment with replication running and it's gone back to doing it's usual thing of taking quite a while. I'm getting the "1 transaction(s) which must be rolled back or cleaned up" message again which was a pre-cursor to the assertion failure before, so hopefully this will succeed.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXB-320

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.