Xtrabackup 2.0.7 or 2.1.2 failed: expected log block no. 5458041, but got no. 1070811265 from the log file

Reported by Nickolay Ihalainen on 2013-07-30
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona XtraBackup
Status tracked in 2.2
2.0
High
Alexey Kopytov
2.1
High
Alexey Kopytov
2.2
High
Alexey Kopytov

Bug Description

Affected versions:
percona-xtrabackup-2.1.2-611.rhel5
percona-xtrabackup-2.0.7-552.rhel5

xtrabackup version 2.0.7 for Percona Server 5.1.59 unknown-linux-gnu (x86_64) (revision id: 552)
xtrabackup: uses posix_fadvise().
xtrabackup: cd to /database/mysql
xtrabackup: Target instance is assumed as followings.
xtrabackup: innodb_data_home_dir = ./
xtrabackup: innodb_data_file_path = ibdata1:10M:autoextend
xtrabackup: innodb_log_group_home_dir = ./
xtrabackup: innodb_log_files_in_group = 2
xtrabackup: innodb_log_file_size = 2147483648
xtrabackup: using O_DIRECT
InnoDB: Warning: allocated tablespace 424, old maximum was 9 ...

xtrabackup: error: log block numbers mismatch:
xtrabackup: error: expected log block no. 5458041, but got no. 1070811265 from the log file.
xtrabackup: Error: xtrabackup_copy_logfile() failed.
innobackupex: Error: ibbackup child process has died at /usr/bin/innobackupex line 386.

The error emitted after 179m45.411s since backup start.

Before backup start transaction log information:
Log sequence number 186856561401671
Log flushed up to 186856559716066
Last checkpoint at 186854266314745
Max checkpoint age 3476324967
Checkpoint age target 3367689812
Modified age 2265855030
Checkpoint age 2295086926
0 pending log writes, 0 pending chkp writes
1019036192 log i/o's done, 220.02 log i/o's/second

Right after xtrabackup crash:
Log sequence number 186919780065548
Log flushed up to 186919776998233
Last checkpoint at 186917463665464
Max checkpoint age 3476324967
Checkpoint age target 3367689812
Modified age 2313548226
Checkpoint age 2316400084
0 pending log writes, 0 pending chkp writes
1020869843 log i/o's done, 88.32 log i/o's/second

There are several errors from previous runs:
 xtrabackup: error: expected log block no. 3388077, but got no. 1068741301 from the log file.

>> log scanned up to (183619734185080)
>> log scanned up to (183619741409546)
xtrabackup: error: log block numbers mismatch:
xtrabackup: error: expected log block no. 2551941, but got no. 1067905165 from the log file.

From previous month innodb recovery information:
130618 23:46:21 Percona XtraDB (http://www.percona.com) 1.0.15-12.7 started; log sequence number 166828151398485

As you can see block no. diff is always 1065353224 compare to much smaller LSN change:
(186919780065548-186856561401671)/1024/1024/1024 = 58G

description: updated
summary: - Xtrabackup 2.0.7 failed: expected log block no. 5458041, but got no.
- 1070811265 from the log file
+ Xtrabackup 2.0.7 or 2.1.2 failed: expected log block no. 5458041, but
+ got no. 1070811265 from the log file
Alexey Kopytov (akopytov) wrote :

Please provide the InnoDB log configuration from the server.

Changed in percona-xtrabackup:
status: New → Incomplete
Nickolay Ihalainen (ihanick) wrote :

There are innodb settings from my.cnf:
innodb_file_per_table=1
innodb_max_purge_lag=100000
innodb_buffer_pool_size=120GB
innodb_log_buffer_size=80M
innodb_log_file_size=2GB
innodb_open_files=1000
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT
innodb_thread_concurrency=32
innodb_autoinc_lock_mode=0
innodb_file_format=Barracuda
innodb_io_capacity=400

And all innodb related settings from show variables:
have_innodb YES
ignore_builtin_innodb OFF
innodb_adaptive_checkpoint estimate
innodb_adaptive_flushing OFF
innodb_adaptive_hash_index ON
innodb_additional_mem_pool_size 8388608
innodb_auto_lru_dump 0
innodb_autoextend_increment 8
innodb_autoinc_lock_mode 0
innodb_buffer_pool_shm_checksum ON
innodb_buffer_pool_shm_key 0
innodb_buffer_pool_size 128849018880
innodb_change_buffering inserts
innodb_checkpoint_age_target 0
innodb_checksums ON
innodb_commit_concurrency 0
innodb_concurrency_tickets 500
innodb_data_file_path ibdata1:10M:autoextend
innodb_data_home_dir
innodb_dict_size_limit 0
innodb_doublewrite ON
innodb_doublewrite_file
innodb_enable_unsafe_group_commit 0
innodb_expand_import 0
innodb_extra_rsegments 0
innodb_extra_undoslots OFF
innodb_fast_checksum OFF
innodb_fast_recovery OFF
innodb_fast_shutdown 1
innodb_file_format Barracuda
innodb_file_format_check Barracuda
innodb_file_per_table ON
innodb_flush_log_at_trx_commit 2
innodb_flush_log_at_trx_commit_session 3
innodb_flush_method O_DIRECT
innodb_flush_neighbor_pages 1
innodb_force_recovery 0
innodb_ibuf_accel_rate 100
innodb_ibuf_active_contract 1
innodb_ibuf_max_size 64424493056
innodb_io_capacity 400
innodb_lazy_drop_table 0
innodb_lock_wait_timeout 50
innodb_locks_unsafe_for_binlog OFF
innodb_log_block_size 512
innodb_log_buffer_size 83886080
innodb_log_file_size 2147483648
innodb_log_files_in_group 2
innodb_log_group_home_dir ./
innodb_max_dirty_pages_pct 75
innodb_max_purge_lag 100000
innodb_mirrored_log_groups 1
innodb_old_blocks_pct 37
innodb_old_blocks_time 0
innodb_open_files 1000
innodb_overwrite_relay_log_info OFF
innodb_page_size 16384
innodb_pass_corrupt_table 0
innodb_read_ahead linear
innodb_read_ahead_threshold 56
innodb_read_io_threads 4
innodb_recovery_stats OFF
innodb_replication_delay 0
innodb_rollback_on_timeout OFF
innodb_show_locks_held 10
innodb_show_verbose_locks 0
innodb_spin_wait_delay 6
innodb_stats_auto_update 1
innodb_stats_method nulls_equal
innodb_stats_on_metadata ON
innodb_stats_sample_pages 8
innodb_stats_update_need_lock 1
innodb_strict_mode OFF
innodb_support_xa ON
innodb_sync_spin_loops 30
innodb_table_locks ON
innodb_thread_concurrency 32
innodb_thread_concurrency_timer_based OFF
innodb_thread_sleep_delay 10000
innodb_use_purge_thread 1
innodb_use_sys_malloc ON
innodb_use_sys_stats_table OFF
innodb_version 1.0.15-12.7
innodb_write_io_threads 4

tags: added: i33538
Changed in percona-xtrabackup:
status: Incomplete → New
Alexey Kopytov (akopytov) wrote :

The problem is a combination of 2 things:

1. InnoDB log block numbers wrap around at 0x40000000. So the next log block after 1073741823 has number 0, rather than 1073741824.
2. XtraBackup uses the following logic to detect "old" blocks (i.e. the current log tail after which it has to stop scanning): if the current log block number is less than the current scanned log block number, it is an old one, stop scanning.

#2 doesn't work in cases when log block numbers wrap around. I.e. we might be looking at an old block with a number before they had wrapped around, but the current "expected" number has already wrapped around.

Nickolay Ihalainen (ihanick) wrote :

The bug is a regression, it's possible to make a backup with 2.0.6.

Alexey Kopytov (akopytov) wrote :

No, it's not a regression. 2 conditions must be met in order to hit this bug:

1. The log block numbers must wrap around, which only happens once per 1 GB of log writes
2. The wrap-around point must be between the last checkpoint and the current log tail at the time the backup starts.

Which makes it not that easy to hit. Even when you hit it, it may disappear later once the server moves the last checkpoint past the numbers overwrap point.

Alexey Kopytov (akopytov) wrote :

The fix for this bug was incorrect due to a couple of unfortunate omissions. Will submit a followup fix.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers