Comment 2 for bug 1706342

Revision history for this message
Rick Pizzi (pizzi) wrote : Re: [Bug 1706342] xtrabackup 2.4 log copy thread hangs if there is no enough CPU

It doesn’t seem to be related with any DDL. Most of the time I get the other error.
Question, should the “log scanned up to … (LSN)” appear once every second (by default)?
It seems to get “stuck" in our tests quite often, like in:

170725 22:49:32 >> log scanned up to (97599947036309)
170725 22:49:33 >> log scanned up to (97599947908874)
170725 22:49:34 >> log scanned up to (97599948863500)
170725 22:49:35 >> log scanned up to (97599949768269)
170725 22:49:36 >> log scanned up to (97599950590202)

(pauses several seconds)

170725 22:49:43 >> log scanned up to (97599956699160)
170725 22:49:44 >> log scanned up to (97599957736494)


Riccardo Pizzi
<email address hidden>
http://www.pinballowners.com/webmaster

On 25/lug/2017, at 19:41, Shahriyar Rzayev <email address hidden> wrote:

> Are you running DDL run during backup? If yes then please see also this reports:
> https://bugs.launchpad.net/percona-xtrabackup/+bug/1555626
> https://bugs.launchpad.net/percona-xtrabackup/+bug/1532878
>
> According the error message it is similar as described here:
>
> https://bugs.launchpad.net/percona-xtrabackup/+bug/1532878/comments/3
>
> And now you can try to use 2.4.8 with new options as shown here:
>
> https://bugs.launchpad.net/percona-xtrabackup/+bug/1532878/comments/13
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1706342
>
> Title:
> xtrabackup 2.4 log copy thread hangs if there is no enough CPU
>
> Status in Percona XtraBackup:
> New
>
> Bug description:
> We have been trying to take backups of a large MySQL 5.7.18 instance using xtrabackup 2.4.7.
> We have observed that if cpu usage hits 100% the log copy threads hangs, and after a little while xtrabackup dies with an error saying that it is unable to cope with the redo log growth:
>
> xtrabackup: error: log block numbers mismatch:
> xtrabackup: error: expected log block no. 212999933, but got no. 215097077 from the log file.
> xtrabackup: error: it looks like InnoDB log has wrapped around before xtrabackup could process all records due to either log copying being too slow, or log files being too small.
>
> Actually, depending on when xtrabackup realizes the problem, other
> errors may show up too, including assertions:
>
> InnoDB: Last flushed lsn: 97554425340507 load_index lsn 97554732668223
> [FATAL] InnoDB: An optimized(without redo logging) DDLoperation has been performed. All modified pages may not have been flushed to the disk yet.
> PXB will not be able take a consistent backup. Retry the backup operation
> 2017-07-25 10:17:48 0x7f38933d6700 InnoDB: Assertion failure in thread 139880965170944 in file ut0ut.cc line 916
>
>
> From some quick troubleshooting, the log copy threads appears to hang in:
>
> Thread 16 (Thread 0x7fde978b4700 (LWP 12487)):
> #0 0x000000399a80ba0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1 0x000000000092525e in os_event::timed_wait(timespec const*) ()
> #2 0x000000000092557e in os_event::wait_time_low(unsigned long, long) ()
> #3 0x000000000075ce54 in log_copying_thread(void*) ()
> #4 0x000000399a807a51 in start_thread () from /lib64/libpthread.so.0
> #5 0x000000399a0e893d in clone () from /lib64/libc.so.6
>
>
> We have been trying different options and settings until we realized it all depends on the fact CPU on the test machine is available or not.
>
> Turns out that this only happens if you hit 100% utilisation, and we
> were able to make xtrabackup work correctly by reducing CPU usage (we
> removed compression to save CPU, and reduced parallelism in xtrabackup
> itself).
>
> We made several tests and as long as there is available idle CPU the
> log copy thread works fine, but it hangs if there is no CPU left for
> it and never recovers.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/percona-xtrabackup/+bug/1706342/+subscriptions
>