XtraBackup segfaults after period of time

Bug #1166888 reported by Justin La Sotten on 2013-04-09
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraBackup moved to https://jira.percona.com/projects/PXB
Invalid
Undecided
Unassigned
2.0
Fix Released
High
Alexey Kopytov
2.1
Invalid
Undecided
Unassigned

Bug Description

Version:
XtraBackup 2.0.6-x86_64.

Description:
I have a large 64-bit mysql 5.0.77 instance, 700+GB.
When innobackupex is run, it begins to perform the backup. It will run for quite some time (transfers ~60GB) and then segfaults.

Command Invocation:
innobackupex --stream=tar ./ | gzip | ssh backup@backup_server "cd /var/lib/mysql/; tar izxf -"

Segfault:
Apr 8 19:17:23 db3 kernel: xtrabackup_51[12854]: segfault at 0000000000000084 rip 00000000004e59f5 rsp 0000000043debff0 error 4

Strace:
... { SNIP } ...
stat("/tmp/xtrabackup_suspended", 0x1e9e1140) = -1 ENOENT (No such file or directory)
wait4(12189, 0x7fff4c2d1fc4, WNOHANG, NULL) = 0
nanosleep({0, 100000000}, NULL) = 0
... { END SNIP } ...

stat("/tmp/xtrabackup_suspended", 0x1e9e1140) = -1 ENOENT (No such file or directory)
wait4(12189, 0x7fff4c2d1fc4, WNOHANG, NULL) = 0
nanosleep({0, 100000000}, NULL) = 0
stat("/tmp/xtrabackup_suspended", 0x1e9e1140) = -1 ENOENT (No such file or directory)
wait4(12189, 0x7fff4c2d1fc4, WNOHANG, NULL) = 0
nanosleep({0, 100000000}, NULL) = 0
stat("/tmp/xtrabackup_suspended", 0x1e9e1140) = -1 ENOENT (No such file or directory)
wait4(12189, 0x7fff4c2d1fc4, WNOHANG, NULL) = 0
nanosleep({0, 100000000}, NULL) = 0
stat("/tmp/xtrabackup_suspended", 0x1e9e1140) = -1 ENOENT (No such file or directory)
wait4(12189, 0x7fff4c2d1fc4, WNOHANG, NULL) = 0
nanosleep({0, 100000000}, 0) = ? ERESTART_RESTARTBLOCK (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
restart_syscall(<... resuming interrupted call ...>) = 0
stat("/tmp/xtrabackup_suspended", 0x1e9e1140) = -1 ENOENT (No such file or directory)
wait4(12189, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV}], WNOHANG, NULL) = 12189
write(2, "innobackupex: Error: ibbackup ch"..., 88innobackupex: Error: ibbackup child process has died at /usr/bin/innobackupex line 381.
) = 88
umask(0) = 022
umask(022) = 0
umask(066) = 022
chmod("/tmp/X5QwfzQ7jU", 0600) = 0
umask(022) = 066
fstat(4, {st_mode=S_IFREG|0600, st_size=36, ...}) = 0
stat("/tmp/X5QwfzQ7jU", {st_mode=S_IFREG|0600, st_size=36, ...}) = 0
stat("/tmp/X5QwfzQ7jU", {st_mode=S_IFREG|0600, st_size=36, ...}) = 0
close(4) = 0
umask(0) = 022
umask(022) = 0
umask(066) = 022
chmod("/tmp/X5QwfzQ7jU", 0600) = 0
umask(022) = 066
lstat("/tmp/X5QwfzQ7jU", {st_mode=S_IFREG|0600, st_size=36, ...}) = 0
unlink("/tmp/X5QwfzQ7jU") = 0
umask(0) = 022
umask(022) = 0
umask(066) = 022
chmod("/tmp/miPIhpfqzr", 0600) = 0
umask(022) = 066
fstat(3, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
stat("/tmp/miPIhpfqzr", {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
stat("/tmp/miPIhpfqzr", {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
close(3) = 0
umask(0) = 022
umask(022) = 0
umask(066) = 022
chmod("/tmp/miPIhpfqzr", 0600) = 0
umask(022) = 066
lstat("/tmp/miPIhpfqzr", {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
unlink("/tmp/miPIhpfqzr") = 0
close(5) = 0
exit_group(2) = ?

Related branches

Alexey Kopytov (akopytov) wrote :

The strace output is from the innobackupex process, which doesn't provide any useful info. It is waiting for xtrabackup_suspended file to be created by the xtrabackup_51 process, but then terminates because that process has died.

It is unclear what could be the reason of segfault in xtrabackup_51. Is there anything that might be relevant in the xtrabackup log, or it just crashes silently? Can you provide the part of the log before the crash occurs?

Changed in percona-xtrabackup:
status: New → Incomplete

@Justin,

Since it is crashing with segfault, can you enable core-dumps and
provide a backtrace from it?

gdb `which xtrabackup_51` --core <core-file> --batch --quiet -ex "thread apply all bt full" -ex "quit"

IS this Xtrabackup built from source or installed from Percona
RPM?

If latter, then make sure you have percona-xtrabackup debug package installed as well before producing the core.

@Alexey
When it crashes this is what the output of innobackupex is:

... { SNIP } ...
>> log scanned up to (1181 627995473)
>> log scanned up to (1181 628102970)
>> log scanned up to (1181 628350216)
>> log scanned up to (1181 628834186)
>> log scanned up to (1181 629085674)
>> log scanned up to (1181 629414248)
>> log scanned up to (1181 629821568)
>> log scanned up to (1181 630278611)
>> log scanned up to (1181 630609818)
>> log scanned up to (1181 630840460)
>> log scanned up to (1181 631256774)
>> log scanned up to (1181 631514315)
>> log scanned up to (1181 631846649)
>> log scanned up to (1181 632230846)
>> log scanned up to (1181 632676220)
>> log scanned up to (1181 633047117)
>> log scanned up to (1181 633635219)
>> log scanned up to (1181 633980679)
innobackupex: Error: ibbackup child process has died at /usr/bin/innobackupex line 381.

That's not very helpful as that is basically what the strace said too. I'm running it again to get the backtrace now, I'll post again when I have that output.

@Raghavendra
I've attached the backtrace you requested to this ticket

Forgot to mention that I'm using the build from Percona (RPM)

Alexey Kopytov (akopytov) wrote :

Justin,

Thanks for the backtrace. The problem is clear now: we don't initialize per-thread data in the log copying thread. No idea why we never hit it before.

Anyway, the fix is trivial and will be included into the next release.

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXB-1213

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments