XtraBackup segfaults after period of time

Reported by Justin La Sotten on 2013-04-09
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraBackup
Undecided
Unassigned
2.0
High
Alexey Kopytov
2.1
Undecided
Unassigned

Bug Description

Version:
XtraBackup 2.0.6-x86_64.

Description:
I have a large 64-bit mysql 5.0.77 instance, 700+GB.
When innobackupex is run, it begins to perform the backup. It will run for quite some time (transfers ~60GB) and then segfaults.

Command Invocation:
innobackupex --stream=tar ./ | gzip | ssh backup@backup_server "cd /var/lib/mysql/; tar izxf -"

Segfault:
Apr 8 19:17:23 db3 kernel: xtrabackup_51[12854]: segfault at 0000000000000084 rip 00000000004e59f5 rsp 0000000043debff0 error 4

Strace:
... { SNIP } ...
stat("/tmp/xtrabackup_suspended", 0x1e9e1140) = -1 ENOENT (No such file or directory)
wait4(12189, 0x7fff4c2d1fc4, WNOHANG, NULL) = 0
nanosleep({0, 100000000}, NULL) = 0
... { END SNIP } ...

stat("/tmp/xtrabackup_suspended", 0x1e9e1140) = -1 ENOENT (No such file or directory)
wait4(12189, 0x7fff4c2d1fc4, WNOHANG, NULL) = 0
nanosleep({0, 100000000}, NULL) = 0
stat("/tmp/xtrabackup_suspended", 0x1e9e1140) = -1 ENOENT (No such file or directory)
wait4(12189, 0x7fff4c2d1fc4, WNOHANG, NULL) = 0
nanosleep({0, 100000000}, NULL) = 0
stat("/tmp/xtrabackup_suspended", 0x1e9e1140) = -1 ENOENT (No such file or directory)
wait4(12189, 0x7fff4c2d1fc4, WNOHANG, NULL) = 0
nanosleep({0, 100000000}, NULL) = 0
stat("/tmp/xtrabackup_suspended", 0x1e9e1140) = -1 ENOENT (No such file or directory)
wait4(12189, 0x7fff4c2d1fc4, WNOHANG, NULL) = 0
nanosleep({0, 100000000}, 0) = ? ERESTART_RESTARTBLOCK (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
restart_syscall(<... resuming interrupted call ...>) = 0
stat("/tmp/xtrabackup_suspended", 0x1e9e1140) = -1 ENOENT (No such file or directory)
wait4(12189, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV}], WNOHANG, NULL) = 12189
write(2, "innobackupex: Error: ibbackup ch"..., 88innobackupex: Error: ibbackup child process has died at /usr/bin/innobackupex line 381.
) = 88
umask(0) = 022
umask(022) = 0
umask(066) = 022
chmod("/tmp/X5QwfzQ7jU", 0600) = 0
umask(022) = 066
fstat(4, {st_mode=S_IFREG|0600, st_size=36, ...}) = 0
stat("/tmp/X5QwfzQ7jU", {st_mode=S_IFREG|0600, st_size=36, ...}) = 0
stat("/tmp/X5QwfzQ7jU", {st_mode=S_IFREG|0600, st_size=36, ...}) = 0
close(4) = 0
umask(0) = 022
umask(022) = 0
umask(066) = 022
chmod("/tmp/X5QwfzQ7jU", 0600) = 0
umask(022) = 066
lstat("/tmp/X5QwfzQ7jU", {st_mode=S_IFREG|0600, st_size=36, ...}) = 0
unlink("/tmp/X5QwfzQ7jU") = 0
umask(0) = 022
umask(022) = 0
umask(066) = 022
chmod("/tmp/miPIhpfqzr", 0600) = 0
umask(022) = 066
fstat(3, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
stat("/tmp/miPIhpfqzr", {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
stat("/tmp/miPIhpfqzr", {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
close(3) = 0
umask(0) = 022
umask(022) = 0
umask(066) = 022
chmod("/tmp/miPIhpfqzr", 0600) = 0
umask(022) = 066
lstat("/tmp/miPIhpfqzr", {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
unlink("/tmp/miPIhpfqzr") = 0
close(5) = 0
exit_group(2) = ?

Related branches

lp:~akopytov/percona-xtrabackup/bug1166888-2.1
Merged into lp:percona-xtrabackup/2.1 at revision 540
Alexey Kopytov: Approve on 2013-04-15
Laurynas Biveinis: Needs Fixing on 2013-04-15
Alexey Kopytov (akopytov) wrote :

The strace output is from the innobackupex process, which doesn't provide any useful info. It is waiting for xtrabackup_suspended file to be created by the xtrabackup_51 process, but then terminates because that process has died.

It is unclear what could be the reason of segfault in xtrabackup_51. Is there anything that might be relevant in the xtrabackup log, or it just crashes silently? Can you provide the part of the log before the crash occurs?

Changed in percona-xtrabackup:
status: New → Incomplete

@Justin,

Since it is crashing with segfault, can you enable core-dumps and
provide a backtrace from it?

gdb `which xtrabackup_51` --core <core-file> --batch --quiet -ex "thread apply all bt full" -ex "quit"

IS this Xtrabackup built from source or installed from Percona
RPM?

If latter, then make sure you have percona-xtrabackup debug package installed as well before producing the core.

@Alexey
When it crashes this is what the output of innobackupex is:

... { SNIP } ...
>> log scanned up to (1181 627995473)
>> log scanned up to (1181 628102970)
>> log scanned up to (1181 628350216)
>> log scanned up to (1181 628834186)
>> log scanned up to (1181 629085674)
>> log scanned up to (1181 629414248)
>> log scanned up to (1181 629821568)
>> log scanned up to (1181 630278611)
>> log scanned up to (1181 630609818)
>> log scanned up to (1181 630840460)
>> log scanned up to (1181 631256774)
>> log scanned up to (1181 631514315)
>> log scanned up to (1181 631846649)
>> log scanned up to (1181 632230846)
>> log scanned up to (1181 632676220)
>> log scanned up to (1181 633047117)
>> log scanned up to (1181 633635219)
>> log scanned up to (1181 633980679)
innobackupex: Error: ibbackup child process has died at /usr/bin/innobackupex line 381.

That's not very helpful as that is basically what the strace said too. I'm running it again to get the backtrace now, I'll post again when I have that output.

@Raghavendra
I've attached the backtrace you requested to this ticket

Forgot to mention that I'm using the build from Percona (RPM)

Alexey Kopytov (akopytov) wrote :

Justin,

Thanks for the backtrace. The problem is clear now: we don't initialize per-thread data in the log copying thread. No idea why we never hit it before.

Anyway, the fix is trivial and will be included into the next release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments