InnoDB with shm buffer pool crashes after recovery

Bug #649408 reported by Vadim Tkachenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
High
Unassigned

Bug Description

I do next steps:

- kill mysqld;
- remove shm segment
- start mysqld ( it creates new shm segment)
- InnoDB performs recovery

when recovery finished and we start load on server , InnoDB crashes.

There is bt from gdb and core file.

I have core file and binary.

Thread 1 (Thread 10616):
#0 mem_heap_create_block (heap=0x2acb641be400, n=1048576, type=0,
    file_name=0x2acb641be408 <Address 0x2acb641be408 out of bounds>, line=528) at mem/mem0mem.c:357
#1 0x0000000000707a36 in mem_heap_add_block (heap=0x2acb641be400, n=1048576) at mem/mem0mem.c:446
#2 0x0000000000707db8 in mem_heap_alloc (heap=0x8, data=0x2ac65e4c942c, len=18446744073709546452) at ./include/mem0mem.ic:186
#3 mem_heap_dup (heap=0x8, data=0x2ac65e4c942c, len=18446744073709546452) at mem/mem0mem.c:125
#4 0x000000000075544e in trx_undo_rec_copy (roll_ptr=..., heap=0x2acb641be400) at ./include/trx0rec.ic:110
#5 trx_undo_get_undo_rec_low (roll_ptr=..., heap=0x2acb641be400) at trx/trx0rec.c:1360
#6 0x00000000007554da in trx_undo_get_undo_rec (roll_ptr=..., trx_id=..., undo_rec=0x599f1768, heap=0x2acb641be400)
    at trx/trx0rec.c:1398
#7 0x00000000007573ce in trx_undo_prev_version_build (index_rec=0x2ac65e53a79a <Address 0x2ac65e53a79a out of bounds>,
    index_mtr=<value optimized out>, rec=0x2ac65e53a79a <Address 0x2ac65e53a79a out of bounds>, index=0x2acb6800a7f8,
    offsets=0x599f1ea0, heap=0x2acb641be400, old_vers=0x599f1828) at trx/trx0rec.c:1483
#8 0x000000000074741b in row_vers_build_for_consistent_read (rec=0x2ac65e53a79a <Address 0x2ac65e53a79a out of bounds>,
    mtr=0x599f1a00, index=0x2acb6800a7f8, offsets=0x599f21d0, view=0x2acb680d2468, offset_heap=0x599f21d8,
    in_heap=0x2acb64100e80, old_vers=0x599f21c8) at row/row0vers.c:559
#9 0x000000000073e821 in row_search_for_mysql (buf=0x2acb6808d2a0 <Address 0x2acb6808d2a0 out of bounds>, mode=2,
    prebuilt=0x2acb6809fb08, match_mode=1, direction=0) at row/row0sel.c:4247
#10 0x00000000006e5165 in ha_innobase::index_read (this=0x2acb6808ce80,
    buf=0x2acb6808d2a0 <Address 0x2acb6808d2a0 out of bounds>, key_ptr=<value optimized out>, key_len=<value optimized out>,
    find_flag=HA_READ_KEY_EXACT) at handler/ha_innodb.cc:5801
#11 0x0000000000656ee9 in handler::index_read_idx_map (this=0x2acb6808ce80,
    buf=0x2acb6808d2a0 <Address 0x2acb6808d2a0 out of bounds>, index=<value optimized out>,
    key=0x2acb1f0eec58 <Address 0x2acb1f0eec58 out of bounds>, keypart_map=3, find_flag=HA_READ_KEY_EXACT) at handler.cc:4428
#12 0x00000000005e2897 in join_read_const (tab=0x2acb1f0ee9f0) at sql_select.cc:11909
#13 0x00000000005eb9dd in join_read_const_table (tab=0x2acb1f0ee9f0, pos=0x2acb1f0ecf60) at sql_select.cc:11812
#14 0x00000000005ec3f6 in make_join_statistics (join=0x2acb1f0ecec8, tables_arg=0x2acb641b5eb8, conds=0x2acb1f0ecd20,
---Type <return> to continue, or q <return> to quit---
    keyuse_array=0x2acb1f0ee488) at sql_select.cc:2933
#15 0x00000000005ede07 in JOIN::optimize (this=0x2acb1f0ecec8) at sql_select.cc:1035
#16 0x00000000005f7101 in mysql_select (thd=0x2acb6804c950, rref_pointer_array=0x2acb641b53f8, tables=0x2acb641b5eb8,
    wild_num=0, fields=..., conds=0x2acb1f0ecd20, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0,
    select_options=2416724480, result=0x2acb641b9568, unit=0x2acb641b4e00, select_lex=0x2acb641b5228) at sql_select.cc:2518
#17 0x00000000005f7b55 in handle_select (thd=0x2acb6804c950, lex=0x2acb641b4d60, result=0x2acb641b9568,
    setup_tables_done_option=0) at sql_select.cc:280
#18 0x00000000005864b1 in execute_sqlcom_select (thd=0x2acb6804c950, all_tables=0x2acb641b5eb8) at sql_parse.cc:5181
#19 0x0000000000589721 in mysql_execute_command (thd=0x2acb6804c950) at sql_parse.cc:2362
#20 0x0000000000603520 in Prepared_statement::execute (this=0x2acb641b49f0, expanded_query=<value optimized out>,
    open_cursor=false) at sql_prepare.cc:3879
#21 0x00000000006056f4 in Prepared_statement::execute_loop (this=0x2acb641b49f0, expanded_query=0x599f4180, open_cursor=false,
    packet=<value optimized out>, packet_end=<value optimized out>) at sql_prepare.cc:3554
#22 0x0000000000605a09 in mysqld_stmt_execute (thd=0x2acb6804c950,
    packet_arg=0x2acb6804fcf1 <Address 0x2acb6804fcf1 out of bounds>, packet_length=23) at sql_prepare.cc:2601
#23 0x000000000058fc61 in dispatch_command (command=COM_STMT_EXECUTE, thd=0x2acb6804c950,
    packet=0x2acb6804fcf1 <Address 0x2acb6804fcf1 out of bounds>, packet_length=0) at sql_parse.cc:1231
#24 0x0000000000590e30 in do_command (thd=0x2acb6804c950) at sql_parse.cc:901
#25 0x0000000000582d17 in handle_one_connection (arg=0x2acb6804c950) at sql_connect.cc:1738
#26 0x0000003d2760673d in ?? ()
#27 0x0000000000000000 in ?? ()

Related branches

Changed in percona-server:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Yasufumi Kinoshita (yasufumi-kinoshita)
Revision history for this message
Vadim Tkachenko (vadim-tk) wrote :
Download full text (5.2 KiB)

Issue is repeated very well.

Second time I've got message:

thd: 0x2acb5c09b890
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
InnoDB: Error: trying to access update undo rec field 1895425 in index "PRIMARY" of table "tpcc"."warehouse"
InnoDB: but index has only 11 fields
InnoDB: Submit a detailed bug report to http://bugs.mysql.com
InnoDB: Run also CHECK TABLE "tpcc"."warehouse"
InnoDB: n_fields = 4, i = 1, ptr 0x2acb40e42911
InnoDB: Error: trying to access update undo rec for table tpcc/warehouse
InnoDB: but the table id in the undo record is wrong
InnoDB: Submit a detailed bug report to http://bugs.mysql.com
InnoDB: Run also CHECK TABLE tpcc/warehouse
InnoDB: table tpcc/warehouse, index PRIMARY, n_uniq 1
InnoDB: undo rec address 0x2acb40e428f8, type 12 cmpl_info 1
InnoDB: undo rec table id 0 66, index table id 0 49

and stacktrace:

(gdb) bt
#0 mem_heap_create_block (heap=0x2acb4c106950, n=1048576, type=0,
    file_name=0x2acb4c106958 <Address 0x2acb4c106958 out of bounds>, line=528) at mem/mem0mem.c:357
#1 0x0000000000707a36 in mem_heap_add_block (heap=0x2acb4c106950, n=1048576) at mem/mem0mem.c:446
#2 0x0000000000707db8 in mem_heap_alloc (heap=0x8, data=0x2ac6546faca6, len=18446744073709540186) at ./include/mem0mem.ic:186
#3 mem_heap_dup (heap=0x8, data=0x2ac6546faca6, len=18446744073709540186) at mem/mem0mem.c:125
#4 0x000000000075544e in trx_undo_rec_copy (roll_ptr=..., heap=0x2acb4c106950) at ./include/trx0rec.ic:110
#5 trx_undo_get_undo_rec_low (roll_ptr=..., heap=0x2acb4c106950) at trx/trx0rec.c:1360
#6 0x00000000007554da in trx_undo_get_undo_rec (roll_ptr=..., trx_id=..., undo_rec=0x59c71768, heap=0x2acb4c106950)
    at trx/trx0rec.c:1398
#7 0x00000000007573ce in trx_undo_prev_version_build (index_rec=0x2ac65e270880 <Address 0x2ac65e270880 out of bounds>,
    index_mtr=<value optimized out>, rec=0x2ac65e270880 <Address 0x2ac65e270880 out of bounds>, index=0x2aca9c02c788,
    offsets=0x59c71ea0, heap=0x2acb4c106950, old_vers=0x59c71828) at trx/trx0rec.c:1483
#8 0x000000000074741b in row_vers_build_for_consistent_read (rec=0x2ac65e270880 <Address 0x2ac65e270880 out of bounds>,
    mtr=0x59c71a00, index=0x2aca9c02c788, offsets=0x59c721d0, view=0x2acb54015aa8, offset_heap=0x59c721d8,
    in_heap=0x2acb4c106800, old_vers=0x59c721c8) at row/row0vers.c:559
#9 0x000000000073e821 in row_search_for_mysql (buf=0x2acb40dad4c0 <Address 0x2acb40dad4c0 out of bounds>, mode=2,
    prebuilt=0x2acb40dade48, match_mode=1, direction=0) at row/row0sel.c:4247
#10 0x00000000006e5165 in ha_innobase::index_read (this=0x2acb40dad0a0,
    buf=0x2acb40dad4c0 <Address 0x2acb40dad4c0 out of bounds>, key_ptr=<value optimized out>, key_len=<value optimized out>,
    find_flag=HA_READ_KEY_EXACT) at handler/ha_innodb.cc:5801
#11 0x0000000000656ee9 in handler::index_read_idx_map (this=0x2acb40dad0a0,
    buf=0x2acb40dad4c0 <Address 0x2acb40dad4c0 out of bounds>, index=<value optimized out>,
    key=0x2acb406500a0 <Address 0x2acb406500a0 out of bounds>, keypart_map=1, find_flag=HA_READ_KEY_EXACT) at handler.cc:4428
#12 0x00000000005e289...

Read more...

Revision history for this message
Yasufumi Kinoshita (yasufumi-kinoshita) wrote :

If the fix would not be effective, it might be the problem of the server.
The crash is caused by malloc() returning NULL pointer.

Changed in percona-server:
status: Confirmed → Fix Committed
Revision history for this message
Vadim Tkachenko (vadim-tk) wrote :

I still have crash after performing recovery.

Revision history for this message
Vadim Tkachenko (vadim-tk) wrote :
Download full text (4.3 KiB)

Program terminated with signal 11, Segmentation fault.
#0 mem_heap_create_block (heap=0x2acaac0eb9c0, n=100704, type=0, file_name=0x2acaac0eb9c8 "0vers.c", line=528)
    at mem/mem0mem.c:357
357 mem/mem0mem.c: No such file or directory.
        in mem/mem0mem.c
(gdb) bt
#0 mem_heap_create_block (heap=0x2acaac0eb9c0, n=100704, type=0, file_name=0x2acaac0eb9c8 "0vers.c", line=528)
    at mem/mem0mem.c:357
#1 0x0000000000707a46 in mem_heap_add_block (heap=0x2acaac0eb9c0, n=100704) at mem/mem0mem.c:446
#2 0x0000000000707dc8 in mem_heap_alloc (heap=0x8, data=0x2ac96edb5397, len=18446744073709546601) at ./include/mem0mem.ic:186
#3 mem_heap_dup (heap=0x8, data=0x2ac96edb5397, len=18446744073709546601) at mem/mem0mem.c:125
#4 0x000000000075545e in trx_undo_rec_copy (roll_ptr=..., heap=0x2acaac0eb9c0) at ./include/trx0rec.ic:110
#5 trx_undo_get_undo_rec_low (roll_ptr=..., heap=0x2acaac0eb9c0) at trx/trx0rec.c:1360
#6 0x00000000007554ea in trx_undo_get_undo_rec (roll_ptr=..., trx_id=..., undo_rec=0x4182b768, heap=0x2acaac0eb9c0)
    at trx/trx0rec.c:1398
#7 0x00000000007573de in trx_undo_prev_version_build (index_rec=0x2ac96e76af66 "\201e\203", index_mtr=<value optimized out>,
    rec=0x2ac96e76af66 "\201e\203", index=0x2acaa7f65728, offsets=0x4182bea0, heap=0x2acaac0eb9c0, old_vers=0x4182b828)
    at trx/trx0rec.c:1483
#8 0x000000000074742b in row_vers_build_for_consistent_read (rec=0x2ac96e76af66 "\201e\203", mtr=0x4182ba00,
    index=0x2acaa7f65728, offsets=0x4182c1d0, view=0x2acaa8025798, offset_heap=0x4182c1d8, in_heap=0x2acaac0eb870,
    old_vers=0x4182c1c8) at row/row0vers.c:559
#9 0x000000000073e831 in row_search_for_mysql (buf=0x2acaa7f727f0 "\320'\367\247\312*", mode=2, prebuilt=0x2acaa7f73428,
    match_mode=1, direction=0) at row/row0sel.c:4247
#10 0x00000000006e5175 in ha_innobase::index_read (this=0x2acaa7f723d0, buf=0x2acaa7f727f0 "\320'\367\247\312*",
    key_ptr=<value optimized out>, key_len=<value optimized out>, find_flag=HA_READ_KEY_EXACT) at handler/ha_innodb.cc:5803
#11 0x0000000000656ee9 in handler::index_read_idx_map (this=0x2acaa7f723d0, buf=0x2acaa7f727f0 "\320'\367\247\312*",
    index=<value optimized out>, key=0x2acaa8042828 "e\001\003", keypart_map=3, find_flag=HA_READ_KEY_EXACT) at handler.cc:4428
#12 0x00000000005e2897 in join_read_const (tab=0x2acaa80425c0) at sql_select.cc:11909
#13 0x00000000005eb9dd in join_read_const_table (tab=0x2acaa80425c0, pos=0x2acaa8040b30) at sql_select.cc:11812
#14 0x00000000005ec3f6 in make_join_statistics (join=0x2acaa8040a98, tables_arg=0x2acaac0dc9d0, conds=0x2acaa80408f0,
    keyuse_array=0x2acaa8042058) at sql_select.cc:2933
#15 0x00000000005ede07 in JOIN::optimize (this=0x2acaa8040a98) at sql_select.cc:1035
#16 0x00000000005f7101 in mysql_select (thd=0x2acaa7ea6010, rref_pointer_array=0x2acaac0dbf08, tables=0x2acaac0dc9d0,
    wild_num=0, fields=..., conds=0x2acaa80408f0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0,
    select_options=2416724480, result=0x2acaac0ddd08, unit=0x2acaac0db910, select_lex=0x2acaac0dbd38) at sql_select.cc:2518
#17 0x00000000005f7b55 in handle_select (thd=0x2acaa7ea6010, lex=0x2acaac0db870, result=0x2acaac0ddd08,
    ...

Read more...

Revision history for this message
Vadim Tkachenko (vadim-tk) wrote :

it seems it can damage recovery area, as I am getting crash now even after clean start / shutdown

Revision history for this message
Vadim Tkachenko (vadim-tk) wrote :

another stack
#0 0x0000003d26e30265 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x0000003d26e30265 in raise () from /lib64/libc.so.6
#1 0x0000003d26e31d10 in abort () from /lib64/libc.so.6
#2 0x0000000000731187 in row_mysql_handle_errors (new_err=0x5a75f1e8, trx=0x2acb501cab98, thr=0x2acb680fc348, savept=0x0)
    at row/row0mysql.c:582
#3 0x000000000073d473 in row_search_for_mysql (buf=0x2acb680f1430 "\020\024\017h\313*", mode=2, prebuilt=0x2acb680f1db8,
    match_mode=1, direction=0) at row/row0sel.c:4589
#4 0x00000000006e5175 in ha_innobase::index_read (this=0x2acb680f1010, buf=0x2acb680f1430 "\020\024\017h\313*",
    key_ptr=<value optimized out>, key_len=<value optimized out>, find_flag=HA_READ_KEY_EXACT) at handler/ha_innodb.cc:5803
#5 0x0000000000656ee9 in handler::index_read_idx_map (this=0x2acb680f1010, buf=0x2acb680f1430 "\020\024\017h\313*",
    index=<value optimized out>, key=0x2acab000a520 "6\003", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at handler.cc:4428
#6 0x00000000005e2897 in join_read_const (tab=0x2acb680f8128) at sql_select.cc:11909
#7 0x00000000005eb9dd in join_read_const_table (tab=0x2acb680f8128, pos=0x2acab00087d0) at sql_select.cc:11812
#8 0x00000000005ec3f6 in make_join_statistics (join=0x2acab0008710, tables_arg=0x2acb506f7af8, conds=0x2acab0008540,
    keyuse_array=0x2acab0009cd0) at sql_select.cc:2933
#9 0x00000000005ede07 in JOIN::optimize (this=0x2acab0008710) at sql_select.cc:1035
#10 0x00000000005f7101 in mysql_select (thd=0x2acaae7a2030, rref_pointer_array=0x2acb506f6c98, tables=0x2acb506f7af8,
    wild_num=0, fields=..., conds=0x2acab0008540, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0,
    select_options=2416724480, result=0x2acb506f9c80, unit=0x2acb506f66a0, select_lex=0x2acb506f6ac8) at sql_select.cc:2518
#11 0x00000000005f7b55 in handle_select (thd=0x2acaae7a2030, lex=0x2acb506f6600, result=0x2acb506f9c80,
    setup_tables_done_option=0) at sql_select.cc:280
#12 0x00000000005864b1 in execute_sqlcom_select (thd=0x2acaae7a2030, all_tables=0x2acb506f7af8) at sql_parse.cc:5181
#13 0x0000000000589721 in mysql_execute_command (thd=0x2acaae7a2030) at sql_parse.cc:2362
#14 0x0000000000603520 in Prepared_statement::execute (this=0x2acb506ebff0, expanded_query=<value optimized out>,
    open_cursor=false) at sql_prepare.cc:3879
#15 0x00000000006056f4 in Prepared_statement::execute_loop (this=0x2acb506ebff0, expanded_query=0x5a761180, open_cursor=false,
    packet=<value optimized out>, packet_end=<value optimized out>) at sql_prepare.cc:3554
#16 0x0000000000605a09 in mysqld_stmt_execute (thd=0x2acaae7a2030, packet_arg=0x2acb504ec021 "\001", packet_length=29)
    at sql_prepare.cc:2601
#17 0x000000000058fc61 in dispatch_command (command=COM_STMT_EXECUTE, thd=0x2acaae7a2030, packet=0x2acb504ec021 "\001",
    packet_length=6) at sql_parse.cc:1231
#18 0x0000000000590e30 in do_command (thd=0x2acaae7a2030) at sql_parse.cc:901
#19 0x0000000000582d17 in handle_one_connection (arg=0x2acaae7a2030) at sql_connect.cc:1738
#20 0x0000003d2760673d in start_thread () from /lib64/libpthread.so.0
#21 0x0000003d26ed3d1d in clone () from /lib64/libc.so.6

Revision history for this message
Vadim Tkachenko (vadim-tk) wrote :

It crashes for me with 100% if I kill mysqld and start after that.

I think we need to research more to understand what's wrong.

Changed in percona-server:
status: Fix Committed → Confirmed
Revision history for this message
Yasufumi Kinoshita (yasufumi-kinoshita) wrote :

I found it may not be related to buffer_pool_shm.
buffer_pool_shm is just a trigger to elicit the problem.

Always the recovery process seems to have problem.
Using "MALLOC_CHECK_" environment says error always at the environment.

I continue to analyze.

Revision history for this message
Yasufumi Kinoshita (yasufumi-kinoshita) wrote :

It seems specific problem only at the server.
And it seems to be caused by multi sessions' load.
(single thread load seems to safe)

I currently suspect
"Hardware bug" of the server
which like "breakable cache coherency" and cannot keep atomic operations.

The work area of malloc() seems to be broken by the multi-threads' load.

In the end,
Percona-Server and XtraDB must be innocent.

Revision history for this message
Yasufumi Kinoshita (yasufumi-kinoshita) wrote :

BTW,
if the server has the hardware bug about cache coherency between cores/HT,
The hangup or irregular concurrent access about mutex/lock should happen.

How about the other hangup problems of the server?

Changed in percona-server:
status: Confirmed → Invalid
Revision history for this message
Yasufumi Kinoshita (yasufumi-kinoshita) wrote :

I couldn't reach to prove the hardware bug of the server.
So, it may be problem of malloc() at the server.
(e.g. not thread safe)

But I don't understand why.

Revision history for this message
Yasufumi Kinoshita (yasufumi-kinoshita) wrote :

It might be problem of MALLOC_CHECK_

Changed in percona-server:
status: Invalid → Incomplete
Revision history for this message
Yasufumi Kinoshita (yasufumi-kinoshita) wrote :

I cannot reproduce now, with correct operation.
(And I executed "yum update glibc" also before the retesting)

The malloc() may return NULL pointer, when the free memory is not enough.
If you still meet the crash, I suspect you have not deleted the shm with another key.

Revision history for this message
Yasufumi Kinoshita (yasufumi-kinoshita) wrote :

The another type of crash may still exist.
I will fix it soon. It should be the last fix for this bug.

Changed in percona-server:
status: Incomplete → Confirmed
Changed in percona-server:
status: Confirmed → Fix Committed
Changed in percona-server:
milestone: none → 5.1.50-12.1
Changed in percona-server:
status: Fix Committed → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-433

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.