Valgrind: free in buf_page_get_gen (Invalid read in buf_flush_batch / buf_flush_list) + free in buf_page_get_gen (Invalid read in buf_flush_page_and_try_neighbors)

Bug #1086680 reported by Roel Van de Paar
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Invalid
High
Laurynas Biveinis
5.1
Fix Released
High
Laurynas Biveinis
5.5
Fix Released
High
Laurynas Biveinis
5.6
Invalid
High
Laurynas Biveinis

Bug Description

121125 7:05:01 [Note] /data/ssd/qa/Percona-Server-5.5.28-rel29.1-351.Linux.x86_64/bin/mysqld: ready for connections.
Version: '5.5.28-rel29.1-log' socket: '/data/ssd/qa/94/current1_1/tmp/master.sock' port: 13000 Percona Server with XtraDB (GPL), Release rel29.1, Revision 351
==11657== Thread 16:
==11657== Invalid read of size 8
==11657== at 0x87E0C6: buf_flush_batch (buf0flu.c:1770)
==11657== by 0x87E3FF: buf_flush_list (buf0flu.c:2047)
==11657== by 0x823CC1: srv_master_thread (srv0srv.c:3692)
==11657== by 0x38E260677C: start_thread (pthread_create.c:301)
==11657== by 0x38E16D3C1C: clone (in /lib64/libc-2.5.so)
==11657== Address 0x152e01a8 is 56 bytes inside a block of size 152 free'd
==11657== at 0x4A072BA: free (vg_replace_malloc.c:446)
==11657== by 0x87446A: buf_page_get_gen (buf0buf.ic:886)
==11657== by 0x85ABD1: btr_free_externally_stored_field (btr0cur.c:4981)
==11657== by 0x863723: btr_cur_pessimistic_delete (btr0cur.c:5078)
==11657== by 0x926276: row_undo_ins (row0uins.c:117)
==11657== by 0x91EC13: row_undo_step (row0undo.c:320)
==11657== by 0x913049: que_run_threads (que0que.c:1257)
==11657== by 0x839CB8: trx_general_rollback_for_mysql (trx0roll.c:97)
==11657== by 0x8077F0: row_mysql_handle_errors (row0mysql.c:593)
==11657== by 0x80892B: row_insert_for_mysql (row0mysql.c:1264)
==11657== by 0x7EFA89: ha_innobase::write_row(unsigned char*) (ha_innodb.cc:6006)
==11657== by 0x69B7E1: handler::ha_write_row(unsigned char*) (handler.cc:5359)
==11657== by 0x5673D0: write_record(THD*, TABLE*, st_copy_info*) (sql_insert.cc:1788)
==11657== by 0x56E463: mysql_insert(THD*, TABLE_LIST*, List<Item>&, List<List<Item> >&, List<Item>&, List<Item>&, enum_duplicates, bool) (sql_insert.cc:956)
==11657== by 0x57EE0B: mysql_execute_command(THD*) (sql_parse.cc:3065)
==11657== by 0x583112: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:5811)
==11657==
==11657== Invalid read of size 1
==11657== at 0x87D8FB: buf_flush_page_and_try_neighbors (buf0buf.ic:77)
==11657== by 0x87E008: buf_flush_batch (buf0flu.c:1738)
==11657== by 0x87E3FF: buf_flush_list (buf0flu.c:2047)
==11657== by 0x823CC1: srv_master_thread (srv0srv.c:3692)
==11657== by 0x38E260677C: start_thread (pthread_create.c:301)
==11657== by 0x38E16D3C1C: clone (in /lib64/libc-2.5.so)
==11657== Address 0x152e017b is 11 bytes inside a block of size 152 free'd
==11657== at 0x4A072BA: free (vg_replace_malloc.c:446)
==11657== by 0x87446A: buf_page_get_gen (buf0buf.ic:886)
==11657== by 0x85ABD1: btr_free_externally_stored_field (btr0cur.c:4981)
==11657== by 0x863723: btr_cur_pessimistic_delete (btr0cur.c:5078)
==11657== by 0x926276: row_undo_ins (row0uins.c:117)
==11657== by 0x91EC13: row_undo_step (row0undo.c:320)
==11657== by 0x913049: que_run_threads (que0que.c:1257)
==11657== by 0x839CB8: trx_general_rollback_for_mysql (trx0roll.c:97)
==11657== by 0x8077F0: row_mysql_handle_errors (row0mysql.c:593)
==11657== by 0x80892B: row_insert_for_mysql (row0mysql.c:1264)
==11657== by 0x7EFA89: ha_innobase::write_row(unsigned char*) (ha_innodb.cc:6006)
==11657== by 0x69B7E1: handler::ha_write_row(unsigned char*) (handler.cc:5359)
==11657== by 0x5673D0: write_record(THD*, TABLE*, st_copy_info*) (sql_insert.cc:1788)
==11657== by 0x56E463: mysql_insert(THD*, TABLE_LIST*, List<Item>&, List<List<Item> >&, List<Item>&, List<Item>&, enum_duplicates, bool) (sql_insert.cc:956)
==11657== by 0x57EE0B: mysql_execute_command(THD*) (sql_parse.cc:3065)
==11657== by 0x583112: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:5811)
==11657==
==11657== Invalid read of size 1
[...]
[........ numerous occurrences of Invalid read of size 1 or 4 or 8 in buf_flush_page_and_try_neighbors as above ........]
[...]
==11657==
==11657== HEAP SUMMARY:
==11657== in use at exit: 1,560 bytes in 5 blocks
==11657== total heap usage: 3,162,238 allocs, 3,162,233 frees, 32,397,098,285 bytes allocated
==11657==
==11657== LEAK SUMMARY:
==11657== definitely lost: 0 bytes in 0 blocks
==11657== indirectly lost: 0 bytes in 0 blocks
==11657== possibly lost: 0 bytes in 0 blocks
==11657== still reachable: 0 bytes in 0 blocks
==11657== suppressed: 1,560 bytes in 5 blocks
==11657==
==11657== For counts of detected and suppressed errors, rerun with: -v
==11657== ERROR SUMMARY: 12 errors from 12 contexts (suppressed: 4 from 4)

Related branches

Revision history for this message
Roel Van de Paar (roel11) wrote :

This bug is a split from bug 1083058

Revision history for this message
Roel Van de Paar (roel11) wrote :
Download full text (3.9 KiB)

In Debug + Valgrind, the sever asserts with:

130101 11:04:32 InnoDB: Assertion failure in thread 395134720 in file buf0flu.c line 482
InnoDB: Failing assertion: bpage->in_flush_list

Just after giving errors like these:

==19340== Invalid read of size 8
==19340== at 0x8E9F6B: buf_flush_page_and_try_neighbors (buf0flu.c:480)
==19340== by 0x8EC87E: buf_flush_batch (buf0flu.c:1740)
==19340== by 0x8ED41D: buf_flush_list (buf0flu.c:2047)
==19340== by 0x8734D6: srv_master_thread (srv0srv.c:3512)
==19340== by 0x3DA1807D13: start_thread (in /usr/lib64/libpthread-2.15.so)
==19340== by 0x3DA14F168C: clone (in /usr/lib64/libc-2.15.so)
==19340== Address 0x4c5da00 is 128 bytes inside a block of size 192 free'd
==19340== at 0x4A06666: free (vg_replace_malloc.c:446)
==19340== by 0x8A9543: ut_free (ut0mem.c:221)
==19340== by 0x8E1E78: buf_page_get_gen (buf0buf.ic:886)
==19340== by 0x8E386B: buf_page_create (buf0buf.c:3760)
==19340== by 0x940759: fsp_page_create (fsp0fsp.c:1546)
==19340== by 0x941899: fseg_alloc_free_page_low (fsp0fsp.c:2833)
==19340== by 0x9426E2: fseg_alloc_free_page_general (fsp0fsp.c:2907)
==19340== by 0x8B05FF: btr_page_alloc (btr0btr.c:993)
==19340== by 0x8C5918: btr_store_big_rec_extern_fields (btr0cur.c:4509)
==19340== by 0x9CA4B3: row_ins_index_entry_low (row0ins.c:2237)
==19340== by 0x9CDE87: row_ins_index_entry (row0ins.c:2297)
==19340== by 0x9CEBCD: row_ins_step (row0ins.c:2373)
==19340== by 0x84014F: row_insert_for_mysql (row0mysql.c:1255)
==19340== by 0x81CDDB: ha_innobase::write_row(unsigned char*) (ha_innodb.cc:6061)
==19340== by 0x6BA555: handler::ha_write_row(unsigned char*) (handler.cc:5395)
==19340== by 0x585B2E: write_record(THD*, TABLE*, st_copy_info*) (sql_insert.cc:1788)

Backtrace:

Thread 5 (LWP 19421):
+bt
#0 0x0000003da180c60c in pthread_kill () from /lib64/libpthread.so.0
#1 0x00000000007dbe18 in my_write_core (sig=6) at /ssd/BT-16724-xtradb-bmp-requests-5.5-BUILD-TREE/Percona-Server-5.5.28-rel29.1/mysys/stacktrace.c:433
#2 0x00000000006b1a68 in handle_fatal_signal (sig=6) at /ssd/BT-16724-xtradb-bmp-requests-5.5-BUILD-TREE/Percona-Server-5.5.28-rel29.1/sql/signal_handler.cc:249
#3 <signal handler called>
#4 0x0000003da1435935 in raise () from /lib64/libc.so.6
#5 0x0000003da14370e8 in abort () from /lib64/libc.so.6
#6 0x00000000008e9fab in buf_flush_ready_for_flush (flush_type=BUF_FLUSH_LIST, bpage=0x4c5d980) at /ssd/BT-16724-xtradb-bmp-requests-5.5-BUILD-TREE/Percona-Server-5.5.28-rel29.1/storage/innobase/buf/buf0flu.c:482
#7 buf_flush_page_and_try_neighbors (bpage=bpage@entry=0x4c5d980, flush_type=flush_type@entry=BUF_FLUSH_LIST, n_to_flush=n_to_flush@entry=22, count=count@entry=0x178d3668) at /ssd/BT-16724-xtradb-bmp-requests-5.5-BUILD-TREE/Percona-Server-5.5.28-rel29.1/storage/innobase/buf/buf0flu.c:1587
#8 0x00000000008ec87f in buf_flush_flush_list_batch (lsn_limit=277030465, min_n=22, buf_pool=0x5dc0c28) at /ssd/BT-16724-xtradb-bmp-requests-5.5-BUILD-TREE/Percona-Server-5.5.28-rel29.1/storage/innobase/buf/buf0flu.c:1740
#9 buf_flush_batch (buf_pool=buf_pool@entry=0x5dc0c28, flush_type=flush_type@entry=BUF_FLUSH_LIST, min_n=min_n@entry=22, l...

Read more...

Revision history for this message
Roel Van de Paar (roel11) wrote :

Server still asserts w/ --innodb_adaptive_flushing_method=native

Revision history for this message
Roel Van de Paar (roel11) wrote :

Server *does not* assert w/ --innodb_adaptive_flushing_method=0

Revision history for this message
Roel Van de Paar (roel11) wrote :

Was able to assert server w/ --innodb_adaptive_flushing_method=0 also now. This option may or may not affect reproducibility level.

Revision history for this message
Roel Van de Paar (roel11) wrote :

Discussed w/ Laurynas. This bug looks now unrelated to bug 1083058. Laurynas doing code analysis. Roel has testcase which semi-reliably reproduces the sporadic issue. Issue is single thread reproducible, but executing same sql in many threads helps w/ reproducibility.

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

Roel,

I see that you only tried different flushing methods by changing the parameter --innodb_adaptive_flushing_method to different values. It would be worthwhile to test with the adaptive_flushing itself disabled, that might mean a different code path all together. I would also note that the customer who was hitting the same bug, has not have the bug seen after disabling adaptive_flushing

Revision history for this message
Roel Van de Paar (roel11) wrote :

Correction on #6:
--innodb_adaptive_flushing_method=native is the same as ...-method=0 (And server asserts on both)
When testing with: --innodb_adaptive_flushing=0, the server no longer asserts.
So, bug 1083058 may still be related to this bug.

Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Roel -

Is 5.1 affected?

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

I have a theory based on instrumented runs.

We need two threads, one at buf_page_get_gen and another at
buf_flush_flush_list_batch, trying to work on the same dirty
compressed (i.e. state = BUF_BLOCK_ZIP_DIRTY) page.

1) buf_flush_flush_list_batch acquires flush_list_mutex, gets a
   page from the flush list, releases flush_list_mutex.
2) buf_page_get_gen attempts to read the same page, acquires
   zip_mutex, gets the dirty compressed page descriptor, allocates
   an uncompressed page, and replaces the descriptor with the page in
   the flush_list by a buf_flush_relocate_on_flush_list() call. That
   call acquires flush_list_mutex, sets in_flush_list = FALSE; but
   leaves oldest_modification alone, releases the flush_list_mutex.
   Caller then invalidates the descriptor memory, releases zip_mutex,
   frees its memory.
3) buf_flush_flush_list_batch attempts to read from the deallocated
   compressed page descriptor. Assuming that the memory was not
   overwritten since deallocation, the code reads oldest_modification
   != 0 and then trips on !in_flush_list.

Revision history for this message
Roel Van de Paar (roel11) wrote :

Laurynas, is this related to Bug 934377 - InnoDB: Failing assertion: (buf_pool->flush_list).count > 0 ?

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

It could be. The backtrace of that bug seems to be a bit removed from the underlying cause, so hard to say ATM.

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

Laurynas,

What about the case when compressed tablespace is not being used. I suppose dirty read from the flush_list is still done nevertheless, is that correct?

Revision history for this message
Roel Van de Paar (roel11) wrote :

5.1 Testing has revealed no assert, nor any related Valgrind issues.
Several rounds of patch>testing have resulted in a prototype patch being available which no longer asserts, no Valgrind's'

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Ovais -

Yes, that's bug 1083058. For non-compressed tables the current bug does not trigger.

Revision history for this message
Roel Van de Paar (roel11) wrote :

Assert in buf_page_get_gen in bug 1100178

Revision history for this message
Roel Van de Paar (roel11) wrote :
Download full text (8.9 KiB)

Assert now also seen in 5.1

130117 9:24:09 InnoDB: Assertion failure in thread 386553600 in file buf/buf0flu.c line 407
InnoDB: Failing assertion: bpage->in_flush_list

Directly after a Valgrind warning

==7461== Thread 16:
==7461== Invalid read of size 1
==7461== at 0x97724D: buf_page_get_state (buf0buf.ic:176)
==7461== by 0x9775C8: buf_page_get_mutex (buf0buf.ic:319)
==7461== by 0x977638: buf_page_get_mutex_enter (buf0buf.ic:347)
==7461== by 0x984865: buf_flush_batch (buf0flu.c:1470)
==7461== by 0x9354CB: srv_master_thread (srv0srv.c:3108)
==7461== by 0x3DA1807D13: start_thread (in /usr/lib64/libpthread-2.15.so)
==7461== by 0x3DA14F168C: clone (in /usr/lib64/libc-2.15.so)
==7461== Address 0x19ff03f8 is 8 bytes inside a block of size 192 free'd
==7461== at 0x4A06666: free (vg_replace_malloc.c:446)
==7461== by 0x9577E8: ut_free (ut0mem.c:227)
==7461== by 0x9780AE: buf_page_free_descriptor (buf0buf.ic:798)
==7461== by 0x97C87F: buf_page_get_gen (buf0buf.c:2079)
==7461== by 0x97EA44: buf_page_create (buf0buf.c:2972)
==7461== by 0x9B898E: fsp_page_create (fsp0fsp.c:1608)
==7461== by 0x9BBD9E: fseg_alloc_free_page_low (fsp0fsp.c:2899)
==7461== by 0x9BBF18: fseg_alloc_free_page_general (fsp0fsp.c:2973)
==7461== by 0x95BF31: btr_page_alloc_low (btr0btr.c:970)
==7461== by 0x95BFA0: btr_page_alloc (btr0btr.c:1005)
==7461== by 0x96C505: btr_store_big_rec_extern_fields (btr0cur.c:4382)
==7461== by 0x911BA7: row_ins_index_entry_low (row0ins.c:2246)
==7461== by 0x911CCA: row_ins_index_entry (row0ins.c:2297)
==7461== by 0x911FF9: row_ins_index_entry_step (row0ins.c:2382)
==7461== by 0x9122C3: row_ins (row0ins.c:2514)
==7461== by 0x912594: row_ins_step (row0ins.c:2624)
==7461==
==7461== Invalid read of size 1
==7461== at 0x97724D: buf_page_get_state (buf0buf.ic:176)
==7461== by 0x9775C8: buf_page_get_mutex (buf0buf.ic:319)
==7461== by 0x981F8D: buf_flush_ready_for_flush (buf0flu.c:402)
==7461== by 0x984881: buf_flush_batch (buf0flu.c:1476)
==7461== by 0x9354CB: srv_master_thread (srv0srv.c:3108)
==7461== by 0x3DA1807D13: start_thread (in /usr/lib64/libpthread-2.15.so)
==7461== by 0x3DA14F168C: clone (in /usr/lib64/libc-2.15.so)
==7461== Address 0x19ff03f8 is 8 bytes inside a block of size 192 free'd
==7461== at 0x4A06666: free (vg_replace_malloc.c:446)
==7461== by 0x9577E8: ut_free (ut0mem.c:227)
==7461== by 0x9780AE: buf_page_free_descriptor (buf0buf.ic:798)
==7461== by 0x97C87F: buf_page_get_gen (buf0buf.c:2079)
==7461== by 0x97EA44: buf_page_create (buf0buf.c:2972)
==7461== by 0x9B898E: fsp_page_create (fsp0fsp.c:1608)
==7461== by 0x9BBD9E: fseg_alloc_free_page_low (fsp0fsp.c:2899)
==7461== by 0x9BBF18: fseg_alloc_free_page_general (fsp0fsp.c:2973)
==7461== by 0x95BF31: btr_page_alloc_low (btr0btr.c:970)
==7461== by 0x95BFA0: btr_page_alloc (btr0btr.c:1005)
==7461== by 0x96C505: btr_store_big_rec_extern_fields (btr0cur.c:4382)
==7461== by 0x911BA7: row_ins_index_entry_low (row0ins.c:2246)
==7461== by 0x911CCA: row_ins_index_entry (row0ins.c:2297)
==7461== by 0x911FF9: row_ins_index_entry_step (row0ins.c:2382)
==7461== ...

Read more...

Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :

Assert seen again in QA-16274-5.5 tree at revid Percona-Server-5.5.28-rel29.3-416-debug.Linux.x86_64, this time aborting in buf_flush_try_yield. Attaching full bt set + error log.

Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :

If we should split this issue into a new bug for clarity etc, please let me know and I'll be happy to arrange.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Roel -

The last error is indeed unrelated and should be split to a new bug.

Revision history for this message
Roel Van de Paar (roel11) wrote :

Split to bug 1110102 from comment 20 up to 24.

tags: added: xtradb
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Related: bug 1122462.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-609

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.