Adaptive flushing does a dirty scan of the flush list

Bug #1083058 reported by Roel Van de Paar
14
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
High
Laurynas Biveinis
5.1
Fix Released
High
Laurynas Biveinis
5.5
Fix Released
High
Laurynas Biveinis

Bug Description

121124 3:56:12 [Note] /data/ssd/qa/Percona-Server-5.5.28-rel29.1-351.Linux.x86_64/bin/mysqld: ready for connections.
Version: '5.5.28-rel29.1-log' socket: '/data/ssd/qa/93/current1_8/tmp/master.sock' port: 13140 Percona Server with XtraDB (GPL), Release rel29.1, Revision 351
==3005== Thread 16:
==3005== Invalid read of size 8
==3005== at 0x8243C8: srv_master_thread (srv0srv.c:3487)
==3005== by 0x38E260677C: start_thread (pthread_create.c:301)
==3005== by 0x38E16D3C1C: clone (in /lib64/libc-2.5.so)
==3005== Address 0x15a488e0 is 96 bytes inside a block of size 152 free'd
==3005== at 0x4A072BA: free (vg_replace_malloc.c:446)
==3005== by 0x87446A: buf_page_get_gen (buf0buf.ic:886)
==3005== by 0x879DF3: buf_page_create (buf0buf.c:3760)
==3005== by 0x8BB89B: fsp_page_create (fsp0fsp.c:1547)
==3005== by 0x8C4411: fseg_alloc_free_page_low (fsp0fsp.c:2833)
==3005== by 0x8C4F6E: fseg_alloc_free_page_general (fsp0fsp.c:2907)
==3005== by 0x85B235: btr_store_big_rec_extern_fields (btr0cur.c:4487)
==3005== by 0x918B6C: row_ins_index_entry_low (row0ins.c:2234)
==3005== by 0x91CC52: row_ins_step (row0ins.c:2294)
==3005== by 0x8088F5: row_insert_for_mysql (row0mysql.c:1255)
==3005== by 0x7EFA89: ha_innobase::write_row(unsigned char*) (ha_innodb.cc:6006)
==3005== by 0x69B7E1: handler::ha_write_row(unsigned char*) (handler.cc:5359)
==3005== by 0x5673D0: write_record(THD*, TABLE*, st_copy_info*) (sql_insert.cc:1788)
==3005== by 0x56E463: mysql_insert(THD*, TABLE_LIST*, List<Item>&, List<List<Item> >&, List<Item>&, List<Item>&, enum_duplicates, bool) (sql_insert.cc:956)
==3005== by 0x57EE0B: mysql_execute_command(THD*) (sql_parse.cc:3065)
==3005== by 0x583112: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:5811)
==3005==
==3005== Invalid read of size 8
==3005== at 0x8243E4: srv_master_thread (srv0srv.c:3492)
==3005== by 0x38E260677C: start_thread (pthread_create.c:301)
==3005== by 0x38E16D3C1C: clone (in /lib64/libc-2.5.so)
==3005== Address 0x15a488c0 is 64 bytes inside a block of size 152 free'd
==3005== at 0x4A072BA: free (vg_replace_malloc.c:446)
==3005== by 0x87446A: buf_page_get_gen (buf0buf.ic:886)
==3005== by 0x879DF3: buf_page_create (buf0buf.c:3760)
==3005== by 0x8BB89B: fsp_page_create (fsp0fsp.c:1547)
==3005== by 0x8C4411: fseg_alloc_free_page_low (fsp0fsp.c:2833)
==3005== by 0x8C4F6E: fseg_alloc_free_page_general (fsp0fsp.c:2907)
==3005== by 0x85B235: btr_store_big_rec_extern_fields (btr0cur.c:4487)
==3005== by 0x918B6C: row_ins_index_entry_low (row0ins.c:2234)
==3005== by 0x91CC52: row_ins_step (row0ins.c:2294)
==3005== by 0x8088F5: row_insert_for_mysql (row0mysql.c:1255)
==3005== by 0x7EFA89: ha_innobase::write_row(unsigned char*) (ha_innodb.cc:6006)
==3005== by 0x69B7E1: handler::ha_write_row(unsigned char*) (handler.cc:5359)
==3005== by 0x5673D0: write_record(THD*, TABLE*, st_copy_info*) (sql_insert.cc:1788)
==3005== by 0x56E463: mysql_insert(THD*, TABLE_LIST*, List<Item>&, List<List<Item> >&, List<Item>&, List<Item>&, enum_duplicates, bool) (sql_insert.cc:956)
==3005== by 0x57EE0B: mysql_execute_command(THD*) (sql_parse.cc:3065)
==3005== by 0x583112: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:5811)
==3005==
[.....]
==3005==
==3005== HEAP SUMMARY:
==3005== in use at exit: 1,560 bytes in 5 blocks
==3005== total heap usage: 4,481,035 allocs, 4,481,030 frees, 40,235,084,856 bytes allocated
==3005==
==3005== LEAK SUMMARY:
==3005== definitely lost: 0 bytes in 0 blocks
==3005== indirectly lost: 0 bytes in 0 blocks
==3005== possibly lost: 0 bytes in 0 blocks
==3005== still reachable: 0 bytes in 0 blocks
==3005== suppressed: 1,560 bytes in 5 blocks
==3005==
==3005== For counts of detected and suppressed errors, rerun with: -v
==3005== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 4 from 4)

Preliminary Analysis thanks to Laurynas: Adaptive flushing algorithms do dirty reads from the flush list and then from the pages on that list. Because of that, nothing protects to read from a free'd or malloc'ed but uninitialized page descriptor if the same page happens to be read in by other thread.

Related branches

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

InnoDB master thread code in question with the dirty reads:
...
  } else if (srv_adaptive_flushing && srv_adaptive_flushing_method == 1) {
...
     for (j = 0; j < srv_buf_pool_instances; j++) {
      buf_pool_t* buf_pool;
      ulint n_blocks;

      buf_pool = buf_pool_from_array(j);

      /* The scanning flush_list is optimistic here */

      level = 0;
      n_blocks = 0;
      bpage = UT_LIST_GET_FIRST(buf_pool->flush_list);

      while (bpage != NULL) {
       ib_uint64_t oldest_modification = bpage->oldest_modification;
       if (oldest_modification != 0) {
        level += log_sys->max_checkpoint_age
          - (lsn - oldest_modification);
       }
       bpage = UT_LIST_GET_NEXT(flush_list, bpage);
       n_blocks++;
      }
...

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Interestingly the same code in 5.1 takes the flush list mutex.

Revision history for this message
Roel Van de Paar (roel11) wrote :

Laurynas, do you think this is the result of a recent change?

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

No, it's been there for a while, most likely since initial Percona Server 5.5 versions.

Revision history for this message
Roel Van de Paar (roel11) wrote : Re: Valgrind: free in buf_page_get_gen (srv_master_thread/buf_flush_batch/buf_flush_page_and_try_neighbors)
Download full text (4.4 KiB)

Another occurrence with somewhat different stacks:

121125 7:05:01 [Note] /data/ssd/qa/Percona-Server-5.5.28-rel29.1-351.Linux.x86_64/bin/mysqld: ready for connections.
Version: '5.5.28-rel29.1-log' socket: '/data/ssd/qa/94/current1_1/tmp/master.sock' port: 13000 Percona Server with XtraDB (GPL), Release rel29.1, Revision 351
==11657== Thread 16:
==11657== Invalid read of size 8
==11657== at 0x87E0C6: buf_flush_batch (buf0flu.c:1770)
==11657== by 0x87E3FF: buf_flush_list (buf0flu.c:2047)
==11657== by 0x823CC1: srv_master_thread (srv0srv.c:3692)
==11657== by 0x38E260677C: start_thread (pthread_create.c:301)
==11657== by 0x38E16D3C1C: clone (in /lib64/libc-2.5.so)
==11657== Address 0x152e01a8 is 56 bytes inside a block of size 152 free'd
==11657== at 0x4A072BA: free (vg_replace_malloc.c:446)
==11657== by 0x87446A: buf_page_get_gen (buf0buf.ic:886)
==11657== by 0x85ABD1: btr_free_externally_stored_field (btr0cur.c:4981)
==11657== by 0x863723: btr_cur_pessimistic_delete (btr0cur.c:5078)
==11657== by 0x926276: row_undo_ins (row0uins.c:117)
==11657== by 0x91EC13: row_undo_step (row0undo.c:320)
==11657== by 0x913049: que_run_threads (que0que.c:1257)
==11657== by 0x839CB8: trx_general_rollback_for_mysql (trx0roll.c:97)
==11657== by 0x8077F0: row_mysql_handle_errors (row0mysql.c:593)
==11657== by 0x80892B: row_insert_for_mysql (row0mysql.c:1264)
==11657== by 0x7EFA89: ha_innobase::write_row(unsigned char*) (ha_innodb.cc:6006)
==11657== by 0x69B7E1: handler::ha_write_row(unsigned char*) (handler.cc:5359)
==11657== by 0x5673D0: write_record(THD*, TABLE*, st_copy_info*) (sql_insert.cc:1788)
==11657== by 0x56E463: mysql_insert(THD*, TABLE_LIST*, List<Item>&, List<List<Item> >&, List<Item>&, List<Item>&, enum_duplicates, bool) (sql_insert.cc:956)
==11657== by 0x57EE0B: mysql_execute_command(THD*) (sql_parse.cc:3065)
==11657== by 0x583112: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:5811)
==11657==
==11657== Invalid read of size 1
==11657== at 0x87D8FB: buf_flush_page_and_try_neighbors (buf0buf.ic:77)
==11657== by 0x87E008: buf_flush_batch (buf0flu.c:1738)
==11657== by 0x87E3FF: buf_flush_list (buf0flu.c:2047)
==11657== by 0x823CC1: srv_master_thread (srv0srv.c:3692)
==11657== by 0x38E260677C: start_thread (pthread_create.c:301)
==11657== by 0x38E16D3C1C: clone (in /lib64/libc-2.5.so)
==11657== Address 0x152e017b is 11 bytes inside a block of size 152 free'd
==11657== at 0x4A072BA: free (vg_replace_malloc.c:446)
==11657== by 0x87446A: buf_page_get_gen (buf0buf.ic:886)
==11657== by 0x85ABD1: btr_free_externally_stored_field (btr0cur.c:4981)
==11657== by 0x863723: btr_cur_pessimistic_delete (btr0cur.c:5078)
==11657== by 0x926276: row_undo_ins (row0uins.c:117)
==11657== by 0x91EC13: row_undo_step (row0undo.c:320)
==11657== by 0x913049: que_run_threads (que0que.c:1257)
==11657== by 0x839CB8: trx_general_rollback_for_mysql (trx0roll.c:97)
==11657== by 0x8077F0: row_mysql_handle_errors (row0mysql.c:593)
==11657== by 0x80892B: row_insert_for_mysql (row0mysql.c:1264)
==11657== by 0x7EFA89: ha_innobase::write_row(unsigned ch...

Read more...

summary: - Valgrind: free in buf_page_get_gen (srv_master_thread)
+ Valgrind: free in buf_page_get_gen
+ (srv_master_thread/buf_flush_batch/buf_flush_page_and_try_neighbors)
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Roel -

The two other stacktraces are an unrelated issue, should be a separate bug.

Revision history for this message
Roel Van de Paar (roel11) wrote :

Laurynas:

Ok. Are both likely to be individual/completely separate bugs or are they likely related to each other (and/or to the original issue)?

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Roel -

Currently it seems likely that they are related to each other, but not related to the original issue.

summary: - Valgrind: free in buf_page_get_gen
- (srv_master_thread/buf_flush_batch/buf_flush_page_and_try_neighbors)
+ Valgrind: free in buf_page_get_gen (Invalid read in srv_master_thread)
Revision history for this message
Roel Van de Paar (roel11) wrote : Re: Valgrind: free in buf_page_get_gen (Invalid read in srv_master_thread)

Second issue split off to/now tracked in bug 1086680

Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :

Bug 1086680 and bug 1085015 were made duplicates of this one

summary: - Valgrind: free in buf_page_get_gen (Invalid read in srv_master_thread)
+ Adaptive flushing does a dirty scan of the flush list | Valgrind: free
+ in buf_page_get_gen | Debug Assert: bpage->in_flush_list
Revision history for this message
Roel Van de Paar (roel11) wrote : Re: Adaptive flushing does a dirty scan of the flush list | Valgrind: free in buf_page_get_gen | Debug Assert: bpage->in_flush_list

See other bugs for additional information like assert info & assert backtrace.

Revision history for this message
Roel Van de Paar (roel11) wrote :

Bug has now been reproduced with --innodb_adaptive_flushing=0 also. Internal discussion.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Adaptive flushing method keep_average is affected too, on both 5.1 and 5.5.

summary: - Adaptive flushing does a dirty scan of the flush list | Valgrind: free
- in buf_page_get_gen | Debug Assert: bpage->in_flush_list
+ Adaptive flushing does a dirty scan of the flush list
tags: added: xtradb
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-607

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.