Percona Server with XtraDB

Adaptive flushing does a dirty scan of the flush list

Reported by Roel Van de Paar on 2012-11-26
14
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Percona Server
High
Laurynas Biveinis
5.1
High
Laurynas Biveinis
5.5
High
Laurynas Biveinis

Bug Description

121124 3:56:12 [Note] /data/ssd/qa/Percona-Server-5.5.28-rel29.1-351.Linux.x86_64/bin/mysqld: ready for connections.
Version: '5.5.28-rel29.1-log' socket: '/data/ssd/qa/93/current1_8/tmp/master.sock' port: 13140 Percona Server with XtraDB (GPL), Release rel29.1, Revision 351
==3005== Thread 16:
==3005== Invalid read of size 8
==3005== at 0x8243C8: srv_master_thread (srv0srv.c:3487)
==3005== by 0x38E260677C: start_thread (pthread_create.c:301)
==3005== by 0x38E16D3C1C: clone (in /lib64/libc-2.5.so)
==3005== Address 0x15a488e0 is 96 bytes inside a block of size 152 free'd
==3005== at 0x4A072BA: free (vg_replace_malloc.c:446)
==3005== by 0x87446A: buf_page_get_gen (buf0buf.ic:886)
==3005== by 0x879DF3: buf_page_create (buf0buf.c:3760)
==3005== by 0x8BB89B: fsp_page_create (fsp0fsp.c:1547)
==3005== by 0x8C4411: fseg_alloc_free_page_low (fsp0fsp.c:2833)
==3005== by 0x8C4F6E: fseg_alloc_free_page_general (fsp0fsp.c:2907)
==3005== by 0x85B235: btr_store_big_rec_extern_fields (btr0cur.c:4487)
==3005== by 0x918B6C: row_ins_index_entry_low (row0ins.c:2234)
==3005== by 0x91CC52: row_ins_step (row0ins.c:2294)
==3005== by 0x8088F5: row_insert_for_mysql (row0mysql.c:1255)
==3005== by 0x7EFA89: ha_innobase::write_row(unsigned char*) (ha_innodb.cc:6006)
==3005== by 0x69B7E1: handler::ha_write_row(unsigned char*) (handler.cc:5359)
==3005== by 0x5673D0: write_record(THD*, TABLE*, st_copy_info*) (sql_insert.cc:1788)
==3005== by 0x56E463: mysql_insert(THD*, TABLE_LIST*, List<Item>&, List<List<Item> >&, List<Item>&, List<Item>&, enum_duplicates, bool) (sql_insert.cc:956)
==3005== by 0x57EE0B: mysql_execute_command(THD*) (sql_parse.cc:3065)
==3005== by 0x583112: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:5811)
==3005==
==3005== Invalid read of size 8
==3005== at 0x8243E4: srv_master_thread (srv0srv.c:3492)
==3005== by 0x38E260677C: start_thread (pthread_create.c:301)
==3005== by 0x38E16D3C1C: clone (in /lib64/libc-2.5.so)
==3005== Address 0x15a488c0 is 64 bytes inside a block of size 152 free'd
==3005== at 0x4A072BA: free (vg_replace_malloc.c:446)
==3005== by 0x87446A: buf_page_get_gen (buf0buf.ic:886)
==3005== by 0x879DF3: buf_page_create (buf0buf.c:3760)
==3005== by 0x8BB89B: fsp_page_create (fsp0fsp.c:1547)
==3005== by 0x8C4411: fseg_alloc_free_page_low (fsp0fsp.c:2833)
==3005== by 0x8C4F6E: fseg_alloc_free_page_general (fsp0fsp.c:2907)
==3005== by 0x85B235: btr_store_big_rec_extern_fields (btr0cur.c:4487)
==3005== by 0x918B6C: row_ins_index_entry_low (row0ins.c:2234)
==3005== by 0x91CC52: row_ins_step (row0ins.c:2294)
==3005== by 0x8088F5: row_insert_for_mysql (row0mysql.c:1255)
==3005== by 0x7EFA89: ha_innobase::write_row(unsigned char*) (ha_innodb.cc:6006)
==3005== by 0x69B7E1: handler::ha_write_row(unsigned char*) (handler.cc:5359)
==3005== by 0x5673D0: write_record(THD*, TABLE*, st_copy_info*) (sql_insert.cc:1788)
==3005== by 0x56E463: mysql_insert(THD*, TABLE_LIST*, List<Item>&, List<List<Item> >&, List<Item>&, List<Item>&, enum_duplicates, bool) (sql_insert.cc:956)
==3005== by 0x57EE0B: mysql_execute_command(THD*) (sql_parse.cc:3065)
==3005== by 0x583112: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:5811)
==3005==
[.....]
==3005==
==3005== HEAP SUMMARY:
==3005== in use at exit: 1,560 bytes in 5 blocks
==3005== total heap usage: 4,481,035 allocs, 4,481,030 frees, 40,235,084,856 bytes allocated
==3005==
==3005== LEAK SUMMARY:
==3005== definitely lost: 0 bytes in 0 blocks
==3005== indirectly lost: 0 bytes in 0 blocks
==3005== possibly lost: 0 bytes in 0 blocks
==3005== still reachable: 0 bytes in 0 blocks
==3005== suppressed: 1,560 bytes in 5 blocks
==3005==
==3005== For counts of detected and suppressed errors, rerun with: -v
==3005== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 4 from 4)

Preliminary Analysis thanks to Laurynas: Adaptive flushing algorithms do dirty reads from the flush list and then from the pages on that list. Because of that, nothing protects to read from a free'd or malloc'ed but uninitialized page descriptor if the same page happens to be read in by other thread.

Related branches

lp:~laurynas-biveinis/percona-server/bug1083058-5.1
Merged into lp:percona-server/5.1 at revision 523
Laurynas Biveinis: Approve on 2013-02-06
Sergei Glushchenko: Needs Information (g2) on 2013-02-06
Stewart Smith (community): Approve on 2013-02-06
lp:~laurynas-biveinis/percona-server/bug1083058-5.5
Merged into lp:percona-server/5.5 at revision 416
Stewart Smith (community): Approve on 2013-02-05
Laurynas Biveinis: Pending requested 2013-01-16

InnoDB master thread code in question with the dirty reads:
...
  } else if (srv_adaptive_flushing && srv_adaptive_flushing_method == 1) {
...
     for (j = 0; j < srv_buf_pool_instances; j++) {
      buf_pool_t* buf_pool;
      ulint n_blocks;

      buf_pool = buf_pool_from_array(j);

      /* The scanning flush_list is optimistic here */

      level = 0;
      n_blocks = 0;
      bpage = UT_LIST_GET_FIRST(buf_pool->flush_list);

      while (bpage != NULL) {
       ib_uint64_t oldest_modification = bpage->oldest_modification;
       if (oldest_modification != 0) {
        level += log_sys->max_checkpoint_age
          - (lsn - oldest_modification);
       }
       bpage = UT_LIST_GET_NEXT(flush_list, bpage);
       n_blocks++;
      }
...

Interestingly the same code in 5.1 takes the flush list mutex.

Roel Van de Paar (roel11) wrote :

Laurynas, do you think this is the result of a recent change?

No, it's been there for a while, most likely since initial Percona Server 5.5 versions.

Download full text (4.4 KiB)

Another occurrence with somewhat different stacks:

121125 7:05:01 [Note] /data/ssd/qa/Percona-Server-5.5.28-rel29.1-351.Linux.x86_64/bin/mysqld: ready for connections.
Version: '5.5.28-rel29.1-log' socket: '/data/ssd/qa/94/current1_1/tmp/master.sock' port: 13000 Percona Server with XtraDB (GPL), Release rel29.1, Revision 351
==11657== Thread 16:
==11657== Invalid read of size 8
==11657== at 0x87E0C6: buf_flush_batch (buf0flu.c:1770)
==11657== by 0x87E3FF: buf_flush_list (buf0flu.c:2047)
==11657== by 0x823CC1: srv_master_thread (srv0srv.c:3692)
==11657== by 0x38E260677C: start_thread (pthread_create.c:301)
==11657== by 0x38E16D3C1C: clone (in /lib64/libc-2.5.so)
==11657== Address 0x152e01a8 is 56 bytes inside a block of size 152 free'd
==11657== at 0x4A072BA: free (vg_replace_malloc.c:446)
==11657== by 0x87446A: buf_page_get_gen (buf0buf.ic:886)
==11657== by 0x85ABD1: btr_free_externally_stored_field (btr0cur.c:4981)
==11657== by 0x863723: btr_cur_pessimistic_delete (btr0cur.c:5078)
==11657== by 0x926276: row_undo_ins (row0uins.c:117)
==11657== by 0x91EC13: row_undo_step (row0undo.c:320)
==11657== by 0x913049: que_run_threads (que0que.c:1257)
==11657== by 0x839CB8: trx_general_rollback_for_mysql (trx0roll.c:97)
==11657== by 0x8077F0: row_mysql_handle_errors (row0mysql.c:593)
==11657== by 0x80892B: row_insert_for_mysql (row0mysql.c:1264)
==11657== by 0x7EFA89: ha_innobase::write_row(unsigned char*) (ha_innodb.cc:6006)
==11657== by 0x69B7E1: handler::ha_write_row(unsigned char*) (handler.cc:5359)
==11657== by 0x5673D0: write_record(THD*, TABLE*, st_copy_info*) (sql_insert.cc:1788)
==11657== by 0x56E463: mysql_insert(THD*, TABLE_LIST*, List<Item>&, List<List<Item> >&, List<Item>&, List<Item>&, enum_duplicates, bool) (sql_insert.cc:956)
==11657== by 0x57EE0B: mysql_execute_command(THD*) (sql_parse.cc:3065)
==11657== by 0x583112: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:5811)
==11657==
==11657== Invalid read of size 1
==11657== at 0x87D8FB: buf_flush_page_and_try_neighbors (buf0buf.ic:77)
==11657== by 0x87E008: buf_flush_batch (buf0flu.c:1738)
==11657== by 0x87E3FF: buf_flush_list (buf0flu.c:2047)
==11657== by 0x823CC1: srv_master_thread (srv0srv.c:3692)
==11657== by 0x38E260677C: start_thread (pthread_create.c:301)
==11657== by 0x38E16D3C1C: clone (in /lib64/libc-2.5.so)
==11657== Address 0x152e017b is 11 bytes inside a block of size 152 free'd
==11657== at 0x4A072BA: free (vg_replace_malloc.c:446)
==11657== by 0x87446A: buf_page_get_gen (buf0buf.ic:886)
==11657== by 0x85ABD1: btr_free_externally_stored_field (btr0cur.c:4981)
==11657== by 0x863723: btr_cur_pessimistic_delete (btr0cur.c:5078)
==11657== by 0x926276: row_undo_ins (row0uins.c:117)
==11657== by 0x91EC13: row_undo_step (row0undo.c:320)
==11657== by 0x913049: que_run_threads (que0que.c:1257)
==11657== by 0x839CB8: trx_general_rollback_for_mysql (trx0roll.c:97)
==11657== by 0x8077F0: row_mysql_handle_errors (row0mysql.c:593)
==11657== by 0x80892B: row_insert_for_mysql (row0mysql.c:1264)
==11657== by 0x7EFA89: ha_innobase::write_row(unsigned ch...

Read more...

summary: - Valgrind: free in buf_page_get_gen (srv_master_thread)
+ Valgrind: free in buf_page_get_gen
+ (srv_master_thread/buf_flush_batch/buf_flush_page_and_try_neighbors)

Roel -

The two other stacktraces are an unrelated issue, should be a separate bug.

Roel Van de Paar (roel11) wrote :

Laurynas:

Ok. Are both likely to be individual/completely separate bugs or are they likely related to each other (and/or to the original issue)?

Roel -

Currently it seems likely that they are related to each other, but not related to the original issue.

summary: - Valgrind: free in buf_page_get_gen
- (srv_master_thread/buf_flush_batch/buf_flush_page_and_try_neighbors)
+ Valgrind: free in buf_page_get_gen (Invalid read in srv_master_thread)
Roel Van de Paar (roel11) wrote :
Roel Van de Paar (roel11) wrote :

Bug 1086680 and bug 1085015 were made duplicates of this one

summary: - Valgrind: free in buf_page_get_gen (Invalid read in srv_master_thread)
+ Adaptive flushing does a dirty scan of the flush list | Valgrind: free
+ in buf_page_get_gen | Debug Assert: bpage->in_flush_list
Roel Van de Paar (roel11) wrote :

Bug has now been reproduced with --innodb_adaptive_flushing=0 also. Internal discussion.

Adaptive flushing method keep_average is affected too, on both 5.1 and 5.5.

summary: - Adaptive flushing does a dirty scan of the flush list | Valgrind: free
- in buf_page_get_gen | Debug Assert: bpage->in_flush_list
+ Adaptive flushing does a dirty scan of the flush list
tags: added: xtradb
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers