Comment 3 for bug 791030

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

This issue is not fixed yet. Setting multiple AHI partitions (mostly with multiple buffer pools) triggers a lock deadlock.

--Thread 139558755972864 has waited at btr0sea.c line 1197 for 930.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x9390f558 'btr_search_latch_part[i]'
a writer (thread id 139558755972864) has reserved it in mode wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file btr0sea.c line 1099
Last time write locked in file /home/jenkins/workspace/percona-server-5.5-rpms/label_exp/centos6-64/target/BUILD/Percona-Server-5.5.27-rel28.1/Percona-Server-5.5.27-rel28.1/storage/innobase/btr/btr0sea.c line 669
InnoDB: Warning: a long semaphore wait:

As you can see, 139558755972864 is waiting on itself.

The call chain is as follows: (will provide full trace later)

    pthread_cond_wait
    os_cond_wait
    os_event_wait_low
    sync_array_wait_event
    rw_lock_x_lock_wait
    rw_lock_x_lock_low
    rw_lock_x_lock_func
    pfs_rw_lock_x_lock_func
    btr_search_drop_page_hash_index
    buf_LRU_free_block
    buf_LRU_free_from_common_LRU_list
    buf_LRU_search_and_free_block
    buf_LRU_get_free_block
    buf_block_alloc
    btr_search_check_free_space_in_heap
    btr_search_info_update_slow
    btr_search_info_update
    btr_cur_search_to_nth_level
    btr_pcur_open_with_no_init_func
    row_sel_try_search_shortcut_for_mysql
    row_search_for_mysql
    ha_innobase::index_read
    join_read_key
    sub_select
    evaluate_join_record
    sub_select
    evaluate_join_record
    sub_select
    evaluate_join_record
    sub_select
    evaluate_join_record
    sub_select
    evaluate_join_record
    sub_select
    evaluate_join_record
    sub_select
    do_select
    JOIN::exec
    mysql_select
    handle_select
    execute_sqlcom_select
    mysql_execute_command
    mysql_parse
    dispatch_command
    do_handle_one_connection
    handle_one_connection
    start_thread
    clone

The suspect in question is btr_search_drop_page_hash_index, since it deals with multiple partitions differently, and the FIXME mentioned in the description is still present in the code.

=================================
 if (btr_search_index_num > 1) {
  rw_lock_t* btr_search_latch;

  /* FIXME: This may be optimistic implementation still. */
  btr_search_latch = (rw_lock_t*)(block->btr_search_latch);
  if (UNIV_LIKELY(!btr_search_latch)) {
   if (block->index) {
    goto retry;
   }
   return;
  }
  ......
  ..
=================================================

It has also been reported in lp:331659