SHOW STATUS blocked by binlog commit | Server becomes unresponsive during flushing after loading big files through LOAD DATA INFILE

Bug #1646100 reported by Francisco Bordenave on 2016-11-30
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.5
Invalid
Undecided
Unassigned
5.6
Fix Released
Medium
Laurynas Biveinis
5.7
Fix Released
Medium
Laurynas Biveinis

Bug Description

In Percona Server 5.7 when loading big files using LOAD DATA INFILE Percona Server becomes unresponsive for some time during "Query end" stage. During this flushing new connections got stuck in Login status until LOAD DATA INFILE finsihes, this can take several minutes depending on the file size.
This is processlist output after we changed permissions to several users to avoid them getting access during the process so Killed status might be connection being rejected:
performance_schema> show processlist;
+--------+-----------------+---------------------+---------------------------+-------------+------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+--------+-----------------+---------------------+---------------------------+-------------+------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+-----------+---------------+
| 763513 | sysconfig | 10.96.123.30:42632 | system_configurations | Sleep | 83 | | NULL | 1 | 0 |
| 770141 | engin12 | 10.96.122.234:41073 | conf | Killed | 180 | cleaning up | NULL | 0 | 0 |
| 776030 | testing | localhost | dsp_manager | Query | 103 | starting | show binary logs | 0 | 0 |
| 779746 | engin12 | 10.96.122.234:59783 | conf | Sleep | 5 | | NULL | 1 | 0 |
| 780089 | ppp | 10.96.123.38:39834 | ppp | Query | 484 | starting | commit | 0 | 0 |
| 785960 | ppp | 10.96.123.38:37732 | ppp | Query | 482 | starting | commit | 0 | 0 |
| 786387 | creat_123_serv | 10.96.122.227:41110 | creative_approval_service | Query | 482 | starting | commit | 0 | 0 |
| 786423 | testing | localhost | performance_schema | Query | 0 | starting | show processlist | 0 | 0 |
| 786519 | testing | localhost | ai | Query | 840 | query end | load data local infile '/mnt/backups/db/utd/archive/20161124/ai/rrr_price_floor_10.txt' into table r | 0 | 0 |
| 788242 | slave | 10.96.122.201:51043 | NULL | Binlog Dump | 1435 | Master has sent all binlog to slave; waiting for more updates | NULL | 0 | 0 |
| 788358 | testing_user123 | 10.96.122.223:34029 | inventory_restriction | Query | 485 | starting | commit | 0 | 0 |
| 790646 | dspmgr | 10.96.120.78:44732 | dsp_manager | Killed | 449 | cleaning up | NULL | 0 | 0 |
| 790796 | fastl123 | 10.96.122.153:56137 | fastl123_installer | Killed | 446 | cleaning up | NULL | 0 | 0 |
| 790797 | fastl123 | 10.96.122.153:56138 | fastl123_installer | Killed | 446 | cleaning up | NULL | 0 | 0 |
| 790798 | fastl123 | 10.96.122.153:56139 | fastl123_installer | Killed | 446 | cleaning up | NULL | 0 | 0 |
| 790799 | fastl123 | 10.96.122.153:56141 | fastl123_installer | Killed | 446 | cleaning up | NULL | 0 | 0 |
| 790801 | fastl123 | 10.96.122.152:52671 | fastl123_installer | Query | 496 | starting | commit | 0 | 0 |
| 790803 | fastl123 | 10.96.122.152:52673 | fastl123_installer | Query | 495 | starting | commit | 0 | 0 |
| 790901 | testingMatcher | 10.83.125.21:53663 | testingMatcher | Query | 491 | starting | commit | 0 | 0 |
| 790910 | testingMatcher | 10.83.125.21:53664 | testingMatcher | Killed | 766 | cleaning up | NULL | 1 | 0 |
| 790911 | testingMatcher | 10.83.125.21:53665 | testingMatcher | Killed | 766 | cleaning up | NULL | 1 | 0 |
| 790912 | testingMatcher | 10.83.125.21:53666 | testingMatcher | Killed | 766 | cleaning up | NULL | 1 | 0 |
| 790913 | testingMatcher | 10.83.125.21:53667 | testingMatcher | Killed | 766 | cleaning up | NULL | 1 | 0 |
| 790914 | testingMatcher | 10.83.125.21:53668 | testingMatcher | Killed | 766 | cleaning up | NULL | 1 | 0 |
| 790915 | testingMatcher | 10.83.125.21:53669 | testingMatcher | Killed | 766 | cleaning up | NULL | 1 | 0 |
| 790916 | testingMatcher | 10.83.125.21:53670 | testingMatcher | Killed | 766 | cleaning up | NULL | 1 | 0 |
| 790917 | testingMatcher | 10.83.125.21:53671 | testingMatcher | Killed | 766 | cleaning up | NULL | 1 | 0 |
| 790918 | testingMatcher | 10.83.125.21:53672 | testingMatcher | Killed | 766 | cleaning up | NULL | 1 | 0 |
| 791191 | engin12 | 10.96.122.141:58556 | arw | Killed | 316 | cleaning up | NULL | 0 | 0 |
| 791210 | dspmgr | 10.96.122.199:46862 | dsp_manager | Killed | 312 | cleaning up | NULL | 0 | 0 |
| 791234 | fastl123 | 10.96.122.14:37170 | fastl123_stats | Killed | 303 | cleaning up | NULL | 0 | 0 |
| 791422 | ppp | 10.96.123.38:34256 | ppp | Killed | 364 | cleaning up | NULL | 0 | 0 |
| 791423 | ppp | 10.96.123.38:34258 | ppp | Killed | 364 | cleaning up | NULL | 0 | 0 |
| 791424 | ppp | 10.96.123.38:34257 | ppp | Killed | 364 | cleaning up | NULL | 0 | 0 |
| 791433 | dspmgr | 10.96.120.78:47808 | dsp_manager | Killed | 257 | cleaning up | NULL | 0 | 0 |
| 792019 | fastl123 | 10.96.122.152:57568 | fastl123_installer | Killed | 415 | cleaning up | NULL | 0 | 0 |
| 792020 | fastl123 | 10.96.122.152:57569 | fastl123_installer | Killed | 414 | cleaning up | NULL | 0 | 0 |
| 792021 | fastl123 | 10.96.122.152:57570 | fastl123_installer | Killed | 454 | cleaning up | NULL | 0 | 0 |
| 792022 | fastl123 | 10.96.122.152:57571 | fastl123_installer | Killed | 424 | cleaning up | NULL | 0 | 0 |
| 792023 | fastl123 | 10.96.122.152:57572 | fastl123_installer | Killed | 440 | cleaning up | NULL | 0 | 0 |
| 792054 | ai | 10.96.122.229:46898 | ai | Query | 479 | starting | commit | 0 | 0 |
| 792066 | testing_user123 | 10.96.122.223:49000 | inventory_restriction | Killed | 429 | cleaning up | NULL | 0 | 0 |
| 792068 | fastl123 | 10.96.122.152:57741 | fastl123_installer | Killed | 444 | cleaning up | NULL | 0 | 0 |
| 792073 | readonly | 10.83.40.170:54683 | NULL | Query | 473 | Sending data | SHOW GLOBAL STATUS | 0 | 0 |
| 792074 | testingMatcher | 10.83.125.21:53866 | testingMatcher | Killed | 469 | cleaning up | NULL | 0 | 0 |
| 792075 | testingMatcher | 10.83.125.21:53867 | testingMatcher | Killed | 473 | cleaning up | NULL | 1 | 0 |
| 792076 | testingMatcher | 10.83.125.21:53868 | testingMatcher | Killed | 473 | cleaning up | NULL | 1 | 0 |
| 792077 | testingMatcher | 10.83.125.21:53869 | testingMatcher | Killed | 473 | cleaning up | NULL | 1 | 0 |
| 792078 | testingMatcher | 10.83.125.21:53870 | testingMatcher | Killed | 473 | cleaning up | NULL | 1 | 0 |
| 792079 | testing | localhost | dsp_manager | Killed | 473 | cleaning up | NULL | 0 | 0 |
| 792080 | testingMatcher | 10.83.125.21:53871 | testingMatcher | Killed | 472 | cleaning up | NULL | 1 | 0 |
| 792081 | testingMatcher | 10.83.125.21:53872 | testingMatcher | Killed | 472 | cleaning up | NULL | 1 | 0 |
| 792082 | testingMatcher | 10.83.125.21:53873 | testingMatcher | Killed | 472 | cleaning up | NULL | 1 | 0 |
| 792083 | testingMatcher | 10.83.125.21:53874 | testingMatcher | Killed | 472 | cleaning up | NULL | 1 | 0 |
| 792084 | testingMatcher | 10.83.125.21:53875 | testingMatcher | Killed | 472 | cleaning up | NULL | 1 | 0 |
| 792085 | testing | localhost | dsp_manager | Killed | 471 | cleaning up | NULL | 0 | 0 |
| 792086 | testing | localhost | dsp_manager | Killed | 470 | cleaning up | NULL | 0 | 0 |
| 792087 | creat_123_serv | 10.83.124.54:44097 | NULL | Killed | 470 | login | NULL | 0 | 0 |
| 792088 | creat_123_serv | 10.83.124.54:44096 | NULL | Killed | 470 | login | NULL | 0 | 0 |
| 792089 | creat_123_serv | 10.83.124.54:44098 | NULL | Killed | 470 | login | NULL | 0 | 0 |
| 792090 | testing | localhost | dsp_manager | Killed | 469 | cleaning up | NULL | 0 | 0 |
| 792091 | creat_123_serv | 10.83.124.54:44099 | NULL | Killed | 469 | login | NULL | 0 | 0 |
| 792092 | creat_123_serv | 10.83.124.54:44100 | NULL | Killed | 469 | login | NULL | 0 | 0 |
| 792093 | creat_123_serv | 10.83.124.54:44101 | NULL | Killed | 469 | login | NULL | 0 | 0 |
| 792094 | testing | localhost | dsp_manager | Killed | 468 | cleaning up | NULL | 0 | 0 |
| 792095 | creat_123_serv | 10.83.124.53:52018 | NULL | Killed | 468 | login | NULL | 0 | 0 |
| 792096 | creat_123_serv | 10.83.124.53:52019 | NULL | Killed | 468 | login | NULL | 0 | 0 |
| 792097 | creat_123_serv | 10.83.124.53:52020 | NULL | Killed | 468 | login | NULL | 0 | 0 |
| 792098 | creat_123_serv | 10.83.124.54:44102 | NULL | Killed | 468 | login | NULL | 0 | 0 |
| 792099 | creat_123_serv | 10.83.124.54:44103 | NULL | Killed | 468 | login | NULL | 0 | 0 |
| 792100 | creat_123_serv | 10.83.124.54:44104 | NULL | Killed | 467 | login | NULL | 0 | 0 |
| 792101 | testing | localhost | dsp_manager | Killed | 467 | cleaning up | NULL | 0 | 0 |
| 792102 | creat_123_serv | 10.83.124.53:52021 | NULL | Killed | 467 | login | NULL | 0 | 0 |
| 792103 | creat_123_serv | 10.83.124.53:52022 | NULL | Killed | 467 | login | NULL | 0 | 0 |
| 792104 | creat_123_serv | 10.83.124.53:52024 | NULL | Killed | 467 | login | NULL | 0 | 0 |
| 792105 | creat_123_serv | 10.83.124.54:44106 | NULL | Killed | 466 | login | NULL | 0 | 0 |
| 792106 | creat_123_serv | 10.83.124.54:44105 | NULL | Killed | 466 | login | NULL | 0 | 0 |
| 792107 | creat_123_serv | 10.83.124.54:44108 | NULL | Killed | 466 | login | NULL | 0 | 0 |
| 792108 | testing | localhost | dsp_manager | Killed | 466 | cleaning up | NULL | 0 | 0 |
| 792109 | creat_123_serv | 10.83.124.53:52025 | NULL | Killed | 466 | login | NULL | 0 | 0 |
| 792110 | creat_123_serv | 10.83.124.53:52026 | NULL | Killed | 466 | login | NULL | 0 | 0 |
| 792111 | creat_123_serv | 10.83.124.53:52028 | NULL | Killed | 466 | login | NULL | 0 | 0 |
| 792112 | creat_123_serv | 10.83.124.54:44109 | NULL | Killed | 465 | login | NULL | 0 | 0 |
| 792113 | creat_123_serv | 10.83.124.54:44110 | NULL | Killed | 465 | login | NULL | 0 | 0 |
| 792114 | creat_123_serv | 10.83.124.54:44111 | NULL | Killed | 465 | login | NULL | 0 | 0 |
| 792115 | creat_123_serv | 10.83.124.54:44112 | NULL | Killed | 465 | login | NULL | 0 | 0 |
| 792116 | testing | localhost | dsp_manager | Killed | 465 | cleaning up | NULL | 0 | 0 |
| 792117 | creat_123_serv | 10.83.124.53:52029 | NULL | Killed | 465 | login | NULL | 0 | 0 |
| 792118 | creat_123_serv | 10.83.124.53:52030 | NULL | Killed | 465 | login | NULL | 0 | 0 |
| 792119 | creat_123_serv | 10.83.124.53:52031 | NULL | Killed | 465 | login | NULL | 0 | 0 |
| 792120 | fastl123 | 10.96.122.152:57910 | fastl123_installer | Killed | 434 | cleaning up | NULL | 0 | 0 |
| 792121 | creat_123_serv | 10.83.124.54:44113 | NULL | Killed | 464 | login | NULL | 0 | 0 |
| 792122 | testing | localhost | dsp_manager | Killed | 464 | cleaning up | NULL | 0 | 0 |
| 792123 | creat_123_serv | 10.83.124.53:52032 | NULL | Killed | 464 | login | NULL | 0 | 0 |
| 792124 | creat_123_serv | 10.83.124.53:52033 | NULL | Killed | 464 | login | NULL | 0 | 0 |
| 792125 | creat_123_serv | 10.83.124.53:52034 | NULL | Killed | 464 | login | NULL | 0 | 0 |
| 792126 | creat_123_serv | 10.83.124.53:52035 | NULL | Killed | 464 | login | NULL | 0 | 0 |
| 792127 | creat_123_serv | 10.83.124.54:44114 | NULL | Killed | 463 | login | NULL | 0 | 0 |
| 792128 | testing | localhost | dsp_manager | Killed | 463 | cleaning up | NULL | 0 | 0 |
| 792129 | creat_123_serv | 10.83.124.53:52036 | NULL | Killed | 463 | login | NULL | 0 | 0 |
| 792130 | creat_123_serv | 10.83.124.54:44117 | NULL | Killed | 462 | login | NULL | 0 | 0 |
+--------+-----------------+---------------------+---------------------------+-------------+------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+-----------+---------------+
101 rows in set (0.01 sec)

pt-pmp suggests some locking related release_resources mutex in thread handling used in Percona Server:
[<email address hidden> pmp]# perl pt-pmp -p 23791 -i 3 -s 1
Mon Nov 28 11:09:39 PST 2016

warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff993fd000

warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff993fd000

warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff993fd000
    192 __lll_lock_wait(libpthread.so.0),_L_lock_1233(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),THD::release_resources,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
     84 poll(libc.so.6),vio_io_wait,vio_socket_io_wait,vio_read,net_read_raw_loop,net_read_packet,my_net_read,Protocol_classic::read_packet,Protocol_classic::get_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
     30 libaio::??(libaio.so.1),LinuxAIOHandler::collect,LinuxAIOHandler::poll,os_aio_handler,fil_aio_wait,io_handler_thread,start_thread(libpthread.so.0),clone(libc.so.6)
     24 nanosleep(libpthread.so.0),os_thread_sleep,buf_lru_manager,start_thread(libpthread.so.0),clone(libc.so.6)
     15 pthread_cond_wait,Stage_manager::enroll_for,MYSQL_BIN_LOG::change_stage,MYSQL_BIN_LOG::ordered_commit,MYSQL_BIN_LOG::commit,ha_commit_trans,trans_commit,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      9 pthread_cond_wait,os_event::wait_low,srv_worker_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      3 pthread_cond_wait,os_event::wait_low,srv_purge_coordinator_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      3 pthread_cond_wait,os_event::wait_low,buf_resize_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      3 pthread_cond_wait,os_event::wait_low,buf_dump_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      3 pthread_cond_wait,compress_gtid_table,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      3 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,srv_monitor_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      3 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,srv_error_monitor_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      3 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,lock_wait_timeout_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      3 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,ib_wqueue_timedwait,fts_optimize_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      3 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,dict_stats_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      3 pthread_cond_timedwait,MYSQL_BIN_LOG::wait_for_update_bin_log,Binlog_sender::send_binlog,Binlog_sender::run,mysql_binlog_send,com_binlog_dump,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      3 poll(libc.so.6),Mysqld_socket_listener::listen_for_connection_event,mysqld_main,__libc_start_main(libc.so.6),_start
      3 nanosleep(libpthread.so.0),os_thread_sleep,srv_master_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      3 __lll_lock_wait(libpthread.so.0),_L_lock_1034(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),MYSQL_BIN_LOG::set_status_variables,show_binlog_vars,PFS_status_variable_cache::manifest,PFS_status_variable_cache::do_materialize_global,table_global_status::rnd_init,ha_perfschema::rnd_init,handler::ha_rnd_init,init_read_record,join_init_read_record,sub_select,JOIN::exec,TABLE_LIST::materialize_derived,join_materialize_derived,QEP_TAB::prepare_scan,sub_select,JOIN::exec,handle_query,execute_sqlcom_select,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      3 __lll_lock_wait(libpthread.so.0),_L_lock_1034(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),MYSQL_BIN_LOG::change_stage,MYSQL_BIN_LOG::ordered_commit,MYSQL_BIN_LOG::commit,ha_commit_trans,trans_commit,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      3 do_sigwait(libpthread.so.0),sigwait(libpthread.so.0),signal_hand,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      3 do_sigwaitinfo(libc.so.6),sigwaitinfo(libc.so.6),timer_notify_thread_func,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      2 read(libpthread.so.0),my_read,my_b_fill,MYSQL_BIN_LOG::do_write_cache,MYSQL_BIN_LOG::write_cache,binlog_cache_data::flush,MYSQL_BIN_LOG::flush_thread_caches,MYSQL_BIN_LOG::process_flush_stage_queue,MYSQL_BIN_LOG::ordered_commit,MYSQL_BIN_LOG::commit,ha_commit_trans,trans_commit_stmt,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 write(libpthread.so.0),my_write,_my_b_write,MYSQL_BIN_LOG::do_write_cache,MYSQL_BIN_LOG::write_cache,binlog_cache_data::flush,MYSQL_BIN_LOG::flush_thread_caches,MYSQL_BIN_LOG::process_flush_stage_queue,MYSQL_BIN_LOG::ordered_commit,MYSQL_BIN_LOG::commit,ha_commit_trans,trans_commit_stmt,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 ut_time_ms,buf_flush_page_cleaner_coordinator,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_mutex_unlock(libpthread.so.0),os_event_reset,buf_flush_page_cleaner_coordinator,start_thread(libpthread.so.0),clone(libc.so.6)
      1 ::??,::??,buf_flush_page_cleaner_coordinator,start_thread(libpthread.so.0),clone(libc.so.6)

Which is defined in sql/sql_class.cc as follows (PS 5.7.14-7):
void THD::release_resources()
{
  DBUG_ASSERT(m_release_resources_done == false);

  Global_THD_manager::get_instance()->release_thread_id(m_thread_id);

  mysql_mutex_lock(&LOCK_status);
  add_to_status(&global_status_var, &status_var, false);
  /*
    Status queries after this point should not aggregate THD::status_var
    since the values has been added to global_status_var.
    The status values are not reset so that they can still be read
    by performance schema.
  */
  status_var_aggregated= true;
  mysql_mutex_unlock(&LOCK_status);

  /* Ensure that no one is using THD */
  mysql_mutex_lock(&LOCK_thd_data);
  mysql_mutex_lock(&LOCK_query_plan);

  /* Close connection */
#ifndef EMBEDDED_LIBRARY
  if (is_classic_protocol() && get_protocol_classic()->get_vio())
  {
    vio_delete(get_protocol_classic()->get_vio());
    get_protocol_classic()->end_net();
  }
#endif

  /* modification plan for UPDATE/DELETE should be freed. */
  DBUG_ASSERT(query_plan.get_modification_plan() == NULL);
  mysql_mutex_unlock(&LOCK_query_plan);
  mysql_mutex_unlock(&LOCK_thd_data);
  mysql_mutex_lock(&LOCK_thd_query);
  mysql_mutex_unlock(&LOCK_thd_query);

  stmt_map.reset(); /* close all prepared statements */
  if (!cleanup_done)
    cleanup();

  mdl_context.destroy();
  ha_close_connection(this);
  /*
    Debug sync system must be closed after ha_close_connection, because
    DEBUG_SYNC is used in InnoDB connection handlerton close.
  */
#if defined(ENABLED_DEBUG_SYNC)
  /* End the Debug Sync Facility. See debug_sync.cc. */
  debug_sync_end_thread(this);
#endif /* defined(ENABLED_DEBUG_SYNC) */

  plugin_thdvar_cleanup(this, m_enable_plugins);

  DBUG_ASSERT(timer == NULL);

  if (timer_cache)
    thd_timer_destroy(timer_cache);

#ifndef EMBEDDED_LIBRARY
  if (rli_fake)
  {
    rli_fake->end_info();
    delete rli_fake;
    rli_fake= NULL;
  }
  mysql_audit_free_thd(this);
#endif

  if (current_thd == this)
    restore_globals();
  m_release_resources_done= true;
}

Seems these 3 might be related (Percona Server slow query log patch maybe?)
mysql_mutex_lock(&LOCK_thd_query);
mysql_mutex_lock(&LOCK_thd_data);
mysql_mutex_lock(&LOCK_query_plan);

Tested community version (5.7.16) with exactly the same dataset and load and problem is not present, also problem was not present in 5.6 series.

How to repeat:
Fire a VM with 2 cores and 2G of RAM, good IO capacity (maybe something around 10k IOPS), 1G of innodb_buffer_pool_size and load a file bigger than buffer pool capacity. We reproduced the error with files of 2G and 5G, sync_binlog=0 and innodb_flush_log_at_trx_commit=2 seems not related after few tests with different configurations.
On a side note the table loaded is using unique keys and text fields which can increment the flushing during commit stage.

description: updated

Can you provide the raw unaggregated PMP samples?

tags: added: performance
removed: percona-server-5.7

Sure, file attached.

THD::release_resources takes several mutexes, without line number info in PMP hard to say which one, nor anything in the rest of the threads comes up as obviously holding something THD::release_resources needs. Would need PFS mutex contention data or reproduction procedure

How to repeat:
Fire a VM with 2 cores and 2G of RAM, good IO capacity (maybe something around 10k IOPS), 1G of innodb_buffer_pool_size and load a file bigger than buffer pool capacity:
- Table definition:
CREATE TABLE `test` (
  `process_date` datetime NOT NULL,
  `account_id` int(11) NOT NULL,
  `rtb_rule_id` int(11) NOT NULL,
  `revenue_gain` decimal(15,4) NOT NULL DEFAULT '0.0000',
  `recommended_floor` decimal(10,4) NOT NULL DEFAULT '0.0000',
  `revenue_gain_currency_id` char(3) NOT NULL DEFAULT 'USD',
  `recommended_floor_currency_id` char(3) NOT NULL DEFAULT 'USD',
  `bid_distribution` longtext,
  `price_floor_distribution` longtext,
  `fill_rate_distribution` longtext,
  `created` datetime NOT NULL,
  `updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  KEY `account_rtb_rule` (`account_id`,`rtb_rule_id`),
  KEY `create_date` (`created`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

- Command:
load data local infile '/tmp/test.txt' into table test;

I could provide sample rows in pvt, can't publish them.

Some more info related:
OS is CentOS 5.8 and PS was installed via custom RPM created by us since Percona is not providing rpms for CentOS/RHEL 5 series.

This is caused by binary log Binlog_snapshot_file and Binlog_snapshot_position status variables, getting which tries to take LOCK_log with LOCK_status already locked. LOCK_log is taken by the committing thread which has a lot of work to binlog the file load, and LOCK_status is what all connection threads try to take in THD::release_resources

tags: added: consistent-binlog-snapshot
Yves Trudeau (y-trudeau) wrote :

Here's a similar case, a deadlock between multiple threads around LOCK_log, LOCK_index, LOCK_status and LOCK_thd_data. The output of pt-pmp and the raw trace are attached.

Yves, my PMP analysis is missing some piece of puzzle that you have found above.

What I see is

    Global_THD_manager::add_thd threads try to lock LOCK_thd_list (but
    don't appear to hold anything relevant)
    THD::release_resources threads try to lock LOCK_status (but don't
    appear to hold anything relevant)

    Thread 2266: tries to lock thd->LOCK_thd_data
       has LOCK_thd_list and LOCK_index
    Thread 1984: tries to lock LOCK_log
       has LOCK_status
    Thread 357: tries to lock LOCK_index
       has LOCK_log

    dep graph: 1984 -> 357 -> 2266

I'm missing the thread(s) that complete the deadlock loop here: tries to lock LOCK_status and has LOCK_thd_data.

At the same time, I see the following in the PMP aggregation

      1 __lll_lock_wait(libpthread.so.0),_L_lock_791(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),native_mutex_lock(thr_mutex.h:84),my_mutex_lock(thr_mutex.h:84),inline_mysql_mutex_lock(thr_mutex.h:84),MYSQL_BIN_LOG::change_stage(thr_mutex.h:84),MYSQL_BIN_LOG::ordered_commit(binlog.cc:9150),MYSQL_BIN_LOG::commit(binlog.cc:8426),ha_commit_trans(handler.cc:1818),trans_commit_stmt(transaction.cc:458),mysql_execute_command(sql_parse.cc:5241),Prepared_statement::execute(sql_prepare.cc:4281),Prepared_statement::execute_loop(sql_prepare.cc:3889),mysqld_stmt_execute(sql_prepare.cc:2679),dispatch_command(sql_parse.cc:1381),do_command(sql_parse.cc:1010),handle_connection(connection_handler_per_thread.cc:312),pfs_spawn_thread(pfs.cc:2188),start_thread(libpthread.so.0),clone(libc.so.6)

Which I fail to find in the raw PMP.

Can you help?

Yves Trudeau (y-trudeau) wrote :

I checked this morning and the customer trashed the EC2 instance. I ran the tool a few times and I assumed the instance was completely frozen. Apparently is wasn't. Here's the pt-pmp of the raw output. The problem is in there...

Yves, are you saying that there is no confirmed deadlock, only a stall, as up to comment #6? Do you have raw stacktraces of the PMP in comment #9?

Discussed with Yves on Slack. Raw PMP in comment #7 corresponds to processed PMP in comment #9.

Upon further analysis found the missing bit of comment #8: thread 1984 also has locked one of thd->LOCK_thd_data, and so the picture is

    Thread 2266: tries to lock thd->LOCK_thd_data
       has LOCK_index
    Thread 1984: tries to lock LOCK_log
       has thd->LOCK_thd_data
    Thread 357: tries to lock LOCK_index
       has LOCK_log

1984 -> 357 -> 2266 -> 1984. Deadlock. Will move this to separate bug as symptoms are different from a transient COMMIT stall with concurrent SHOW STATUS and will fix together.

tags: added: regression

Will fix the underlying cause (SHOW STATUS locking LOCK_status, then LOCK_log) for 5.6 too.

summary: - Server becomes unresponsive during flushing after loading big files
- through LOAD DATA INFILE
+ SHOW STATUS blocked by binlog commit | Server becomes unresponsive
+ during flushing after loading big files through LOAD DATA INFILE

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1765

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers