Description:
mysqladmin shutdown command hangs when it's called after flush error (ERROR 1290) on slave instance using super_read_only variable.
Instance does not accept more connections and the only way to stop it is killing mysql process.
Seems to be realted with upstream bugs #84332 (compression of mysql.gtid_executed has been prevented when super_read_only = 1) and #84437 (super-read-only does not allow FLUSH LOGS on 5.7).
How to repeat:
Variables for Master:
gtid_mode = ON
enforce_gtid_consistency = ON
Variables for Slave:
log_bin=mysql-bin
gtid_mode = ON
enforce_gtid_consistency = ON
log_slave_updates = ON
super_read_only = ON
Requirements:
A) Master/slave configuration using GTIDs.
B) Slave instance is using super_read_only.
C) High amount of transactions been executed on master.
Procedure:
A) Execute high amount of transactions on master, I've used following sysbench command:
sysbench --test=oltp.lua --mysql-socket=/tmp/mysql_sandbox5716.sock --report-interval=1 --oltp-tables-count=100000 --oltp-table-size=1000 --max-time=1800 --oltp-read-only=off --max-requests=0 --num-threads=8 --rand-type=uniform --db-driver=mysql --mysql-user=msandbox --mysql-password=msandbox --mysql-db=test prepare
B) Table mysql.gtid_executed grows to a considerable amount of rows:
mysql> select count(*),@@super_read_only,@@read_only from mysql.gtid_executed;
+----------+-------------------+-------------+
| count(*) | @@super_read_only | @@read_only |
+----------+-------------------+-------------+
| 2684 | 1 | 1 |
+----------+-------------------+-------------+
C) Execute flush logs using mysqladmin or using mysql cli till get 1290 error:
juan.arruti@bm-support01:~$ /opt/percona_server/5.7.16/bin/mysqladmin -u msandbox -p flush-logs --socket=/tmp/mysql_sandbox5718.sock
Enter password:
mysqladmin: refresh failed; error: 'The MySQL server is running with the --super-read-only option so it cannot execute this statement'
D) Shut down MySQL process:
juan.arruti@bm-support01:~$ /opt/percona_server/5.7.16/bin/mysqladmin -u msandbox -p shutdown --socket=/tmp/mysql_sandbox5718.sock
Enter password:
^CWarning; Aborted waiting on pid file: '/home/juan.arruti/sandboxes/msb_100025_7_16/data/mysql_sandbox5718.pid' after 365 seconds
Here is output from pt-pmp, it seems that is waiting for gtid_table thread but we saw in upstream bug #84332 that this is not happening.
root@bm-support01:~# pt-pmp -p 47425
Sat Jan 21 13:14:04 EST 2017
92 pthread_join.c: No such file or directory.
10 __io_getevents_0_4,LinuxAIOHandler::collect(os0file.cc:2502),LinuxAIOHandler::poll(os0file.cc:2648),os_aio_linux_handler(os0file.cc:2704),os_aio_handler(os0file.cc:2704),fil_aio_wait(fil0fil.cc:5869),io_handler_thread(srv0start.cc:330),start_thread,clone
3 pthread_cond_wait,wait(os0event.h:156),os_event::wait_low(os0event.h:156),os_event_wait_low(os0event.cc:328),srv_worker_thread(srv0srv.cc:2879),start_thread,clone
1 row_upd(row0upd.cc:3067),row_upd_step(row0upd.cc:3158),row_update_for_mysql_using_upd_graph(row0mysql.cc:2524),row_update_for_mysql(row0mysql.cc:2722),ha_innobase::delete_row(ha_innodb.cc:8701),handler::ha_delete_row(handler.cc:8365),Gtid_table_persistor::compress_first_consecutive_range(rpl_gtid_persist.cc:639),Gtid_table_persistor::compress_in_single_transaction(rpl_gtid_persist.cc:581),Gtid_table_persistor::compress(rpl_gtid_persist.cc:544),Gtid_state::compress(rpl_gtid_state.cc:855),compress_gtid_table(rpl_gtid_persist.cc:861),pfs_spawn_thread(pfs.cc:2188),start_thread,clone
1 pthread_join,terminate_compress_gtid_table_thread(rpl_gtid_persist.cc:947),mysqld_main(mysqld.cc:5316),__libc_start_main,_start
1 pthread_cond_wait,wait(os0event.h:156),os_event::wait_low(os0event.h:156),os_event_wait_low(os0event.cc:328),srv_purge_coordinator_suspend(srv0srv.cc:3039),srv_purge_coordinator_thread(srv0srv.cc:3039),start_thread,clone
1 pthread_cond_wait,wait(os0event.h:156),os_event::wait_low(os0event.h:156),os_event_wait_low(os0event.cc:328),buf_resize_thread(buf0buf.cc:3043),start_thread,clone
1 pthread_cond_wait,wait(os0event.h:156),os_event::wait_low(os0event.h:156),os_event_wait_low(os0event.cc:328),buf_dump_thread(buf0dump.cc:781),start_thread,clone
1 pthread_cond_wait,native_cond_wait(thr_cond.h:140),my_cond_wait(thr_cond.h:140),inline_mysql_cond_wait(thr_cond.h:140),signal_hand(thr_cond.h:140),pfs_spawn_thread(pfs.cc:2188),start_thread,clone
1 pthread_cond_timedwait,os_event::timed_wait(os0event.cc:81),os_event::wait_time_low(os0event.cc:208),os_event_wait_time_low(os0event.cc:311),srv_monitor_thread(srv0srv.cc:1807),start_thread,clone
1 pthread_cond_timedwait,os_event::timed_wait(os0event.cc:81),os_event::wait_time_low(os0event.cc:208),os_event_wait_time_low(os0event.cc:311),srv_error_monitor_thread(srv0srv.cc:1997),start_thread,clone
1 pthread_cond_timedwait,os_event::timed_wait(os0event.cc:81),os_event::wait_time_low(os0event.cc:208),os_event_wait_time_low(os0event.cc:311),pc_sleep_if_needed(buf0flu.cc:2757),buf_flush_page_cleaner_coordinator(buf0flu.cc:2757),start_thread,clone
1 pthread_cond_timedwait,os_event::timed_wait(os0event.cc:81),os_event::wait_time_low(os0event.cc:208),os_event_wait_time_low(os0event.cc:311),lock_wait_timeout_thread(lock0wait.cc:501),start_thread,clone
1 pthread_cond_timedwait,os_event::timed_wait(os0event.cc:81),os_event::wait_time_low(os0event.cc:208),os_event_wait_time_low(os0event.cc:311),ib_wqueue_timedwait(ut0wqueue.cc:160),fts_optimize_thread(fts0opt.cc:3045),start_thread,clone
1 pthread_cond_timedwait,os_event::timed_wait(os0event.cc:81),os_event::wait_time_low(os0event.cc:208),os_event_wait_time_low(os0event.cc:311),dict_stats_thread(dict0stats_bg.cc:421),start_thread,clone
1 poll,vio_io_wait(viosocket.c:1157),vio_socket_io_wait(viosocket.c:116),vio_ssl_read(viossl.c:196),net_read_raw_loop(net_serv.cc:672),net_read_packet_header(net_serv.cc:756),net_read_packet(net_serv.cc:756),my_net_read(net_serv.cc:899),Protocol_classic::read_packet(protocol_classic.cc:808),Protocol_classic::get_command(protocol_classic.cc:965),do_command(sql_parse.cc:949),handle_connection(connection_handler_per_thread.cc:312),pfs_spawn_thread(pfs.cc:2188),start_thread,clone
1 nanosleep,os_thread_sleep(os0thread.cc:300),srv_master_sleep(srv0srv.cc:2667),srv_master_thread(srv0srv.cc:2667),start_thread,clone
1 nanosleep,os_thread_sleep(os0thread.cc:300),buf_lru_manager_sleep_if_needed(buf0flu.cc:3556),buf_lru_manager(buf0flu.cc:3556),start_thread,clone
1 do_sigwaitinfo,__GI___sigwaitinfo,timer_notify_thread_func(posix_timers.c:77),pfs_spawn_thread(pfs.cc:2188),start_thread,clone
Workaround seems to be disable super_read_only to let mysql compress gtid_table ant then shutdown the instance.
Suggested fix: Let MySQL compress gtid_table when super_read_only is enable.
Tested in PS 5.7.16 and MySQL Community 5.7.17