mysqladmin shutdown hangs after flush-logs error

Bug #1658354 reported by Juan Pablo Arruti
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL Server
Unknown
Unknown
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.5
Invalid
Undecided
Unassigned
5.6
New
Undecided
Unassigned
5.7
Triaged
High
Robert Golebiowski

Bug Description

Description:

mysqladmin shutdown command hangs when it's called after flush error (ERROR 1290) on slave instance using super_read_only variable.
Instance does not accept more connections and the only way to stop it is killing mysql process.

Seems to be realted with upstream bugs #84332 (compression of mysql.gtid_executed has been prevented when super_read_only = 1) and #84437 (super-read-only does not allow FLUSH LOGS on 5.7).

How to repeat:

Variables for Master:

gtid_mode = ON
enforce_gtid_consistency = ON

Variables for Slave:

log_bin=mysql-bin
gtid_mode = ON
enforce_gtid_consistency = ON
log_slave_updates = ON
super_read_only = ON

Requirements:

A) Master/slave configuration using GTIDs.
B) Slave instance is using super_read_only.
C) High amount of transactions been executed on master.

Procedure:

A) Execute high amount of transactions on master, I've used following sysbench command:

sysbench --test=oltp.lua --mysql-socket=/tmp/mysql_sandbox5716.sock --report-interval=1 --oltp-tables-count=100000 --oltp-table-size=1000 --max-time=1800 --oltp-read-only=off --max-requests=0 --num-threads=8 --rand-type=uniform --db-driver=mysql --mysql-user=msandbox --mysql-password=msandbox --mysql-db=test prepare

B) Table mysql.gtid_executed grows to a considerable amount of rows:

mysql> select count(*),@@super_read_only,@@read_only from mysql.gtid_executed;
+----------+-------------------+-------------+
| count(*) | @@super_read_only | @@read_only |
+----------+-------------------+-------------+
| 2684 | 1 | 1 |
+----------+-------------------+-------------+

C) Execute flush logs using mysqladmin or using mysql cli till get 1290 error:

juan.arruti@bm-support01:~$ /opt/percona_server/5.7.16/bin/mysqladmin -u msandbox -p flush-logs --socket=/tmp/mysql_sandbox5718.sock
Enter password:
mysqladmin: refresh failed; error: 'The MySQL server is running with the --super-read-only option so it cannot execute this statement'

D) Shut down MySQL process:

juan.arruti@bm-support01:~$ /opt/percona_server/5.7.16/bin/mysqladmin -u msandbox -p shutdown --socket=/tmp/mysql_sandbox5718.sock
Enter password:

^CWarning; Aborted waiting on pid file: '/home/juan.arruti/sandboxes/msb_100025_7_16/data/mysql_sandbox5718.pid' after 365 seconds

Here is output from pt-pmp, it seems that is waiting for gtid_table thread but we saw in upstream bug #84332 that this is not happening.

root@bm-support01:~# pt-pmp -p 47425
Sat Jan 21 13:14:04 EST 2017
92 pthread_join.c: No such file or directory.
     10 __io_getevents_0_4,LinuxAIOHandler::collect(os0file.cc:2502),LinuxAIOHandler::poll(os0file.cc:2648),os_aio_linux_handler(os0file.cc:2704),os_aio_handler(os0file.cc:2704),fil_aio_wait(fil0fil.cc:5869),io_handler_thread(srv0start.cc:330),start_thread,clone
      3 pthread_cond_wait,wait(os0event.h:156),os_event::wait_low(os0event.h:156),os_event_wait_low(os0event.cc:328),srv_worker_thread(srv0srv.cc:2879),start_thread,clone
      1 row_upd(row0upd.cc:3067),row_upd_step(row0upd.cc:3158),row_update_for_mysql_using_upd_graph(row0mysql.cc:2524),row_update_for_mysql(row0mysql.cc:2722),ha_innobase::delete_row(ha_innodb.cc:8701),handler::ha_delete_row(handler.cc:8365),Gtid_table_persistor::compress_first_consecutive_range(rpl_gtid_persist.cc:639),Gtid_table_persistor::compress_in_single_transaction(rpl_gtid_persist.cc:581),Gtid_table_persistor::compress(rpl_gtid_persist.cc:544),Gtid_state::compress(rpl_gtid_state.cc:855),compress_gtid_table(rpl_gtid_persist.cc:861),pfs_spawn_thread(pfs.cc:2188),start_thread,clone
      1 pthread_join,terminate_compress_gtid_table_thread(rpl_gtid_persist.cc:947),mysqld_main(mysqld.cc:5316),__libc_start_main,_start
      1 pthread_cond_wait,wait(os0event.h:156),os_event::wait_low(os0event.h:156),os_event_wait_low(os0event.cc:328),srv_purge_coordinator_suspend(srv0srv.cc:3039),srv_purge_coordinator_thread(srv0srv.cc:3039),start_thread,clone
      1 pthread_cond_wait,wait(os0event.h:156),os_event::wait_low(os0event.h:156),os_event_wait_low(os0event.cc:328),buf_resize_thread(buf0buf.cc:3043),start_thread,clone
      1 pthread_cond_wait,wait(os0event.h:156),os_event::wait_low(os0event.h:156),os_event_wait_low(os0event.cc:328),buf_dump_thread(buf0dump.cc:781),start_thread,clone
      1 pthread_cond_wait,native_cond_wait(thr_cond.h:140),my_cond_wait(thr_cond.h:140),inline_mysql_cond_wait(thr_cond.h:140),signal_hand(thr_cond.h:140),pfs_spawn_thread(pfs.cc:2188),start_thread,clone
      1 pthread_cond_timedwait,os_event::timed_wait(os0event.cc:81),os_event::wait_time_low(os0event.cc:208),os_event_wait_time_low(os0event.cc:311),srv_monitor_thread(srv0srv.cc:1807),start_thread,clone
      1 pthread_cond_timedwait,os_event::timed_wait(os0event.cc:81),os_event::wait_time_low(os0event.cc:208),os_event_wait_time_low(os0event.cc:311),srv_error_monitor_thread(srv0srv.cc:1997),start_thread,clone
      1 pthread_cond_timedwait,os_event::timed_wait(os0event.cc:81),os_event::wait_time_low(os0event.cc:208),os_event_wait_time_low(os0event.cc:311),pc_sleep_if_needed(buf0flu.cc:2757),buf_flush_page_cleaner_coordinator(buf0flu.cc:2757),start_thread,clone
      1 pthread_cond_timedwait,os_event::timed_wait(os0event.cc:81),os_event::wait_time_low(os0event.cc:208),os_event_wait_time_low(os0event.cc:311),lock_wait_timeout_thread(lock0wait.cc:501),start_thread,clone
      1 pthread_cond_timedwait,os_event::timed_wait(os0event.cc:81),os_event::wait_time_low(os0event.cc:208),os_event_wait_time_low(os0event.cc:311),ib_wqueue_timedwait(ut0wqueue.cc:160),fts_optimize_thread(fts0opt.cc:3045),start_thread,clone
      1 pthread_cond_timedwait,os_event::timed_wait(os0event.cc:81),os_event::wait_time_low(os0event.cc:208),os_event_wait_time_low(os0event.cc:311),dict_stats_thread(dict0stats_bg.cc:421),start_thread,clone
      1 poll,vio_io_wait(viosocket.c:1157),vio_socket_io_wait(viosocket.c:116),vio_ssl_read(viossl.c:196),net_read_raw_loop(net_serv.cc:672),net_read_packet_header(net_serv.cc:756),net_read_packet(net_serv.cc:756),my_net_read(net_serv.cc:899),Protocol_classic::read_packet(protocol_classic.cc:808),Protocol_classic::get_command(protocol_classic.cc:965),do_command(sql_parse.cc:949),handle_connection(connection_handler_per_thread.cc:312),pfs_spawn_thread(pfs.cc:2188),start_thread,clone
      1 nanosleep,os_thread_sleep(os0thread.cc:300),srv_master_sleep(srv0srv.cc:2667),srv_master_thread(srv0srv.cc:2667),start_thread,clone
      1 nanosleep,os_thread_sleep(os0thread.cc:300),buf_lru_manager_sleep_if_needed(buf0flu.cc:3556),buf_lru_manager(buf0flu.cc:3556),start_thread,clone
      1 do_sigwaitinfo,__GI___sigwaitinfo,timer_notify_thread_func(posix_timers.c:77),pfs_spawn_thread(pfs.cc:2188),start_thread,clone

Workaround seems to be disable super_read_only to let mysql compress gtid_table ant then shutdown the instance.

Suggested fix: Let MySQL compress gtid_table when super_read_only is enable.

Revision history for this message
Juan Pablo Arruti (juanarru) wrote :

Tested in PS 5.7.16 and MySQL Community 5.7.17

tags: added: upstream
tags: added: 162685
Revision history for this message
Robert Golebiowski (robert.golebiowski) wrote :

This should be fixed by commit:

commit b24fedd4a043088f3d9117bea3011788d4f6531e
Author: Sujatha Sivakumar <email address hidden>
Date: Tue Nov 15 12:01:05 2016 +0530

    Bug#22857926: ASSERTION `! IS_SET()' AT SQL_ERROR.CC:38 IN
    READ_ONLY MODE FOR MANY RPL CMDS.

    Problem:
    =======
    This bug is opened to fix assertion `! is_set()' at
    sql_error.cc:380 in "void Diagnostics_area::set_ok_status
    (ulonglong, ulonglong, const char*)" for many RPL commands.
    This issue has been fixed for couple of commands with
    Bug#22097534. But, still issue exists for few other
    commands.

    RPL commands affected:-
    =======================
    [PENDING] 1. FLUSH LOGS;
    [PENDING] 2. RESET SLAVE ALL;
    [PENDING] 3. SET GLOBAL gtid_purged='aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1';
    [FIXED] 4. CHANGE MASTER TO..
    [FIXED] 5. CHANGE REPLICATION FILTER
    [FIXED] 6. START SLAVE;
    [FIXED] 7. STOP SLAVE;

    Analysis:
    ========
    When the above set of commands are executed they will try
    to update replication specific system tables. For example
    mysql.gtid_executed
    mysql.salve_master_info
    mysql.relay_log_info

    This scenario is applicable to all replication specific
    internal system tables. During the fix for Bug22097534 it
    was concluded that "read only" mode should prevent changes
    to user data, but not to prevent normal Server operations or
    to prevent a DBA from managing the state of the MySQL Server
    itself.

    Hence the above commands should be able to update data in
    replication specific internal tables.

    Fix:
    ===
    There exists a flag named 'ignore_global_read_lock' whose
    intention is to ignore the read only option and allow
    commits to replication specific system tables. Reused
    this flag as part of fix. If this flag is set then the
    additional check for 'read only' option is not done.
    Hence 'read only' mode error will not be reported and
    command will complete successfully.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Should be re-verified and closed?

Revision history for this message
Juan Pablo Arruti (juanarru) wrote :

Upstream bugs #84332 (compression of mysql.gtid_executed has been prevented when super_read_only = 1) and #84437 (super-read-only does not allow FLUSH LOGS on 5.7) are not longer happening in Percona Server 5.7.18, these two bugs were related with this hung. So this bug, can't be reproduced in PS 5.7.18.

Revision history for this message
Peter Schwaller (peter-schwaller) wrote :

Fixed upstream?

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1052

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.