Percona Server moved to https://jira.percona.com/projects/PS

mysqladmin shutdown hangs after flush-logs error

Bug #1658354 reported by Juan Pablo Arruti on 2017-01-21

This bug affects 1 person

	Status	Importance	Assigned to
MySQL Server	Unknown	Unknown	mysql-bugs #84597
Percona Server moved to https://jira.percona.com/projects/PS	Status tracked in 5.7
5.5	Invalid	Undecided	Unassigned
5.6	New	Undecided	Unassigned
5.7	Triaged	High	Robert Golebiowski

Bug Description

Description:

mysqladmin shutdown command hangs when it's called after flush error (ERROR 1290) on slave instance using super_read_only variable.
Instance does not accept more connections and the only way to stop it is killing mysql process.

Seems to be realted with upstream bugs #84332 (compression of mysql.gtid_executed has been prevented when super_read_only = 1) and #84437 (super-read-only does not allow FLUSH LOGS on 5.7).

How to repeat:

Variables for Master:

gtid_mode = ON
enforce_gtid_consistency = ON

Variables for Slave:

log_bin=mysql-bin
gtid_mode = ON
enforce_gtid_consistency = ON
log_slave_updates = ON
super_read_only = ON

Requirements:

A) Master/slave configuration using GTIDs.
B) Slave instance is using super_read_only.
C) High amount of transactions been executed on master.

Procedure:

A) Execute high amount of transactions on master, I've used following sysbench command:

sysbench --test=oltp.lua --mysql-socket=/tmp/mysql_sandbox5716.sock --report-interval=1 --oltp-tables-count=100000 --oltp-table-size=1000 --max-time=1800 --oltp-read-only=off --max-requests=0 --num-threads=8 --rand-type=uniform --db-driver=mysql --mysql-user=msandbox --mysql-password=msandbox --mysql-db=test prepare

B) Table mysql.gtid_executed grows to a considerable amount of rows:

mysql> select count(*),@@super_read_only,@@read_only from mysql.gtid_executed;
+----------+-------------------+-------------+
| count(*) | @@super_read_only | @@read_only |
+----------+-------------------+-------------+
| 2684 | 1 | 1 |
+----------+-------------------+-------------+

C) Execute flush logs using mysqladmin or using mysql cli till get 1290 error:

juan.arruti@bm-support01:~$ /opt/percona_server/5.7.16/bin/mysqladmin -u msandbox -p flush-logs --socket=/tmp/mysql_sandbox5718.sock
Enter password:
mysqladmin: refresh failed; error: 'The MySQL server is running with the --super-read-only option so it cannot execute this statement'

D) Shut down MySQL process:

juan.arruti@bm-support01:~$ /opt/percona_server/5.7.16/bin/mysqladmin -u msandbox -p shutdown --socket=/tmp/mysql_sandbox5718.sock
Enter password:

^CWarning; Aborted waiting on pid file: '/home/juan.arruti/sandboxes/msb_100025_7_16/data/mysql_sandbox5718.pid' after 365 seconds

Here is output from pt-pmp, it seems that is waiting for gtid_table thread but we saw in upstream bug #84332 that this is not happening.

root@bm-support01:~# pt-pmp -p 47425
Sat Jan 21 13:14:04 EST 2017
92 pthread_join.c: No such file or directory.
     10 __io_getevents_0_4,LinuxAIOHandler::collect(os0file.cc:2502),LinuxAIOHandler::poll(os0file.cc:2648),os_aio_linux_handler(os0file.cc:2704),os_aio_handler(os0file.cc:2704),fil_aio_wait(fil0fil.cc:5869),io_handler_thread(srv0start.cc:330),start_thread,clone
      3 pthread_cond_wait,wait(os0event.h:156),os_event::wait_low(os0event.h:156),os_event_wait_low(os0event.cc:328),srv_worker_thread(srv0srv.cc:2879),start_thread,clone
      1 row_upd(row0upd.cc:3067),row_upd_step(row0upd.cc:3158),row_update_for_mysql_using_upd_graph(row0mysql.cc:2524),row_update_for_mysql(row0mysql.cc:2722),ha_innobase::delete_row(ha_innodb.cc:8701),handler::ha_delete_row(handler.cc:8365),Gtid_table_persistor::compress_first_consecutive_range(rpl_gtid_persist.cc:639),Gtid_table_persistor::compress_in_single_transaction(rpl_gtid_persist.cc:581),Gtid_table_persistor::compress(rpl_gtid_persist.cc:544),Gtid_state::compress(rpl_gtid_state.cc:855),compress_gtid_table(rpl_gtid_persist.cc:861),pfs_spawn_thread(pfs.cc:2188),start_thread,clone
      1 pthread_join,terminate_compress_gtid_table_thread(rpl_gtid_persist.cc:947),mysqld_main(mysqld.cc:5316),__libc_start_main,_start
      1 pthread_cond_wait,wait(os0event.h:156),os_event::wait_low(os0event.h:156),os_event_wait_low(os0event.cc:328),srv_purge_coordinator_suspend(srv0srv.cc:3039),srv_purge_coordinator_thread(srv0srv.cc:3039),start_thread,clone
      1 pthread_cond_wait,wait(os0event.h:156),os_event::wait_low(os0event.h:156),os_event_wait_low(os0event.cc:328),buf_resize_thread(buf0buf.cc:3043),start_thread,clone
      1 pthread_cond_wait,wait(os0event.h:156),os_event::wait_low(os0event.h:156),os_event_wait_low(os0event.cc:328),buf_dump_thread(buf0dump.cc:781),start_thread,clone
      1 pthread_cond_wait,native_cond_wait(thr_cond.h:140),my_cond_wait(thr_cond.h:140),inline_mysql_cond_wait(thr_cond.h:140),signal_hand(thr_cond.h:140),pfs_spawn_thread(pfs.cc:2188),start_thread,clone
      1 pthread_cond_timedwait,os_event::timed_wait(os0event.cc:81),os_event::wait_time_low(os0event.cc:208),os_event_wait_time_low(os0event.cc:311),srv_monitor_thread(srv0srv.cc:1807),start_thread,clone
      1 pthread_cond_timedwait,os_event::timed_wait(os0event.cc:81),os_event::wait_time_low(os0event.cc:208),os_event_wait_time_low(os0event.cc:311),srv_error_monitor_thread(srv0srv.cc:1997),start_thread,clone
      1 pthread_cond_timedwait,os_event::timed_wait(os0event.cc:81),os_event::wait_time_low(os0event.cc:208),os_event_wait_time_low(os0event.cc:311),pc_sleep_if_needed(buf0flu.cc:2757),buf_flush_page_cleaner_coordinator(buf0flu.cc:2757),start_thread,clone
      1 pthread_cond_timedwait,os_event::timed_wait(os0event.cc:81),os_event::wait_time_low(os0event.cc:208),os_event_wait_time_low(os0event.cc:311),lock_wait_timeout_thread(lock0wait.cc:501),start_thread,clone
      1 pthread_cond_timedwait,os_event::timed_wait(os0event.cc:81),os_event::wait_time_low(os0event.cc:208),os_event_wait_time_low(os0event.cc:311),ib_wqueue_timedwait(ut0wqueue.cc:160),fts_optimize_thread(fts0opt.cc:3045),start_thread,clone
      1 pthread_cond_timedwait,os_event::timed_wait(os0event.cc:81),os_event::wait_time_low(os0event.cc:208),os_event_wait_time_low(os0event.cc:311),dict_stats_thread(dict0stats_bg.cc:421),start_thread,clone
      1 poll,vio_io_wait(viosocket.c:1157),vio_socket_io_wait(viosocket.c:116),vio_ssl_read(viossl.c:196),net_read_raw_loop(net_serv.cc:672),net_read_packet_header(net_serv.cc:756),net_read_packet(net_serv.cc:756),my_net_read(net_serv.cc:899),Protocol_classic::read_packet(protocol_classic.cc:808),Protocol_classic::get_command(protocol_classic.cc:965),do_command(sql_parse.cc:949),handle_connection(connection_handler_per_thread.cc:312),pfs_spawn_thread(pfs.cc:2188),start_thread,clone
      1 nanosleep,os_thread_sleep(os0thread.cc:300),srv_master_sleep(srv0srv.cc:2667),srv_master_thread(srv0srv.cc:2667),start_thread,clone
      1 nanosleep,os_thread_sleep(os0thread.cc:300),buf_lru_manager_sleep_if_needed(buf0flu.cc:3556),buf_lru_manager(buf0flu.cc:3556),start_thread,clone
      1 do_sigwaitinfo,__GI___sigwaitinfo,timer_notify_thread_func(posix_timers.c:77),pfs_spawn_thread(pfs.cc:2188),start_thread,clone

Workaround seems to be disable super_read_only to let mysql compress gtid_table ant then shutdown the instance.

Suggested fix: Let MySQL compress gtid_table when super_read_only is enable.

Tags:

Revision history for this message

Juan Pablo Arruti (juanarru) wrote on 2017-01-21:

Tested in PS 5.7.16 and MySQL Community 5.7.17

Laurynas Biveinis (laurynas-biveinis) on 2017-01-23

tags:

added: upstream

Sveta Smirnova (svetasmirnova) on 2017-01-23

tags:

added: 162685

Revision history for this message

Robert Golebiowski (robert.golebiowski) wrote on 2017-05-23:

This should be fixed by commit:

commit b24fedd4a043088f3d9117bea3011788d4f6531e
Author: Sujatha Sivakumar <email address hidden>
Date: Tue Nov 15 12:01:05 2016 +0530

Bug#22857926: ASSERTION `! IS_SET()' AT SQL_ERROR.CC:38 IN
READ_ONLY MODE FOR MANY RPL CMDS.

    Problem:
    =======
    This bug is opened to fix assertion `! is_set()' at
    sql_error.cc:380 in "void Diagnostics_area::set_ok_status
    (ulonglong, ulonglong, const char*)" for many RPL commands.
    This issue has been fixed for couple of commands with
    Bug#22097534. But, still issue exists for few other
    commands.

    RPL commands affected:-
    =======================
    [PENDING] 1. FLUSH LOGS;
    [PENDING] 2. RESET SLAVE ALL;
    [PENDING] 3. SET GLOBAL gtid_purged='aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1';
    [FIXED] 4. CHANGE MASTER TO..
    [FIXED] 5. CHANGE REPLICATION FILTER
    [FIXED] 6. START SLAVE;
    [FIXED] 7. STOP SLAVE;

    Analysis:
    ========
    When the above set of commands are executed they will try
    to update replication specific system tables. For example
    mysql.gtid_executed
    mysql.salve_master_info
    mysql.relay_log_info

    This scenario is applicable to all replication specific
    internal system tables. During the fix for Bug22097534 it
    was concluded that "read only" mode should prevent changes
    to user data, but not to prevent normal Server operations or
    to prevent a DBA from managing the state of the MySQL Server
    itself.

Hence the above commands should be able to update data in
replication specific internal tables.

    Fix:
    ===
    There exists a flag named 'ignore_global_read_lock' whose
    intention is to ignore the read only option and allow
    commits to replication specific system tables. Reused
    this flag as part of fix. If this flag is set then the
    additional check for 'read only' option is not done.
    Hence 'read only' mode error will not be reported and
    command will complete successfully.

This should be fixed by commit:

commit b24fedd4a043088f3d9117bea3011788d4f6531e
Author: Sujatha Sivakumar <sujatha.sivakumar@oracle.com>
Date:   Tue Nov 15 12:01:05 2016 +0530

Bug#22857926: ASSERTION `! IS_SET()' AT SQL_ERROR.CC:38 IN
    READ_ONLY MODE FOR MANY RPL CMDS.
    
    Problem:
    =======
    This bug is opened to fix assertion `! is_set()' at
    sql_error.cc:380 in "void Diagnostics_area::set_ok_status
    (ulonglong, ulonglong, const char*)" for many RPL commands.
    This issue has been fixed for couple of commands with
    Bug#22097534. But, still issue exists for few other
    commands.
    
    RPL commands affected:-
    =======================
    [PENDING] 1. FLUSH LOGS;
    [PENDING] 2. RESET SLAVE ALL;
    [PENDING] 3. SET GLOBAL gtid_purged='aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1';
    [FIXED]   4. CHANGE MASTER TO..
    [FIXED]   5. CHANGE REPLICATION FILTER
    [FIXED]   6. START SLAVE;
    [FIXED]   7. STOP SLAVE;
    
    Analysis:
    ========
    When the above set of commands are executed they will try
    to update replication specific system tables. For example
    mysql.gtid_executed
    mysql.salve_master_info
    mysql.relay_log_info
    
    This scenario is applicable to all replication specific
    internal system tables. During the fix for Bug22097534 it
    was concluded that "read only" mode should prevent changes
    to user data, but not to prevent normal Server operations or
    to prevent a DBA from managing the state of the MySQL Server
    itself.
    
    Hence the above commands should be able to update data in
    replication specific internal tables.
    
    Fix:
    ===
    There exists a flag named 'ignore_global_read_lock' whose
    intention is to ignore the read only option and allow
    commits to replication specific system tables. Reused
    this flag as part of fix. If this flag is set then the
    additional check for 'read only' option is not done.
    Hence 'read only' mode error will not be reported and
    command will complete successfully.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2017-05-24:

Should be re-verified and closed?

Revision history for this message

Juan Pablo Arruti (juanarru) wrote on 2017-05-24:

Upstream bugs #84332 (compression of mysql.gtid_executed has been prevented when super_read_only = 1) and #84437 (super-read-only does not allow FLUSH LOGS on 5.7) are not longer happening in Percona Server 5.7.18, these two bugs were related with this hung. So this bug, can't be reproduced in PS 5.7.18.

Revision history for this message

Peter Schwaller (peter-schwaller) wrote on 2017-06-01:

Fixed upstream?

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-25:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1052

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

mysql-bugs #84597 Edit

Bug watches keep track of this bug in other bug trackers.