Percona Server moved to https://jira.percona.com/projects/PS

Percona Server crashes during stop/restart when using userstat=1 and Slave_open_temp_tables > 0

Bug #1732252 reported by Francisco Bordenave on 2017-11-14

This bug affects 1 person

	Status	Importance	Assigned to
Percona Server moved to https://jira.percona.com/projects/PS	Status tracked in 5.7
5.5	Triaged	High	Unassigned
5.6	Triaged	High	Unassigned
5.7	Triaged	High	Unassigned

Bug Description

Bug description:
Performing a clean stop/restart when userstat=1 in a multi-sourced slave causes crashes during shutdown process.

How to repeat:
1- Create a multi-sourced slave (tested 1 slave with 2 masters)
2- Enable userstat.
3- Stop/Restart instance.

This the stack reported in error log:
2017-11-14T10:51:24.747540-08:00 0 [Warning] /usr/sbin/mysqld: Forcing close of thread 6 user: 'test'

18:51:24 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=17179869184
read_buffer_size=131072
max_used_connections=4
max_threads=601
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 17016081 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0xed35ac]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x7a15e1]
/lib64/libpthread.so.0[0x3f1ae0f7e0]
/usr/sbin/mysqld(_ZN7handler25update_global_table_statsEv+0x17f)[0x80704f]
/usr/sbin/mysqld(_Z15close_temporaryP5TABLEbb+0xa6)[0xc53bc6]
/usr/sbin/mysqld(_ZN14Relay_log_info22close_temporary_tablesEv+0x3b)[0xe9e32b]
/usr/sbin/mysqld(_Z25delete_slave_info_objectsv+0x136)[0xe88a56]
/usr/sbin/mysqld(signal_hand+0x4a6)[0x793956]
/usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0x1208a24]
/lib64/libpthread.so.0[0x3f1ae07aa1]
/lib64/libc.so.6(clone+0x6d)[0x3f1aae8aad]
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.

Tested with Percona Server 5.7.15 and 5.7.18-15-log.

Masters are 5.7.18-15-log fwiw.

Tags:

Sveta Smirnova (svetasmirnova) on 2017-11-14

Changed in percona-server:
assignee:	nobody → Sveta Smirnova (svetasmirnova)

Revision history for this message

Sveta Smirnova (svetasmirnova) wrote on 2017-11-14:

Thank you for the report.

I cannot repeat described behavior with version 5.7.19. Please try with this version and if issue still repeatable provide full option files you use for 3 servers and specify if any activity is happening on them before slave restart

Changed in percona-server:
status:	New → Incomplete

Revision history for this message

Roel Van de Paar (roel11) wrote on 2017-11-15:

Please have a look at these which may be related; MS 75707, MS 75741, PS 1417382

$ ~/percona-qa/text_string.sh error.log # error log stack as shown above
ZN7handler25update_global_table_statsEv
$ grep "ZN7handler25update_global_table_statsEv" ~/percona-qa/known_bugs.strings
#ZN7handler25update_global_table_statsEv ## Fixed ## https://bugs.mysql.com/bug.php?id=75707 (duplicated by 75741) (LP PS#1417382)

Revision history for this message

Roel Van de Paar (roel11) wrote on 2017-11-15:

Bug 75707 and 75741 only mention a 5.5 fix version, no 5.6 or 5.7. However, LP 1417382 does, but it is prior to the version used above. I would first try and establish if it is the same bug.

Revision history for this message

Francisco Bordenave (francisco-bordenave) wrote on 2017-11-15:

Sveta, one thing I noticed is that Slave_open_temp_tables counter shows some temp_tables opened before the restart:
show global status like 'slave_open_temp_tables';
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| Slave_open_temp_tables | 31 |
+------------------------+-------+

I noticed this counter after issuing STOP SLAVE FOR CHANNEL that threw this output:
frpp-sql0004:ui> stop slave for channel 'revv';
Query OK, 0 rows affected, 1 warning (0.03 sec)

show warnings;
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 3022 | This operation may not be safe when the slave has temporary tables. The tables will be kept open until the server restarts or until the tables are deleted by any replicated DROP statement. Suggest to wait until slave_open_temp_tables = 0. |
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Maybe related? I can see some temp tables operations in the stack reported in crash. One test worth to try is to add some load in one of masters using SBR that can push some query with temp tables to slave (i.e. INSERT...SELECT) so that will make test closer to reality.

Makes sense?

Sveta, one thing I noticed is that Slave_open_temp_tables counter shows some temp_tables opened before the restart: 
show global status like 'slave_open_temp_tables';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| Slave_open_temp_tables | 31    |
+------------------------+-------+

I noticed this counter after issuing STOP SLAVE FOR CHANNEL that threw this output: 
frpp-sql0004:ui> stop slave for channel 'revv';
Query OK, 0 rows affected, 1 warning (0.03 sec)

show warnings;
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                                                                                                                                                        |
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 3022 | This operation may not be safe when the slave has temporary tables. The tables will be kept open until the server restarts or until the tables are deleted by any replicated DROP statement. Suggest to wait until slave_open_temp_tables = 0. |
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Makes sense?

Revision history for this message

Sveta Smirnova (svetasmirnova) wrote on 2017-11-17:

Francisco,

I tried with such a load, but was not able to repeat crash. Please try with latest version 5.7.19: bug already may be fixed.

Roel,

scenario from bug #1417382 does not crash for me even with multi-threaded slave.

Laurynas Biveinis (laurynas-biveinis) on 2017-11-24

tags:

added: userstat

Revision history for this message

Francisco Bordenave (francisco-bordenave) wrote on 2018-01-09:

Sveta, Laurynas,

I have some new info here, I could reproduce the crash in a single sourced slave (i.e. 1 master-1slave) and the stack trace looks a bit different crashing at the same point:

19:07:24 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=17179869184
read_buffer_size=131072
max_used_connections=5
max_threads=601
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 17016081 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0xed35ac]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x7a15e1]
/lib64/libpthread.so.0[0x30f480f7e0]
/usr/sbin/mysqld(_ZN7handler25update_global_table_statsEv+0x17f)[0x80704f]
/usr/sbin/mysqld(_Z15close_temporaryP5TABLEbb+0xa6)[0xc53bc6]
/usr/sbin/mysqld(_ZN14Relay_log_info22close_temporary_tablesEv+0x3b)[0xe9e32b]
/usr/sbin/mysqld(_Z25delete_slave_info_objectsv+0x136)[0xe88a56]
/usr/sbin/mysqld(signal_hand+0x4a6)[0x793956]
/usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0x1208a24]
/lib64/libpthread.so.0[0x30f4807aa1]
/lib64/libc.so.6(clone+0x6d)[0x30f44e8aad]
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.

Seems this portion is the key here:
/usr/sbin/mysqld(_ZN7handler25update_global_table_statsEv+0x17f)[0x80704f]

The failure happens during stop process and only happens with userstat=1.

Also it looks like there is some relation with userstat=1 and Slave_open_temp_tables>0 (not sure yet), tomorrow I should have root access here and probably trace the problem with gdb.

Sveta, Laurynas,

I have some new info here, I could reproduce the crash in a single sourced slave (i.e. 1 master-1slave) and the stack trace looks a bit different crashing at the same point:

key_buffer_size=17179869184
read_buffer_size=131072
max_used_connections=5
max_threads=601
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 17016081 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0xed35ac]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x7a15e1]
/lib64/libpthread.so.0[0x30f480f7e0]
/usr/sbin/mysqld(_ZN7handler25update_global_table_statsEv+0x17f)[0x80704f]
/usr/sbin/mysqld(_Z15close_temporaryP5TABLEbb+0xa6)[0xc53bc6]
/usr/sbin/mysqld(_ZN14Relay_log_info22close_temporary_tablesEv+0x3b)[0xe9e32b]
/usr/sbin/mysqld(_Z25delete_slave_info_objectsv+0x136)[0xe88a56]
/usr/sbin/mysqld(signal_hand+0x4a6)[0x793956]
/usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0x1208a24]
/lib64/libpthread.so.0[0x30f4807aa1]
/lib64/libc.so.6(clone+0x6d)[0x30f44e8aad]
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.

Seems this portion is the key here: 
/usr/sbin/mysqld(_ZN7handler25update_global_table_statsEv+0x17f)[0x80704f]

The failure happens during stop process and only happens with userstat=1.

Also it looks like there is some relation with userstat=1 and Slave_open_temp_tables>0 (not sure yet), tomorrow I should have root access here and probably trace the problem with gdb.

Revision history for this message

Sveta Smirnova (svetasmirnova) wrote on 2018-01-11:

Thank you for the update and Slack chat. Verified as described.

Updated "How to repeat" instructions.

1. Start master-slave which use SBR
2. Start replication on slave
3. On master:
create temporary table t2(f1 int) engine=innodb;
insert into t2 values(1),(2),(3);
4. Leave connection open
5. On slave:
stop slave;
set global userstat=1;
6. Shutdown slave
7. Check error log file

Revision history for this message

Sveta Smirnova (svetasmirnova) wrote on 2018-01-11:

Test case for MTR:

--source include/master-slave.inc
--source include/have_innodb.inc

set binlog_format='statement';
create temporary table t2(f1 int) engine=innodb;
insert into t2 values(1),(2),(3);

--sync_slave_with_master
stop slave;
set global userstat=1;

--let $rpl_server_number= 2
--source include/rpl_restart_server.inc

Sveta Smirnova (svetasmirnova) on 2018-01-11

summary:

- Percona Server crashes during stop/restart when using userstat=1 in
- multi-sourced slaves
+ Percona Server crashes during stop/restart when using userstat=1and
+ Slave_open_temp_tables > o

Laurynas Biveinis (laurynas-biveinis) on 2018-01-12

summary:

- Percona Server crashes during stop/restart when using userstat=1and
- Slave_open_temp_tables > o
+ Percona Server crashes during stop/restart when using userstat=1 and
+ Slave_open_temp_tables > 0

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-25:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1124

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

mysql-bugs #75707 Edit

Bug watches keep track of this bug in other bug trackers.