5.7.16-10 crash with "InnoDB: Assertion failure in thread 140546057099008 in file ut0ut.cc line 917"

Bug #1653764 reported by Emily Slocombe
38
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
New
Undecided
Unassigned

Bug Description

InnoDB: ###### Diagnostic info printed to the standard error stream
2017-01-03T15:34:52.028365Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
2017-01-03 07:34:52 0x7fd36dced700 InnoDB: Assertion failure in thread 140546057099008 in file ut0ut.cc line 917
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
15:34:52 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=12582912
read_buffer_size=2097152
max_used_connections=1601
max_threads=1601
thread_count=1600
connection_count=1600
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4949964 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x80000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0xec876c]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x79e5b1]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7ff56d4ae330]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7ff56c8efc37]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7ff56c8f3028]
/usr/sbin/mysqld[0x76cfb0]
/usr/sbin/mysqld(_ZN2ib5fatalD1Ev+0xee)[0x10f15fe]
/usr/sbin/mysqld(srv_error_monitor_thread+0xe40)[0x108cb70]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7ff56d4a6184]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7ff56c9b337d]
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
2017-01-03T15:34:59.010998Z mysqld_safe Number of processes running now: 0
2017-01-03T15:34:59.012141Z mysqld_safe mysqld restarted

Revision history for this message
Emily Slocombe (eslocombe) wrote :
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

This is very non-specific - all it says that something (not clear what) was locked for more than 10 minutes. The bug report needs at least the stacktraces from all the threads and workload details.

Changed in percona-server:
status: New → Incomplete
Revision history for this message
Emily Slocombe (eslocombe) wrote :

attaching the innodb status monitor output

Changed in percona-server:
status: Incomplete → New
Revision history for this message
Przemek (pmalkowski) wrote :

Emily, how many tables does this instance have in total?

Revision history for this message
Emily Slocombe (eslocombe) wrote :

Databases: 256065
Tables: 5212706

on this instance.

Revision history for this message
mark (gruenberg) wrote :

It looks like we ran into this bug.

This happened while running a long running alter on a large partitioned compressed table. The alter was running for approximately 3 hours and while the DDL was running the server was swapping and reduced the innodb_buffer_pool_size from 10G to 9G.

Revision history for this message
mark (gruenberg) wrote :

running: mysqld 5.7.18-14

InnoDB: ###### Diagnostic info printed to the standard error stream
2017-06-17T15:30:17.098108Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
2017-06-17 11:30:17 0x7f34ecdfe700 InnoDB: Assertion failure in thread 139865289123584 in file ut0ut.cc line 917

Revision history for this message
Michael Shield (mike.shield) wrote :

running mysqld 5.7.19-17, as a new slave in a 1 Master 3 slave architecture

This is the first server to run on 5.7, others were to be upgraded, beginning today. This slave had been running error free for 4 days, when a simple insert caused a similar failure as referenced above.

2017-11-21T02:12:41.279263Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 139710996440832 has waited at row0ins.cc line 2511 for 241.00 seconds the semaphore:
X-lock on RW-latch at 0x7f1daef78f70 created in file buf0buf.cc line 1456
a writer (thread id 139709876315904) has reserved it in mode SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time read locked in file row0sel.cc line 3765

InnoDB: ###### Diagnostic info printed to the standard error stream
2017-11-21T02:24:33.293036Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
2017-11-21 02:24:33 0x7f1109c90700 InnoDB: Assertion failure in thread 139711155341056 in file ut0ut.cc line 917

key_buffer_size=67108864
read_buffer_size=4194304
max_used_connections=1
max_threads=4097
thread_count=2
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 33677646 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x3b)[0xf0362b]
/usr/sbin/mysqld(handle_fatal_signal+0x471)[0x7a1561]
/lib64/libpthread.so.0(+0xf5e0)[0x7f1de00735e0]
/lib64/libc.so.6(gsignal+0x37)[0x7f1dde17c1f7]
/lib64/libc.so.6(abort+0x148)[0x7f1dde17d8e8]
/usr/sbin/mysqld[0x76f3f6]
/usr/sbin/mysqld(_ZN2ib5fatalD1Ev+0xfd)[0x10d2a9d]
/usr/sbin/mysqld(srv_error_monitor_thread+0xadb)[0x106da5b]
/lib64/libpthread.so.0(+0x7e25)[0x7f1de006be25]
/lib64/libc.so.6(clone+0x6d)[0x7f1dde23f34d]

Given that this machine is not yet in service, I may just restart from a clean backup, but would like to know what's going on.

Revision history for this message
Michael Shield (mike.shield) wrote :

Update

There does appear to be a possibility that, despite having a 4 ssd raid10, that the DB was a victim of having too high a vm.dirty_ratio setting (defaults for Centos 7) and large memory (128 GB). Chasing the following errors through RedHat suggests that this a possible cause of the underlying 120 sec timeout.

Nov 21 02:01:01 systemd[1]: Stopping User Slice of root.
Nov 21 02:05:58 kernel: INFO: task mysqld:84452 blocked for more than 120 seconds.
Nov 21 02:05:58 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 21 02:05:58 kernel: mysqld D ffff88103aeaa080 0 84452 1 0x00000080
Nov 21 02:05:58 kernel: ffff880d8e6279a0 0000000000000082 ffff880dc4fd8fd0 ffff880d8e627fd8
Nov 21 02:05:58 kernel: ffff880d8e627fd8 ffff880d8e627fd8 ffff880dc4fd8fd0 ffff88103e9d6cc0
Nov 21 02:05:58 kernel: 0000000000000000 7fffffffffffffff 0000000000000000 ffff88103aeaa080
Nov 21 02:05:58 kernel: Call Trace:
Nov 21 02:05:58 kernel: [<ffffffff816a9589>] schedule+0x29/0x70
Nov 21 02:05:58 kernel: [<ffffffff816a7099>] schedule_timeout+0x239/0x2c0
Nov 21 02:05:58 kernel: [<ffffffffc0002df7>] ? dm_make_request+0x127/0x190 [dm_mod]
Nov 21 02:05:58 kernel: [<ffffffff812f8c05>] ? generic_make_request+0x105/0x310
Nov 21 02:05:58 kernel: [<ffffffff810e93ac>] ? ktime_get_ts64+0x4c/0xf0
Nov 21 02:05:58 kernel: [<ffffffff816a8c0d>] io_schedule_timeout+0xad/0x130
Nov 21 02:05:58 kernel: [<ffffffff816a8ca8>] io_schedule+0x18/0x20
Nov 21 02:05:58 kernel: [<ffffffff8124227d>] do_blockdev_direct_IO+0x1bdd/0x2050
Nov 21 02:05:58 kernel: [<ffffffffc02ffa00>] ? xfs_find_bdev_for_inode+0x20/0x20 [xfs]
Nov 21 02:05:58 kernel: [<ffffffff81242745>] __blockdev_direct_IO+0x55/0x60
Nov 21 02:05:58 kernel: [<ffffffffc02ffa00>] ? xfs_find_bdev_for_inode+0x20/0x20 [xfs]
Nov 21 02:05:58 kernel: [<ffffffffc02ffa40>] ? xfs_get_blocks_dax_fault+0x20/0x20 [xfs]
Nov 21 02:05:58 kernel: [<ffffffffc030b758>] xfs_file_dio_aio_write+0x188/0x390 [xfs]
Nov 21 02:05:58 kernel: [<ffffffffc02ffa00>] ? xfs_find_bdev_for_inode+0x20/0x20 [xfs]
Nov 21 02:05:58 kernel: [<ffffffffc02ffa40>] ? xfs_get_blocks_dax_fault+0x20/0x20 [xfs]
Nov 21 02:05:58 kernel: [<ffffffffc030bd22>] xfs_file_aio_write+0x102/0x1b0 [xfs]
Nov 21 02:05:58 kernel: [<ffffffff812001ed>] do_sync_write+0x8d/0xd0
Nov 21 02:05:58 kernel: [<ffffffff81200cad>] vfs_write+0xbd/0x1e0
Nov 21 02:05:58 kernel: [<ffffffff81201c72>] SyS_pwrite64+0x92/0xc0
Nov 21 02:05:58 kernel: [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b

Revision history for this message
Michael Shield (mike.shield) wrote :

Note going well.

Problem occurs between 2 and 4 days after slave creation, full restore from backup, DB upgrade to 5.7, and slave catchup.

Not file system (ocurs with XFS and ext4) or io schedular (noop and deadline), have raised a call with RH but not hopefully, as this is a Centos server.

Next step will be to not update to 5.7, but to let the slave run at 5.6 for some time, which may establish if the issue is with DB or operating system.

Mike

Revision history for this message
Michael Shield (mike.shield) wrote :

No issues after 7 days running as a 5.6 slave - would appear to be an upgrade issue.

Not sure how to proceed, idea's welcome

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-3623

Revision history for this message
Michael Shield (mike.shield) wrote : Re: [Bug 1653764] Re: 5.7.16-10 crash with "InnoDB: Assertion failure in thread 140546057099008 in file ut0ut.cc line 917"
Download full text (4.1 KiB)

Shahriyar,

Thanks for letting me know.

We replicated the setup in AWS and experienced no difficulties.

I have since rebuilt the machine, replacing the LVM based raid10 volume
with one created on the Dell Perc 730p raid controller.

We have had no issues since that time, and though we've applied various
operating system updates, I think the hardware appears to be the issue.

Getting that resolved between RH/Centos, Dell and yourselves is beyond my
resource, but may help you elsewhere.

Thanks,

Mike

*Mike Shield*
Systems & Database Lead Engineer
+44 (0) 1223421355 Office
+44 (0) 7775713864 Mobile

On 25 January 2018 at 09:04, Shahriyar Rzayev <email address hidden>
wrote:

> Percona now uses JIRA for bug reports so this bug report is migrated to:
> https://jira.percona.com/browse/PS-3623
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1653764
>
> Title:
> 5.7.16-10 crash with "InnoDB: Assertion failure in thread
> 140546057099008 in file ut0ut.cc line 917"
>
> Status in Percona Server moved to https://jira.percona.com/projects/PS:
> New
>
> Bug description:
> InnoDB: ###### Diagnostic info printed to the standard error stream
> 2017-01-03T15:34:52.028365Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has
> lasted > 600 seconds. We intentionally crash the server because it appears
> to be hung.
> 2017-01-03 07:34:52 0x7fd36dced700 InnoDB: Assertion failure in thread
> 140546057099008 in file ut0ut.cc line 917
> InnoDB: We intentionally generate a memory trap.
> InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
> InnoDB: If you get repeated assertion failures or crashes, even
> InnoDB: immediately after the mysqld startup, there may be
> InnoDB: corruption in the InnoDB tablespace. Please refer to
> InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-
> recovery.html
> InnoDB: about forcing recovery.
> 15:34:52 UTC - mysqld got signal 6 ;
> This could be because you hit a bug. It is also possible that this binary
> or one of the libraries it was linked against is corrupt, improperly
> built,
> or misconfigured. This error can also be caused by malfunctioning
> hardware.
> Attempting to collect some information that could help diagnose the
> problem.
> As this is a crash and something is definitely wrong, the information
> collection process might fail.
> Please help us make Percona Server better by reporting any
> bugs at http://bugs.percona.com/
>
> key_buffer_size=12582912
> read_buffer_size=2097152
> max_used_connections=1601
> max_threads=1601
> thread_count=1600
> connection_count=1600
> It is possible that mysqld could use up to
> key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads =
> 4949964 K bytes of memory
> Hope that's ok; if not, decrease some variables in the equation.
>
> Thread pointer: 0x0
> Attempting backtrace. You can use the following information to find out
> where mysqld died. If you see no messages after this, something went
> terribly wrong...
> stack_bottom = 0 thread_stack 0x80000
> /usr/sbin/mysqld(my_print_stacktrace+0...

Read more...

Revision history for this message
luojia (luojia) wrote :
Download full text (6.5 KiB)

when truncate table,We have the same problem in 5.7.21-21-log Percona Server (GPL), Release 21, Revision 2a37e4e.
Linux version 2.6.32-431.20.3.el6.mt20161028.x86_64 (<email address hidden>) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC) ) #1 SMP Fri Oct 28 17:50:35 CST 2016
log:
2018-07-06T20:01:18.064913+08:00 0 [ERROR] InnoDB: ibuf cursor restoration fails!. ibuf record inserted to page 3910:31517
2018-07-06T20:01:18.064980+08:00 0 [ERROR] InnoDB: Submit a detailed bug report to http://bugs.mysql.com
PHYSICAL RECORD: n_fields 9; 1-byte offsets; info bits 0
 0: len 4; hex 00000e59; asc Y;;
 1: len 1; hex 00; asc ;;
 2: len 4; hex 0019515f; asc Q_;;
 3: len 30; hex 000300010c0f000c00e00c0f001000e00c0f000800e00c0f002000e08603; asc ; (total 34 bytes);
 4: len 3; hex 303034; asc 004;;
 5: len 2; hex 5052; asc PR;;
 6: len 2; hex 3039; asc 09;;
 7: len 6; hex 2a2a2a2a2a2a; asc ******;;
 8: len 4; hex 8d78f926; asc x &;;
PHYSICAL RECORD: n_fields 9; 1-byte offsets; info bits 0
 0: len 4; hex 00000e59; asc Y;;
 1: len 1; hex 00; asc ;;
 2: len 4; hex 0019515f; asc Q_;;
 3: len 30; hex 000300010c0f000c00e00c0f001000e00c0f000800e00c0f002000e08603; asc ; (total 34 bytes);
 4: len 3; hex 303034; asc 004;;
 5: len 2; hex 5052; asc PR;;
 6: len 2; hex 3039; asc 09;;
 7: len 6; hex 2a2a2a2a2a2a; asc ******;;
 8: len 4; hex 8d78f926; asc x &;;
DATA TUPLE: 3 fields;
 0: len 4; hex 00000f46; asc F;;
 1: len 1; hex 00; asc ;;
 2: len 4; hex 00007b1d; asc { ;;
PHYSICAL RECORD: n_fields 9; 1-byte offsets; info bits 32
 0: len 4; hex 00000f46; asc F;;
 1: len 1; hex 00; asc ;;
 2: len 4; hex 00007b1d; asc { ;;
 3: len 30; hex 000400010c0f0020802d0c0f000c802d0c0f0014002d0c0f0014002d8608; asc - - - - ; (total 34 bytes);
 4: len 6; hex 523150314752; asc R1P1GR;;
 5: len 2; hex 4b4c; asc KL;;
 6: len 3; hex 415448; asc ATH;;
 7: len 3; hex 545045; asc TPE;;
 8: len 8; hex 80000000f29c7f54; asc T;;
2018-07-06T20:01:18.070471+08:00 0 [ERROR] [FATAL] InnoDB: Failed to restore ibuf position.
2018-07-06 20:01:18 0x7f828f8ca700 InnoDB: Assertion failure in thread 140198730835712 in file ut0ut.cc line 943
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
12:01:18 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=33...

Read more...

Revision history for this message
Marc O'Dell (modell18) wrote :

Had this issue on A REPLACE...SELECT...
I set innodb_flush_neighbors=0 and problem went away.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.