Server crashed without specific reason [test_quick_select]

Bug #1102392 reported by Steffen Boehme
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Incomplete
Undecided
Unassigned
5.1
Won't Fix
Undecided
Unassigned
5.5
Expired
High
Unassigned
5.6
Incomplete
Undecided
Unassigned
Ubuntu
Invalid
Undecided
Unassigned

Bug Description

Today one of our db slaves crashed after a uptime of > 90 days.
The below stack trace was in the db error log.
I will also append some more information gathered from the server itself (status, variables) and form external tools.

Server-Log:

12:19:24 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=4294967296
read_buffer_size=131072
max_used_connections=1749
max_threads=3072
thread_count=33
connection_count=33
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 5018392 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7faf701912c0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7faf46724e58 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x7d0b75]
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x69d004]
/lib/libpthread.so.0(+0xf8f0)[0x7fb8bc72e8f0]
/lib/libc.so.6(+0x7ac85)[0x7fb8bb94cc85]
/lib/libc.so.6(cfree+0x73)[0x7fb8bb9500d3]
/usr/sbin/mysqld(free_root+0xe9)[0x7c6049]
/usr/sbin/mysqld(_ZN10SQL_SELECT17test_quick_selectEP3THD6BitmapILj64EEyyb+0x599)[0x76a479]
/usr/sbin/mysqld(_ZN4JOIN8optimizeEv+0x3967)[0x5d1e77]
/usr/sbin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0xd1)[0x5d2691]
/usr/sbin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x1cd)[0x5d851d]
/usr/sbin/mysqld[0x59401a]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x31ca)[0x59a78a]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x333)[0x59c313]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x15df)[0x59d99f]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0xdf)[0x63c6ef]
/usr/sbin/mysqld(handle_one_connection+0x51)[0x63c821]
/lib/libpthread.so.0(+0x69ca)[0x7fb8bc7259ca]
/lib/libc.so.6(clone+0x6d)[0x7fb8bb9b8cdd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fafb5d1a480): is an invalid pointer
Connection ID (thread ID): 439598602
Status: NOT_KILLED

You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.

Revision history for this message
Steffen Boehme (boemm) wrote :
Revision history for this message
Steffen Boehme (boemm) wrote :
Revision history for this message
Steffen Boehme (boemm) wrote :
Revision history for this message
Steffen Boehme (boemm) wrote :
Revision history for this message
Steffen Boehme (boemm) wrote :
Revision history for this message
Steffen Boehme (boemm) wrote :

All those attached files were created at 12:00 UTC (19 minutes before the server crashed).

Finally we have some extracts from the current ps and top output at 12:00 UTC:

> cat ps.dbslave03
40981104 39682272 /usr/sbin/mysqld --defaults-file=/local/mysql/dbslave03/my.cnf --basedir=/usr --datadir=/local/mysql/dbslave03/data --plugin-dir=/usr/lib/mysql/plugin --user=mysql --skip-name-resolve --log-error=/local/mysql/dbslave03/data/dbslave03.intern.err --open-files-limit=7680 --pid-file=/local/mysql/dbslave03/mysqld_net.pid --socket=/local/mysql/dbslave03/mysql_net.sock --port=3306
  4096 680 /bin/sh /usr/bin/mysqld_safe --defaults-file=/local/mysql/dbslave03/my.cnf --datadir=/local/mysql/dbslave03/data --pid-file=/local/mysql/dbslave03/mysqld_net.pid --log-error=/local/mysql/dbslave03/data/dbslave03.intern.err --skip-name-resolve

> cat top.dbslave03
top - 12:01:36 up 195 days, 1:45, 0 users, load average: 5.10, 4.54, 4.50
Tasks: 650 total, 1 running, 649 sleeping, 0 stopped, 0 zombie
Cpu(s): 6.6%us, 1.4%sy, 0.0%ni, 88.5%id, 3.1%wa, 0.0%hi, 0.4%si, 0.0%st
Mem: 66095704k total, 63557080k used, 2538624k free, 728048k buffers
Swap: 0k total, 0k used, 0k free, 9373856k cached

  PID USER PR NI VIRT RES SWAP SHR S %CPU %MEM TIME+ COMMAND
20933 mysql 20 0 39.1g 37g 1.2g 5636 S 0 60.0 35780:42 /usr/sbin/mysql

I hope this information is useful for you.

Revision history for this message
Steffen Boehme (boemm) wrote :

Some more info about the OS:

> cat /etc/issue
Ubuntu 10.04.4 LTS \n \l

> uname -s
Linux
root@dbslave03:~# uname -a
Linux dbslave03 2.6.32-41-server #91-Ubuntu SMP Wed Jun 13 11:58:56 UTC 2012 x86_64 GNU/Linux

> ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 20
file size (blocks, -f) unlimited
pending signals (-i) 16382
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

information type: Public → Private Security
information type: Private Security → Public
Revision history for this message
Roel Van de Paar (roel11) wrote :

Hi Steffen, thanks for the detailed info. Could you please setup your server for core file generation (see http://dev.mysql.com/doc/refman/5.5/en/server-options.html#option_mysqld_core-file). You may also need to set OS settings to allow [large] core dump files to be generated. Once you have a core dump, please execute:

gdb /path_to_mysqld/bin/mysqld ./core.pid
[... wait for gdb prompt ...]
set trace-commands on
set logging on
set pagination off
set print pretty on
set print array on
set print array-indexes on
set print elements 4096
thread apply all bt

Please then sent us the resulting gdb.txt

Revision history for this message
Roel Van de Paar (roel11) wrote :

Hi Steffen, also, do you have some sort of log of what queries are executed (considering a testcase)?

Changed in percona-server:
importance: Undecided → High
summary: - Server crashed without specific reason
+ Server crashed without specific reason [test_quick_select]
Revision history for this message
Steffen Boehme (boemm) wrote :

Hi,

thanks for you comment.
I can enable the core dump option but I'm a bit confused about the size of the core dump file.
Will this option dump the whole process memory?
In our case this would be > 30GB including all puffers and so on ...
Another doubt is about the time the server needed to crash.
As said above, the server runs well for more then 90 days.
So if I enable the option it may take a long time until I get the next crash and with it the core dump file (if ever).

This leads me to the second commend:
We do not have full logs of queries running against the servers.
We did not run any test cases but the server was in full live work (unchanged since 90 days).
So I cannot provide queries executed on this instance.

Revision history for this message
Fabio Marconi (fabiomarconi) wrote :

closing ubuntu tag
---
Ubuntu Bug Squad volunteer triager
http://wiki.ubuntu.com/BugSquad

Changed in ubuntu:
status: New → Invalid
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Percona Server 5.5 because there has been no activity for 60 days.]

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Percona Server because there has been no activity for 60 days.]

Changed in percona-server:
status: Incomplete → Expired
Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :

Is this still repeatable with recent versions? if it is, please, send information requested by Roel in previous comments.

Revision history for this message
Chris Calender (chriscalender) wrote :
Download full text (101.2 KiB)

Just saw this in Percona Server 5.1.42, just after corruption was detected. Perhaps it will help for those cases where corruption is not involved (or perhaps others need to check for corruption):

InnoDB: Page directory corruption: infimum not pointed to
131217 7:35:32 InnoDB: Page dump in ascii and hex (16384 bytes):
 len 16384; hex 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000...

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-2880

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.