crashes (sig 11) with 5.3.7-MariaDB union query

Bug #1020645 reported by Peter (Stig) Edwards on 2012-07-03
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MariaDB
New
Undecided
Unassigned

Bug Description

Hello and thanks for mariadb-5.3.7-Linux-x86_64.tar.gz,

  Running on 2.6.18-274.7.1.el5 x86_64 x86_64 x86_64 GNU/Linux RedHat EL.

  We recently upgraded some mysqld instances in a pool from MySQL 5.1.X to mariadb 5.3.7, we have had two crashes (on different instances on different hosts) with the same backtrace and very similar queries, the pool is a production pool so the priority has been restoration of service and a rollback, the query from the first backtrace did not cause a crash when run in isolation on our development and staging instances (I have not tested the 2nd yet), the development and staging instances do not have identical configurations, data or queries running, but are the same version (MariaDB 5.3.7) on the same OS on the same architecture. Oh, also both mariadb instances ran for several days days in production before crashing.

 Here are the contents of the error log for the first crash (the backtrace is the same for the second crash and the query is very similar). I have removed the actual query reported and have included a representation of the query, I can send the actual query and table definitions privately.

  I am wondering if (and hoping that) the backtrace looks familiar.

120627 5:20:14 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 5.3.7-MariaDB-log
key_buffer_size=33554432
read_buffer_size=2097152
max_used_connections=319
max_threads=3001
thread_count=29
connection_count=29
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 18510029 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x2ab7fc3499b0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x512fc0f8 thread_stack 0x48000
./bin/mysqld(my_print_stacktrace+0x2e) [0xa2b62e]
./bin/mysqld(handle_fatal_signal+0x3f9) [0x7627f9]
/lib64/libpthread.so.0 [0x341f00eb70]
./bin/mysqld [0x6be068]
./bin/mysqld(JOIN::exec()+0x852) [0x6cfc52]
./bin/mysqld(st_select_lex_unit::exec()+0x184) [0x7ab904]
./bin/mysqld(mysql_union(THD*, st_lex*, select_result*, st_select_lex_unit*, unsigned long)+0x2e) [0x7add0e]
./bin/mysqld(handle_select(THD*, st_lex*, select_result*, unsigned long)+0x82) [0x6d2632]
./bin/mysqld [0x647e7e]
./bin/mysqld(mysql_execute_command(THD*)+0x3a58) [0x64da78]
./bin/mysqld(mysql_parse(THD*, char*, unsigned int, char const**)+0x299) [0x650859]
./bin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0xa9b) [0x65174b]
./bin/mysqld(do_command(THD*)+0x101) [0x6522a1]
./bin/mysqld(handle_one_connection+0xfd) [0x6436ad]
/lib64/libpthread.so.0 [0x341f00673d]
/lib64/libc.so.6(clone+0x6d) [0x341e4d44bd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x2ab7deb1abf8):
(select many fields from a few tables with joins and inner joins group by sort by limit 5000)
UNION ALL
(select many fields from a few tables with joins and inner joins group by sort by limit 5000)
order by limit 5000
Connection ID (thread ID): 30483258
Status: KILL_CONNECTION
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

Thank you.

Elena Stepanova (elenst) wrote :

Hi Peter,

According to the log, the connection is in KILL_CONNECTION status -- were you trying to kill it because it was hung?
If so, it somewhat reminds the bug #998516 (don't be fooled by the 'released' status, it was fixed after 5.3.7).
Otherwise, there is not much of a stack trace, but I will see what else we had with UNION recently.

Not trying to kill the connection. I don't think there is anything in place doing that but I shall check our logs. The other crash also had KILL_CONNECTION status.
Thanks.

I don't see any queries killing other queries, or connections at around the time or in the binlog that look like they might kill queries. We have a scheduled rollback of the remaining mariadb pool member this week, and will probably try again with 5.3.8 when it is out and internally QA'ed. If the remaining 5.3.7 instance crashes before we roll it back then I'll aim to at least try running the query from it's backtrace before the rollback, otherwise we will try (again) to reproduce in development and staging environments. Thanks.

Elena Stepanova (elenst) wrote :

Please do upload the query, table structures and my.cnf to the private FTP. If you can provide data dumps, it might help too, otherwise please at least give us the idea of how many rows the tables contain.
Thanks.

I have uploaded the query, the table structures, row counts and the my.cnf to the private FTP, the filename has this bug number in it. I can not provide a data dump right now, but may be able to when we do the rollback, it would be about 2GB of data. Thanks.

Elena Stepanova (elenst) wrote :

Hi Peter,

You mentioned earlier that you tried the query on your dev/staging instances; but the query that comes with the backtrace that you provided is corrupted (all group by and a part of the order by in the first union part is gone, as well as the end of the query). Do you have the full version?

Sorry, resent with the original full version. Thanks.

I was able to take dumps of the tables (changes are frequent so the data is not the same as at the point of the crash) and import them into a development instance, the development instance has the same my.cnf apart from the port and the innodb_buffer_pool_size (smaller). This was with the latest crash query and it did not crash the development instance. Some of the data is sensitive/private and I am unable to send it to you without first changing much of it, so until I have a reproducer I will hold off on changing the data so that it can be sent.
Thanks for looking.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers