MariaDB 5.3.3-rc is erroneously (I think) reporting "Out of memory" and then crashing

Bug #929603 reported by Peter (Stig) Edwards
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MariaDB
Fix Committed
Undecided
Unassigned

Bug Description

Thank you for MariaDB 5.3.3-rc

MariaDB 5.3.3-rc is erroneously (I think) reporting "Out of memory" and then crashing.

From the stack trace it looks like it is creating an internal tmp table.

Sorry, I do not have a reproducer or a core dump. I do have several monitoring systems that collectively run frequently and none show any data that indicate the host (Red Hat EL 5.7 x86_64, 48G RAM, 2G swap, glibc-2.5-65, ) was low on memory (no swap used or swapping, only significant process was mysqld using vsz=28701476 rss=27949352 21s before the crash, 14s before the 2nd "Out of memory" in the error log and 30s before the 1st "Out of memory". The user running mysqld does have a max memory size and virtual memory ulimits of 48G set (I just lowered this to 44G).

One thing I noticed in the crash log is the key_buffer_size=0 , it's not, it is 25769803776 (key_buffer = 24576M), I have noticed this in a few other MariaDB bugs.

I will share a my.cnf file if I can isolate a reproducer, the my.cnf file does have max_heap_table_size = 2048MB (2147483648) in it, while tmp_table_size is the default 16777216. aria_sort_buffer_size is the default 128m, hmm I wonder if this should be at least the same as max_heap_table_size (to avoid the crash). I need to bump it up for performance anyway.

 I'll work on getting a core file to be produced if it crashes again.

120208 22:56:38 [ERROR] mysqld: Out of memory (Needed 2256587384 bytes)
120208 22:56:38 [ERROR] Out of memory; check if mysqld or some other process uses all available memory; if not, you may have to use 'ulimit' to allow mysqld to use more memory or you can add more swap space
120208 22:57:16 [ERROR] mysqld: Out of memory (Needed 2572758680 bytes)
120208 22:57:16 [ERROR] Out of memory; check if mysqld or some other process uses all available memory; if not, you may have to use 'ulimit' to allow mysqld to use more memory or you can add more swap space
120208 22:57:23 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

Server version: 5.3.3-MariaDB-rc-log
key_buffer_size=0
read_buffer_size=8388608
max_used_connections=18
max_threads=153
threads_connected=7
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2038701 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x2ab0cc9319e0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x402fc0f8 thread_stack 0x48000
./bin/mysqld(my_print_stacktrace+0x2e) [0xa295ae]
./bin/mysqld(handle_segfault+0x49f) [0x63ef0f]
/lib64/libpthread.so.0 [0x3519a0eb70]
./bin/mysqld(create_internal_tmp_table(st_table*, st_key*, st_maria_columndef*, st_maria_columndef**, unsigned long long)+0x218) [0x6aae88]
./bin/mysqld(mysql_derived_create(THD*, st_lex*, TABLE_LIST*)+0xc4) [0x7ac904]
./bin/mysqld(mysql_handle_single_derived(st_lex*, TABLE_LIST*, unsigned int)+0xb8) [0x7ad8f8]
./bin/mysqld(st_join_table::preread_init()+0xa1) [0x6ab6b1]
./bin/mysqld(sub_select(JOIN*, st_join_table*, bool)+0x496) [0x6b08a6]
./bin/mysqld [0x6bc385]
./bin/mysqld(JOIN::exec()+0x842) [0x6cf192]
./bin/mysqld(mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*)+0x185) [0x6d1225]
./bin/mysqld(mysql_multi_update(THD*, TABLE_LIST*, List<Item>*, List<Item>*, Item*, unsigned long long, enum_duplicates, bool, st_select_lex_unit*, st_select_lex*)+0x115) [0x6eb085]
./bin/mysqld(mysql_execute_command(THD*)+0xcb0) [0x64afe0]
./bin/mysqld(mysql_parse(THD*, char*, unsigned int, char const**)+0x299) [0x650b09]
./bin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0xa54) [0x6519b4]
./bin/mysqld(do_command(THD*)+0x101) [0x6524e1]
./bin/mysqld(handle_one_connection+0xfd) [0x6439ed]
/lib64/libpthread.so.0 [0x3519a0673d]
/lib64/libc.so.6(clone+0x6d) [0x3518ed44bd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x2ab0ce944d48): /* comment removed */ UPDATE p_prelim_org AS org JOIN ( SELECT encore_load, cfterm_id, GROUP_CONCAT( CONCAT("[",label,"]") ORDER BY indent ASC SEPARATOR '' ) AS full_label, MAX( indent ) + 1 AS indent /* looking for the data's indent level, not the section */ FROM p_raw_sections GROUP BY encore_load, cfterm_id ) AS slabels ON org.encore_load = slabels.encore_load AND org.cfterm_id = slabels.cfterm_id SET org.section_label = slabels.full_label, org.indent = slabels.indent
Connection ID (thread ID): 43272
Status: NOT_KILLED
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Revision history for this message
Peter (Stig) Edwards (thatsafunnyname) wrote :
Download full text (5.4 KiB)

I have a reproducer for the "Out of memory" error being logged (not a crash), it is currently not trivial.
After increasing aria_pagecache_buffer_size from 128MB to 2048MB I no longer saw the "Out of memory" error in the error log, instead the mysql client returns "Using too big key for internal temp tables". Both errors go away when I remove the line:
  group_concat_max_len=131072
from the my.cnf file. Without the line the reported global default value of 1024 is being used. I also get the error using 65536, 32768, 16384, 8192, 4096 and 2048, only with 1024 does it run error free.

The queries with the errors are using GROUP_CONCAT.

"Out of memory" query:

UPDATE
  p_prelim_org AS pre
JOIN ( SELECT
         pre.encore_load,
         pre.cfterm_id,
         GROUP_CONCAT( CONCAT("[", fin_sec.label, "]") ORDER BY fin_sec.indent ASC SEPARATOR '' ) AS sub_label,
         MAX( fin_sec.indent ) AS sub_indent
       FROM
         p_prelim_org AS pre
         JOIN
         p_prelim_org AS sub_check
         ON pre.encore_load = sub_check.encore_load AND
            pre.cfterm_id_match = sub_check.cfterm_id AND
            pre.status = "MATCHED" AND
            pre.whos_structure = "FINAL" AND
            pre.indent_match + 1 < sub_check.indent AND
            pre.table_type = "CF"
         JOIN
         p_final_sections AS fin_sec
         ON pre.encore_load = fin_sec.encore_load AND
            pre.cfterm_id_match = fin_sec.cfterm_id AND
            pre.indent_match >= fin_sec.indent
            GROUP BY pre.encore_load, pre.cfterm_id
      ) AS sub_sec
        ON pre.encore_load = sub_sec.encore_load AND
           pre.cfterm_id = sub_sec.cfterm_id
        SET pre.whos_structure = "PRELIM", pre.section_label = sub_sec.sub_label,
            pre.indent = sub_sec.sub_indent + 1, pre.source = CONCAT( 'Used FINAL struct, ', pre.source )

EXPLAIN on the inner select:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: pre
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 76
        Extra: Using where; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: sub_check
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 5
          ref: ffar_load_a.pre.encore_load,ffar_load_a.pre.cfterm_id_match
         rows: 1
        Extra: Using where
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: fin_sec
         type: ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 5
          ref: ffar_load_a.pre.encore_load,ffar_load_a.pre.cfterm_id_match
         rows: 2
        Extra: Using index condition

"Using too big key for internal temp tables" query:

UPDATE
  p_prelim_org AS org
JOIN ( SELECT
         encore_load,
         cfterm_id,
     ...

Read more...

Revision history for this message
Steffen Plotner (swplotner) wrote :
Download full text (5.0 KiB)

Hi,

I would like to comment that I have the same problem. We used to run (and now still do) MariaDB-server-5.3.2-102.el5.x86_64.rpm, I installed 3 days ago, MariaDB-server-5.3.5-113.el5.x86_64.rpm on a CentOS 5 series server. Within 1 day of running, there was a query done once a day which caused the following crash:

No swap space was every used, the system has 8GB of RAM - I doubt that the system could not allocate 64MB or 151MB as it indicates below. I am afraid that the problem exists past the 5.3.2 release. To be clear, the problem does not exist in 5.3.2 - I have been running that for the last 6 months without a single hickup - and I have been very pleased with that stability.

Steffen

120301 11:56:12 Percona XtraDB (http://www.percona.com) 1.0.17-13.0 started; log sequence number 1403936934864
120301 11:56:12 [Note] Event Scheduler: Loaded 0 events
120301 11:56:12 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.3.5-MariaDB-ga-mariadb113-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 (MariaDB - http://mariadb.com/)
120302 8:28:05 [ERROR] mysqld: Out of memory (Needed 67372056 bytes)
120302 8:28:05 [ERROR] Out of memory; check if mysqld or some other process uses all available memory; if not, you may have to use 'ulimit' to allow mysqld to use more memory or you can add more swap space
120304 8:27:17 [ERROR] mysqld: Out of memory (Needed 158575640 bytes)
120304 8:27:17 [ERROR] Out of memory; check if mysqld or some other process uses all available memory; if not, you may have to use 'ulimit' to allow mysqld to use more memory or you can add more swap space
2d2d2d d:2d:2d [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 5.3.5-MariaDB-ga-mariadb113-log
key_buffer_size=2147483648
read_buffer_size=2097152
max_used_connections=135
max_threads=251
thread_count=71
connection_count=71
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3128499 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x2aab98687540
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 514010f8 thread_stack 0x48000
/usr/sbin/mysqld(my_print_stacktrace+0x2e) [0xa0a26e]
/usr/sbin/mysqld(handle_fatal_signal+0x427) [0x741dc7]
/lib64/libpthread.so.0 [0x358540eb10]
/usr/sbin/mysqld(create_internal_tmp_table(st_table*, st_key*, st_maria_columndef*, st_maria_columndef**, unsigned long long)+0x218) [0x68a888]
/usr/sbin/mysqld(mysql_derived_create(THD*, st_lex*, TABLE_LIST*)+0xc4) [0x78d404]
/usr/sbin/mysqld(mysql_handle_single_derived(st_lex*, TABLE_LIST*, unsigned i...

Read more...

Revision history for this message
Elena Stepanova (elenst) wrote :

Hi Peter,

If the problem you described is still reproducible on 5.3.5, could you please add the remaining info (DDL for the table p_final_sections used in the first query, and also contents of the tables, if possible -- it would help a lot).

Thank you!

Hi Steffen,

Do you have any more information about the failure? Did it happen only once, or is it recurring? Were you able to identify the query that causes the problem?

Thank you,

Elena

Revision history for this message
Peter (Stig) Edwards (thatsafunnyname) wrote :

Hello Elena,

Thank you for working on the other bugs I reported recently.

I am no longer running 5.3.3-rc, currently using 5.3.5-ga.

After increasing aria_pagecache_buffer_size from 128MB to 2048MB which I needed to do for performance, and after removing the global group_concat_max_len=131072 (which I also needed to do) this problem did not repeat in 5.3.3-rc, 5.3.4-rc and 5.3.5-ga.

Sorry I never isolated a small/trivial reproducer. I do not have the DDL for the tables at the time of the problem or the contents of them, and I no longer have a reproducer.

If it occurs again I will aim to isolate a reproducer I can provide to you.

Cheers.

Revision history for this message
Elena Stepanova (elenst) wrote :

Hi Peter, Steffen,

The problem you reported looks the same as described in bug https://bugs.launchpad.net/maria/+bug/955776. The latter exists in 5.3.5 and 5.5.21, but is not reproducible on current maria/5.3 (revno 3459+) and maria/5.5 (revno 3334+). If you are using release builds and encounter the problem again before the fix is released, please try to set optimizer_switch=derived_with_keys=off.

For now, I will close this bug as a duplicate of 955776 -- although yours came earlier, the other one is reproducible on recent releases which makes it easier to chase and track. If you see a reason why it is not a duplicate, please feel free to re-open it.

Thank you.

Changed in maria:
status: New → Invalid
status: Invalid → Incomplete
Revision history for this message
Elena Stepanova (elenst) wrote :

Actually, there is no 'duplicate' in launchpad, so we will close this bug when a fix for the other one has been released.

Revision history for this message
Peter (Stig) Edwards (thatsafunnyname) wrote :

Hello Elena,

  Thank you. I think you may have meant https://bugs.launchpad.net/bugs/957409 and not https://bugs.launchpad.net/bugs/955776 in comment #5 above? I do see "Mark as duplicate" in launchpad, it "will, by default, hide it from search results listings." I agree that the problem I reported looks like https://bugs.launchpad.net/bugs/957409 and am happy for this to be marked as a duplicate.

Cheers.

Revision history for this message
Elena Stepanova (elenst) wrote :

Hi Peter,

You are right of course about the numbers, sorry -- juggling too many at once and ending up with bad copy-pastes.

The problem seems to be the same as described in bug #957409 which, in turn, is another manifestation of bug #953649, so all three should have been fixed by now.

Changed in maria:
status: Incomplete → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.