percona-projects-qa

Second valgrind warning /crash in hp_process_field_data_to_chunkset with an out-of-memory situation

Bug #790828 reported by Philip Stoev on 2011-05-31

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	percona-projects-qa	New	Low	Alexey Kopytov	percona-projects-qa 5.5.13-eb "eb"

Bug Description

When executing a RQG stress test under valgrind, memory consumption grew suddenly (most likely due to trying to insert too ma ny 2MB blobs in a table) and the following was produced in the server error log file:

110531 17:12:08 [ERROR] /home/philips/bzr/mysql-55-eb/sql/mysqld: Out of memory (Needed 129872 bytes)
==16380== Thread 19:
==16380== Invalid write of size 1
==16380== at 0x4007634: memcpy (mc_replace_strmem.c:497)
==16380== by 0x8617123: hp_process_field_data_to_chunkset (hp_record.c:173)
==16380== by 0x861733D: hp_process_record_data_to_chunkset (hp_record.c:276)
==16380== by 0x86173C4: hp_copy_record_data_to_chunkset (hp_record.c:306)
==16380== by 0x8618172: heap_update (hp_update.c:66)
==16380== by 0x860FEB8: ha_heap::update_row(unsigned char const*, unsigned char*) (ha_heap.cc:265)
==16380== by 0x835A24A: handler::ha_update_row(unsigned char const*, unsigned char*) (handler.cc:4806)
==16380== by 0x8293F8F: mysql_update(THD*, TABLE_LIST*, List<Item>&, List<Item>&, Item*, unsigned int, st_order*, unsigned long long, enum_duplicates, bool
, unsigned long long*, unsigned long long*) (sql_update.cc:713)
==16380== by 0x8204368: mysql_execute_command(THD*) (sql_parse.cc:2662)
==16380== by 0x820C025: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:5503)
==16380== by 0x82006ED: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1034)
==16380== by 0x81FFBDB: do_command(THD*) (sql_parse.cc:771)
==16380== by 0x82D03B8: do_handle_one_connection(THD*) (sql_connect.cc:776)
==16380== by 0x82D007B: handle_one_connection (sql_connect.cc:724)
==16380== by 0x821918: start_thread (in /lib/libpthread-2.12.1.so)
==16380== by 0x76ACCD: clone (in /lib/libc-2.12.1.so)
==16380== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==16380==

I interpret this to mean that a certain memory operation could not be completed, returned 0 and this 0 was subsequently used by the heap storage engine. A cursory code inspection showed that most of the return value of most memory management calls is checked, but not for all.

I can provide a test case for this bug, however a code inspection may be the best way to fix this situation.

The core and the binary are available if needed both locally and remotely -- compressed size is 2gb.

See original description

Related branches

lp:~percona-dev/percona-server/mysql-55-eb

Revision history for this message

Philip Stoev (pstoev-askmonty) wrote on 2011-05-31:

mysql bzr version-info
revision-id: <email address hidden>
date: 2011-05-31 11:33:25 +0300
build-date: 2011-05-31 21:44:55 +0300
revno: 3483
branch-nick: mysql-55-eb

RQG bzr version-info
revision-id: <email address hidden>
date: 2011-05-31 14:18:45 +0200
build-date: 2011-05-31 21:45:08 +0300
revno: 809
branch-nick: randgen-heap

RQG command line:

perl runall.pl --queries=100000000 --validator=None --queries=100M --mysqld=--log-output=file --seed=time --mysqld=--max_heap_table_size=3Gb --threads=2 --grammar=conf/engines/heap/heap_ddl_multi.yy --basedir1=/home/philips/bzr/mysql-55-eb --valgrind --duration=21600

description:

updated

Philip Stoev (pstoev-askmonty) on 2011-06-01

description:	updated
description:	updated

Laurynas Biveinis (laurynas-biveinis) on 2011-06-01

Changed in percona-projects-qa:
milestone:	none → 5.5.13-eb

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2011-06-01:

Code inspection has not revealed any code paths that might lead to NULL pointer dereference. Manual tests with inserting BLOBs and emulating OOM in a debugger shows correct behavior: the "out of memory" error is returned to the client.

I'm now running the randgen test with the reported command line. The test has been running for ~2 hours so far with no errors.

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2011-06-01:

Setting to low importance, since it doesn't look like a showstopper to me.

The bug seems to be valid, even though it's not yet clear what exactly leads to that state. There are lots of OOM bugs in the server, so a loaded server will likely fail under an OOM condition anyway, if not in HEAP then elsewhere.

The workaround is to set max_heap_table_size appropriately.

Changed in percona-projects-qa:
assignee:	nobody → Alexey Kopytov (akopytov)
importance:	Undecided → Low

Revision history for this message

Philip Stoev (pstoev-askmonty) wrote on 2011-06-01:

I do agree that there are other places in the server where OOM is not handled properly.

This particular test shows a failure when an UPDATE statement tries to update all records in a table to the largest blob from randgen's data directory. So, if there are any unguarded paths, they should be in update and not in insert.

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2011-06-02:

The randgen test has completed in 7 hours with no errors.

Revision history for this message

Philip Stoev (pstoev-askmonty) wrote on 2011-06-02:

Ok, let me know if you ever want to work on this bug and I will provide a resource-constrained VM where the problem is reproducible in 2 hours or so. I will not be doing any further testing that causes OOM.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.