V3 Segfault/hang on drop table after executing (disallowed) update graph operation

Bug #1233113 reported by Andrew McDonnell on 2013-09-30
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OQGRAPH
Undecided
Andrew McDonnell

Bug Description

Test case:

CREATE TABLE graph_base (
    from_id INT UNSIGNED NOT NULL,
    to_id INT UNSIGNED NOT NULL,
    PRIMARY KEY (from_id,to_id),
    INDEX (to_id)
  ) ENGINE=MyISAM;
CREATE TABLE graph (
    latch VARCHAR(32) NULL,
    origid BIGINT UNSIGNED NULL,
    destid BIGINT UNSIGNED NULL,
    weight DOUBLE NULL,
    seq BIGINT UNSIGNED NULL,
    linkid BIGINT UNSIGNED NULL,
    KEY (latch, origid, destid) USING HASH,
    KEY (latch, destid, origid) USING HASH
  ) ENGINE=OQGRAPH DATA_TABLE='graph_base' ORIGID='from_id', DESTID='to_id';
INSERT INTO graph_base(from_id, to_id) VALUES (1,2), (2,1);
INSERT INTO graph_base(from_id, to_id) VALUES (3,2), (2,3);
INSERT INTO graph_base(from_id, to_id) VALUES (1,3), (3,1);
update graph set origid=123;
DROP TABLE graph_base;
DROP TABLE graph;

#0 malloc_consolidate (av=0x7ffff6d03e40) at malloc.c:5137
#1 0x00007ffff6a19518 in _int_free (av=0x7ffff6d03e40, p=0x1d47950) at malloc.c:5018
#2 0x00007ffff6a1c94c in *__GI___libc_free (mem=<optimized out>) at malloc.c:3739
#3 0x0000000000a68525 in free_root (root=0x7ffff0aafd50, MyFlags=0) at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/mysys/my_alloc.c:391
#4 0x000000000061ee02 in TABLE_SHARE::destroy (this=0x1d471b0) at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/sql/table.cc:457
#5 0x000000000067d8df in tdc_delete_share_from_hash (share=0x1d471b0) at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/sql/table_cache.cc:518
#6 0x000000000067e564 in tdc_release_share (share=0x1d471b0) at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/sql/table_cache.cc:953
#7 0x000000000067f21a in tdc_remove_table (thd=<optimized out>, remove_type=TDC_RT_REMOVE_ALL, db=0x1bca258 "test", table_name=<optimized out>, kill_delayed_threads=false) at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/sql/table_cache.cc:1108
#8 0x00000000005ff475 in mysql_rm_table_no_locks (thd=0x1d5f298, tables=<optimized out>, if_exists=<optimized out>, drop_temporary=false, drop_view=<optimized out>, dont_log_query=<optimized out>) at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/sql/sql_table.cc:2461
#9 0x00000000005ffd6e in mysql_rm_table (thd=0x1d5f298, tables=0x1bc9c90, if_exists=0 '\000', drop_temporary=0 '\000') at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/sql/sql_table.cc:2111
#10 0x000000000058c425 in mysql_execute_command (thd=0x1d5f298) at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/sql/sql_parse.cc:3583
#11 0x000000000058f83d in mysql_parse (thd=0x1d5f298, rawbuf=<optimized out>, length=16, parser_state=0x7ffff0ab2c60) at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/sql/sql_parse.cc:6264
#12 0x0000000000590d46 in dispatch_command (command=COM_QUERY, thd=0x1d5f298, packet=<optimized out>, packet_length=<optimized out>) at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/sql/sql_parse.cc:1277
#13 0x00000000006435f4 in do_handle_one_connection (thd_arg=<optimized out>) at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/sql/sql_connect.cc:1379
#14 0x0000000000643680 in handle_one_connection (arg=0x1d5f298) at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/sql/sql_connect.cc:1293
#15 0x0000000000822590 in pfs_spawn_thread (arg=<optimized out>) at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/storage/perfschema/pfs.cc:1853
#16 0x00007ffff7bc98ca in start_thread (arg=<optimized out>) at pthread_create.c:300
#17 0x00007ffff6a7592d in clone () at ..

Discovered when adding simple test case for operations disallowed by read-only

Andrew McDonnell (andymc73) wrote :

Seems to be limited to `update`, not triggered by insert/delete/truncate

Changed in oqgraph:
assignee: nobody → Andrew McDonnell (andymc73)
status: New → In Progress
Andrew McDonnell (andymc73) wrote :

our code is not in the backtrace so presumably triggered by something writing to bogus memory / not recovering, when update attempted

Andrew McDonnell (andymc73) wrote :

Not sure if relevant yet, discovered in simple `cmake ..` build (for which extant test suite still passes) without my typical CONFIGURE="-DWITH_EXTRA_CHARSETS=complex -DWITH_PLUGIN_ARIA=1-DWITH_READLINE=1 -DWITH_SSL=bundled -DWITH_MAX=1 -DWITH_EMBEDDED_SERVER=1" options

Andrew McDonnell (andymc73) wrote :

Incidentally also happens when closing mysql client immediately after the update operation.

Yields a different stack trace this time

(gdb) bt
#0 malloc_consolidate (av=0x7ffff6d03e40) at malloc.c:5137
#1 0x00007ffff6a19518 in _int_free (av=0x7ffff6d03e40, p=0x24a1ee0) at malloc.c:5018
#2 0x00007ffff6a1c94c in *__GI___libc_free (mem=<optimized out>) at malloc.c:3739
#3 0x0000000000a68525 in free_root (root=0x246fa78, MyFlags=0) at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/mysys/my_alloc.c:391
#4 0x000000000056632e in free_memory (this=<optimized out>) at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/sql/sql_error.h:729
#5 THD::~THD (this=0x246ab78, __in_chrg=<optimized out>) at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/sql/sql_class.cc:1578
#6 0x0000000000515225 in unlink_thd (thd=0x246ab78) at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/sql/mysqld.cc:2672
#7 0x0000000000519c50 in one_thread_per_connection_end (thd=0x7ffff6d03e40, put_in_cache=false) at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/sql/mysqld.cc:2783
#8 0x000000000064351a in do_handle_one_connection (thd_arg=<optimized out>) at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/sql/sql_connect.cc:1390
#9 0x0000000000643680 in handle_one_connection (arg=0x246ab78) at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/sql/sql_connect.cc:1293
#10 0x0000000000822590 in pfs_spawn_thread (arg=<optimized out>) at /home/andrew/develop/maria/repo/10.0-oqgraph3-varchar/storage/perfschema/pfs.cc:1853
#11 0x00007ffff7bc98ca in start_thread (arg=<optimized out>) at pthread_create.c:300
#12 0x00007ffff6a7592d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#13 0x0000000000000000 in ?? ()

Andrew McDonnell (andymc73) wrote :
Download full text (4.3 KiB)

Valgrind reports multiple errors around the path via ha_oqgraph::close()

Invalid read of size 4
==28028== at 0x272EE579: oqgraph3::cursor_ptr::~cursor_ptr() (oqgraph_thunk.h:132)
==28028== by 0x272EE77A: open_query::edges_cursor::~edges_cursor() (oqgraph_thunk.h:68)
==28028== by 0x272EA0E6: open_query::oqgraph::release_cursor() (graphcore.cc:706)
==28028== by 0x272E7243: ha_oqgraph::external_lock(THD*, int) (ha_oqgraph.cc:1129)
==28028== by 0x6C4A11: handler::ha_external_lock(THD*, int) (handler.cc:5740)
==28028== by 0x7728A0: unlock_external(THD*, TABLE**, unsigned int) (lock.cc:671)
==28028== by 0x772A27: mysql_unlock_tables(THD*, st_mysql_lock*, bool) (lock.cc:382)
==28028== by 0x5519F7: close_thread_tables(THD*) (sql_base.cc:960)
==28028== by 0x5894E1: mysql_execute_command(THD*) (sql_parse.cc:5023)
==28028== by 0x58F83C: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6264)
==28028== by 0x590D45: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1277)
==28028== by 0x6435F3: do_handle_one_connection(THD*) (sql_connect.cc:1379)
==28028== Address 0x1c1f6370 is 0 bytes inside a block of size 80 free'd
==28028== at 0x4C23E0F: operator delete(void*) (vg_replace_malloc.c:387)
==28028== by 0x272EA0CC: open_query::oqgraph::release_cursor() (graphcore.cc:703)
==28028== by 0x272E7243: ha_oqgraph::external_lock(THD*, int) (ha_oqgraph.cc:1129)
==28028== by 0x6C4A11: handler::ha_external_lock(THD*, int) (handler.cc:5740)
==28028== by 0x7728A0: unlock_external(THD*, TABLE**, unsigned int) (lock.cc:671)
==28028== by 0x772A27: mysql_unlock_tables(THD*, st_mysql_lock*, bool) (lock.cc:382)
==28028== by 0x5519F7: close_thread_tables(THD*) (sql_base.cc:960)
==28028== by 0x5894E1: mysql_execute_command(THD*) (sql_parse.cc:5023)
==28028== by 0x58F83C: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6264)
==28028== by 0x590D45: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1277)
==28028== by 0x6435F3: do_handle_one_connection(THD*) (sql_connect.cc:1379)
==28028== by 0x64367F: handle_one_connection (sql_connect.cc:1293)
==28028==
==28028== Invalid write of size 4
==28028== at 0x272EE580: oqgraph3::cursor_ptr::~cursor_ptr() (oqgraph_thunk.h:132)
==28028== by 0x272EE77A: open_query::edges_cursor::~edges_cursor() (oqgraph_thunk.h:68)
==28028== by 0x272EA0E6: open_query::oqgraph::release_cursor() (graphcore.cc:706)
==28028== by 0x272E7243: ha_oqgraph::external_lock(THD*, int) (ha_oqgraph.cc:1129)
==28028== by 0x6C4A11: handler::ha_external_lock(THD*, int) (handler.cc:5740)
==28028== by 0x7728A0: unlock_external(THD*, TABLE**, unsigned int) (lock.cc:671)
==28028== by 0x772A27: mysql_unlock_tables(THD*, st_mysql_lock*, bool) (lock.cc:382)
==28028== by 0x5519F7: close_thread_tables(THD*) (sql_base.cc:960)
==28028== by 0x5894E1: mysql_execute_command(THD*) (sql_parse.cc:5023)
==28028== by 0x58F83C: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6264)
==28028== by 0x590D45: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_...

Read more...

Andrew McDonnell (andymc73) wrote :

Interestingly it looks like int oqgraph::random(bool scan) gets called in spite of our readonliness
and also rnd_next even though we dont do a query!

So it seems that

update graph set x=y;

will actually run a query before it works out we are read-only...

also for some reason two calls to fetch_row which increment the cursor ref count twice...

Andrew McDonnell (andymc73) wrote :

Ultimately seems to be caused by not releasing _rnd_cursor in release_cursor() method, and the this->cursor as well, before freeing the share object

Andrew McDonnell (andymc73) wrote :

That fix though now regressed bug 1133093 hrrm

Hi Andrew

On 30/09/13 23:55, Andrew McDonnell wrote:
> Interestingly it looks like int oqgraph::random(bool scan) gets called in spite of our readonliness
> and also rnd_next even though we dont do a query!
>
> So it seems that
>
> update graph set x=y;
>
> will actually run a query before it works out we are read-only...
>
> also for some reason two calls to fetch_row which increment the cursor
> ref count twice...

Did you see what the debug log trail is - it might also be the optimiser
trying to figure something out.

An UPDATE will definitely always do a read before write.

Regards,
Arjen.
--
Arjen Lentz, Exec.Director @ Open Query (http://openquery.com)
Australian peace of mind for your MySQL/MariaDB infrastructure.

Follow us at http://openquery.com/blog/ & http://twitter.com/openquery

Arjen Lentz (arjen-lentz) wrote :
Download full text (4.9 KiB)

Hi Andrew

On 30/09/13 21:56, Andrew McDonnell wrote:
> Valgrind reports multiple errors around the path via ha_oqgraph::close()

This might be something to run by Sergei Golubchik or another MariaDB
whiz, if you reckon the server core might be doing the wrong thing.
Because our engine operates differently, it might catch bugs that other
engines don't.

> Invalid read of size 4
> ==28028== at 0x272EE579: oqgraph3::cursor_ptr::~cursor_ptr() (oqgraph_thunk.h:132)
> ==28028== by 0x272EE77A: open_query::edges_cursor::~edges_cursor() (oqgraph_thunk.h:68)
> ==28028== by 0x272EA0E6: open_query::oqgraph::release_cursor() (graphcore.cc:706)
> ==28028== by 0x272E7243: ha_oqgraph::external_lock(THD*, int) (ha_oqgraph.cc:1129)
> ==28028== by 0x6C4A11: handler::ha_external_lock(THD*, int) (handler.cc:5740)
> ==28028== by 0x7728A0: unlock_external(THD*, TABLE**, unsigned int) (lock.cc:671)
> ==28028== by 0x772A27: mysql_unlock_tables(THD*, st_mysql_lock*, bool) (lock.cc:382)
> ==28028== by 0x5519F7: close_thread_tables(THD*) (sql_base.cc:960)
> ==28028== by 0x5894E1: mysql_execute_command(THD*) (sql_parse.cc:5023)
> ==28028== by 0x58F83C: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6264)
> ==28028== by 0x590D45: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1277)
> ==28028== by 0x6435F3: do_handle_one_connection(THD*) (sql_connect.cc:1379)
> ==28028== Address 0x1c1f6370 is 0 bytes inside a block of size 80 free'd
> ==28028== at 0x4C23E0F: operator delete(void*) (vg_replace_malloc.c:387)
> ==28028== by 0x272EA0CC: open_query::oqgraph::release_cursor() (graphcore.cc:703)
> ==28028== by 0x272E7243: ha_oqgraph::external_lock(THD*, int) (ha_oqgraph.cc:1129)
> ==28028== by 0x6C4A11: handler::ha_external_lock(THD*, int) (handler.cc:5740)
> ==28028== by 0x7728A0: unlock_external(THD*, TABLE**, unsigned int) (lock.cc:671)
> ==28028== by 0x772A27: mysql_unlock_tables(THD*, st_mysql_lock*, bool) (lock.cc:382)
> ==28028== by 0x5519F7: close_thread_tables(THD*) (sql_base.cc:960)
> ==28028== by 0x5894E1: mysql_execute_command(THD*) (sql_parse.cc:5023)
> ==28028== by 0x58F83C: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6264)
> ==28028== by 0x590D45: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1277)
> ==28028== by 0x6435F3: do_handle_one_connection(THD*) (sql_connect.cc:1379)
> ==28028== by 0x64367F: handle_one_connection (sql_connect.cc:1293)
> ==28028==
> ==28028== Invalid write of size 4
> ==28028== at 0x272EE580: oqgraph3::cursor_ptr::~cursor_ptr() (oqgraph_thunk.h:132)
> ==28028== by 0x272EE77A: open_query::edges_cursor::~edges_cursor() (oqgraph_thunk.h:68)
> ==28028== by 0x272EA0E6: open_query::oqgraph::release_cursor() (graphcore.cc:706)
> ==28028== by 0x272E7243: ha_oqgraph::external_lock(THD*, int) (ha_oqgraph.cc:1129)
> ==28028== by 0x6C4A11: handler::ha_external_lock(THD*, int) (handler.cc:5740)
> ==28028== by 0x7728A0: unlock_external(THD*, TABLE**, unsigned int) (lock.cc:671)
> ==28028== by 0x772A27: mysql_unlock_tables(THD*, st_mysql_lock*, bool) (...

Read more...

Andrew McDonnell (andymc73) wrote :

The valgrind is in our code.

I am suspicious It is looking like something to do with the boost intrusive pointer, where I am finding the reference counter is not all the way back to zero when the storage engine is closed

Aside, having multiple cursor type objects gets very confusing with debugging after a while

i.e. open_query::oqgraph_cursor vs. oqgraph3::cursor classes

once we get merged into 10.0 I think I will look at refactoring some of that with a different naming convention for the 'cursor' objects that walk the graph rather than map onto the database - maybe 'walker' ??

Andrew McDonnell (andymc73) wrote :

I managed to fix it, but I have so much debug in there now I cant tell what actually did it, so I am going to have to add things in/out one at a time. Too late to do that now...

It is likely to be a matter of properly 'releasing all the things' when the table gets unlocked, and in the correct order

Andrew McDonnell (andymc73) wrote :

This bug depends on how maria is built:

(a) Doesn't manifest:

cmake -DWITH_EXTRA_CHARSETS=complex -DWITH_PLUGIN_ARIA=1 -DWITH_READLINE=1 -DWITH_SSL=bundled -DWITH_MAX=1 -DWITH_EMBEDDED_SERVER=1

(b)

Manifests as a hang on drop table after the `update graph set` statement when drop table is next statement
Manifests as a hang on `select * from graph ... order by` statement when that statement is next statement

cmake -DCMAKE_BUILD_TYPE=Debug

(c) same conditions as (a),

but when MTR run with --gdb, results in first-mentioned stack trace downstream of ha_oqgraph::close() in closefrm (b)-1, and outside of our code downstream of mysql_execute_command()

Changed in oqgraph:
status: In Progress → Fix Committed
Andrew McDonnell (andymc73) wrote :

OK, fixed now.

Arjen Lentz (arjen-lentz) wrote :

Hi Andrew

On 02/10/13 21:20, Andrew McDonnell wrote:
> Aside, having multiple cursor type objects gets very confusing with
> debugging after a while
>
> i.e. open_query::oqgraph_cursor vs. oqgraph3::cursor classes
>
> once we get merged into 10.0 I think I will look at refactoring some of
> that with a different naming convention for the 'cursor' objects that
> walk the graph rather than map onto the database - maybe 'walker' ??

Agreed.
The term 'cursor' is already used in SQL context anyway, so best to use
different terms for different code in the server, to avoid confusion.

Regards,
Arjen.
--
Arjen Lentz, Exec.Director @ Open Query (http://openquery.com.au)
Australian peace of mind for your MySQL/MariaDB infrastructure.

Follow us http://openquery.com.au/blog/ & http://twitter.com/openquery

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers