Erroneous sql dump crashes drizzled

Bug #307317 reported by Andrew Hutchings
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Drizzle
Fix Released
Critical
Eric Day

Bug Description

I'm using up to date drizzle trunk (2008-12-12) on Fedora 10 x86_64.

The attached PHP script will generate a basic dump file. This dump file is missing semi-colons off the INSERTs so it is expected to generate some kind of error. When inserting with "drizzle < drizzle.sql" I instead get the following error:

ERROR 2006 (HY000) at line 100000: Drizzle server has gone away

and the following crash in drizzled:

drizzled: sql_parse.cc:386: bool do_command(Session*): Assertion `session->is_error()' failed.
081212 0:00:40 - drizzled got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
 or one of the libraries it was linked against is corrupt, improperly built,
 or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8388600
read_buffer_size=131072
max_used_connections=2
max_threads=8
thread_count=1
connection_count=1
It is possible that drizzled could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 26776 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

session: 0x1b20520
Attempting backtrace. You can use the following information to find out
where drizzled died. If you see no messages after this, something went
terribly wrong...
../drizzle/sbin/drizzled [0x4fd84e]
../drizzle/sbin/drizzled [0x421727]
/lib64/libpthread.so.0 [0x311a80f0f0]
/lib64/libc.so.6(gsignal+0x35) [0x3119c32ed5]
/lib64/libc.so.6(abort+0x183) [0x3119c34a43]
/lib64/libc.so.6(__assert_fail+0xe9) [0x3119c2bec9]
../drizzle/sbin/drizzled [0x4bfef4]
../drizzle/sbin/drizzled [0x477320]
/lib64/libpthread.so.0 [0x311a8073da]
/lib64/libc.so.6(clone+0x6d) [0x3119ce627d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
session->query at (nil) is invalid pointer
session->thread_id=1
session->killed=NOT_KILLED

Revision history for this message
Andrew Hutchings (linuxjedi) wrote :
Revision history for this message
Jay Pipes (jaypipes) wrote :

This bug has continually been a pain in my ass as well, while developing the test automation framework. I traced the server with GDB and the assert() occurs after net_real_read() happens and returns a packet_length == ~(uint32_t) 0, which happens to be the definition of packet_error.

Trying to verify what was going on, I traced the error further and ended up in the macros defined in /drizzled/korr.h, which left me scratching my head, since drizzled wasn't compiled with debug symbols and thus the macros were optimized away and I couldn't get any information out of GDB.

I've recompiled drizzled with debug symbols but haven't had time to do another trace. Would be very helpful to get some input from others on this. At some point, we've changed the protocol or warning/error handling and the changes have borked the communication between server and client.

Revision history for this message
Jay Pipes (jaypipes) wrote :

Currently, I cannot confirm the behaviour via the drizzle command line client (though for me it appears in other places). Here is a python script to generate the SQL which caused the original failure:

import random

f= open("drizzle.sql","w")
f.write("USE test;\n")
f.write("""CREATE TABLE `testing` (
  `id` int,
  `rand` int,
  PRIMARY KEY (id));\n""")

for i in range (1,10000):
  f.write("INSERT INTO testing SET id=%d, rand=%d\n" % (i, random.randrange(1,10000)))
f.close()

Unfortunately, I cannot reproduce the bug:

jpipes@serialcoder:~/repos/drizzle/new-test-runner/tests$ ../client/drizzle --port=9306 --user=root < drizzle.sql
ERROR 1064 (42000) at line 10005: You have an error in your SQL syntax; check the manual that corresponds to your Drizzle server version for the right syntax to use near 'INSERT INTO testing SET id=2, rand=5715
INSERT INTO testing SET id=3, rand=3418
' at line 2

Revision history for this message
Andrew Hutchings (linuxjedi) wrote :

Yep, I get that with your script too, try increasing the for loop to 100000, this seems to trigger it for me.

Revision history for this message
Andrew Hutchings (linuxjedi) wrote :

Trace with debugging symbols:

drizzled: sql_parse.cc:386: bool do_command(Session*): Assertion `session->is_error()' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffef472950 (LWP 6884)]
0x0000003119c32ed5 in raise (sig=<value optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
Current language: auto; currently c
(gdb) bt
#0 0x0000003119c32ed5 in raise (sig=<value optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x0000003119c34a43 in abort () at abort.c:88
#2 0x0000003119c2bec9 in __assert_fail (
    assertion=0x79fad8 "session->is_error()", file=0x79f473 "sql_parse.cc",
    line=386, function=0x7a0110 "bool do_command(Session*)") at assert.c:78
#3 0x00000000004d9c6f in do_command (session=0x10ea6e0) at sql_parse.cc:386
#4 0x00000000004852f8 in libevent_thread_proc (arg=0x0) at scheduler.cc:535
#5 0x000000311a8073da in start_thread (arg=<value optimized out>)
    at pthread_create.c:297
#6 0x0000003119ce627d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

Revision history for this message
Jay Pipes (jaypipes) wrote :

Yes, sure enough, increasing the loop by a power of 10 to 100000 causes the server to crash.

Changing this to a CRITICAL bug. This is a major security hole, since any client can throw a bunch of data at the drizzled server and cause it to crash.

Revision history for this message
Jay Pipes (jaypipes) wrote :

The following is from sql_parse.cc:363-398

  /*
    This thread will do a blocking read from the client which
    will be interrupted when the next command is received from
    the client, the connection is closed or "net_wait_timeout"
    number of seconds has passed
  */
  my_net_set_read_timeout(net, session->variables.net_wait_timeout);

  /*
    XXX: this code is here only to clear possible errors of init_connect.
    Consider moving to init_connect() instead.
  */
  session->clear_error(); // Clear error message
  session->main_da.reset_diagnostics_area();

  net_new_transaction(net);

  packet_length= my_net_read(net);
  if (packet_length == packet_error)
  {
    /* Check if we can continue without closing the connection */

    /* The error must be set. */
    assert(session->is_error());
    net_end_statement(session);

    if (net->error != 3)
    {
      return_value= true; // We have to close it.
      goto out;
    }

    net->error= 0;
    return_value= false;
    goto out;
  }

The assert(session->is_error()); after checking of packet_length == packet_error, is the problem. Sure, asserting that the session has an error set is a good safety check, however throwing an assert any time after reading from a client connection (my_net_read()) is a security problem, since the client can send junk and crash the server.

Steps that need to be done:

1) Identify *why* the if block is being entered. We are using ~(uint32_t) 0 as the packet_error code. Does the my_net_read() function incorrectly return ~(uint32_t) 0 when a large packet is being sent? Check into this.

2) Change the assert() to be a graceful fail and a close of the connection, not a complete bomb of the server...

Changed in drizzle:
importance: Undecided → Critical
status: New → Confirmed
Revision history for this message
Jay Pipes (jaypipes) wrote :

I created a test case based on the output of the script above, ran it through ./dtr and received the following:

main.bug307317 [ fail ]

drizzletest: At line 6: The input buffer is too small for this query.x
check your query or increase MAX_QUERY and recompile

The result from queries just before the failure was:
USE test;
CREATE TABLE IF NOT EXISTS `testing` (
`id` int,
`rand` int,
PRIMARY KEY (id));

Therefore, I suspect our current testing suite may not be capable of reproducing this bug...

Revision history for this message
Jay Pipes (jaypipes) wrote :

Assigning to myself. I traced through GDB and have found the source of the problem, at least in this case....

Changed in drizzle:
assignee: nobody → jaypipes
Revision history for this message
Jay Pipes (jaypipes) wrote :

The problem is in my_net_read(). After the call to net_realloc() (drizzled/protocol.cc:720), which occurs when the incoming packet length for multi-packet statements is greater than net->max_packet, my_net_read() fails to set the error on the *Session.

Revision history for this message
Jay Pipes (jaypipes) wrote :

FYI, this bug still exists on trunk:

jpipes@serialcoder:~/repos/drizzle/trunk-with-temporal/tests$ ../client/drizzle --user=root --port=9306 < drizzle.sql
ERROR 2006 (HY000) at line 100005: Drizzle server has gone away

Changed in drizzle:
assignee: jaypipes → eday
milestone: none → aloha
Revision history for this message
Lee Bieber (kalebral-deactivatedaccount) wrote :

I'm marking this as fixed as I can no longer reproduce this when running on Solaris, Mac and Ubuntu. Tested with build 1103

Changed in drizzle:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.