dbt2 fails with 1024 connections

Bug #500031 reported by Lee Bieber
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Drizzle
Fix Released
High
Jay Pipes

Bug Description

dbt2 will randomly fail when trying to run with 1024 connections, builds before that are fine. It appears that drizzled stops responding - see the logs on orisndriz01 at http://hudson.drizzle.org/view/Drizzle-staging/job/Drizzle-dbt2-staging/45/console as an example, specifically client.out, driver.out and driver/error.log

See the complete log file at http://hudson.drizzle.org/view/Drizzle-staging/job/Drizzle-dbt2-staging/45/console

From drizzled error log:
Wed Dec 23 13:37:24 2009 INFO: /home/drizzle/repos/staging-dbt2-dbt2-r1247/drizzled/drizzled: Forcing close of thread 0 user: ''
Wed Dec 23 13:37:24 2009 INFO: /home/drizzle/repos/staging-dbt2-dbt2-r1247/drizzled/drizzled: ready for connections.
Wed Dec 23 13:37:24 2009 INFO: Version: '2009.12.1247' Source distribution (staging-dbt2-dbt2-r1247)
Wed Dec 23 13:37:24 2009 INFO: terminate called after throwing an instance of 'std::out_of_range'
Wed Dec 23 13:37:24 2009 INFO: what(): basic_string::substr
Wed Dec 23 13:37:24 2009 INFO: 091223 13:32:28 - drizzled got signal 6;
Wed Dec 23 13:37:24 2009 INFO: This could be because you hit a bug. It is also possible that this binary
Wed Dec 23 13:37:24 2009 INFO: or one of the libraries it was linked against is corrupt, improperly built,
Wed Dec 23 13:37:24 2009 INFO: or misconfigured. This error can also be caused by malfunctioning hardware.
Wed Dec 23 13:37:24 2009 INFO: We will try our best to scrape up some info that will hopefully help diagnose
Wed Dec 23 13:37:24 2009 INFO: the problem, but since we have already crashed, something is definitely wrong
Wed Dec 23 13:37:24 2009 INFO: and this may fail.
Wed Dec 23 13:37:24 2009 INFO:
Wed Dec 23 13:37:24 2009 INFO: key_buffer_size=8388608
Wed Dec 23 13:37:24 2009 INFO: read_buffer_size=131072
Wed Dec 23 13:37:24 2009 INFO: max_used_connections=1024
Wed Dec 23 13:37:24 2009 INFO: connection_count=1024
Wed Dec 23 13:37:24 2009 INFO: It is possible that drizzled could use up to
Wed Dec 23 13:37:24 2009 INFO: key_buffer_size + (read_buffer_size + sort_buffer_size)*thread_count
Wed Dec 23 13:37:24 2009 INFO: bytes of memory
Wed Dec 23 13:37:24 2009 INFO: Hope that's ok; if not, decrease some variables in the equation.
Wed Dec 23 13:37:24 2009 INFO:
Wed Dec 23 13:37:24 2009 INFO: session: 0x3cd6b70
Wed Dec 23 13:37:24 2009 INFO: Attempting backtrace. You can use the following information to find out
Wed Dec 23 13:37:24 2009 INFO: where drizzled died. If you see no messages after this, something went
Wed Dec 23 13:37:24 2009 INFO: terribly wrong...
Wed Dec 23 13:37:24 2009 INFO: /home/drizzle/repos/staging-dbt2-dbt2-r1247/drizzled/drizzled [0x61a90e]
Wed Dec 23 13:37:24 2009 INFO: /home/drizzle/repos/staging-dbt2-dbt2-r1247/drizzled/drizzled(print_stacktrace+0xd) [0x61aa5d]
Wed Dec 23 13:37:24 2009 INFO: /home/drizzle/repos/staging-dbt2-dbt2-r1247/drizzled/drizzled(handle_segfault+0x343) [0x4eec03]
Wed Dec 23 13:37:24 2009 INFO: /lib/libpthread.so.0 [0x7fa2b0b13080]
Wed Dec 23 13:37:24 2009 INFO: /lib/libc.so.6(gsignal+0x35) [0x7fa2b07c4fb5]
Wed Dec 23 13:37:24 2009 INFO: /lib/libc.so.6(abort+0x183) [0x7fa2b07c6bc3]
Wed Dec 23 13:37:24 2009 INFO: /usr/lib/libstdc++.so.6(__gnu_cxx::__verbose_terminate_handler()+0x114) [0x7fa2b12860b4]
Wed Dec 23 13:37:24 2009 INFO: /usr/lib/libstdc++.so.6 [0x7fa2b12844b6]
Wed Dec 23 13:37:24 2009 INFO: /usr/lib/libstdc++.so.6 [0x7fa2b12844e3]
Wed Dec 23 13:37:24 2009 INFO: /usr/lib/libstdc++.so.6 [0x7fa2b12845ca]
Wed Dec 23 13:37:24 2009 INFO: /usr/lib/libstdc++.so.6(std::__throw_out_of_range(char const*)+0x67) [0x7fa2b1220777]
Wed Dec 23 13:37:24 2009 INFO: /usr/lib/libstdc++.so.6 [0x7fa2b1263223]
Wed Dec 23 13:37:24 2009 INFO: /home/drizzle/repos/staging-dbt2-dbt2-r1247/drizzled/drizzled(drizzled::TemporalFormat::matches(char const*, unsigned long, drizzled::Temporal*)+0x250) [0x63a240]
Wed Dec 23 13:37:24 2009 INFO: /home/drizzle/repos/staging-dbt2-dbt2-r1247/drizzled/drizzled(drizzled::DateTime::from_string(char const*, unsigned long)+0x46) [0x639646]
Wed Dec 23 13:37:24 2009 INFO: /home/drizzle/repos/staging-dbt2-dbt2-r1247/drizzled/drizzled(Field_timestamp::store(char const*, unsigned int, charset_info_st const*)+0x40) [0x500050]
Wed Dec 23 13:37:24 2009 INFO: /home/drizzle/repos/staging-dbt2-dbt2-r1247/drizzled/drizzled(Field::store_time(st_drizzle_time*, enum_drizzle_timestamp_type)+0x41) [0x4f4ac1]
Wed Dec 23 13:37:24 2009 INFO: /home/drizzle/repos/staging-dbt2-dbt2-r1247/drizzled/drizzled(fill_record(Session*, List<Item>&, List<Item>&, bool)+0x74) [0x5d4ad4]
Wed Dec 23 13:37:24 2009 INFO: /home/drizzle/repos/staging-dbt2-dbt2-r1247/drizzled/drizzled(mysql_update(Session*, TableList*, List<Item>&, List<Item>&, Item*, unsigned int, order_st*, unsigned long, enum_duplicates, bool)+0xa77) [0x60b197]
Wed Dec 23 13:37:24 2009 INFO: /home/drizzle/repos/staging-dbt2-dbt2-r1247/drizzled/drizzled(drizzled::statement::Update::execute()+0xd5) [0x629015]
Wed Dec 23 13:37:24 2009 INFO: /home/drizzle/repos/staging-dbt2-dbt2-r1247/drizzled/drizzled [0x5edf5a]
Wed Dec 23 13:37:24 2009 INFO: /home/drizzle/repos/staging-dbt2-dbt2-r1247/drizzled/drizzled [0x5eeaa7]
Wed Dec 23 13:37:24 2009 INFO: /home/drizzle/repos/staging-dbt2-dbt2-r1247/drizzled/drizzled(dispatch_command(enum_server_command, Session*, char*, unsigned int)+0x3ba) [0x5efa2a]
Wed Dec 23 13:37:24 2009 INFO: /home/drizzle/repos/staging-dbt2-dbt2-r1247/drizzled/drizzled(Session::executeStatement()+0x91) [0x5bd7d1]
Wed Dec 23 13:37:24 2009 INFO: /home/drizzle/repos/staging-dbt2-dbt2-r1247/drizzled/drizzled(Session::run()+0x97) [0x5bec27]
Wed Dec 23 13:37:24 2009 INFO: /home/drizzle/repos/staging-dbt2-dbt2-r1247/plugin/.libs/libmulti_thread_plugin.so(session_thread+0x62) [0x7fa29efcf972]
Wed Dec 23 13:37:24 2009 INFO: /lib/libpthread.so.0 [0x7fa2b0b0b3ba]
Wed Dec 23 13:37:24 2009 INFO: /lib/libc.so.6(clone+0x6d) [0x7fa2b0877fcd]
Wed Dec 23 13:37:24 2009 INFO: Trying to get some variables.
Wed Dec 23 13:37:24 2009 INFO: Some pointers may be invalid and cause the dump to abort...
Wed Dec 23 13:37:24 2009 INFO: session->query at 0x3cdc4b8 = UPDATE order_line
Wed Dec 23 13:37:24 2009 INFO: SET ol_delivery_d = current_timestamp
Wed Dec 23 13:37:24 2009 INFO: WHERE ol_o_id = 2101
Wed Dec 23 13:37:24 2009 INFO: AND ol_w_id = 10
Wed Dec 23 13:37:24 2009 INFO: AND ol_d_id = 3
Wed Dec 23 13:37:24 2009 INFO: session->thread_id=257
Wed Dec 23 13:37:24 2009 INFO: session->killed=NOT_KILLED
Wed Dec 23 13:37:24 2009 INFO: pure virtual method called
Wed Dec 23 13:37:24 2009 INFO: terminate called recursively

Related branches

Revision history for this message
Lee Bieber (kalebral-deactivatedaccount) wrote :

Attaching valgrind out put, this part is pretty scary!

valgrind: the 'impossible' happened:
   VG_N_THREADS is too low
==17024== at 0x3802A7AC: report_and_quit (m_libcassert.c:140)
==17024== by 0x3802A8C4: panic (m_libcassert.c:215)
==17024== by 0x3802A932: vgPlain_core_panic_at (m_libcassert.c:220)
==17024== by 0x3802A951: vgPlain_core_panic (m_libcassert.c:225)
==17024== by 0x3804CE91: vgPlain_alloc_ThreadState (scheduler.c:187)
==17024== by 0x380966F9: vgSysWrap_amd64_linux_sys_clone_before (syswrap-amd64-linux.c:205)
==17024== by 0x380501A0: vgPlain_client_syscall (syswrap-main.c:942)
==17024== by 0x3804D672: handle_syscall (scheduler.c:824)
==17024== by 0x3804E676: vgPlain_scheduler (scheduler.c:1018)
==17024== by 0x38060CB0: run_a_thread_NORETURN (syswrap-linux.c:89)

Changed in drizzle:
milestone: none → bell
importance: Undecided → High
status: New → Confirmed
assignee: nobody → Brian Aker (brianaker)
Revision history for this message
Lee Bieber (kalebral-deactivatedaccount) wrote :

Ignore the line in the original description "builds before that are fine", I've gotten it to fail on many different builds

description: updated
Revision history for this message
Jay Pipes (jaypipes) wrote :

I'm so happy this showed up! \o/

Bug 388889 never had a repeatable test case and now this is it :)

Assigning to myself as it seems to be caused by TemporalFormat.

-jay

Changed in drizzle:
assignee: Brian Aker (brianaker) → Jay Pipes (jaypipes)
status: Confirmed → In Progress
Revision history for this message
Jay Pipes (jaypipes) wrote :

After investigation into this, I discovered that there was a race condition in TemporalFormat::match():

TemporalFormat::_re is the compiled PCRE regular expression object inside each of the TemporalFormat objects, which are shared among all threads and live in global scope.

Unfortunately, TemporalFormat::match() was using TemporalFormat::_match_vector as its match state. At high concurrency, this means that the following could happen:

Thread 1 executes pcre_exec() and finds a match, therefore populating TemporalFormat::_match_vector of integers with the position offsets of the matched pieces of the temporal object.

Thread 1, during construction of the Temporal output of TemporalFormat::match(), uses these _match_vector position offsets in calling std::string::substr on a copy of the matched string, essentially "cutting up" the string into year, month, day, etc.

Thread 2 executes pcre_exec() and also finds a match, thereby changing TemporalFormat::_match_vector to something different

Thread 1 continues trying to use std::string::substr(), but now uses offsets that are invalid for its string, thereby producing an out_of_range exception.

The solution is to pull the TemporalFormat::_match_vector member variable and instead put a function-scope-level match_vector variable on the stack inside TemporalFormat::match().

Jay Pipes (jaypipes)
Changed in drizzle:
status: In Progress → Fix Committed
Jay Pipes (jaypipes)
Changed in drizzle:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments