Select query with outer join of 3 repos tables from trafci failing with error 201 and esp core

Bug #1438943 reported by Aruna Sadashiva
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Trafodion
Fix Committed
High
Justin Du

Bug Description

Traf daily build 3/29.

From trafci :
Set schema “_REPOS_”;
Select * from metric_query_table, metric_query_aggr_table, metric_session_table;

It returns error 201s and there are esp cores with the stack below, it works ok from sqlci.

(gdb) bt
#0 0x00007ffff28b5625 in raise () from /lib64/libc.so.6
#1 0x00007ffff28b6e05 in abort () from /lib64/libc.so.6
#2 0x00007ffff40aaa55 in os::abort(bool) ()
   from /usr/java/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#3 0x00007ffff422af87 in VMError::report_and_die() ()
   from /usr/java/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#4 0x00007ffff40af96f in JVM_handle_linux_signal ()
   from /usr/java/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#5 <signal handler called>
#6 0x00007fffe7962dcf in ?? ()
#7 0x00007ffff6d5b0f7 in eval (this=0x7fffe60c0780, newEntry=0x7ffff7ee36e0,
    moveExpr=0x7fffe7962c50, hashValue=666, rc=0x7ffff7ef5b7c,
    skipMemoryCheck=0) at ../exp/exp_expr.h:368
#8 Cluster::insert (this=0x7fffe60c0780, newEntry=0x7ffff7ee36e0,
    moveExpr=0x7fffe7962c50, hashValue=666, rc=0x7ffff7ef5b7c,
    skipMemoryCheck=0) at ../executor/cluster.cpp:1551
#9 0x00007ffff6da92e5 in insert (this=0x7ffff7ef5968)
    at ../executor/cluster.h:1361
#10 ex_hashj_tcb::workReadInner (this=0x7ffff7ef5968)
    at ../executor/ex_hashj.cpp:1737
#11 0x00007ffff6dad360 in ex_hashj_tcb::workUp (this=0x7ffff7ef5968)
    at ../executor/ex_hashj.cpp:918
#12 0x00007ffff6e9ec73 in ExScheduler::work (this=0x7ffff7ebdfb8,
---Type <return> to continue, or q <return> to quit---
    prevWaitTime=<value optimized out>) at ../executor/ExScheduler.cpp:328
#13 0x00007ffff6d70842 in ExEspFragInstanceDir::work (this=0x7fffffffc070,
    prevWaitTime=356936) at ../executor/ex_esp_frag_dir.cpp:757
#14 0x000000000040620f in runESP (argc=3, argv=0x7fffffffc498,
    guaReceiveFastStart=0x0) at ../bin/ex_esp_main.cpp:401
#15 0x0000000000406643 in main (argc=3, argv=0x7fffffffc498)
    at ../bin/ex_esp_main.cpp:255

Tags: sql-exe
Changed in trafodion:
assignee: nobody → Selvaganesan Govindarajan (selva-ganesan)
Changed in trafodion:
assignee: Selvaganesan Govindarajan (selva-ganesan) → Justin Du (justin-du-2)
status: New → In Progress
Revision history for this message
Justin Du (justin-du-2) wrote :

I’m not sure why the query running on trafci got ESP core but clearly there is a problem when figuring out the hash buffer size. From debugging at the frame #19 (Cluster::insert()), the hash buffer (bufferPool_) has
(gdb) p *(HashBufferSerial*)bufferPool_
$29 = {
  <HashBuffer> = {
    <NABasicObject> = {
      _vptr.NABasicObject = 0x7ffff71b5990,
      h_ = 0x7ffff7ebaeb0
    },
    members of HashBuffer:
    cluster_ = 0x7fffe60c0780,
    bufferSize_ = 412360,
    maxRowLength_ = 412360,
    isVariableLength_ = 0,
    considerBufferDefrag_ = 0,
    rows_ = 0x7fffdb16ce08 "",
    maxNumFullRowsSer_ = 0,
    freeSpace_ = 412344,
    currRow_ = 0x0,
    nextAvailRow_ = 0x7fffdb16ce08 "",
    next_ = 0x7fffe60c08b0,
    prev_ = 0x0,
    heap_ = 0x0,
    {
      data_ = 0x7fffdb16ce00 "",
      header_ = 0x7fffdb16ce00
    }
  }, <No data fields>}

This will cause
   1520 dataPointer = bufferPool_->castToSerial()->getFreeRow();
to return null (in sql/executor/Cluster.cpp). We may use this pointer as target address when storing the rows from right side in the hash buffer.

We need to fix this maxRowLength_ problem first (maybe in generator) and try.

Revision history for this message
Justin Du (justin-du-2) wrote :

In the hashjoin generator code, the minimum hash buffer size is chosen to store at least one row plus a hash row struct (16 bytes), which is also called extended row size, if it is for right row. However, when the buffer is created at runtime, the free space (freeSpace_) was set to exclude hash buffer header (8 bytes) and an extra 8 bytes. The actual row size in the buffer is the extended row which then won't fit the remaining free space, thus getFreeRow() call returned null.

Tried to increase the hash buffer size (HashBufferSerial::bufferSize_) to accommodate few extra bytes and there was not ESP core any more.

Justin Du (justin-du-2)
Changed in trafodion:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.