EXECUTE.BATCH update creates core-file

Bug #1274962 reported by Guy Groulx
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Trafodion
In Progress
High
Sandhya Sundaresan

Bug Description

Using atomics suite.

Test 10 is an update test using JDBC batch feature.
Test 12 is a delete test using JDBC batch feature.

Test is configured to batch 500 rows and then does execute batch. See code for update. Similar code for delete.

    /* Prepare the insert statement. */
    String sqlStmtCmd = "UPDATE " + dbTableName + " SET COL1=?, COL2=? WHERE CNT = ?";
    try {
      sqlStmt = dbConnection.prepareStatement(sqlStmtCmd);
    } catch (SQLException e) {
      log_error("Exception in update_with_rowsets\n");
      return(e.getErrorCode());
    }

...

    for (i = 0; i < (int)numbtmf; i++) { <== numbtmf is a passed parameter, set to 500 in my tests.
      try {
        sqlStmt.setLong(1, up_col1);
        sqlStmt.setLong(2, up_col2);
        sqlStmt.setLong(3, hv_cnt);
        sqlStmt.addBatch();
        hv_cnt++;
        if (hv_cnt >= hv_cnt2) hv_cnt = hv_cnt1; /* Need to restart. */
      } catch (SQLException e) {
        log_error("Exception in update_with_rowsets\n");
        SQLException nextException;

        nextException = e;
        retcode = e.getErrorCode();
        do {
          log_error(nextException.getMessage() + " State: " + nextException.getSQLState() + "\n");
        } while ((nextException = nextException.getNextException()) != null);
      }
    }

    beginTXN();

    try {
      resultCountL = sqlStmt.executeBatch(); <== This is where is hangs for a very long time.
      rowCnt+=resultCountL.length;
      if (resultCountL.length != numbtmf) {
        log_error("Error UPDATING!! resultCount: " + resultCountL.length + "\n");
        retcode = 1;
      } else {
        for (i = 0; i < resultCountL.length; i++) {
          if ((resultCountL[i] != 1) && (resultCountL[i] != Statement.SUCCESS_NO_INFO)) {
            log_error("Error UPDATING!! resultCount: " + resultCountL[i] + "\n");
            retcode = resultCountL[i];
            break;
          }
        }
      }
      sqlStmt.clearBatch();
    } catch (SQLException e) {
      log_error("Exception in update_with_rowsets\n");
      SQLException nextException;

      nextException = e;
      retcode = e.getErrorCode();
      do {
        log_error(nextException.getMessage() + " State: " + nextException.getSQLState() + "\n");
      } while ((nextException = nextException.getNextException()) != null);
    }

    commitTXN();
    return retcode;

Eventually, get:
Exception in update_with_rowsets
Exception in update_with_rowsets
Batch Update Failed, See next exception for details State: HY000
Batch Update Failed, See next exception for details State: HY000
*** ERROR[8448] Unable to access Hbase interface. Call to ExpHbaseInterface::insertRows returned error HBASE_ACCESS_ERROR(-705). Cause:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=10, exceptions:
Fri Jan 31 14:38:42 UTC 2014, org.apache.hadoop.hbase.client.transactional.TransactionalTable$7@1c36686e, java.lang.NullPointerException
Fri Jan 31 14:38:43 UTC 2014, org.apache.hadoop.hbase.client.transactional.TransactionalTable$7@1c36686e, java.lang.NullPointerException
Fri Jan 31 14:38:44 UTC 2014, org.apache.hadoop.hbase.client.transactional.TransactionalTable$7@1c36686e, java.lang.NullPointerException
Fri Jan 31 14:38:45 UTC 2014, org.apache.hadoop.hbase.client.transactional.TransactionalTable$7@1c36686e, java.lang.NullPointerException
Fri Jan 31 14:38:47 UTC 2014, org.apache.hadoop.hbase.client.transactional.TransactionalTable$7@1c36686e, java.lang.NullPointerException
Fri Jan 31 14:38:49 UTC 2014, org.apache.hadoop.hbase.client.transactional.TransactionalTable$7@1c36686e, java.lang.NullPointerException
Fri Jan 31 14:38:53 UTC 2014, org.apache.hadoop.hbase.client.transactional.TransactionalTable$7@1c36686e, java.lang.NullPointerException
Fri Jan 31 14:38:57 UTC 2014, org.apache.hadoop.hbase.client.transactional.TransactionalTable$7@1c36686e, java.lang.NullPointerException
Fri Jan 31 14:39:05 UTC 2014, org.apache.hadoop.hbase.client.transactional.TransactionalTable$7@1c36686e, java.lang.NullPointerException
Fri Jan 31 14:39:21 UTC 2014, org.apache.hadoop.hbase.client.transactional.TransactionalTable$7@1c36686e, java.lang.NullPointerException

org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:183)
org.apache.hadoop.hbase.client.transactional.TransactionalTable.put(TransactionalTable.java:306)
org.apache.hadoop.hbase.client.transactional.RMInterface.put(RMInterface.java:155)
org.trafodion.sql.HBaseAccess.HTableClient.insertRows(HTableClient.java:500)
. [2014-01-31 14:39:21] State: X08CG
*** ERROR[8448] Unable to access Hbase interface. Call to ExpHbaseInterface::insertRows returned error HBASE_ACCESS_ERROR(-705). Cause:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=10, exceptions:
Fri Jan 31 14:38:42 UTC 2014, org.apache.hadoop.hbase.client.transactional.TransactionalTable$7@1c36686e, java.lang.NullPointerException
Fri Jan 31 14:38:43 UTC 2014, org.apache.hadoop.hbase.client.transactional.TransactionalTable$7@1c36686e, java.lang.NullPointerException
Fri Jan 31 14:38:44 UTC 2014, org.apache.hadoop.hbase.client.transactional.TransactionalTable$7@1c36686e, java.lang.NullPointerException
Fri Jan 31 14:38:45 UTC 2014, org.apache.hadoop.hbase.client.transactional.TransactionalTable$7@1c36686e, java.lang.NullPointerException
Fri Jan 31 14:38:47 UTC 2014, org.apache.hadoop.hbase.client.transactional.TransactionalTable$7@1c36686e, java.lang.NullPointerException
Fri Jan 31 14:38:49 UTC 2014, org.apache.hadoop.hbase.client.transactional.TransactionalTable$7@1c36686e, java.lang.NullPointerException
Fri Jan 31 14:38:53 UTC 2014, org.apache.hadoop.hbase.client.transactional.TransactionalTable$7@1c36686e, java.lang.NullPointerException
Fri Jan 31 14:38:57 UTC 2014, org.apache.hadoop.hbase.client.transactional.TransactionalTable$7@1c36686e, java.lang.NullPointerException
Fri Jan 31 14:39:05 UTC 2014, org.apache.hadoop.hbase.client.transactional.TransactionalTable$7@1c36686e, java.lang.NullPointerException
Fri Jan 31 14:39:21 UTC 2014, org.apache.hadoop.hbase.client.transactional.TransactionalTable$7@1c36686e, java.lang.NullPointerException

org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:183)
org.apache.hadoop.hbase.client.transactional.TransactionalTable.put(TransactionalTable.java:306)
org.apache.hadoop.hbase.client.transactional.RMInterface.put(RMInterface.java:155)
org.trafodion.sql.HBaseAccess.HTableClient.insertRows(HTableClient.java:500)
. [2014-01-31 14:39:21] State: X08CG
Test TSUR0500: 0 loops from 31 Jan 14 14:32:29.970 to 14:44:57.120 for 747 secs

Get similar result when doing batch deletes.

Tags: sql-cmp
Revision history for this message
Guy Groulx (guy-groulx) wrote :

Debugging also shows that the initial exception has a code of 0.
IE: In code above
          retcode = e.getErrorCode();
retcode has a value of 0.

Guy Groulx (guy-groulx)
tags: added: transaction
tags: added: dtm
removed: transaction
Revision history for this message
Anoop Sharma (anoop-sharma) wrote :

Assigning it to connectivity group to validate if this is still an issue.

Changed in trafodion:
assignee: nobody → Arvind Narain (arvind-narain)
Guy Groulx (guy-groulx)
Changed in trafodion:
importance: High → Critical
Changed in trafodion:
milestone: none → r0.7
assignee: Arvind Narain (arvind-narain) → nobody
assignee: nobody → Oliver (oliver-bucaojit)
Changed in trafodion:
assignee: Oliver (oliver-bucaojit) → Atanu Mishra (atanu-mishra)
Revision history for this message
Atanu Mishra (atanu-mishra) wrote :

-----Original Message-----
From: Groulx, Guy
Sent: Wednesday, May 21, 2014 4:00 PM
To: Goyal, Narendra (SeaQuest R&D); Mishra, Atanu; Kakarlamudi, Rao
Cc: Broeder, Sean
Subject: RE: [Bug 1274962] Re: EXECUTE.BATCH hangs for a long time doing updates or deletes

Hi Narendra,

My target table is in trouble right now.
Hopefully, it will get fixed by tomorrow and I'll try it tout.

Guy

-----Original Message-----
From: Goyal, Narendra (SeaQuest R&D)
Sent: Wednesday, May 21, 2014 6:52 PM
To: Mishra, Atanu; Groulx, Guy; Kakarlamudi, Rao
Cc: Broeder, Sean
Subject: RE: [Bug 1274962] Re: EXECUTE.BATCH hangs for a long time doing updates or deletes

On the face of it, it does not look like a TM issue. It would be great if it could be reproduced with changes to log4j.hdfs.config to produce trace info:

...
# Narendra - somehow the previous value of the following property does not produce trace info. This will produce the hdfs.java.log file in the current working directory log4j.appender.hbaseclient.file=hdfs.java.log
...
log4j.logger.org.apache.hadoop.hbase.client.transactional=TRACE
log4j.logger.org.trafodion.sql=TRACE

Changed in trafodion:
status: New → In Progress
information type: Proprietary → Public
tags: added: sql-cmp
removed: dtm
Changed in trafodion:
assignee: Atanu Mishra (atanu-mishra) → Ravisha Neelakanthappa (ravisha-neelakanthappa)
Revision history for this message
Ravisha Neelakanthappa (ravisha-neelakanthappa) wrote :

Fix for bug 1329965 also fixes this bug.

With 1329965 fix, the plans looks like :
TRAFODION_VSBB_SCAN ======================= SEQ_NO 3 NO CHILDREN
TABLE_NAME ............... TBL500
REQUESTS_IN ............ 100
ROWS/REQUEST ............. 1
EST_OPER_COST ............ 0.03
EST_TOTAL_COST ........... 0.03
DESCRIPTION
  max_card_est ......... 100
  fragment_id ............ 0
  parent_frag ............ (none)
  fragment_type .......... master
  scan_type .............. subset scan of table TRAFODION.HBASE.TBL500
  key_columns ............ _SALT_, CNT
  part_key_predicates .... (CNT = (__array3 RowsetArrayScan
                             \:_sys_rowset_index4)) and ("_SALT_" =
                             (HashDistPartHash(cast((__array3 RowsetArrayScan
                             \:_sys_rowset_index4))) Hash2Distrib cast(8)))
  begin_key .............. (_SALT_ = (HashDistPartHash(cast(cast((cast((__array
                             3 RowsetArrayScan \:_sys_rowset_index4)) /
                             cast(1000000))))) Hash2Distrib cast(8))),
                             (CNT = (__array3 RowsetArrayScan
                             \:_sys_rowset_index4))
  end_key ................ (_SALT_ = (HashDistPartHash(cast((__array3
                             RowsetArrayScan \:_sys_rowset_index4)))
                             Hash2Distrib cast(8))),
                             (CNT = (__array3 RowsetArrayScan
                            \:_sys_rowset_index4))

Without 1329965 fix, we get:
TRAFODION_SCAN ============================ SEQ_NO 3 NO CHILDREN
TABLE_NAME ............... TBL500
REQUESTS_IN ............ 100
ROWS/REQUEST ............. 1
EST_OPER_COST ............ 0.2
EST_TOTAL_COST ........... 0.2
DESCRIPTION
  max_card_est ......... 100
  fragment_id ............ 0
  parent_frag ............ (none)
  fragment_type .......... master
  scan_type .............. subset scan of table TRAFODION.HBASE.TBL500
  key_columns ............ _SALT_, CNT
  executor_predicates .... (cast((cast(CNT) * cast(1000000))) = (__array3
                             RowsetArrayScan \:_sys_rowset_index4))
  part_key_predicates .... (CNT = (__array3 RowsetArrayScan
                             \:_sys_rowset_index4))
  begin_key .............. (_SALT_ = <min>), (CNT = (__array3 RowsetArrayScan
                             \:_sys_rowset_index4))
  end_key ................ (_SALT_ = <max>), (CNT = (__array3 RowsetArrayScan
                             \:_sys_rowset_index4))

I will close this bug as duplicate of 1329965 .

tags: added: duplicate
removed: sql-cmp
tags: added: sql-cmp
removed: duplicate
Revision history for this message
Guy Groulx (guy-groulx) wrote :
Download full text (6.7 KiB)

Just tried the test on our system with git150105 version

Inserted 20500 rows into table using batch inserts.

Then tried test 10 which does JDBC batch updates.

/home/squser2/guy/JDBCAtomics> ./runOneJDBCTest.sh ID tbl500 10 30
Running the test as follows:
/home/tools/jdk1.7.0_05_64/jre/bin/java -Djdbc.drivers=com.hp.jdbc.HPT4Driver -Dhpt4jdbc.properties=/home/squser2/guy/JDBCAtomics/info.properties JDBCAtomics ID URL tbl500 500 10 500 30
Exception in update_with_rowsets
Exception in update_with_rowsets
Batch Update Failed, See next exception for details State: HY000
Batch Update Failed, See next exception for details State: HY000
There was a problem reading from the server State: HY000
There was a problem reading from the server State: HY000
The message header was not long enough State: HY000
The message header was not long enough State: HY000
Exception There was a problem reading from the server in beginTxn.
Exception There was a problem reading from the server in beginTxn.
com.hp.jdbc.HPT4Exception: There was a problem reading from the server
 at com.hp.jdbc.HPT4Messages.createSQLException(HPT4Messages.java:283)
 at com.hp.jdbc.InputOutput.doIO(InputOutput.java:372)
 at com.hp.jdbc.T4Connection.getReadBuffer(T4Connection.java:153)
 at com.hp.jdbc.T4Connection.EndTransaction(T4Connection.java:397)
 at com.hp.jdbc.InterfaceConnection.endTransaction(InterfaceConnection.java:1223)
 at com.hp.jdbc.InterfaceConnection.commit(InterfaceConnection.java:435)
 at com.hp.jdbc.HPT4Connection.commit(HPT4Connection.java:110)
 at JDBCAtomics.commitTXN(JDBCAtomics.java:424)
 at JDBCAtomics.update_with_rowsets_loop(JDBCAtomics.java:1524)
 at JDBCAtomics.update_with_rowsets(JDBCAtomics.java:1441)
 at JDBCAtomics.update_with_rowsets(JDBCAtomics.java:1384)
 at JDBCAtomics.main(JDBCAtomics.java:2738)
Test IDUR0500: 1000 loops from 9 Jan 15 18:30:49.429 to 18:30:54.898 for 5 secs
/home/squser2/guy/JDBCAtomics>

And a core was created.
2015-01-09 18:30:54 /local/cores/1008/core.1420828254.n009.40732.mxosrvr

Core was generated by `mxosrvr -ZkHost n004.cm.cluster:2181,n005.cm.cluster:2181,n006.cm.cluster:2181'.
Program terminated with signal 6, Aborted.
#0 0x00007ffff504c8a5 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007ffff504c8a5 in raise () from /lib64/libc.so.6
#1 0x00007ffff504e00d in abort () from /lib64/libc.so.6
#2 0x00007ffff1858810 in assert_botch_abend (
    f=0x7ffff351251b "../executor/ExHbaseAccess.cpp", l=1905,
    m=0x7ffff3512539 "Direct row buffer overflow", c=<value optimized out>)
    at ../export/NAAbort.cpp:282
#3 0x00007ffff341dada in ExHbaseAccessTcb::patchDirectRowBuffers (
    this=0x7fffe434dd08) at ../executor/ExHbaseAccess.cpp:1905
#4 0x00007ffff34350f0 in ExHbaseAccessSQRowsetTcb::work (this=0x7fffe434dd08)
    at ../executor/ExHbaseIUD.cpp:3942
#5 0x00007ffff3452503 in ExScheduler::work (this=0x7fffe434bf78,
    prevWaitTime=<value optimized out>) at ../executor/ExScheduler.cpp:328
#6 0x00007ffff3380462 in ex_root_tcb::execute (this=0x7fffe434f148,
    cliGlobals=0xeef140, glob=0x7fffce88fc60, input_desc=0x7fffcec64bd8,
    diagsArea=@0x7fffe580b3b0, reExecute=0) at ../executor/ex_root.cpp:1055
#7 0x00007fff...

Read more...

Revision history for this message
Guy Groulx (guy-groulx) wrote :

Removing the link to bug 1329965 since bug is still there for updates.

Revision history for this message
Guy Groulx (guy-groulx) wrote :

Tried the test with ODBC row array and got the same core.

Revision history for this message
Guy Groulx (guy-groulx) wrote :

DROP TABLE TRAFODION.MXOLTP.TBL500;
CREATE TABLE TRAFODION.MXOLTP.TBL500
  (
    CNT INT UNSIGNED NO DEFAULT NOT NULL
  , COL1 INT UNSIGNED NO DEFAULT NOT NULL
  , COL2 INT UNSIGNED NO DEFAULT NOT NULL
  , COL3 INT UNSIGNED NO DEFAULT NOT NULL
  , COL4 INT UNSIGNED NO DEFAULT NOT NULL
  , COL5 INT UNSIGNED NO DEFAULT NOT NULL
  , COL6 INT UNSIGNED NO DEFAULT NOT NULL
  , COL7 INT UNSIGNED NO DEFAULT NOT NULL
  , COL10 CHAR(468) NO DEFAULT NOT NULL
  , PRIMARY KEY (CNT ASC)
  ) SALT USING 20 PARTITIONS
  ;
exit

The update command used is in the code above.
    up_col1 = 11;
    up_col2 = 22;

Changed in trafodion:
assignee: Ravisha Neelakanthappa (ravisha-neelakanthappa) → Sandhya Sundaresan (sandhya-sundaresan)
milestone: r0.7 → r1.1
milestone: r1.1 → r0.9
Changed in trafodion:
assignee: Sandhya Sundaresan (sandhya-sundaresan) → Mike Hanlon (mike-hanlon)
milestone: r0.9 → r1.0
Changed in trafodion:
importance: Critical → High
summary: - EXECUTE.BATCH hangs for a long time doing updates or deletes
+ EXECUTE.BATCH update creates core-file
Changed in trafodion:
milestone: r1.0 → r1.1
Changed in trafodion:
milestone: r1.1 → r2.0
assignee: Mike Hanlon (mike-hanlon) → Sandhya Sundaresan (sandhya-sundaresan)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.