Master Executor reporting error 8605

Bug #1412955 reported by Buddy Wilbanks
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Trafodion
In Progress
High
Arvind Narain

Bug Description

We are seeing this error message streaming during the order entry benchmark.

2015-01-17 11:25:49,116, ERROR, SQL, Node Number: 0, CPU: 0, PIN: 63243, Process Name: $Z001GLY, SQLCODE: 8605, QID: MXID110000632432122882535634324540
00000000206U3333300_24_STMT_COMMIT_1, *** ERROR[8605] Committing a transaction which has not started.
2015-01-17 11:25:50,667, ERROR, SQL, Node Number: 0, CPU: 0, PIN: 63243, Process Name: $Z001GLY, SQLCODE: 8605, QID: MXID110000632432122882535634324540
00000000206U3333300_24_STMT_COMMIT_1, *** ERROR[8605] Committing a transaction which has not started.
2015-01-17 11:26:17,823, ERROR, SQL, Node Number: 0, CPU: 0, PIN: 63243, Process Name: $Z001GLY, SQLCODE: 8605, QID: MXID110000632432122882535634324540
00000000206U3333300_24_STMT_COMMIT_1, *** ERROR[8605] Committing a transaction which has not started.

The counts are prolific. The master above shows every couple of seconds.

master_exec_0_13398.log:10241
master_exec_0_13942.log:12065
master_exec_0_17055.log:10110
master_exec_0_19990.log:11068
master_exec_0_20733.log:11818
master_exec_0_21408.log:10560
master_exec_0_22203.log:10633
master_exec_0_22274.log:1223
master_exec_0_24271.log:10096
master_exec_0_24852.log:1477
master_exec_0_24895.log:10791
master_exec_0_24990.log:12213
master_exec_0_26398.log:11618
master_exec_0_26536.log:1298

The only place where the error occurs is when commitTransaction is called and there isn’t an xnInProgress.

executor/ex_transaction.cpp: EXE_COMMIT_TRANSACTION_ERROR, &errorCond_);
exp/ExpErrorEnums.h: EXE_COMMIT_TRANSACTION_ERROR = 8605,

short ExTransaction::commitTransaction(NABoolean waited)
{
  dp2Xns_ = FALSE;

  if (! xnInProgress())
    {
      // Set the transaDiagsArea.
      // This is the first error. So reset the diags area.
      if (transDiagsArea_)
      {
        transDiagsArea_->decrRefCount();
        transDiagsArea_ = NULL;
      }
      ExRaiseSqlError(heap_, &transDiagsArea_,
                      EXE_COMMIT_TRANSACTION_ERROR, &errorCond_);
      return -1;
    }

It must have something to do with inherited transactions. inheritTransaction is the only place where we set XnInProgress_ to false. The order entry does not use ESPs so there is no reason to have an inherited transaction.

Guy has shown the problem with a simple test case.

First scenario:
- I’m going to do a prepare/exec without using transactions.

/home/squser2> trafci.sh

Welcome to Trafodion Command Interface
Copyright(C) 2013-2014 Hewlett-Packard Development Company, L.P.

User Name: squser2
Host Name/IP Address: n001:21000
Password:

Connected to Trafodion

SQL>set schema mxoltp;

--- SQL operation complete.

SQL>prepare cmd from select [first 10]* from tbl500;

*** WARNING[6008] Statistics for column (CNT) from table TRAFODION.MXOLTP.TBL500 were not available. As a result, the access path chosen might not be the best possible. [2015-01-20 19:26:52]
--- SQL command prepared.

SQL>execute cmd;

CNT COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL10
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- --------------------------------------------------------------------------------------------------------------------------------
         3 1 2 0 4 5 6 7 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL

       176 1 2 1 4 5 6 7 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL

--- 10 row(s) selected.

SQL>exit
/home/squser2>

Here’s the master_exec_ log.
/opt/hp/squser2/git150118/logs> tail -f master_exec_6_63946.log

2015-01-20 19:26:52,591, INFO, SQL, Node Number: 0, CPU: 6, PIN: 63946, Process Name: $Z061H71, SQLCODE: 6008, QID: MXID11006063946212288460034592277000000000406U3333300_1177_CMD, *** WARNING[6008] Statistics for column (CNT) from table TRAFODION.MXOLTP.TBL500 were not available. As a result, the access path chosen might not be the best possible.
^C
/opt/hp/squser2/git150118/logs>

Second Scenario :
- I’m going to do a prepare and do the exec within a transactions.

/home/squser2> trafci.sh

Welcome to Trafodion Command Interface
Copyright(C) 2013-2014 Hewlett-Packard Development Company, L.P.

User Name: squser2
Host Name/IP Address: n001:21000
Password:

Connected to Trafodion

SQL>set schema mxoltp;

--- SQL operation complete.

SQL>prepare cmd from select [first 10]* from tbl500;

*** WARNING[6008] Statistics for column (CNT) from table TRAFODION.MXOLTP.TBL500 were not available. As a result, the access path chosen might not be the best possible. [2015-01-20 19:35:04]
--- SQL command prepared.

SQL>begin work;

--- SQL operation complete.

SQL>execute cmd;

CNT COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL10
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- --------------------------------------------------------------------------------------------------------------------------------
         3 1 2 0 4 5 6 7 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL

       176 1 2 1 4 5 6 7 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL

--- 10 row(s) selected.

SQL>commit work;

--- SQL operation complete.

SQL>exit
/home/squser2>

Here’s the master_exec_ log.
2015-01-20 19:35:04,011, INFO, SQL, Node Number: 0, CPU: 4, PIN: 2006, Process Name: $Z0401MB, SQLCODE: 6008, QID: MXID11004002006212288460034800667000000000406U3333300_1177_CMD, *** WARNING[6008] Statistics for column (CNT) from table TRAFODION.MXOLTP.TBL500 were not available. As a result, the access path chosen might not be the best possible.
2015-01-20 19:36:25,843, INFO, SQL.ESP, Node Number: 0, CPU: 4, PIN: 21984, Process Name: $Z040HY4,,, An ESP process is launched.
2015-01-20 19:36:38,979, ERROR, SQL, Node Number: 0, CPU: 4, PIN: 2006, Process Name: $Z0401MB, SQLCODE: 8609, QID: MXID11004002006212288460034800667000000000406U3333300_1160_STMT_ROLLBACK_1, *** ERROR[8609] Waited rollback performed without starting a transaction.

Notice in my second scenario here, an ESP is started. Why?????
And the 8609 msg shows up.

Revision history for this message
Anoop Sharma (anoop-sharma) wrote :

To answer the last question: "Notice in my second scenario here, an ESP is started. Why?????"

The query being issued is:
prepare cmd from select [first 10]* from tbl500;

This is not a unique or oltp query and will be scanning the table without any key or scan predicates.
Parallel execution using esps may be chosen.

An explain will show what the plan looks like.

Not yet clear on why the other error 8609 is being returned since this is a non-transactional query.
Could be that a transaction state is leftover from a previous query. We have seen some issues with
transaction state not being handled correctly in ESPs. Selva has looked at it.
Maybe he can comment.

Changed in trafodion:
assignee: nobody → Selvaganesan Govindarajan (selva-ganesan)
Changed in trafodion:
assignee: Selvaganesan Govindarajan (selva-ganesan) → Sandhya Sundaresan (sandhya-sundaresan)
status: New → In Progress
Revision history for this message
Sandhya Sundaresan (sandhya-sundaresan) wrote :

Fix will be in connectivity.
If a transaction isn't active in the CLI context, the mxosrvr should not call SQL CLI to commit/rollback work. This will be similar to the way it is done today in connectivity code during disconnect time.
The SQL behavior will remain unchanged.

Changed in trafodion:
milestone: r1.0 → r1.1
assignee: Sandhya Sundaresan (sandhya-sundaresan) → Arvind Narain (arvind-narain)
Revision history for this message
Guy Groulx (guy-groulx) wrote :

Did some testing regarding 8605.
We used the odbc atomics program in gdb step mode to see when the mxosvrs were generating the 8605 errors in the master_exec_#_pid.log files.

1- When the odbc program calls
    retcode = SQLConnect(hdbc,(unsigned char *)dsnName,SQL_NTS,(unsigned char *)user,SQL_NTS,(unsigned char *)password,SQL_NTS);
and it is the first ever connection to a particular mxosrvr, it logs an 8506 error. Retcode is 0 in the client program. IE: No errors.
2- The odbc program then sets AUTOCOMMIT OFF.
Later on it does the following:
      sprintf( (char *)sqlStmt, "SELECT MAX(cnt) FROM %s", hv_tablename );
  if (alloc_hstmt()) exit( retcode );
  if (!SQL_SUCCEEDED((retcode = SQLExecDirect(hstmt,sqlStmt,SQL_NTS)))) {….}
  if (!SQL_SUCCEEDED((retcode = SQLBindCol(hstmt,1,SQL_C_SLONG,&tempval,sizeof(tempval),&tempind)))) {…}
  if ((retcode = SQLFetch(hstmt)) == SQL_NO_DATA_FOUND) {…}
      if (!SQL_SUCCEEDED((retcode = SQLEndTran(SQL_HANDLE_DBC,hdbc,SQL_COMMIT)))) {…}
At this point another 8605 message is logged in the master_exec_#_pid.log file. retcode is 0 in the client program. IE: No errors.

Changed in trafodion:
milestone: r1.1 → r1.2
Alice Chen (alchen)
tags: added: sql-exe
removed: executor
Alice Chen (alchen)
tags: added: dtm
Revision history for this message
Sandhya Sundaresan (sandhya-sundaresan) wrote :

This is a connectivity issue and fix is in connectivity.
 Not dtm nor exe. So changing tags.

tags: added: connectivity-mxosrvr
removed: dtm sql-exe
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.