Statement Label could not be allocated

Bug #1422894 reported by Gary Hall
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Trafodion
Fix Committed
High
Arvind Narain

Bug Description

This has happened multiple times since Friday on performance test system Zircon4 - reporting the last two since now running 1.0.1rc2 at present.

2015-02-17 18:36:22 ERROR -
>----------------------------------------------------------
> Unexpected SQLEXCEPTION, Error Code = -1
> Stream Number : 989 , Transaction Number : 1
> SingletonSelect , scale = 20 , key_value = user5589248998698799825
> Statement Label could not be allocated. [2015-02-17 18:36:22]
>----------------------------------------------------------

org.trafodion.jdbc.t4.HPT4Exception: Statement Label could not be allocated. [2015-02-17 18:36:22]
        at org.trafodion.jdbc.t4.HPT4Messages.throwSQLException(HPT4Messages.java:206)
        at org.trafodion.jdbc.t4.InterfaceStatement.prepare(InterfaceStatement.java:1108)
        at org.trafodion.jdbc.t4.TrafT4PreparedStatement.prepare(TrafT4PreparedStatement.java:2234)
        at org.trafodion.jdbc.t4.TrafT4Connection.prepareStatement(TrafT4Connection.java:772)
        at DatabaseConnector.prepare_statement(DatabaseConnector.java:180)
        at YCSBWorkload2.runSingletonSelect(YCSBWorkload2.java:268)
        at WorkloadDriver.run(WorkloadDriver.java:427)

======================================================================================
An mxosrvr core has occurred both times as well. Stack trace for the two cores are:

echo /local/cores/1005/core.1424198189.n020.5019.mxosrvr
echo

[Thread debugging using libthread_db enabled]
Core was generated by `mxosrvr -ZKHOST n014.cm.cluster:2181,n015.cm.cluster:2181,n016.cm.cluster:2181'.
Program terminated with signal 6, Aborted.
#0 0x00007ffff4a328a5 in raise () from /lib64/libc.so.6
#0 0x00007ffff4a328a5 in raise () from /lib64/libc.so.6
#1 0x00007ffff4a3400d in abort () from /lib64/libc.so.6
#2 0x00007ffff5d51a55 in os::abort(bool) () from /usr/java/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#3 0x00007ffff5ed1f87 in VMError::report_and_die() () from /usr/java/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#4 0x00007ffff5d5696f in JVM_handle_linux_signal () from /usr/java/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#5 <signal handler called>
#6 0x00007ffff43119e9 in SQLCLI_PerformTasks(CliGlobals *, ULng32, SQLSTMT_ID *, SQLDESC_ID *, SQLDESC_ID *, Lng32, Lng32, typedef
__va_list_tag __va_list_tag *, SQLCLI_PTR_PAIRS *, SQLCLI_PTR_PAIRS *) (cliGlobals=0xeea9c0, tasks=606, statement_id=0x19ee4d8, inpu
t_descriptor=0x0, output_descriptor=0x0, num_input_ptr_pairs=0, num_output_ptr_pairs=0, ap=0x7fffd22385d0, input_ptr_pairs=0x0, outp
ut_ptr_pairs=0x0) at ../cli/Cli.cpp:2943
#7 0x00007ffff431bbfb in SQLCLI_ExecDirect2(CliGlobals *, SQLSTMT_ID *, SQLDESC_ID *, Int32, SQLDESC_ID *, Lng32, typedef __va_list
_tag __va_list_tag *, SQLCLI_PTR_PAIRS *) (cliGlobals=0xeea9c0, statement_id=0x19ee4d8, sql_source=<value optimized out>, prepFlags=
0, input_descriptor=0x0, num_ptr_pairs=0, ap=0x7fffd22385d0, ptr_pairs=0x0) at ../cli/Cli.cpp:3718
#8 0x00007ffff4377bba in SQL_EXEC_ExecDirect2 (statement_id=0x19ee4d8, sql_source=0x7fffd22387c0, prep_flags=0, input_descriptor=0x
0, num_ptr_pairs=0) at ../cli/CliExtern.cpp:2326
#9 0x00007ffff6a74e17 in SRVR::WSQL_EXEC_ExecDirect (statement_id=0x19ee4d8, sql_source=0x7fffd22387c0, input_descriptor=0x0, num_p
tr_pairs=0) at SQLWrapper.cpp:360
#10 0x00007ffff6a6a8c0 in SRVR::EXECDIRECT (pSrvrStmt=0x19edec0) at sqlinterface.cpp:4479
#11 0x00007ffff6a2e245 in SRVR::ControlProc (pParam=0x19edec0) at csrvrstmt.cpp:757
#12 0x00007ffff6a2f008 in SRVR_STMT_HDL::ExecDirect (this=0x19edec0, inCursorName=0x0, inSqlString=0x1a54498 "insert into Trafodion.
\"_REPOS_\".metric_query_aggr_table values(0,0,0,5019,5019,9,0,0,'15.250.48.108',0,'$Z09043E','MXID110090050192122909579434510640000
00000206U3333300',CONVERTTIMESTAMP(2122909581694"..., inStmtType=<value optimized out>, inSqlStmtType=<value optimized out>, inSqlAs
yncEnable=<value optimized out>, inQueryTimeout=0) at csrvrstmt.cpp:439
#13 0x00000000004ccb1b in SessionWatchDog (arg=<value optimized out>) at SrvrConnect.cpp:818
#14 0x00007ffff45b2851 in start_thread () from /lib64/libpthread.so.0
#15 0x00007ffff4ae890d in clone () from /lib64/libc.so.6

=====
echo
echo /local/cores/1005/core.1424198231.n014.61369.mxosrvr
echo

[Thread debugging using libthread_db enabled]
Core was generated by `mxosrvr -ZKHOST n014.cm.cluster:2181,n015.cm.cluster:2181,n016.cm.cluster:2181'.
Program terminated with signal 6, Aborted.
#0 0x00007ffff4a328a5 in raise () from /lib64/libc.so.6
#0 0x00007ffff4a328a5 in raise () from /lib64/libc.so.6
#1 0x00007ffff4a3400d in abort () from /lib64/libc.so.6
#2 0x00007ffff5d51a55 in os::abort(bool) () from /usr/java/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#3 0x00007ffff5ed1f87 in VMError::report_and_die() () from /usr/java/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#4 0x00007ffff5d5696f in JVM_handle_linux_signal () from /usr/java/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#5 <signal handler called>
#6 SRVR_STMT_HDL::cleanupAll (this=0x0) at csrvrstmt.cpp:625
#7 0x00007ffff6a2ef90 in SRVR_STMT_HDL::ExecDirect (this=0x0, inCursorName=0x0, inSqlString=0x1844e68 "update Trafodion.\"_REPOS_\"
.metric_query_aggr_table set AGGREGATION_END_UTC_TS = CONVERTTIMESTAMP(212290958231118193),TOTAL_EST_ROWS_ACCESSED = 0,TOTAL_EST_ROW
S_USED = 0,TOTAL_ROWS_RETRIEVED = 0,TOTAL_"..., inStmtType=<value optimized out>, inSqlStmtType=<value optimized out>, inSqlAsyncEna
ble=<value optimized out>, inQueryTimeout=0) at csrvrstmt.cpp:414
#8 0x00000000004ccb1b in SessionWatchDog (arg=<value optimized out>) at SrvrConnect.cpp:818
#9 0x00007ffff45b2851 in start_thread () from /lib64/libpthread.so.0
#10 0x00007ffff4ae890d in clone () from /lib64/libc.so.6

Changed in trafodion:
importance: Undecided → High
assignee: nobody → Mike Hanlon (mike-hanlon)
Revision history for this message
Mike Hanlon (mike-hanlon) wrote :
Download full text (8.7 KiB)

Notes on pid 5019
/local/cores/1005/core.1424198189.n020.5019.mxosrvr

#5 <signal handler called>
#6 0x00007ffff43119e9 in SQLCLI_PerformTasks(CliGlobals *, ULng32, SQLSTMT_ID *, SQLDESC_ID *, SQLDESC_ID *, Lng32, Lng32, typedef __va_list_tag __va_list_tag *, SQLCLI_PTR_PAIRS *, SQLCLI_PTR_PAIRS *) (cliGlobals=0xeea9c0, tasks=606, statement_id=0x19ee4d8,
    input_descriptor=0x0, output_descriptor=0x0, num_input_ptr_pairs=0, num_output_ptr_pairs=0, ap=0x7fffd22385d0, input_ptr_pairs=0x0,
    output_ptr_pairs=0x0) at ../cli/Cli.cpp:2943
#7 0x00007ffff431bbfb in SQLCLI_ExecDirect2(CliGlobals *, SQLSTMT_ID *, SQLDESC_ID *, Int32, SQLDESC_ID *, Lng32, typedef __va_list_tag __va_list_tag *, SQLCLI_PTR_PAIRS *) (cliGlobals=0xeea9c0, statement_id=0x19ee4d8, sql_source=<value optimized out>, prepFlags=0,
    input_descriptor=0x0, num_ptr_pairs=0, ap=0x7fffd22385d0, ptr_pairs=0x0) at ../cli/Cli.cpp:3718
#8 0x00007ffff4377bba in SQL_EXEC_ExecDirect2 (statement_id=0x19ee4d8, sql_source=0x7fffd22387c0, prep_flags=0, input_descriptor=0x0,
    num_ptr_pairs=0) at ../cli/CliExtern.cpp:2326
#9 0x00007ffff6a74e17 in SRVR::WSQL_EXEC_ExecDirect (statement_id=0x19ee4d8, sql_source=0x7fffd22387c0, input_descriptor=0x0,
    num_ptr_pairs=0) at SQLWrapper.cpp:360
#10 0x00007ffff6a6a8c0 in SRVR::EXECDIRECT (pSrvrStmt=0x19edec0) at sqlinterface.cpp:4479
#11 0x00007ffff6a2e245 in SRVR::ControlProc (pParam=0x19edec0) at csrvrstmt.cpp:757
#12 0x00007ffff6a2f008 in SRVR_STMT_HDL::ExecDirect (this=0x19edec0, inCursorName=0x0,
    inSqlString=0x1a54498 "insert into Trafodion.\"_REPOS_\".metric_query_aggr_table values(0,0,0,5019,5019,9,0,0,'15.250.48.108',0,'$Z09043E','MXID11009005019212290957943451064000000000206U3333300',CONVERTTIMESTAMP(2122909581694"..., inStmtType=<value optimized out>,
    inSqlStmtType=<value optimized out>, inSqlAsyncEnable=<value optimized out>, inQueryTimeout=0) at csrvrstmt.cpp:439
#13 0x00000000004ccb1b in SessionWatchDog (arg=<value optimized out>) at SrvrConnect.cpp:818
#14 0x00007ffff45b2851 in start_thread () from /lib64/libpthread.so.0
#15 0x00007ffff4ae890d in clone () from /lib64/libc.so.6

In Frame 6 --
(gdb) p *statement_id
$3 = {
  version = -293728816,
  name_mode = 32767,
  module = 0x2932287261686320,
  identifier = 0x0,
  handle = 0x0,
  charset = 0x0,
  identifier_len = 0,
  tag = 1819047278
}

(gdb) p statement_id->module
$6 = (const SQLMODULE_ID *) 0x2932287261686320
(gdb) p statement_id->module->module_name
Cannot access memory at address 0x2932287261686328
(gdb) p stmtInfo
$7 = (StatementInfo *) 0x0

The caller has set tasks as...
3714 ULng32 tasks = CLI_PT_GET_INPUT_DESC | CLI_PT_EXEC | CLI_PT_FETCH |
3715 CLI_PT_CLOSE |CLI_PT_SPECIAL_END_PROCESS | CLI_PT_EPILOGUE;

The statement_id param is definitely corrupt.But this the first use of statement_id in SQLCLI_PerformTasks -- see tasks and stmt_info.

2893 Lng32 SQLCLI_PerformTasks(
2894 /*IN*/ CliGlobals * cliGlobals,
2895 /*IN*/ ULng32 tasks,
2896 /*IN*/ SQLSTMT_ID * statement_id,
2897 /*IN OPTIONAL*/ SQLDESC_ID * input_descriptor,
2898 /*IN OPTIONAL*/ SQLDESC_ID * output_descriptor,
2899 /*IN*/ Lng32 num_inpu...

Read more...

Revision history for this message
Mike Hanlon (mike-hanlon) wrote :
Download full text (5.9 KiB)

Quoting some email from Arvind:

From: Narain, Arvind
Sent: Tuesday, February 17, 2015 8:24 PM
To: Varshneya, Renu; Hanlon, Mike; Hegde, Anuradha; Govindarajan, Selvaganes; Du, Justin
Cc: Sundaresan, Sandhya; Capirala, Tharaknath; Le Rouzo, Christophe
Subject: RE: ZIRCON4: I've never seen this error message before

Hi Renu,

Yes it is related to enabling stats. Went thru the cores with Selva – current thinking is that this is happening because of updating of the statement list maintained by mxosrvr by two different threads - we need to make that code thread safe. We didn’t face these cores on Lava though we did run with the modified code since Feb.

These cores did happen earlier too on Zircon (Feb 13 run).

Regards
Arvind

From: Varshneya, Renu
Sent: Tuesday, February 17, 2015 4:22 PM
To: Narain, Arvind; Hanlon, Mike; Hegde, Anuradha; Govindarajan, Selvaganes; Du, Justin
Cc: Sundaresan, Sandhya; Capirala, Tharaknath
Subject: RE: ZIRCON4: I've never seen this error message before

Thanks Arvind. Even though different issues, are they related to stats?
Mike mentioned in his earlier email that the queries did not fail. Could they have been AQRed?

From: Narain, Arvind
Sent: Tuesday, February 17, 2015 4:17 PM
To: Hanlon, Mike; Varshneya, Renu; Hegde, Anuradha; Govindarajan, Selvaganes; Du, Justin
Cc: Sundaresan, Sandhya; Capirala, Tharaknath
Subject: RE: ZIRCON4: I've never seen this error message before

25567 should be a separate bug – seems like the context got messed up. In the repository context I also see CREATE TABLE LIBRARIES.

Regarding 61369, the change that went into mxosrvr was to delete an existing statement and create a new one for each of the inserts/updates done to the repository table. Granted we should be checking if pSrvrStmt is NULL before doing ExecDirect on the same (to avoid the core), but I’m seeing a pattern of 8802 and 8804 errors for the repository context:

                        pSrvrStmt = getSrvrStmt("STMT_PUBLICATION", FALSE);
                        if (pSrvrStmt != NULL)
                        {
                                pSrvrStmt->cleanupAll();
                                pSrvrStmt->Close(SQL_DROP);
                                pSrvrStmt = NULL;
                        }

                        pSrvrStmt = getSrvrStmt("STMT_PUBLICATION", TRUE);

                        ss.str("");
                        ss.clear();

This change was tested in Feb with OE and YCSB runs.

The messages that I currently see in the logs are:

n020: /opt/hp/squser4/tr101rc2/logs/master_exec_9_5019.log:2015-02-17 18:36:23,050, ERROR, MXOSRVR, Node Number: 9, CPU: 9, PIN:5019, Process Name:$Z09043E , , ,A SQL/MX query failed. SQLCODE: SQL statement allocate failed, sqlcode = -8804, sql error = *** ERROR[8804] The provided input statement does not exist in the current context. [2015-02-17 18:36:23].

n014: /opt/hp/squser4/tr101rc2/logs/master_exec_3_61369.log:2015-02-17 18:36:22,796, ERROR, MXOSRVR, Node Number: 3, CPU: 3, PIN:61369, Process Name:$Z031F3E , , ,A SQL/MX query failed. SQLCODE: SQL statement allocate failed, sqlcode = -8804, sql error = *** ERROR[8804] The provided input statement does not exist in the...

Read more...

Changed in trafodion:
assignee: Mike Hanlon (mike-hanlon) → Arvind Narain (arvind-narain)
Changed in trafodion:
milestone: none → r1.2
assignee: Arvind Narain (arvind-narain) → Daniel Lu (ping-lu)
milestone: r1.2 → r1.1
assignee: Daniel Lu (ping-lu) → nobody
assignee: nobody → Arvind Narain (arvind-narain)
status: New → Fix Committed
milestone: r1.1 → r1.0.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.