mxosrvr core at llvm::MachineFunction::DeleteMachineInstr

Bug #1412911 reported by Aruna Sadashiva
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Trafodion
Incomplete
High
James Capps

Bug Description

Found a mxosrvr core with the stack trace below during mgblty/dcs testing. Jim Capps analysis added as comment.

There is a mxosrvr core on n015 with the below stack, while running jdbc t4 tests at the time and using dbvisualizer to look at repos data.

Core file on n015 on amethyst : 2015-01-19 08:42:18 /home/trafodion/aruna/core.1421656937.n015.9903.mxosrvr

Core was generated by `mxosrvr -ZKHOST n013:2181,n014:2181,n015:2181 -RZ g4q0015.houston.hp.com:3:29 -'.
Program terminated with signal 6, Aborted.
#0 0x00007ffff4a458a5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install boost-filesystem-1.41.0-11.el6_1.2.x86_64 boost-program-options-1.41.0-11.el6_1.2.x86_64 boost-system-1.41.0-11.el6_1.2.x86_64 cyrus-sasl-lib-2.1.23-13.el6.x86_64 glibc-2.12-1.107.el6.x86_64 hadoop-2.3.0+cdh5.1.3+824-1.cdh5.1.3.p0.13.el6.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.9-33.el6.x86_64 libcom_err-1.41.12-12.el6.x86_64 libgcc-4.4.6-4.el6.x86_64 libselinux-2.0.94-5.3.el6.x86_64 libstdc++-4.4.6-4.el6.x86_64 libuuid-2.17.2-12.7.el6.x86_64 nspr-4.9.2-1.el6.x86_64 nss-3.14.0.0-12.el6.x86_64 nss-softokn-freebl-3.12.9-11.el6.x86_64 nss-util-3.14.0.0-2.el6.x86_64 openldap-2.4.23-26.el6.x86_64 openssl-1.0.0-20.el6_2.5.x86_64 qpid-cpp-client-0.14-22.el6_3.x86_64 zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0 0x00007ffff4a458a5 in raise () from /lib64/libc.so.6
#1 0x00007ffff4a4700d in abort () from /lib64/libc.so.6
#2 0x00007ffff4a837b7 in __libc_message () from /lib64/libc.so.6
#3 0x00007ffff4a890e6 in malloc_printerr () from /lib64/libc.so.6
#4 0x00007ffff4a8bc13 in _int_free () from /lib64/libc.so.6
#5 0x00007ffff1ad244c in llvm::MachineFunction::DeleteMachineInstr(llvm::MachineInstr*) () from /opt/home/trafodion/traf_jan17/export/lib64/libtdm_sqlexp.so
#6 0x00007ffff1ac5bc5 in llvm::MachineBasicBlock::~MachineBasicBlock() ()
   from /opt/home/trafodion/traf_jan17/export/lib64/libtdm_sqlexp.so
#7 0x00007ffff1ad23c2 in llvm::MachineFunction::DeleteMachineBasicBlock(llvm::MachineBasicBlock*) ()
   from /opt/home/trafodion/traf_jan17/export/lib64/libtdm_sqlexp.so
#8 0x00007ffff1ad2b55 in llvm::MachineFunction::~MachineFunction() ()
   from /opt/home/trafodion/traf_jan17/export/lib64/libtdm_sqlexp.so
#9 0x00007ffff1ad79e2 in llvm::MachineFunctionAnalysis::releaseMemory() ()
   from /opt/home/trafodion/traf_jan17/export/lib64/libtdm_sqlexp.so
#10 0x00007ffff1f09d90 in llvm::PMDataManager::freePass(llvm::Pass*, llvm::StringRef, llvm::PassDebuggingString) ()
   from /opt/home/trafodion/traf_jan17/export/lib64/libtdm_sqlexp.so
#11 0x00007ffff1f09eec in llvm::PMDataManager::removeDeadPasses(llvm::Pass*, llvm::StringRef, llvm::PassDebuggingString) ()
   from /opt/home/trafodion/traf_jan17/export/lib64/libtdm_sqlexp.so
#12 0x00007ffff1f0f5bc in llvm::FPPassManager::runOnFunction(llvm::Function&)
---Type <return> to continue, or q <return> to quit---
    () from /opt/home/trafodion/traf_jan17/export/lib64/libtdm_sqlexp.so
#13 0x00007ffff1f0f84b in llvm::FunctionPassManagerImpl::run(llvm::Function&)
    () from /opt/home/trafodion/traf_jan17/export/lib64/libtdm_sqlexp.so
#14 0x00007ffff1f0fb5c in llvm::FunctionPassManager::run(llvm::Function&) ()
   from /opt/home/trafodion/traf_jan17/export/lib64/libtdm_sqlexp.so
#15 0x00007ffff1a4e594 in llvm::JIT::jitTheFunction(llvm::Function*, llvm::MutexGuard const&) ()
   from /opt/home/trafodion/traf_jan17/export/lib64/libtdm_sqlexp.so
#16 0x00007ffff1a4e8ac in llvm::JIT::runJITOnFunctionUnlocked(llvm::Function*, llvm::MutexGuard const&) ()
   from /opt/home/trafodion/traf_jan17/export/lib64/libtdm_sqlexp.so
#17 0x00007ffff1a4ec7a in llvm::JIT::runJITOnFunction(llvm::Function*, llvm::MachineCodeInfo*) ()
   from /opt/home/trafodion/traf_jan17/export/lib64/libtdm_sqlexp.so
#18 0x00007ffff17628f0 in PCodeCfg::layoutNativeCode (this=0x7fffcc4ff848)
    at ../exp/ExpPCodeOptsNativeExpr.cpp:9122
#19 0x00007ffff172505f in PCodeCfg::optimize (this=0x7fffcc4ff848)
    at ../exp/ExpPCodeOptimizations.cpp:2275
#20 0x00007ffff1704ce5 in ex_expr::pCodeGenerate (this=<value optimized out>,
    space=0x7fffcc4ff428, heap=0x7fffcfc4a448, f=0)
    at ../exp/ExpPCodeExpGen.cpp:975
#21 0x00007fffed7a0df4 in ExpGenerator::endExprGen (this=0x7fffcc4da758,
    expr=0x7fffd14b8828, gen_last_clause=0)
---Type <return> to continue, or q <return> to quit---
    at ../generator/GenExpGenerator.cpp:4629
#22 0x00007fffed7a3343 in ExpGenerator::generateListExpr (this=0x7fffcc4da758,
    val_id_list=..., node_type=<value optimized out>, expr=0x7fffd14b8828,
    atp=<value optimized out>, atpIndex=<value optimized out>, hdrInfo=0x0)
    at ../generator/GenExpGenerator.cpp:3699
#23 0x00007fffed7acc4d in ExpGenerator::generateContiguousMoveExpr (
    this=0x7fffcc4da758, valIdList=..., addConvNodes=0, atp=1, atpIndex=2,
    tdataF=ExpTupleDesc::SQLARK_EXPLODED_FORMAT, tupleLength=@0x7fffd14b886c,
    moveExpr=0x7fffd14b8828, tupleDesc=0x7fffd14b8810,
    tdescF=ExpTupleDesc::LONG_FORMAT, newMapTable=0x0, tgtValues=0x0,
    startOffset=0, bulkMoveSrcStartOffset=0x0, disableConstFolding=0, colArray=
    0x0, doBulkMoves=0) at ../generator/GenExpGenerator.cpp:2386
#24 0x00007fffed88446f in HbaseAccess::codeGen (this=0x7fffd14b80d0, generator=
    0x7fffd14ba3d0) at ../generator/GenRelScan.cpp:2155
#25 0x00007fffed81918b in ProbeCache::codeGen (this=0x7fffcc4e29e0,
    generator=0x7fffd14ba3d0) at ../generator/GenProbeCache.cpp:118
#26 0x00007fffed8511b0 in NestedJoin::codeGen (this=0x7fffcc780f48,
    generator=0x7fffd14ba3d0) at ../generator/GenRelJoin.cpp:3369
#27 0x00007fffed866a4b in Sort::codeGen (this=0x7fffcc4e72c0,
    generator=0x7fffd14ba3d0) at ../generator/GenRelMisc.cpp:3318
#28 0x00007fffed85101a in NestedJoin::codeGen (this=0x7fffcc4e4930,
    generator=0x7fffd14ba3d0) at ../generator/GenRelJoin.cpp:3283
#29 0x00007fffed8686c5 in RelRoot::codeGen (this=0x7fffcc6bf060,
---Type <return> to continue, or q <return> to quit---
    generator=0x7fffd14ba3d0) at ../generator/GenRelMisc.cpp:1202
#0 0x00007ffff4a458a5 in raise () from /lib64/libc.so.6
#1 0x00007ffff4a4700d in abort () from /lib64/libc.so.6
#2 0x00007ffff4a837b7 in __libc_message () from /lib64/libc.so.6
#3 0x00007ffff4a890e6 in malloc_printerr () from /lib64/libc.so.6
#4 0x00007ffff4a8bc13 in _int_free () from /lib64/libc.so.6
#5 0x00007ffff1ad244c in llvm::MachineFunction::DeleteMachineInstr(llvm::MachineInstr*) () from /opt/home/trafodion/traf_jan17/export/lib64/libtdm_sqlexp.so
#31 0x00007fffef7bdc20 in CmpMain::compile (this=0x7fffd14bdb60,
    input_str=0x7fffd09166f8 "select trim(O.catalog_name || '.' || '\"' || O.schema_name || '\"' || '.' || '\"' || O.object_name || '\"' ) constr_name, trim(O2.catalog_name || '.' || '\"' || O2.schema_name || '\"' || '.' || '\"' || O2.ob"..., charset=15, queryExpr=@0x7fffd14bda98, gen_code=0x7fffd0905158,
    gen_code_len=0x7fffd0905150, heap=0x7fffd0a72178, phase=CmpMain::END,
    fragmentDir=0x7fffd14bdcb8, op=3004, useQueryCache=1,
    cacheable=0x7fffd14bdaa8, begTime=0x7fffd14bda80, shouldLog=0)
    at ../sqlcomp/CmpMain.cpp:2408
#32 0x00007fffef7bf66c in CmpMain::sqlcomp (this=0x7fffd14bdb60,
    input_str=0x7fffd09166f8 "select trim(O.catalog_name || '.' || '\"' || O.schema_name || '\"' || '.' || '\"' || O.object_name || '\"' ) constr_name, trim(O2.catalog_name || '.' || '\"' || O2.schema_name || '\"' || '.' || '\"' || O2.ob"..., charset=15, queryExpr=@0x7fffd14bda98, gen_code=0x7fffd0905158,
    gen_code_len=0x7fffd0905150, heap=0x7fffd0a72178, phase=CmpMain::END,
    fragmentDir=0x7fffd14bdcb8, op=3004, useQueryCache=1,
    cacheable=0x7fffd14bdaa8, begTime=0x7fffd14bda80, shouldLog=0)
---Type <return> to continue, or q <return> to quit---
    at ../sqlcomp/CmpMain.cpp:1732
#33 0x00007fffef7c0970 in CmpMain::sqlcomp (this=0x7fffd14bdb60, input=...,
    gen_code=0x7fffd0905158, gen_code_len=0x7fffd0905150, heap=0x7fffd0a72178,
    phase=CmpMain::END, fragmentDir=0x7fffd14bdcb8, op=3004)
    at ../sqlcomp/CmpMain.cpp:817
#34 0x00007fffed50f198 in CmpStatement::process (this=0x7fffd08fc930,
    sqltext=<value optimized out>) at ../arkcmp/CmpStatement.cpp:508
#35 0x00007fffed502623 in CmpContext::compileDirect (this=0x7fffd0555090,
    data=0x7fffd0392e08 "\200", data_len=696, outHeap=0x7fffe378e1b8,
    charset=15, op=CmpMessageObj::SQLTEXT_COMPILE, gen_code=@0x7fffd14be240,
    gen_code_len=@0x7fffd14be248, parserFlags=163840, diagsArea=0x7fffd03930c8)
    at ../arkcmp/CmpContext.cpp:686
#36 0x00007ffff437acf7 in CliStatement::prepare2 (this=0x7fffd01cf890,
    source=0x7fffd03683d8 "select trim(O.catalog_name || '.' || '\"' || O.schema_name || '\"' || '.' || '\"' || O.object_name || '\"' ) constr_name, trim(O2.catalog_name || '.' || '\"' || O2.schema_name || '\"' || '.' || '\"' || O2.ob"..., diagsArea=..., passed_gen_code=<value optimized out>,
    passed_gen_code_len=3816350136, charset=15, unpackTdbs=1, cliFlags=144)
    at ../cli/Statement.cpp:1827
#37 0x00007ffff437b146 in CliStatement::prepare (this=0x7fffd01cf890,
    source=0x7fffd03683d8 "select trim(O.catalog_name || '.' || '\"' || O.schema_name || '\"' || '.' || '\"' || O.object_name || '\"' ) constr_name, trim(O2.catalog_name || '.' || '\"' || O2.schema_name || '\"' || '.' || '\"' || O2.ob"..., ---Type <return> to continue, or q <return> to quit---
diagsArea=..., passed_gen_code=<value optimized out>,
    passed_gen_code_len=<value optimized out>, charset=<value optimized out>,
    unpackTdbs=1, cliFlags=144) at ../cli/Statement.cpp:1420
#38 0x00007ffff432a574 in SQLCLI_Prepare2 (cliGlobals=0xeead20,
    statement_id=0x681c740, sql_source=0x67e1fa0, gencode_ptr=0x0,
    gencode_len=0, ret_gencode_len=0x0, query_cost_info=0x7fffd14bf610,
    query_comp_stats_info=0x7fffd14be510, uniqueStmtId=<value optimized out>,
    uniqueStmtIdLen=0x0, flags=0) at ../cli/Cli.cpp:5914
#39 0x00007ffff4388b00 in SQL_EXEC_Prepare2 (statement_id=0x681c740,
    sql_source=0x67e1fa0, gencode_ptr=0x0, gencode_len=0, ret_gencode_len=0x0,
    query_cost_info=0x7fffd14bf610, comp_stats_info=0x7fffd14be510,
    uniqueStmtId=0x0, uniqueStmtIdLen=0x0, flags=0)
    at ../cli/CliExtern.cpp:4985
#40 0x00007ffff2de169b in ExeCliInterface::prepare (this=0x7fffd14c1000,
    stmtStr=<value optimized out>, module=<value optimized out>,
    stmt=0x681c740, sql_src=0x67e1fa0, input_desc=0x68a58c0,
    output_desc=0x681c6e0, outputBuf=0x7fffd14c1038, outputVarPtrList=0x0,
    inputBuf=0x7fffd14c1070, inputVarPtrList=0x0, uniqueStmtId=0x0,
    uniqueStmtIdLen=0x0, query_cost_info=0x0, comp_stats_info=0x0,
    monitorThis=0) at ../executor/ExExeUtilCli.cpp:321
#41 0x00007ffff2de309a in ExeCliInterface::fetchRowsPrologue (
    this=0x7fffd14c1000,
    sqlStrBuf=0x7fffd14bf9a0 "select trim(O.catalog_name || '.' || '\"' || O.sch---Type <return> to continue, or q <return> to quit---
ema_name || '\"' || '.' || '\"' || O.object_name || '\"' ) constr_name, trim(O2.catalog_name || '.' || '\"' || O2.schema_name || '\"' || '.' || '\"' || O2.ob"..., noExec=0, monitorThis=0, stmtName=0x0) at ../executor/ExExeUtilCli.cpp:1031
#42 0x00007ffff2de6db6 in ExeCliInterface::fetchAllRows (this=0x7fffd14c1000,
    infoList=@0x7fffd14c1438,
    query=0x7fffd14bf9a0 "select trim(O.catalog_name || '.' || '\"' || O.schema_name || '\"' || '.' || '\"' || O.object_name || '\"' ) constr_name, trim(O2.catalog_name || '.' || '\"' || O2.schema_name || '\"' || '.' || '\"' || O2.ob"...,
    inNumOutputEntries=0, varcharFormat=0, monitorThis=<value optimized out>,
    initInfoList=1) at ../executor/ExExeUtilCli.cpp:1114
#43 0x00007fffef849c93 in CmpSeabaseDDL::getSeabaseUserTableDesc (
    this=0x7fffd14c1640, catName=..., schName=..., objName=...,
    objType=<value optimized out>, includeInvalidDefs=108950128)
    at ../sqlcomp/CmpSeabaseDDLtable.cpp:7187
#44 0x00007fffef84c074 in CmpSeabaseDDL::getSeabaseTableDesc (
    this=0x7fffd14c1640, catName=..., schName=..., objName=...,
    objType=COM_BASE_TABLE_OBJECT, includeInvalidDefs=0)
    at ../sqlcomp/CmpSeabaseDDLtable.cpp:7491
#45 0x00007fffee168a84 in NATableDB::get (this=0x7fffd0961928, corrName=...,

    bindWA=0x7fffd14c42a0, inTableDescStruct=0x0)
    at ../optimizer/NATable.cpp:8006
#46 0x00007fffedec6cca in BindWA::getNATable (this=0x7fffd14c42a0,
    corrName=..., catmanCollectTableUsages=1, inTableDescStruct=0x0)
---Type <return> to continue, or q <return> to quit---
    at ../optimizer/BindRelExpr.cpp:1445
#47 0x00007fffeded6e33 in GenericUpdate::bindNode (this=0x7fffd087b978,
    bindWA=0x7fffd14c42a0) at ../optimizer/BindRelExpr.cpp:11539
#48 0x00007fffededbc6b in Update::bindNode (this=0x7fffd087b978,
    bindWA=0x7fffd14c42a0) at ../optimizer/BindRelExpr.cpp:10258
#49 0x00007fffedeb71c7 in RelExpr::bindChildren (this=0x7fffd087c428,
    bindWA=0x7fffd14c42a0) at ../optimizer/BindRelExpr.cpp:2180
#50 0x00007fffedef2b8e in RelRoot::bindNode (this=0x7fffd087c428,
    bindWA=0x7fffd14c42a0) at ../optimizer/BindRelExpr.cpp:5225
#51 0x00007fffef7bc85e in CmpMain::compile (this=0x7fffd14c6d50,
    input_str=0x7fffd0899190 "update Trafodion.\"_REPOS_\".metric_query_aggr_table set AGGREGATION_END_UTC_TS = CONVERTTIMESTAMP(212288416937177290),TOTAL_EST_ROWS_ACCESSED = 0,TOTAL_EST_ROWS_USED = 0,TOTAL_ROWS_RETRIEVED = 2322,TOT"...,
    charset=15, queryExpr=@0x7fffd14c6c88, gen_code=0x7fffd0884030,
    gen_code_len=0x7fffd0884028, heap=0x7fffd0a71c08, phase=CmpMain::END,
    fragmentDir=0x7fffd14c6ea8, op=3004, useQueryCache=1,
    cacheable=0x7fffd14c6c98, begTime=0x7fffd14c6c70, shouldLog=0)
    at ../sqlcomp/CmpMain.cpp:2119
#52 0x00007fffef7bf66c in CmpMain::sqlcomp (this=0x7fffd14c6d50,
    input_str=0x7fffd0899190 "update Trafodion.\"_REPOS_\".metric_query_aggr_table set AGGREGATION_END_UTC_TS = CONVERTTIMESTAMP(212288416937177290),TOTAL_EST_ROWS_ACCESSED = 0,TOTAL_EST_ROWS_USED = 0,TOTAL_ROWS_RETRIEVED = 2322,TOT"...,
    charset=15, queryExpr=@0x7fffd14c6c88, gen_code=0x7fffd0884030,
---Type <return> to continue, or q <return> to quit---
    gen_code_len=0x7fffd0884028, heap=0x7fffd0a71c08, phase=CmpMain::END,
    fragmentDir=0x7fffd14c6ea8, op=3004, useQueryCache=1,
    cacheable=0x7fffd14c6c98, begTime=0x7fffd14c6c70, shouldLog=0)
    at ../sqlcomp/CmpMain.cpp:1732
#53 0x00007fffef7c0970 in CmpMain::sqlcomp (this=0x7fffd14c6d50, input=...,
    gen_code=0x7fffd0884030, gen_code_len=0x7fffd0884028, heap=0x7fffd0a71c08,
    phase=CmpMain::END, fragmentDir=0x7fffd14c6ea8, op=3004)
    at ../sqlcomp/CmpMain.cpp:817
#54 0x00007fffed50f198 in CmpStatement::process (this=0x7fffd087b808,
    sqltext=<value optimized out>) at ../arkcmp/CmpStatement.cpp:508
#55 0x00007fffed502623 in CmpContext::compileDirect (this=0x7fffd095c090,
    data=0x7fffd02b9a18 "\200", data_len=728, outHeap=0x7fffe378e1b8,
    charset=15, op=CmpMessageObj::SQLTEXT_COMPILE, gen_code=@0x7fffd14c7430,
    gen_code_len=@0x7fffd14c7438, parserFlags=131072, diagsArea=0x7fffd02b9cf8)
    at ../arkcmp/CmpContext.cpp:686
#56 0x00007ffff437acf7 in CliStatement::prepare2 (this=0x7fffd0a65bd0,
    source=0x7fffd035fa70 "update Trafodion.\"_REPOS_\".metric_query_aggr_table set AGGREGATION_END_UTC_TS = CONVERTTIMESTAMP(212288416937177290),TOTAL_EST_ROWS_ACCESSED = 0,TOTAL_EST_ROWS_USED = 0,TOTAL_ROWS_RETRIEVED = 2322,TOT"...,
    diagsArea=..., passed_gen_code=<value optimized out>,
    passed_gen_code_len=3816350136, charset=15, unpackTdbs=1, cliFlags=144)
    at ../cli/Statement.cpp:1827
#57 0x00007ffff437b146 in CliStatement::prepare (this=0x7fffd0a65bd0,
---Type <return> to continue, or q <return> to quit---
    source=0x7fffd035fa70 "update Trafodion.\"_REPOS_\".metric_query_aggr_table set AGGREGATION_END_UTC_TS = CONVERTTIMESTAMP(212288416937177290),TOTAL_EST_ROWS_ACCESSED = 0,TOTAL_EST_ROWS_USED = 0,TOTAL_ROWS_RETRIEVED = 2322,TOT"...,
    diagsArea=..., passed_gen_code=<value optimized out>,
    passed_gen_code_len=<value optimized out>, charset=<value optimized out>,
    unpackTdbs=1, cliFlags=144) at ../cli/Statement.cpp:1420
#58 0x00007ffff432eb8c in SQLCLI_ExecDirect2(CliGlobals *, SQLSTMT_ID *, SQLDESC_ID *, Int32, SQLDESC_ID *, Lng32, typedef __va_list_tag __va_list_tag *, SQLCLI_PTR_PAIRS *) (cliGlobals=0xeead20, statement_id=0x1ef2118,
    sql_source=0x7fffd14c77b0, prepFlags=0, input_descriptor=0x0,
    num_ptr_pairs=0, ap=0x7fffd14c75c0, ptr_pairs=0x0) at ../cli/Cli.cpp:3705
#59 0x00007ffff438abba in SQL_EXEC_ExecDirect2 (statement_id=0x1ef2118,
    sql_source=0x7fffd14c77b0, prep_flags=0, input_descriptor=0x0,
    num_ptr_pairs=0) at ../cli/CliExtern.cpp:2326
#60 0x00007ffff6871db7 in SRVR::WSQL_EXEC_ExecDirect (statement_id=0x1ef2118,
    sql_source=0x7fffd14c77b0, input_descriptor=0x0, num_ptr_pairs=0)
    at SQLWrapper.cpp:360
#61 0x00007ffff6867860 in SRVR::EXECDIRECT (pSrvrStmt=0x1ef1b00)
    at sqlinterface.cpp:4479
#62 0x00007ffff682b205 in SRVR::ControlProc (pParam=0x1ef1b00)
    at csrvrstmt.cpp:757
#63 0x00007ffff682bfc8 in SRVR_STMT_HDL::ExecDirect (this=0x1ef1b00,
    inCursorName=0x0,
---Type <return> to continue, or q <return> to quit---
    inSqlString=0x657acd8 "update Trafodion.\"_REPOS_\".metric_query_aggr_table set AGGREGATION_END_UTC_TS = CONVERTTIMESTAMP(212288416937177290),TOTAL_EST_ROWS_ACCESSED = 0,TOTAL_EST_ROWS_USED = 0,TOTAL_ROWS_RETRIEVED = 2322,TOT"...,
    inStmtType=<value optimized out>, inSqlStmtType=<value optimized out>,
    inSqlAsyncEnable=<value optimized out>, inQueryTimeout=0)
    at csrvrstmt.cpp:439
#64 0x00000000004cc939 in SessionWatchDog (arg=<value optimized out>)
    at SrvrConnect.cpp:795
#65 0x00007ffff45c5851 in start_thread () from /lib64/libpthread.so.0
#66 0x00007ffff4afb90d in clone () from /lib64/libc.so.6
(gdb)

Tags: sql-cmp
Revision history for this message
Aruna Sadashiva (aruna-sadashiva) wrote :
Download full text (3.1 KiB)

Jim's analysis:

What I found appears to be a case of memory corruption … though I cannot really tell exactly what piece of memory was corrupted nor what piece of code might have done the damage. The general nature of the corruption is that there is something wrong with the data structures used by malloc/free to keep track of what chunks of memory are free.

The stack backtrace starts with:
#0 0x00007ffff4a458a5 in raise () from /lib64/libc.so.6
#1 0x00007ffff4a4700d in abort () from /lib64/libc.so.6
#2 0x00007ffff4a837b7 in __libc_message () from /lib64/libc.so.6
#3 0x00007ffff4a890e6 in malloc_printerr () from /lib64/libc.so.6
#4 0x00007ffff4a8bc13 in _int_free () from /lib64/libc.so.6
#5 0x00007ffff1ad244c in llvm::MachineFunction::DeleteMachineInstr(llvm::MachineInstr*) () from /opt/home/trafodion/traf_jan17/export/lib64/libtdm_sqlexp.so

From looking at the CPU registers, I believe I found the error message that was being printed by __int_free(). It reads:
             free(): invalid next size (normal)

I wish we had the source for libc so I could determine what all that might mean in this particular case. When I googled that error message, all I found was examples of buggy code (usually by programmers new to C/C++) which resulted in that error message or similar ones.

Now, when I look further back in the stack backtrace, I find the query that is being compiled:
#31 0x00007fffef7bdc20 in CmpMain::compile (this=0x7fffd14bdb60,
    input_str=0x7fffd09166f8 "select trim(O.catalog_name || '.' || '\"' || O.schema_name || '\"' || '.' || '\"' || O.object_name || '\"' ) constr_name, trim(O2.catalog_name || '.' || '\"' || O2.schema_name || '\"' || '.' || '\"' || O2.ob"..., charset=15, queryExpr=@0x7fffd14bda98, . . . )
    at ../sqlcomp/CmpMain.cpp:2408

That particular select query is one of the standard metadata queries that the Compiler does every time a user query references a table. It is one of the queries that the Compiler uses to determine various attributes of the user table.

That means, that the Compiler is working on a query which we have compiled thousands (or millions) of times in all the testing that we have done over the past few months. It is not working on anything strange or unusual.

I looked at the stack backtrace for the other 36 threads in this mxosrvr process, but they all appear to be doing perfectly normal things … mostly in pthread_cond_wait() or otherwise asleep waiting for something. No smoking guns.

Finally, when I look at the stack backtrace, the first 18 frames are in libc.so and libtdm_sqlexp.so which are both linked without debug symbols and none of those 18 stack frames even show a line number. [Welcome to wonderful world of Release-Mode builds.] When I get back to the next frame, I find we are in PCodeCfg::layoutNativeCode where it calls llvm::JIT::runJITOnFunction() which is the normal entry point into LLVM code that is called by “Native Expressions.” So, nothing out of the ordinary here either.

Unless we can find a way to reliably reproduce this problem OR at least be able to reproduce it with a Debug Mode build, I don’t know how to make any additional progress on this ...

Read more...

Changed in trafodion:
assignee: Jim Capp (jcapp) → James Capps (james-capps)
Changed in trafodion:
status: New → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.