MXOSRVR abort/core during ramp-up period
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Trafodion |
Fix Committed
|
High
|
Arvind Narain |
Bug Description
This problem occurs randomly, and only during rampup of a performance test with 1024 parallel streams being used. The mxosrvr process aborts, resulting in the client receiving the unexpected exception message "There was a problem reading from the server".
The latest two cores can be found on Zircon4, and have the following stack trace info.
echo /local/
echo
…
[Thread debugging using libthread_db enabled]
Core was generated by `mxosrvr -ZKHOST n014.cm.
Program terminated with signal 6, Aborted.
#0 0x00007ffff4a328a5 in raise () from /lib64/libc.so.6
#0 0x00007ffff4a328a5 in raise () from /lib64/libc.so.6
#1 0x00007ffff4a3400d in abort () from /lib64/libc.so.6
#2 0x00007ffff4a707b7 in __libc_message () from /lib64/libc.so.6
#3 0x00007ffff4a760e6 in malloc_printerr () from /lib64/libc.so.6
#4 0x00007ffff6a2ecc9 in SRVR_STMT_
#5 0x00000000004ccbb6 in SessionWatchDog (arg=<value optimized out>) at SrvrConnect.cpp:853
#6 0x00007ffff45b2851 in start_thread () from /lib64/
#7 0x00007ffff4ae890d in clone () from /lib64/libc.so.6
echo /local/
echo
…
[Thread debugging using libthread_db enabled]
Core was generated by `mxosrvr -ZKHOST n014.cm.
Program terminated with signal 6, Aborted.
#0 0x00007ffff4a328a5 in raise () from /lib64/libc.so.6
#0 0x00007ffff4a328a5 in raise () from /lib64/libc.so.6
#1 0x00007ffff4a3400d in abort () from /lib64/libc.so.6
#2 0x00007ffff5d51a55 in os::abort(bool) () from /usr/java/
#3 0x00007ffff5ed1f87 in VMError:
#4 0x00007ffff5d5696f in JVM_handle_
#5 <signal handler called>
#6 0x00007ffff4a7b6ec in free () from /lib64/libc.so.6
#7 0x00007ffff6a2ecad in SRVR_STMT_
#8 0x00000000004ccbb6 in SessionWatchDog (arg=<value optimized out>) at SrvrConnect.cpp:853
#9 0x00007ffff45b2851 in start_thread () from /lib64/
#10 0x00007ffff4ae890d in clone () from /lib64/libc.so.6
echo
Arvind Narain has provided addition research detail on the above cores:
In both cases pSrvrStmt is trashed.
As per the messages this is happening while setting CQD to disable transactions in repository context.
/opt/hp/
monitor.
monitor.
2015-02-25 03:10:06,242, ERROR, SQL, Node Number: 0, CPU: 7, PIN: 28943, Process Name: $Z070NLY, SQLCODE: 8804,, *** ERROR[8804] The provided input statement does not exist in the current context.
2015-02-25 03:10:06,242, ERROR, SQL, Node Number: 0, CPU: 7, PIN: 28943, Process Name: $Z070NLY, SQLCODE: 8804,, *** ERROR[8804] The provided input statement does not exist in the current context.
2015-02-25 03:10:06,242, ERROR, MXOSRVR, Node Number: 7, CPU: 7, PIN:28943, Process Name:$Z070NLY , , ,A NonStop Process Service error Failed to skip transaction - *** ERROR[8804] The provided input statement does not exist in the current context. [2015-02-25 03:10:06] has occurred.
/opt/hp/
monitor.
monitor.
2015-02-25 03:22:16,023, ERROR, SQL, Node Number: 0, CPU: 7, PIN: 50441, Process Name: $Z071666, SQLCODE: 8804,, *** ERROR[8804] The provided input statement does not exist in the current context.
2015-02-25 03:22:16,023, ERROR, SQL, Node Number: 0, CPU: 7, PIN: 50441, Process Name: $Z071666, SQLCODE: 8804,, *** ERROR[8804] The provided input statement does not exist in the current context.
2015-02-25 03:22:16,023, ERROR, MXOSRVR, Node Number: 7, CPU: 7, PIN:50441, Process Name:$Z071666 , , ,A NonStop Process Service error Failed to skip transaction - *** ERROR[8804] The provided input statement does not exist in the current context. [2015-02-25 03:22:16] has occurred.
Changed in trafodion: | |
importance: | Undecided → High |
Changed in trafodion: | |
assignee: | nobody → Arvind Narain (arvind-narain) |
milestone: | none → r1.0.1 |
milestone: | r1.0.1 → r1.1 |
in the UTT given earlier to address second core seen in bug Bug 1422894 mutex was put in while managing the stmt list maintained by mxosrvr. mutex was being done while adding or deleting from the list.
Now mutex is also done in getSrvrStmt and equivalent where currentStatement gets setup.