core file from shell during shutdown

Bug #1366227 reported by Chris Sheedy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Trafodion
New
Low
Gonzalo Correa

Bug Description

Change 332 had a failure in core-regress-seabase-cdh4.4 when doing shutdown.

http://logs.trafodion.org/32/332/2/check/core-regress-seabase-cdh4.4/6a54826/console.html has:

Shutting down (normal) the SQ environment!
Fri Sep 5 18:52:26 UTC 2014
Processing cluster.conf on local host slave01
[$Z000BBN] Shell/shell Version 1.0.1 Release 0.8.4 (Build release [0.8.3rc1-203-ga165839master_Bld177], date 20140905_175103)
ps
[$Z000BBN] %ps
[$Z000BBN] NID,PID(os) PRI TYPE STATES NAME PARENT PROGRAM
[$Z000BBN] ------------ --- ---- ------- ----------- ----------- ---------------
[$Z000BBN] 000,00031016 000 WDG ES--A-- $WDT000 NONE sqwatchdog
[$Z000BBN] 000,00031017 000 PSD ES--A-- $PSD000 NONE pstartd
[$Z000BBN] 000,00031061 001 DTM ES--A-- $TM0 NONE tm
[$Z000BBN] 000,00013883 001 GEN ES--A-- $Z000BBN NONE shell
[$Z000BBN] 001,00031018 000 PSD ES--A-- $PSD001 NONE pstartd
[$Z000BBN] 001,00031015 000 WDG ES--A-- $WDT001 NONE sqwatchdog
[$Z000BBN] 001,00031139 001 DTM ES--A-- $TM1 NONE tm
shutdown
[$Z000BBN] %shutdown
/home/jenkins/workspace/core-regress-seabase-cdh4.4/trafodion/core/sqf/sql/scripts/sqshell: line 7: 13883 Aborted (core dumped) shell $1 $2 $3 $4 $5 $6 $7 $8 $9
Issued a 'shutdown normal' request

Shutdown in progress

# of SQ processes: 0
SQ Shutdown (normal) from /home/jenkins/workspace/core-regress-seabase-cdh4.4/trafodion/core/sql/regress Successful
Fri Sep 5 18:52:34 UTC 2014
+ ret=0
+ [[ 0 == 124 ]]
+ echo 'Return code 0'
Return code 0
+ sudo /usr/local/bin/hbase-sudo.sh stop
Stopping hbase-master
Stopping HBase master daemon (hbase-master):[ OK ]
stopping master.....
Return code 0
+ echo 'Return code 0'
Return code 0
+ cd ../../sqf/rundir
+ set +x

========= seabase
09/05/14 18:21:07 (RELEASE build)
09/05/14 18:23:51 TEST010 ### PASS ###
09/05/14 18:25:38 TEST011 ### PASS ###
09/05/14 18:27:46 TEST012 ### PASS ###
09/05/14 18:29:36 TEST013 ### PASS ###
09/05/14 18:29:50 TEST014 ### PASS ###
09/05/14 18:32:05 TEST016 ### PASS ###
09/05/14 18:32:35 TEST018 ### PASS ###
09/05/14 18:50:28 TEST020 ### PASS ###
09/05/14 18:50:44 TEST022 ### PASS ###
09/05/14 18:52:26 TEST024 ### PASS ###
09/05/14 18:21:07 - 18:52:26 (RELEASE build)

WARNING: Core files found in /home/jenkins/workspace/core-regress-seabase-cdh4.4/trafodion/core :
-rw-------. 1 jenkins jenkins 44552192 Sep 5 18:52 sql/regress/core.slave01.13883.shell

========================
Total Passed: 10
Total Failures: 0

Failure : Found 1 core files
Build step 'Execute shell' marked build as failure

The core file's back trace is:

-bash-4.1$ core_bt -d sql/regress
core file : -rw-------. 1 jenkins jenkins 44552192 Sep 5 18:52 sql/regress/core.slave01.13883.shell
gdb command: gdb shell sql/regress/core.slave01.13883.shell --batch -n -x /tmp/tmp.xEFWF2xufh 2>&1
Missing separate debuginfo for
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/1e/0a7d58f454926e2afb4797865d85801ed65ece
[New Thread 13884]
[New Thread 13883]
[Thread debugging using libthread_db enabled]
Core was generated by `shell -a'.
Program terminated with signal 6, Aborted.
#0 0x00000030ada32635 in raise () from /lib64/libc.so.6
#0 0x00000030ada32635 in raise () from /lib64/libc.so.6
#1 0x00000030ada33e15 in abort () from /lib64/libc.so.6
#2 0x0000000000411982 in LIOTM_assert_fun (pp_exp=0x4d4f40 "0", pp_file=0x4d175e "clio.cxx", pv_line=1022, pp_fun=0x4d2d60 "int Local_IO_To_Monitor::process_notice(message_def*)") at clio.cxx:99
#3 0x0000000000413b26 in Local_IO_To_Monitor::process_notice (this=0x7c6e80, pp_msg=<value optimized out>) at clio.cxx:1022
#4 0x0000000000413e03 in Local_IO_To_Monitor::get_io (this=0x7c6e80, pv_sig=<value optimized out>, pp_siginfo=<value optimized out>) at clio.cxx:637
#5 0x0000000000414075 in local_monitor_reader (pp_arg=0x7916) at clio.cxx:154
#6 0x00000030ae2079d1 in start_thread () from /lib64/libpthread.so.0
#7 0x00000030adae886d in clone () from /lib64/libc.so.6

Tags: foundation
Revision history for this message
Chris Sheedy (chris-sheedy) wrote :
Download full text (4.9 KiB)

Change 371 encountered a similar stack trace.

http://logs.trafodion.org/71/371/1/gate/core-regress-seabase-cdh4.4/f25a75e/console.html has

[$Z000FM5] %shutdown
[$Z000FM5] Shutdown notice, level=0 received
/home/jenkins/workspace/core-regress-seabase-cdh4.4/trafodion/core/sqf/sql/scripts/sqshell: line 7: 19150 Aborted (core dumped) shell $1 $2 $3 $4 $5 $6 $7 $8 $9
Issued a 'shutdown normal' request

Shutdown in progress

# of SQ processes: 0
SQ Shutdown (normal) from /home/jenkins/workspace/core-regress-seabase-cdh4.4/trafodion/core/sql/regress Successful
Tue Sep 9 19:08:39 UTC 2014
+ ret=0
+ [[ 0 == 124 ]]
+ echo 'Return code 0'
Return code 0
+ sudo /usr/local/bin/hbase-sudo.sh stop
Stopping hbase-master
Stopping HBase master daemon (hbase-master):[ OK ]
stopping master........
Return code 0
+ echo 'Return code 0'
Return code 0
+ cd ../../sqf/rundir
+ set +x

========= seabase
09/09/14 18:36:24 (RELEASE build)
09/09/14 18:39:10 TEST010 ### PASS ###
09/09/14 18:40:49 TEST011 ### PASS ###
09/09/14 18:43:03 TEST012 ### PASS ###
09/09/14 18:44:57 TEST013 ### PASS ###
09/09/14 18:45:11 TEST014 ### PASS ###
09/09/14 18:47:31 TEST016 ### PASS ###
09/09/14 18:48:00 TEST018 ### PASS ###
09/09/14 19:06:28 TEST020 ### PASS ###
09/09/14 19:06:45 TEST022 ### PASS ###
09/09/14 19:08:32 TEST024 ### PASS ###
09/09/14 18:36:24 - 19:08:32 (RELEASE build)

WARNING: Core files found in /home/jenkins/workspace/core-regress-seabase-cdh4.4/trafodion/core :
-rw-------. 1 jenkins jenkins 44552192 Sep 9 19:08 sql/regress/core.slave08.19150.shell

========================
Total Passed: 10
Total Failures: 0

Failure : Found 1 core files

On slave08 the stack trace is:

-bash-4.1$ cd /home/jenkins/workspace/core-regress-seabase-cdh4.4/trafodion/core/sqf
-bash-4.1$ . ./sqenvr.sh
-bash-4.1$ cd ..
-bash-4.1$ core_bt
core file : -rw-------. 1 jenkins jenkins 44552192 Sep 9 19:08 ./sql/regress/core.slave08.19150.shell
gdb command: gdb shell ./sql/regress/core.slave08.19150.shell --batch -n -x /tmp/tmp.KubaBX3k7s 2>&1
Missing separate debuginfo for
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/1e/0a7d58f454926e2afb4797865d85801ed65ece
[New Thread 19151]
[New Thread 19150]
[Thread debugging using libthread_db enabled]
Core was generated by `shell -a'.
Program terminated with signal 6, Aborted.
#0 0x0000003d61a32635 in raise () from /lib64/libc.so.6
#0 0x0000003d61a32635 in raise () from /lib64/libc.so.6
#1 0x0000003d61a33e15 in abort () from /lib64/libc.so.6
#2 0x0000000000411982 in LIOTM_assert_fun (pp_exp=0x4d4f40 "0", pp_file=0x4d175e "clio.cxx", pv_line=1022, pp_fun=0x4d2d60 "int Local_IO_To_Monitor::process_notice(message_def*)") at clio.cxx:99
#3 0x0000000000413b26 in Local_IO_To_Monitor::process_notice (this=0x7c6e80, pp_msg=<value optimized out>) at clio.cxx:1022
#4 0x0000000000413e03 in Local_IO_To_Monitor::get_io (this=0x7c6e80, pv_sig=<value optimized out>, pp_siginfo=<value optimized out>) at clio.cxx:637
#5 0x0000000000414075 in local_monitor_reader (pp_arg=0x6582) at clio.cxx:154
#6 0x0000003d622079d1 in start_thread () from /lib64/libpthread.so.0
#7 0x0000003d61ae886d i...

Read more...

Revision history for this message
Arvind Narain (arvind-narain) wrote :
Download full text (3.7 KiB)

Change 443 encountered a similar stack trace:
http://logs.trafodion.org/43/443/2/gate/core-regress-core-cdh4.4/920092b/console.html

Shutting down (normal) the SQ environment!
Wed Sep 24 17:13:14 UTC 2014
Processing cluster.conf on local host slave01
[$Z000KMR] Shell/shell Version 1.0.1 Release 0.9.0 (Build release [0.8.3rc1-276-g661bfcemaster_Bld338], date 20140924_154707)
ps
[$Z000KMR] %ps
[$Z000KMR] NID,PID(os) PRI TYPE STATES NAME PARENT PROGRAM
[$Z000KMR] ------------ --- ---- ------- ----------- ----------- ---------------
[$Z000KMR] 000,00031061 000 WDG ES--A-- $WDT000 NONE sqwatchdog
[$Z000KMR] 000,00031062 000 PSD ES--A-- $PSD000 NONE pstartd
[$Z000KMR] 000,00031114 001 DTM ES--A-- $TM0 NONE tm
[$Z000KMR] 000,00025296 001 GEN ES--A-- $Z000KMR NONE shell
[$Z000KMR] 001,00031057 000 WDG ES--A-- $WDT001 NONE sqwatchdog
[$Z000KMR] 001,00031063 000 PSD ES--A-- $PSD001 NONE pstartd
[$Z000KMR] 001,00031181 001 DTM ES--A-- $TM1 NONE tm
shutdown
[$Z000KMR] %shutdown
[$Z000KMR] Shutdown notice, level=0 received
/home/jenkins/workspace/core-regress-core-cdh4.4/trafodion/core/sqf/sql/scripts/sqshell: line 7: 25296 Aborted (core dumped) shell $1 $2 $3 $4 $5 $6 $7 $8 $9
Issued a 'shutdown normal' request

Shutdown in progress

# of SQ processes: 0
SQ Shutdown (normal) from /home/jenkins/workspace/core-regress-core-cdh4.4/trafodion/core/sql/regress Successful

===
WARNING: Core files found in /home/jenkins/workspace/core-regress-core-cdh4.4/trafodion/core :
-rw-------. 1 jenkins jenkins 44548096 Sep 24 17:13 sql/regress/core.slave01.25296.shell
core file : -rw-------. 1 jenkins jenkins 44548096 Sep 24 17:13 ./sqf/sql/regress/core.slave01.25296.shell
gdb command: gdb shell ./sqf/sql/regress/core.slave01.25296.shell --batch -n -x /tmp/tmp.njEIVJDVTc 2>&1
Missing separate debuginfo for
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/1e/0a7d58f454926e2afb4797865d85801ed65ece
[New Thread 25297]
[New Thread 25296]
[Thread debugging using libthread_db enabled]
Core was generated by `shell -a'.
Program terminated with signal 6, Aborted.
#0 0x00000030ada32635 in raise () from /lib64/libc.so.6

Thread 2 (Thread 0x7ffff7cdeb40 (LWP 25296)):
#0 0x00000030ae20b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000410965 in Local_IO_To_Monitor::wait_on_cv (this=0x7c6e80) at clio.cxx:1978
#2 0x0000000000411e21 in Local_IO_To_Monitor::send_recv (this=0x7c6e80, pp_msg=0x7ffff717e774, pv_nw=false) at clio.cxx:1452
#3 0x000000000040a426 in exit_process () at shell.cxx:1874
#4 0x000000000040a75c in shutdown (level=<value optimized out>) at shell.cxx:4183
#5 0x000000000040f7f5 in process_command (token=<value optimized out>, cmd_tail=0x7cb649 "", delimiter=32 ' ') at shell.cxx:5607
#6 0x000000000041005f in main (argc=2, argv=0x7fffffff7058) at shell.cxx:6191

Thread 1 (Thread 0x7ffff5baa700 (LWP 25297)):
#0 0x00000030ada32635 in raise () from /lib64/libc.so.6
#1 0x00000030ada33e15 in abort () from /lib64/libc.so.6
#2 0x0000000000411982 in LIOTM_assert_fun (pp_exp=0x4d4f80 "0...

Read more...

tags: added: foundation
Revision history for this message
Chris Sheedy (chris-sheedy) wrote :
Download full text (4.8 KiB)

Change 451 = https://review.trafodion.org/#/c/451/ encountered a similar stack trace in shutdown of core-regress-seabase-cdh4.4.

https://jenkins01.trafodion.org/job/core-regress-seabase-cdh4.4/345/console has

Shutting down (normal) the SQ environment!
Wed Sep 24 22:47:59 UTC 2014
Processing cluster.conf on local host slave07
[$Z0001ZW] Shell/shell Version 1.0.1 Release 0.9.0 (Build release [0.8.3rc1-279-g398c34fmaster_Bld345], date 20140924_213511)
ps
[$Z0001ZW] %ps
[$Z0001ZW] NID,PID(os) PRI TYPE STATES NAME PARENT PROGRAM
[$Z0001ZW] ------------ --- ---- ------- ----------- ----------- ---------------
[$Z0001ZW] 000,00009116 000 WDG ES--A-- $WDT000 NONE sqwatchdog
[$Z0001ZW] 000,00009117 000 PSD ES--A-- $PSD000 NONE pstartd
[$Z0001ZW] 000,00009158 001 DTM ES--A-- $TM0 NONE tm
[$Z0001ZW] 000,00002446 001 GEN ES--A-- $Z0001ZW NONE shell
[$Z0001ZW] 001,00009111 000 WDG ES--A-- $WDT001 NONE sqwatchdog
[$Z0001ZW] 001,00009112 000 PSD ES--A-- $PSD001 NONE pstartd
[$Z0001ZW] 001,00009238 001 DTM ES--A-- $TM1 NONE tm
shutdown
[$Z0001ZW] %shutdown
[$Z0001ZW] Shutdown notice, level=0 received
/home/jenkins/workspace/core-regress-seabase-cdh4.4/trafodion/core/sqf/sql/scripts/sqshell: line 7: 2446 Aborted (core dumped) shell $1 $2 $3 $4 $5 $6 $7 $8 $9
Issued a 'shutdown normal' request

Shutdown in progress
. . .
========= seabase
09/24/14 22:10:46 (RELEASE build)
09/24/14 22:14:01 TEST010 ### PASS ###
09/24/14 22:16:07 TEST011 ### PASS ###
09/24/14 22:18:38 TEST012 ### PASS ###
09/24/14 22:20:39 TEST013 ### PASS ###
09/24/14 22:20:57 TEST014 ### PASS ###
09/24/14 22:24:10 TEST016 ### PASS ###
09/24/14 22:24:41 TEST018 ### PASS ###
09/24/14 22:45:44 TEST020 ### PASS ###
09/24/14 22:46:03 TEST022 ### PASS ###
09/24/14 22:47:59 TEST024 ### PASS ###
09/24/14 22:10:46 - 22:47:59 (RELEASE build)

WARNING: Core files found in /home/jenkins/workspace/core-regress-seabase-cdh4.4/trafodion/core :
-rw-------. 1 jenkins jenkins 44548096 Sep 24 22:47 sql/regress/core.slave07.2446.shell
core file : -rw-------. 1 jenkins jenkins 44548096 Sep 24 22:47 ./sqf/sql/regress/core.slave07.2446.shell
executable : -rwxr-xr-x. 1 jenkins jenkins 2023501 Sep 24 21:40 /home/jenkins/workspace/core-regress-seabase-cdh4.4/trafodion/core/sqf/export/bin64/shell

gdb command:
  gdb /home/jenkins/workspace/core-regress-seabase-cdh4.4/trafodion/core/sqf/export/bin64/shell ./sqf/sql/regress/core.slave07.2446.shell --batch -n -x /tmp/tmp.YACcmrJ2m2 2>&1
Missing separate debuginfo for
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/1e/0a7d58f454926e2afb4797865d85801ed65ece
[New Thread 2447]
[New Thread 2446]
[Thread debugging using libthread_db enabled]
Core was generated by `shell -a'.
Program terminated with signal 6, Aborted.
#0 0x000000374b832635 in raise () from /lib64/libc.so.6

Thread 2 (Thread 0x7ffff7cdeb40 (LWP 2446)):
#0 0x000000374c00b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000410965 in Local_IO_...

Read more...

Revision history for this message
Atanu Mishra (atanu-mishra) wrote :

Need to observe behavior with Trafodion r0.9

Changed in trafodion:
milestone: none → r1.0
assignee: nobody → Gonzalo Correa (gonzalo.correa)
Changed in trafodion:
importance: Undecided → Low
milestone: r1.0 → r1.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.