tor-agent crash at operator() due to seg fault on scale setup
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Juniper Openstack | Status tracked in Trunk | |||||
R2.20 |
Fix Committed
|
Medium
|
Vedamurthy Joshi | |||
Trunk |
Fix Committed
|
Medium
|
Vedamurthy Joshi |
Bug Description
R2.1 42 Ubuntu 14.04 Multi-node setup
Tor Scale setup with 128 tor agents and 11K vmis
Crash will be in http://
Below tor-agent crash was seen once :
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_
Core was generated by `/usr/bin/
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000c7b181 in operator() (this=0x7fdf440
767 /usr/include/
(gdb) bt
#0 0x0000000000c7b181 in operator() (this=0x7fdf440
#1 OnEntry (this=0x7fdf440
#2 QueueTaskRunner
at controller/
#3 0x0000000000d0d290 in TaskImpl::execute (this=0x7fdf4b0
#4 0x00007fdf52ac1b3a in ?? () from /usr/lib/
#5 0x00007fdf52abd816 in ?? () from /usr/lib/
#6 0x00007fdf52abcf4b in ?? () from /usr/lib/
#7 0x00007fdf52ab90ff in ?? () from /usr/lib/
#8 0x00007fdf52ab92f9 in ?? () from /usr/lib/
#9 0x00007fdf52cdd182 in start_thread (arg=0x7fdf3bbf
#10 0x00007fdf5197dfbd in __signbitl (__x=0) at ../sysdeps/
#11 __qfcvt_r (value=0, ndigit=1002432256, decpt=0x7fdf3bb
#12 0x0000000000000000 in ?? ()
(gdb)
Changed in juniperopenstack: | |
assignee: | Raj Reddy (rajreddy) → Megh Bhatt (meghb) |
tags: | added: analytics |
Seems to be a case of workqueue object delete without shutdown in Sandesh infra
SandeshClientSMImpl pointer memory overlaps with workqueue pointer (which seems to be deleted without shutdown) MImpl *) 0x7fdf4406e080 sm_.px- >work_queue_ SandeshClientSM Impl::EventCont ainer> *) 0x7fdf4406e1e8
(gdb) p (TcpServer *) 0x7fdf44071d50
$167 = (SandeshClient *) 0x7fdf44071d50
(gdb) p (SandeshClient *) 0x7fdf44071d50
$168 = (SandeshClient *) 0x7fdf44071d50
(gdb) p $168->sm_.px
$169 = (SandeshClientS
(gdb) p &$168->
$170 = (WorkQueue<
(gdb) fr 3 a5c40) at controller/ src/base/ task.cc: 232 r<SandeshClient SMImpl: :EventContainer , WorkQueue< SandeshClientSM Impl::EventCont ainer> >) { <SandeshClientS MImpl:: EventContainer, WorkQueue< SandeshClientSM Impl::EventCont ainer> >+16>, <SandeshClientS MImpl:: EventContainer, WorkQueue< SandeshClientSM Impl::EventCont ainer> >:
#3 0x0000000000d0d290 in TaskImpl::execute (this=0x7fdf4b0
(gdb) p *this->parent_
$178 = (QueueTaskRunne
<Task> = {
_vptr.Task = 0xdd6bf0 <vtable for QueueTaskRunner
static kTaskInstanceAny = -1,
task_id_ = 16,
task_instance_ = 0,
task_impl_ = 0x7fdf4b0a5c40,
state_ = Task::RUN,
seqno_ = 747966,
task_recycle_ = false,
task_cancel_ = false
},
members of QueueTaskRunner
queue_ = 0x7fdf4406e100
}
(gdb)
Task Context from task scheduler :allocator< char>> = { __gnu_cxx: :new_allocator< char>> = {<No data fields>}, <No data fields>}, string< char, std::char_ traits< char>, std::allocator< char> >::_Alloc_hider: :SandeshClientS M"
elem[14].left: $210 = {
static npos = <optimized out>,
_M_dataplus = {
<std:
<
members of std::basic_
_M_p = 0x1687f18 "sandesh:
}
}
elem[14].right: $211 = 16
looking at the code i can find one such potential instance, where workqueue in Sandesh Session object is deleted without shutdown.
We will still need to figure out if there are any other instances of the same.