Comment 0 for bug 1764506

Revision history for this message
Pulkit Tandon (pulkitt) wrote : [k8s-R5.0]: Agent and collector crash observed on k8s sanity run

Configuration:
K8s 1.9.2
coat-5.0-15
Centos-7.4

Setup:
5 node setup.
1 Kube master. 3 Controller.
2 Agent+ K8s slaves

The issues was observed in a k8s sanity run:
LogsLocation : http://10.204.216.50/Docs/logs/5.0-15_2018_04_16_17_17_20_1523889114.59/logs/
Report : http://10.204.216.50/Docs/logs/5.0-15_2018_04_16_17_17_20_1523889114.59/junit-noframes.html

Agent crash observed on 1 of the node
```(gdb) bt full
#0 0x00007fe9e70491f7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007fe9e704a8e8 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00007fe9e7042266 in __assert_fail_base () from /lib64/libc.so.6
No symbol table info available.
#3 0x00007fe9e7042312 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#4 0x0000000000c98ccf in NextHopTable::FreeInterfaceId(unsigned long) ()
No symbol table info available.
#5 0x0000000000c9efff in NextHop::~NextHop() ()
No symbol table info available.
#6 0x0000000000cad966 in CompositeNH::~CompositeNH() ()
No symbol table info available.
#7 0x0000000000ec77d8 in DBTablePartition::Remove(DBEntryBase*) ()
No symbol table info available.
#8 0x0000000000ec22e6 in DBPartition::QueueRunner::Run() ()
No symbol table info available.
#9 0x0000000000e9e44f in TaskImpl::execute() ()
No symbol table info available.
#10 0x00007fe9e7c188ca in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) () from /lib64/libtbb.so.2
No symbol table info available.
#11 0x00007fe9e7c145b6 in tbb::internal::arena::process(tbb::internal::generic_scheduler&) () from /lib64/libtbb.so.2
No symbol table info available.
#12 0x00007fe9e7c13c8b in tbb::internal::market::process(rml::job&) () from /lib64/libtbb.so.2
No symbol table info available.
#13 0x00007fe9e7c1167f in tbb::internal::rml::private_worker::run() () from /lib64/libtbb.so.2
No symbol table info available.
#14 0x00007fe9e7c11879 in tbb::internal::rml::private_worker::thread_routine(void*) () from /lib64/libtbb.so.2
No symbol table info available.
#15 0x00007fe9e7e33e25 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#16 0x00007fe9e710c34d in clone () from /lib64/libc.so.6
No symbol table info available.
(gdb)
```

Collector crash observed on one of the Controller.
```
(gdb) bt full
#0 0x00007fee763f4a37 in abort () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007fee763ec266 in __assert_fail_base () from /lib64/libc.so.6
No symbol table info available.
#2 0x00007fee763ec312 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#3 0x0000000000744b8d in OpServerProxy::DeleteUVEs(std::string const&, std::string const&, std::string const&, std::string const&) ()
No symbol table info available.
#4 0x0000000000649e46 in SandeshGenerator::DisconnectSession(VizSession*) ()
No symbol table info available.
#5 0x000000000063aeec in Collector::DisconnectSession(SandeshSession*) ()
No symbol table info available.
#6 0x000000000087863d in SandeshServerConnection::ProcessDisconnect(SandeshSession*) ()
No symbol table info available.
#7 0x0000000000875280 in ssm::Established::react(ssm::EvTcpClose const&) ()
No symbol table info available.
#8 0x00000000008755a8 in boost::statechart::simple_state<ssm::Established, SandeshStateMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*) ()
No symbol table info available.
#9 0x000000000087508b in boost::statechart::state_machine<SandeshStateMachine, ssm::Idle, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&) ()
No symbol table info available.
#10 0x000000000086d975 in SandeshStateMachine::DequeueEvent(SandeshStateMachine::EventContainer&) ()
No symbol table info available.
#11 0x0000000000874525 in QueueTaskRunner<SandeshStateMachine::EventContainer, WorkQueue<SandeshStateMachine::EventContainer> >::RunQueue() ()
No symbol table info available.
#12 0x000000000047781f in TaskImpl::execute() ()
No symbol table info available.
#13 0x00007fee777e18ca in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) () from /lib64/libtbb.so.2
No symbol table info available.
#14 0x00007fee777dd5b6 in tbb::internal::arena::process(tbb::internal::generic_scheduler&) () from /lib64/libtbb.so.2
No symbol table info available.
#15 0x00007fee777dcc8b in tbb::internal::market::process(rml::job&) () from /lib64/libtbb.so.2
No symbol table info available.
#16 0x00007fee777da67f in tbb::internal::rml::private_worker::run() () from /lib64/libtbb.so.2
No symbol table info available.
#17 0x00007fee777da879 in tbb::internal::rml::private_worker::thread_routine(void*) () from /lib64/libtbb.so.2
No symbol table info available.
#18 0x00007fee779fce25 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#19 0x00007fee764b634d in clone () from /lib64/libc.so.6
No symbol table info available.
(gdb)

```

Dumps kept at following path:
/home/bhushana/Documents/technical/logs/5.0-15_2018_04_16_17_17_20_1523889114.59