Image: 3.0-1 kilo ubuntu1404
core will be copied to /cs-shared/bugs/<bug-id>/
After provisioning the scale cluster with SM observed that the collector had crashed and also observe continuous dump of I/O busy messages in collector logs.
2016-02-20 Sat 14:39:59:652.410 PST b5s27 [Thread 140685827057408, Pid 5686]: b5s27:Global: MessageIndexTableInsert: Addition of message: NodeStatusUVE, message UUID: b62eb02a-0ca7-48eb-bf6c-e6d12cd98e41 to table: MessageTableMessageType FAILED
2016-02-20 Sat 14:39:59:652.467 PST b5s27 [Thread 140685827057408, Pid 5686]: ExecuteQuerySyncInternal:controller/src/database/cassandra/cql/cql_if.cc:573: SyncQuery: INSERT INTO MessageTableTimestamp (key, key2, column1, column2) VALUES (173569679, 15, 1833694, b62eb02a-0ca7-48eb-bf6c-e6d12cd98e41) USING TTL 345600 FAILED: All connections on all I/O threads are busy
2016-02-20 Sat 14:39:59:652.506 PST b5s27 [Thread 140685827057408, Pid 5686]: b5s27:Global: MessageIndexTableInsert: Addition of message: NodeStatusUVE, message UUID: b62eb02a-0ca7-48eb-bf6c-e6d12cd98e41 to table: MessageTableTimestamp FAILED
2016-02-20 Sat 14:39:59:652.573 PST b5s27 [Thread 140685827057408, Pid 5686]: ExecuteQuerySyncInternal:controller/src/database/cassandra/cql/cql_if.cc:573: SyncQuery: INSERT INTO ObjectTable (key, key2, key3, column1, column2, value) VALUES (173569679, 0, 'ObjectCollectorInfo', 'b5s27', 1833694, b62eb02a-0ca7-48eb-bf6c-e6d12cd98e41) USING TTL 345600 FAILED: All connections on all I/O threads are busy
2016-02-20 Sat 14:39:59:652.599 PST b5s27 [Thread 140685827057408, Pid 5686]: b5s27:Global: ObjectTableInsert: Addition of b5s27, message UUID b62eb02a-0ca7-48eb-bf6c-e6d12cd98e41 into table ObjectCollectorInfo FAILED
2016-02-20 Sat 14:39:59:759.448 PST b5s27 [Thread 140687297373952, Pid 5686]: ExecuteQuerySyncInternal:controller/src/database/cassandra/cql/cql_if.cc:573: SyncQuery: INSERT INTO MessageTable (key, "Source", "Namespace", "ModuleId", "InstanceId", "NodeType", "MessageTS", "Category", "Level", "Messagetype", "SequenceNum", "VersionSig", "Type", "Xmlmessage") VALUES (28371cf1-bba9-48f8-b4d7-e4aa91a738ba, 'b5s19', '', 'contrail-vrouter-agent', '0', 'Compute', 1456007999757967, '', 2147483647, 'VrouterStats', 2759, 1896819220, 6, '<VrouterStats type="sandesh"><data type="struct" identifier="1"><VrouterStatsAgent><name type="string" identifier="1" key="ObjectVRouter">b5s19</name><cpu_info type="struct" identifier="12"><CpuLoadInfo><num_cpu type="u32" identifier="1">32</num_cpu><sys_mem_info type="struct" identifier="5"><SysMemInfo><total type="u32" identifier="1">197930812</total><used type="u32" identifier="2">2361060</used><free type="u32" identifier="3">195569752</free><buffers type="u32" identifier="4">183368</buffers></SysMemInfo></sys_mem_info><meminfo type="struct" identifier="2"><MemInfo><virt type="u32" identifier="1">830124</virt><peakvirt type="u32" identifier="2">830160</peakvirt><res type="u32" identifier="3">157796</res></MemInfo></meminfo><cpuload type="struct" identifier="3"><CpuLoadAvg><one_min_avg type="double" identifier="1">0.0025</one_min_avg><five_min_avg type="double" identifier="2">0.0009375</five_min_avg><fifteen_min_avg type="double" identifier="3">0.0015625</fifteen_min_avg></CpuLoadAvg></cpuload><cpu_share type="double" identifier="4">0.313489</cpu_share></CpuLoadInfo></cpu_info></VrouterStatsAgent></data></VrouterStats>') USING TTL 345600 FAILED: All connections on all I/O threads are busy
(gdb) bt
#0 0x00007f12a9483cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f12a94870d8 in __GI_abort () at abort.c:89
#2 0x00007f12a947cb86 in __assert_fail_base (fmt=0x7f12a95cd830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x74c215 "ret",
file=file@entry=0x74c460 "controller/src/analytics/OpServerProxy.cc", line=line@entry=783,
function=function@entry=0x74ca80 <OpServerProxy::DeleteUVEs(std::string const&, std::string const&, std::string const&, std::string const&)::__PRETTY_FUNCTION__> "virtual bool OpServerProxy::DeleteUVEs(const string&, const string&, const string&, const string&)") at assert.c:92
#3 0x00007f12a947cc32 in __GI___assert_fail (assertion=0x74c215 "ret", file=0x74c460 "controller/src/analytics/OpServerProxy.cc", line=783,
function=0x74ca80 <OpServerProxy::DeleteUVEs(std::string const&, std::string const&, std::string const&, std::string const&)::__PRETTY_FUNCTION__> "virtual bool OpServerProxy::DeleteUVEs(const string&, const string&, const string&, const string&)") at assert.c:101
#4 0x000000000055f378 in OpServerProxy::DeleteUVEs (this=0x276ff30, source=..., module=..., node_type=..., instance_id=...) at controller/src/analytics/OpServerProxy.cc:783
#5 0x0000000000509042 in SandeshGenerator::DisconnectSession (this=this@entry=0x7f1224002cb0, vsession=vsession@entry=0x2800e80) at controller/src/analytics/generator.cc:249
#6 0x00000000004f2d9d in Collector::ReceiveSandeshCtrlMsg (this=0x2774330, state_machine=<optimized out>, session=<optimized out>, sandesh=<optimized out>)
at controller/src/analytics/collector.cc:292
#7 0x0000000000677f70 in SandeshServerConnection::ProcessSandeshCtrlMessage (this=this@entry=0x27f7410, msg=..., header=..., sandesh_name=...,
header_offset=header_offset@entry=698) at tools/sandesh/library/cpp/sandesh_connection.cc:170
#8 0x0000000000676c6e in ssm::ServerInit::react (this=this@entry=0x7f1258003cc0, event=...) at tools/sandesh/library/cpp/sandesh_state_machine.cc:303
#9 0x00000000006773d5 in react<ssm::ServerInit, boost::statechart::event_base, void const*> (eventType=<synthetic pointer>, evt=..., stt=...)
at /usr/include/boost/statechart/custom_reaction.hpp:42
#10 local_react_impl<boost::mpl::list3<boost::statechart::custom_reaction<ssm::EvSandeshCtrlMessageRecv>, boost::statechart::custom_reaction<ssm::EvSandeshMessageRecv>, boost::
statechart::in_state_reaction<ssm::EvTcpDeleteSession, SandeshStateMachine, &SandeshStateMachine::DeleteTcpSession> >, boost::statechart::simple_state<ssm::ServerInit, SandeshS
tateMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (
eventType=0xaaf1c0 <boost::statechart::detail::id_holder<ssm::EvSandeshCtrlMessageRecv>::idProvider_>, evt=..., stt=...)
at /usr/include/boost/statechart/simple_state.hpp:816
#11 local_react<boost::mpl::list3<boost::statechart::custom_reaction<ssm::EvSandeshCtrlMessageRecv>, boost::statechart::custom_reaction<ssm::EvSandeshMessageRecv>, boost::statechart::in_state_reaction<ssm::EvTcpDeleteSession, SandeshStateMachine, &SandeshStateMachine::DeleteTcpSession> > > (
eventType=0xaaf1c0 <boost::statechart::detail::id_holder<ssm::EvSandeshCtrlMessageRecv>::idProvider_>, evt=..., this=0x7f1258003cc0)
at /usr/include/boost/statechart/simple_state.hpp:851
#12 local_react_impl<boost::mpl::list4<boost::statechart::custom_reaction<ssm::EvTcpClose>, boost::statechart::custom_reaction<ssm::EvSandeshCtrlMessageRecv>, boost::statechart
::custom_reaction<ssm::EvSandeshMessageRecv>, boost::statechart::in_state_reaction<ssm::EvTcpDeleteSession, SandeshStateMachine, &SandeshStateMachine::DeleteTcpSession> >, boos
t::statechart::simple_state<ssm::ServerInit, SandeshStateMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (eventType=<optimized out>,
evt=..., stt=...) at /usr/include/boost/statechart/simple_state.hpp:820
#13 local_react<boost::mpl::list4<boost::statechart::custom_reaction<ssm::EvTcpClose>, boost::statechart::custom_reaction<ssm::EvSandeshCtrlMessageRecv>, boost::statechart::custom_reaction<ssm::EvSandeshMessageRecv>, boost::statechart::in_state_reaction<ssm::EvTcpDeleteSession, SandeshStateMachine, &SandeshStateMachine::DeleteTcpSession> > > (
eventType=<optimized out>, evt=..., this=<optimized out>) at /usr/include/boost/statechart/simple_state.hpp:851
#14 local_react_impl<boost::mpl::list<boost::statechart::transition<ssm::EvStop, ssm::Idle, SandeshStateMachine, &SandeshStateMachine::OnIdle>, boost::statechart::custom_reacti
on<ssm::EvTcpClose>, boost::statechart::custom_reaction<ssm::EvSandeshCtrlMessageRecv>, boost::statechart::custom_reaction<ssm::EvSandeshMessageRecv>, boost::statechart::in_sta
te_reaction<ssm::EvTcpDeleteSession, SandeshStateMachine, &SandeshStateMachine::DeleteTcpSession> >, boost::statechart::simple_state<ssm::ServerInit, SandeshStateMachine, boost
::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (
eventType=0xaaf1c0 <boost::statechart::detail::id_holder<ssm::EvSandeshCtrlMessageRecv>::idProvider_>, evt=..., stt=...)
at /usr/include/boost/statechart/simple_state.hpp:820
---Type <return> to continue, or q <return> to quit---
#15 local_react<boost::mpl::list<boost::statechart::transition<ssm::EvStop, ssm::Idle, SandeshStateMachine, &SandeshStateMachine::OnIdle>, boost::statechart::custom_reaction<ss
m::EvTcpClose>, boost::statechart::custom_reaction<ssm::EvSandeshCtrlMessageRecv>, boost::statechart::custom_reaction<ssm::EvSandeshMessageRecv>, boost::statechart::in_state_reaction<ssm::EvTcpDeleteSession, SandeshStateMachine, &SandeshStateMachine::DeleteTcpSession> > > (
eventType=0xaaf1c0 <boost::statechart::detail::id_holder<ssm::EvSandeshCtrlMessageRecv>::idProvider_>, evt=..., this=0x7f1258003cc0)
at /usr/include/boost/statechart/simple_state.hpp:851
#16 boost::statechart::simple_state<ssm::ServerInit, SandeshStateMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl (
this=0x7f1258003cc0, evt=..., eventType=0xaaf1c0 <boost::statechart::detail::id_holder<ssm::EvSandeshCtrlMessageRecv>::idProvider_>)
at /usr/include/boost/statechart/simple_state.hpp:489
#17 0x0000000000674f2b in operator() (this=<synthetic pointer>) at /usr/include/boost/statechart/state_machine.hpp:87
#18 operator()<boost::statechart::detail::send_function<boost::statechart::detail::state_base<std::allocator<void>, boost::statechart::detail::rtti_policy>, boost::statechart::event_base, void const*>, boost::statechart::state_machine<SandeshStateMachine, ssm::Idle>::exception_event_handler> (this=0x28055a8, action=...)
at /usr/include/boost/statechart/null_exception_translator.hpp:33
#19 boost::statechart::state_machine<SandeshStateMachine, ssm::Idle, std::allocator<void>, boost::statechart::null_exception_translator>::send_event (this=0x2805550, evt=...)
at /usr/include/boost/statechart/state_machine.hpp:889
#20 0x000000000066d015 in process_event (evt=..., this=0x2805550) at /usr/include/boost/statechart/state_machine.hpp:275
#21 SandeshStateMachine::DequeueEvent (this=0x2805550, ec=...) at tools/sandesh/library/cpp/sandesh_state_machine.cc:777
#22 0x00000000006740d7 in operator() (a0=<error reading variable: access outside bounds of object referenced via synthetic pointer>, this=0x7f125fbfd0d0)
at /usr/include/boost/function/function_template.hpp:767
#23 QueueTaskRunner<SandeshStateMachine::EventContainer, WorkQueue<SandeshStateMachine::EventContainer> >::RunQueue (this=0x7f1220002db0)
at controller/src/base/queue_task.h:87
#24 0x00000000006dd76c in TaskImpl::execute (this=0x7f12a019be40) at controller/src/base/task.cc:253
#25 0x00007f12aaa0bb3a in ?? ()
#26 0x00007f12a0193f28 in ?? ()
#27 0x00007f12a0193f40 in ?? ()
#28 0x0000000000000001 in ?? ()
#29 0x00007f12a42b6f00 in ?? ()
#30 0x00007f12a0193f28 in ?? ()
#31 0x0000000000000000 in ?? ()
(gdb) f 4
#4 0x000000000055f378 in OpServerProxy::DeleteUVEs (this=0x276ff30, source=..., module=..., node_type=..., instance_id=...) at controller/src/analytics/OpServerProxy.cc:783
783 controller/src/analytics/OpServerProxy.cc: No such file or directory.
(gdb) p ret
$1 = false
This assert is seen if a redis operation does not complete in 15s.
Its possible that the system was very busy.
The collector log has several messages like these:
All connections on all I/O threads are busy
All hosts in current policy attempted and were either unavailable or failed
Host 10.84.25.29 had the following error on startup: Unable to connect
Connect error 'connection refused' on host 10.84.25.29