collector: assert @ OpServerProxy::DeleteUVEs

Bug #1548005 reported by Senthilnathan Murugappan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.0
Incomplete
High
Anish Mehta
Trunk
Incomplete
High
Anish Mehta

Bug Description

Image: 3.0-1 kilo ubuntu1404
core will be copied to /cs-shared/bugs/<bug-id>/

After provisioning the scale cluster with SM observed that the collector had crashed and also observe continuous dump of I/O busy messages in collector logs.

2016-02-20 Sat 14:39:59:652.410 PST b5s27 [Thread 140685827057408, Pid 5686]: b5s27:Global: MessageIndexTableInsert: Addition of message: NodeStatusUVE, message UUID: b62eb02a-0ca7-48eb-bf6c-e6d12cd98e41 to table: MessageTableMessageType FAILED
2016-02-20 Sat 14:39:59:652.467 PST b5s27 [Thread 140685827057408, Pid 5686]: ExecuteQuerySyncInternal:controller/src/database/cassandra/cql/cql_if.cc:573: SyncQuery: INSERT INTO MessageTableTimestamp (key, key2, column1, column2) VALUES (173569679, 15, 1833694, b62eb02a-0ca7-48eb-bf6c-e6d12cd98e41) USING TTL 345600 FAILED: All connections on all I/O threads are busy
2016-02-20 Sat 14:39:59:652.506 PST b5s27 [Thread 140685827057408, Pid 5686]: b5s27:Global: MessageIndexTableInsert: Addition of message: NodeStatusUVE, message UUID: b62eb02a-0ca7-48eb-bf6c-e6d12cd98e41 to table: MessageTableTimestamp FAILED
2016-02-20 Sat 14:39:59:652.573 PST b5s27 [Thread 140685827057408, Pid 5686]: ExecuteQuerySyncInternal:controller/src/database/cassandra/cql/cql_if.cc:573: SyncQuery: INSERT INTO ObjectTable (key, key2, key3, column1, column2, value) VALUES (173569679, 0, 'ObjectCollectorInfo', 'b5s27', 1833694, b62eb02a-0ca7-48eb-bf6c-e6d12cd98e41) USING TTL 345600 FAILED: All connections on all I/O threads are busy
2016-02-20 Sat 14:39:59:652.599 PST b5s27 [Thread 140685827057408, Pid 5686]: b5s27:Global: ObjectTableInsert: Addition of b5s27, message UUID b62eb02a-0ca7-48eb-bf6c-e6d12cd98e41 into table ObjectCollectorInfo FAILED
2016-02-20 Sat 14:39:59:759.448 PST b5s27 [Thread 140687297373952, Pid 5686]: ExecuteQuerySyncInternal:controller/src/database/cassandra/cql/cql_if.cc:573: SyncQuery: INSERT INTO MessageTable (key, "Source", "Namespace", "ModuleId", "InstanceId", "NodeType", "MessageTS", "Category", "Level", "Messagetype", "SequenceNum", "VersionSig", "Type", "Xmlmessage") VALUES (28371cf1-bba9-48f8-b4d7-e4aa91a738ba, 'b5s19', '', 'contrail-vrouter-agent', '0', 'Compute', 1456007999757967, '', 2147483647, 'VrouterStats', 2759, 1896819220, 6, '<VrouterStats type="sandesh"><data type="struct" identifier="1"><VrouterStatsAgent><name type="string" identifier="1" key="ObjectVRouter">b5s19</name><cpu_info type="struct" identifier="12"><CpuLoadInfo><num_cpu type="u32" identifier="1">32</num_cpu><sys_mem_info type="struct" identifier="5"><SysMemInfo><total type="u32" identifier="1">197930812</total><used type="u32" identifier="2">2361060</used><free type="u32" identifier="3">195569752</free><buffers type="u32" identifier="4">183368</buffers></SysMemInfo></sys_mem_info><meminfo type="struct" identifier="2"><MemInfo><virt type="u32" identifier="1">830124</virt><peakvirt type="u32" identifier="2">830160</peakvirt><res type="u32" identifier="3">157796</res></MemInfo></meminfo><cpuload type="struct" identifier="3"><CpuLoadAvg><one_min_avg type="double" identifier="1">0.0025</one_min_avg><five_min_avg type="double" identifier="2">0.0009375</five_min_avg><fifteen_min_avg type="double" identifier="3">0.0015625</fifteen_min_avg></CpuLoadAvg></cpuload><cpu_share type="double" identifier="4">0.313489</cpu_share></CpuLoadInfo></cpu_info></VrouterStatsAgent></data></VrouterStats>') USING TTL 345600 FAILED: All connections on all I/O threads are busy

(gdb) bt
#0 0x00007f12a9483cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f12a94870d8 in __GI_abort () at abort.c:89
#2 0x00007f12a947cb86 in __assert_fail_base (fmt=0x7f12a95cd830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x74c215 "ret",
    file=file@entry=0x74c460 "controller/src/analytics/OpServerProxy.cc", line=line@entry=783,
    function=function@entry=0x74ca80 <OpServerProxy::DeleteUVEs(std::string const&, std::string const&, std::string const&, std::string const&)::__PRETTY_FUNCTION__> "virtual bool OpServerProxy::DeleteUVEs(const string&, const string&, const string&, const string&)") at assert.c:92
#3 0x00007f12a947cc32 in __GI___assert_fail (assertion=0x74c215 "ret", file=0x74c460 "controller/src/analytics/OpServerProxy.cc", line=783,
    function=0x74ca80 <OpServerProxy::DeleteUVEs(std::string const&, std::string const&, std::string const&, std::string const&)::__PRETTY_FUNCTION__> "virtual bool OpServerProxy::DeleteUVEs(const string&, const string&, const string&, const string&)") at assert.c:101
#4 0x000000000055f378 in OpServerProxy::DeleteUVEs (this=0x276ff30, source=..., module=..., node_type=..., instance_id=...) at controller/src/analytics/OpServerProxy.cc:783
#5 0x0000000000509042 in SandeshGenerator::DisconnectSession (this=this@entry=0x7f1224002cb0, vsession=vsession@entry=0x2800e80) at controller/src/analytics/generator.cc:249
#6 0x00000000004f2d9d in Collector::ReceiveSandeshCtrlMsg (this=0x2774330, state_machine=<optimized out>, session=<optimized out>, sandesh=<optimized out>)
    at controller/src/analytics/collector.cc:292
#7 0x0000000000677f70 in SandeshServerConnection::ProcessSandeshCtrlMessage (this=this@entry=0x27f7410, msg=..., header=..., sandesh_name=...,
    header_offset=header_offset@entry=698) at tools/sandesh/library/cpp/sandesh_connection.cc:170
#8 0x0000000000676c6e in ssm::ServerInit::react (this=this@entry=0x7f1258003cc0, event=...) at tools/sandesh/library/cpp/sandesh_state_machine.cc:303
#9 0x00000000006773d5 in react<ssm::ServerInit, boost::statechart::event_base, void const*> (eventType=<synthetic pointer>, evt=..., stt=...)
    at /usr/include/boost/statechart/custom_reaction.hpp:42
#10 local_react_impl<boost::mpl::list3<boost::statechart::custom_reaction<ssm::EvSandeshCtrlMessageRecv>, boost::statechart::custom_reaction<ssm::EvSandeshMessageRecv>, boost::
statechart::in_state_reaction<ssm::EvTcpDeleteSession, SandeshStateMachine, &SandeshStateMachine::DeleteTcpSession> >, boost::statechart::simple_state<ssm::ServerInit, SandeshS
tateMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (
    eventType=0xaaf1c0 <boost::statechart::detail::id_holder<ssm::EvSandeshCtrlMessageRecv>::idProvider_>, evt=..., stt=...)
    at /usr/include/boost/statechart/simple_state.hpp:816
#11 local_react<boost::mpl::list3<boost::statechart::custom_reaction<ssm::EvSandeshCtrlMessageRecv>, boost::statechart::custom_reaction<ssm::EvSandeshMessageRecv>, boost::statechart::in_state_reaction<ssm::EvTcpDeleteSession, SandeshStateMachine, &SandeshStateMachine::DeleteTcpSession> > > (
    eventType=0xaaf1c0 <boost::statechart::detail::id_holder<ssm::EvSandeshCtrlMessageRecv>::idProvider_>, evt=..., this=0x7f1258003cc0)
    at /usr/include/boost/statechart/simple_state.hpp:851
#12 local_react_impl<boost::mpl::list4<boost::statechart::custom_reaction<ssm::EvTcpClose>, boost::statechart::custom_reaction<ssm::EvSandeshCtrlMessageRecv>, boost::statechart
::custom_reaction<ssm::EvSandeshMessageRecv>, boost::statechart::in_state_reaction<ssm::EvTcpDeleteSession, SandeshStateMachine, &SandeshStateMachine::DeleteTcpSession> >, boos
t::statechart::simple_state<ssm::ServerInit, SandeshStateMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (eventType=<optimized out>,
    evt=..., stt=...) at /usr/include/boost/statechart/simple_state.hpp:820
#13 local_react<boost::mpl::list4<boost::statechart::custom_reaction<ssm::EvTcpClose>, boost::statechart::custom_reaction<ssm::EvSandeshCtrlMessageRecv>, boost::statechart::custom_reaction<ssm::EvSandeshMessageRecv>, boost::statechart::in_state_reaction<ssm::EvTcpDeleteSession, SandeshStateMachine, &SandeshStateMachine::DeleteTcpSession> > > (
    eventType=<optimized out>, evt=..., this=<optimized out>) at /usr/include/boost/statechart/simple_state.hpp:851
#14 local_react_impl<boost::mpl::list<boost::statechart::transition<ssm::EvStop, ssm::Idle, SandeshStateMachine, &SandeshStateMachine::OnIdle>, boost::statechart::custom_reacti
on<ssm::EvTcpClose>, boost::statechart::custom_reaction<ssm::EvSandeshCtrlMessageRecv>, boost::statechart::custom_reaction<ssm::EvSandeshMessageRecv>, boost::statechart::in_sta
te_reaction<ssm::EvTcpDeleteSession, SandeshStateMachine, &SandeshStateMachine::DeleteTcpSession> >, boost::statechart::simple_state<ssm::ServerInit, SandeshStateMachine, boost
::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (
    eventType=0xaaf1c0 <boost::statechart::detail::id_holder<ssm::EvSandeshCtrlMessageRecv>::idProvider_>, evt=..., stt=...)
    at /usr/include/boost/statechart/simple_state.hpp:820
---Type <return> to continue, or q <return> to quit---
#15 local_react<boost::mpl::list<boost::statechart::transition<ssm::EvStop, ssm::Idle, SandeshStateMachine, &SandeshStateMachine::OnIdle>, boost::statechart::custom_reaction<ss
m::EvTcpClose>, boost::statechart::custom_reaction<ssm::EvSandeshCtrlMessageRecv>, boost::statechart::custom_reaction<ssm::EvSandeshMessageRecv>, boost::statechart::in_state_reaction<ssm::EvTcpDeleteSession, SandeshStateMachine, &SandeshStateMachine::DeleteTcpSession> > > (
    eventType=0xaaf1c0 <boost::statechart::detail::id_holder<ssm::EvSandeshCtrlMessageRecv>::idProvider_>, evt=..., this=0x7f1258003cc0)
    at /usr/include/boost/statechart/simple_state.hpp:851
#16 boost::statechart::simple_state<ssm::ServerInit, SandeshStateMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl (
    this=0x7f1258003cc0, evt=..., eventType=0xaaf1c0 <boost::statechart::detail::id_holder<ssm::EvSandeshCtrlMessageRecv>::idProvider_>)
    at /usr/include/boost/statechart/simple_state.hpp:489
#17 0x0000000000674f2b in operator() (this=<synthetic pointer>) at /usr/include/boost/statechart/state_machine.hpp:87
#18 operator()<boost::statechart::detail::send_function<boost::statechart::detail::state_base<std::allocator<void>, boost::statechart::detail::rtti_policy>, boost::statechart::event_base, void const*>, boost::statechart::state_machine<SandeshStateMachine, ssm::Idle>::exception_event_handler> (this=0x28055a8, action=...)
    at /usr/include/boost/statechart/null_exception_translator.hpp:33
#19 boost::statechart::state_machine<SandeshStateMachine, ssm::Idle, std::allocator<void>, boost::statechart::null_exception_translator>::send_event (this=0x2805550, evt=...)
    at /usr/include/boost/statechart/state_machine.hpp:889
#20 0x000000000066d015 in process_event (evt=..., this=0x2805550) at /usr/include/boost/statechart/state_machine.hpp:275
#21 SandeshStateMachine::DequeueEvent (this=0x2805550, ec=...) at tools/sandesh/library/cpp/sandesh_state_machine.cc:777
#22 0x00000000006740d7 in operator() (a0=<error reading variable: access outside bounds of object referenced via synthetic pointer>, this=0x7f125fbfd0d0)
    at /usr/include/boost/function/function_template.hpp:767
#23 QueueTaskRunner<SandeshStateMachine::EventContainer, WorkQueue<SandeshStateMachine::EventContainer> >::RunQueue (this=0x7f1220002db0)
    at controller/src/base/queue_task.h:87
#24 0x00000000006dd76c in TaskImpl::execute (this=0x7f12a019be40) at controller/src/base/task.cc:253
#25 0x00007f12aaa0bb3a in ?? ()
#26 0x00007f12a0193f28 in ?? ()
#27 0x00007f12a0193f40 in ?? ()
#28 0x0000000000000001 in ?? ()
#29 0x00007f12a42b6f00 in ?? ()
#30 0x00007f12a0193f28 in ?? ()
#31 0x0000000000000000 in ?? ()

(gdb) f 4
#4 0x000000000055f378 in OpServerProxy::DeleteUVEs (this=0x276ff30, source=..., module=..., node_type=..., instance_id=...) at controller/src/analytics/OpServerProxy.cc:783
783 controller/src/analytics/OpServerProxy.cc: No such file or directory.
(gdb) p ret
$1 = false

Tags: analytics
description: updated
Revision history for this message
Anish Mehta (amehta00) wrote :

This assert is seen if a redis operation does not complete in 15s.

Its possible that the system was very busy.

The collector log has several messages like these:

All connections on all I/O threads are busy
All hosts in current policy attempted and were either unavailable or failed
Host 10.84.25.29 had the following error on startup: Unable to connect
Connect error 'connection refused' on host 10.84.25.29

Revision history for this message
Anish Mehta (amehta00) wrote :

Senthil, how easy is this to reproduce?
Can we get access to a system in this state?

Revision history for this message
Anish Mehta (amehta00) wrote :

As per Raj:
"
rajreddy [8:21 AM]
for now we are thinking of increasing I/O threads to num_cpus/2 — while continuing the perf study.."

We need more logs to analyze this crash: - /var/log/messaages, redis logs etc.

tags: removed: blocker
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.