In a HA cluster soon after provisoning vrouter crash is seen

Bug #1403295 reported by Vinod Nair
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.0
Fix Committed
Critical
Nipa
Trunk
Fix Committed
Critical
Nipa

Bug Description

In a Ubuntu 14.04 HA cluster with icehouse , soon after contrail provisioning vrouter crash /corre is seen

Image : 2.0 Build 17 ( issues seen in 15/16 also)

Back trace is as below

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-vrouter-agent'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fdbc8eaa5cb in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0 0x00007fdbc8eaa5cb in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x0000000000ab6f22 in AgentXmppChannel::HandleAgentXmppClientChannelEvent(AgentXmppChannel*, xmps::PeerState) ()
#2 0x0000000000bda6cb in XmppClient::NotifyConnectionEvent(XmppChannelMux*, xmps::PeerState) ()
#3 0x0000000000c09ec2 in ?? ()
#4 0x0000000000c13092 in xmsm::OpenSent::react(xmsm::EvXmppOpen const&) ()
#5 0x0000000000c17ba6 in boost::statechart::simple_state<xmsm::OpenSent, XmppStateMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*) ()
#6 0x0000000000c16a2f in boost::statechart::state_machine<XmppStateMachine, xmsm::Idle, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&) ()
#7 0x0000000000c0b637 in XmppStateMachine::DequeueEvent(boost::intrusive_ptr<boost::statechart::event_base const>&) ()
#8 0x0000000000c16ff5 in QueueTaskRunner<boost::intrusive_ptr<boost::statechart::event_base const>, WorkQueue<boost::intrusive_ptr<boost::statechart::event_base const> > >::RunQueue() ()
#9 0x0000000000e69de0 in TaskImpl::execute() ()
#10 0x00007fdbc9113b3a in ?? () from /usr/lib/libtbb.so.2
#11 0x00007fdbc910f816 in ?? () from /usr/lib/libtbb.so.2
#12 0x00007fdbc910ef4b in ?? () from /usr/lib/libtbb.so.2
#13 0x00007fdbc910b0ff in ?? () from /usr/lib/libtbb.so.2
#14 0x00007fdbc910b2f9 in ?? () from /usr/lib/libtbb.so.2
#15 0x00007fdbc932f182 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#16 0x00007fdbc8607fbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb)

Vinod Nair (vinodnair)
Changed in juniperopenstack:
milestone: none → r2.0-fcs
milestone: r2.0-fcs → none
tags: added: blocker
Revision history for this message
Vinod Nair (vinodnair) wrote :
Changed in juniperopenstack:
assignee: nobody → Hari Prasad Killi (haripk)
importance: Undecided → Critical
tags: added: vrouter
Changed in juniperopenstack:
assignee: Hari Prasad Killi (haripk) → Nipa (nipak)
information type: Proprietary → Public
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/5800
Committed: http://github.org/Juniper/contrail-controller/commit/50d8422355e1571ce3f0c40122f14f4ef2261d9f
Submitter: Zuul
Branch: R2.0

commit 50d8422355e1571ce3f0c40122f14f4ef2261d9f
Author: Nipa Kumar <email address hidden>
Date: Fri Dec 19 10:53:48 2014 -0800

Cleanup stale servers only if marked DOWN, as the hints from discovery are
to be honored only if server is detected DOWN by agent.

Core is seen due to access of deleted AgentXmppChannel. AgentXmppChannel
was deleted as discovery sent new set of services even though the vrouter
did not detect the server as DOWN. (This happens only during
provisioning as the sytem is loaded and the heartbeats were not
recieved by discovery and it detects the control-node as DOWN)

Change-Id: I5b588baf59d1550d2a359dd5529074a16c873c02
Closes-Bug:1403295

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/5955
Committed: http://github.org/Juniper/contrail-controller/commit/574e6e14fde4080e9cd9836dc032fd6c4088d02d
Submitter: Zuul
Branch: master

commit 574e6e14fde4080e9cd9836dc032fd6c4088d02d
Author: Nipa Kumar <email address hidden>
Date: Fri Dec 19 10:53:48 2014 -0800

Cleanup stale servers only if marked DOWN, as the hints from discovery are
to be honored only if server is detected DOWN by agent.

Core is seen due to access of deleted AgentXmppChannel. AgentXmppChannel
was deleted as discovery sent new set of services even though the vrouter
did not detect the server as DOWN. (This happens only during
provisioning as the sytem is loaded and the heartbeats were not
recieved by discovery and it detects the control-node as DOWN)

Change-Id: I5b588baf59d1550d2a359dd5529074a16c873c02
Closes-Bug:1403295

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.