control-node crash seen when LLGR timer expired for XMPP Agent

Bug #1645520 reported by Ananth Suryanarayana
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.2
Fix Committed
High
Ananth Suryanarayana
Trunk
Fix Committed
High
Ananth Suryanarayana

Bug Description

#0 0x00007fd31625ac37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fd31625e028 in __GI_abort () at abort.c:89
#2 0x00007fd316253bf6 in __assert_fail_base (fmt=0x7fd3163a43b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x119a2a8 "prs->action() == NONE",
    file=file@entry=0x119a1f8 "controller/src/bgp/bgp_membership.cc", line=line@entry=114,
    function=function@entry=0x119b440 <BgpMembershipManager::Register(IPeer*, BgpTable*, RibExportPolicy const&, int)::__PRETTY_FUNCTION__> "virtual void BgpMembershipManager::Register(IPeer*, BgpTable*, const RibExportPolicy&, int)") at assert.c:92
#3 0x00007fd316253ca2 in __GI___assert_fail (assertion=0x119a2a8 "prs->action() == NONE", file=0x119a1f8 "controller/src/bgp/bgp_membership.cc", line=114,
    function=0x119b440 <BgpMembershipManager::Register(IPeer*, BgpTable*, RibExportPolicy const&, int)::__PRETTY_FUNCTION__> "virtual void BgpMembershipManager::Register(IPeer*, BgpTable*, const RibExportPolicy&, int)") at assert.c:101
#4 0x0000000000a0c039 in BgpMembershipManager::Register (this=0x2f221c0, peer=0x7fd2d801b650, table=0x7fd300023e70, policy=..., instance_id=2)
    at controller/src/bgp/bgp_membership.cc:114
#5 0x0000000000c9d1f1 in BgpXmppChannel::RegisterTable (this=0x7fd2d8031430, line=2132, table=0x7fd300023e70, instance_id=2) at controller/src/bgp/bgp_xmpp_channel.cc:1628
#6 0x0000000000caa9d1 in BgpXmppChannel::ProcessSubscriptionRequest (this=0x7fd2d8031430, vrf_name="default-domain:admin:mgmt-vn:mgmt-vn", iq=0x7fd284080ac0,
    add_change=true) at controller/src/bgp/bgp_xmpp_channel.cc:2132
#7 0x0000000000cb28fd in BgpXmppChannel::ReceiveUpdate (this=0x7fd2d8031430, msg=0x7fd284080ac0) at controller/src/bgp/bgp_xmpp_channel.cc:2326
#8 0x0000000000cd79c2 in boost::_mfi::mf1<void, BgpXmppChannel, XmppStanza::XmppMessage const*>::operator() (this=0x7fd2f77fc358, p=0x7fd2d8031430, a1=0x7fd284080ac0)
    at /usr/include/boost/bind/mem_fn_template.hpp:165
#9 0x0000000000cd5e2b in boost::_bi::list2<boost::_bi::value<BgpXmppChannel*>, boost::arg<1> >::operator()<boost::_mfi::mf1<void, BgpXmppChannel, XmppStanza::XmppMessage const*>, boost::_bi::list2<XmppStanza::XmppMessage const*&, xmps::PeerState&> > (this=0x7fd2f77fc368, f=..., a=...) at /usr/include/boost/bind/bind.hpp:313
#10 0x0000000000cd3960 in boost::_bi::bind_t<void, boost::_mfi::mf1<void, BgpXmppChannel, XmppStanza::XmppMessage const*>, boost::_bi::list2<boost::_bi::value<BgpXmppChannel*>, boost::arg<1> > >::operator()<XmppStanza::XmppMessage const*, xmps::PeerState> (this=0x7fd2f77fc358, a1=@0x7fd2f77fc2b0: 0x7fd284080ac0,
    a2=@0x7fd2f77fc2ac: xmps::READY) at /usr/include/boost/bind/bind_template.hpp:61
#11 0x0000000000cd065f in boost::detail::function::void_function_obj_invoker2<boost::_bi::bind_t<void, boost::_mfi::mf1<void, BgpXmppChannel, XmppStanza::XmppMessage const*>, boost::_bi::list2<boost::_bi::value<BgpXmppChannel*>, boost::arg<1> > >, void, XmppStanza::XmppMessage const*, xmps::PeerState>::invoke (function_obj_ptr=...,
    a0=0x7fd284080ac0, a1=xmps::READY) at /usr/include/boost/function/function_template.hpp:153
#12 0x0000000000efbf44 in boost::function2<void, XmppStanza::XmppMessage const*, xmps::PeerState>::operator() (this=0x7fd2f77fc350, a0=0x7fd284080ac0, a1=xmps::READY)
    at /usr/include/boost/function/function_template.hpp:767
#13 0x0000000000efa99b in XmppChannelMux::ProcessXmppMessage (this=0x2fd1cb0, msg=0x7fd284080ac0) at controller/src/xmpp/xmpp_channel_mux.cc:200
#14 0x0000000000e89bbb in XmppConnection::ProcessXmppIqMessage (this=0x2fd1750, msg=0x7fd284080ac0) at controller/src/xmpp/xmpp_connection.cc:619
#15 0x0000000000ebc047 in xmsm::XmppStreamEstablished::react (this=0x7fd28c095ac0, event=...) at controller/src/xmpp/xmpp_state_machine.cc:1018
#16 0x0000000000ece5e0 in boost::statechart::custom_reaction<xmsm::EvXmppIqReceive>::react<xmsm::XmppStreamEstablished, boost::statechart::event_base, void const*> (
    stt=..., evt=..., eventType=@0x7fd2f77fc4a8: 0x19ccd30 <boost::statechart::detail::id_holder<xmsm::EvXmppIqReceive>::idProvider_>)
    at /usr/include/boost/statechart/custom_reaction.hpp:42

Revision history for this message
Ananth Suryanarayana (anantha-l) wrote :

core and binary are in /cs-shared/bugs/1645520/files.tgz

tags: added: contrail-control graceful-restart
Changed in juniperopenstack:
status: New → In Progress
Changed in juniperopenstack:
milestone: none → r3.2.0.0-fcs
no longer affects: juniperopenstack
Revision history for this message
Ananth Suryanarayana (anantha-l) wrote :
Download full text (4.8 KiB)

membership_mgr_available_ flag is required because of Register/Unregister call ack via trigger for sweep from PeerCloseManager into BgpXmppChannel. This should not happen, as there could be pending subscribes/unsubscribes which still need to be processed first, before unsubscribe as part of sweep process is executed

(gdb) p routingtable_membership_request_map_
$1 = std::map with 1 elements = {
  ["instance2.ermvpn.0"] = {
    current_req = BgpXmppChannel::UNSUBSCRIBE,
    instance_id = -1,
    pending_req = BgpXmppChannel::UNSUBSCRIBE
  }
}

(gdb) bt
#0 0x00002aaaad013c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00002aaaad017028 in __GI_abort () at abort.c:89
#2 0x00002aaaad00cbf6 in __assert_fail_base (fmt=0x2aaaad15d3b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x1da4000 "
close_manager_->IsMembershipInUse()", file=file@entry=0x1da3a18 "controller/src/bgp/bgp_xmpp_channel.cc", line=line@entry=1641, function=function@e
ntry=0x1da6780 <BgpXmppChannel::UnregisterTable(int, BgpTable*)::__PRETTY_FUNCTION__> "void BgpXmppChannel::UnregisterTable(int, BgpTable*)") at as
sert.c:92
#3 0x00002aaaad00cca2 in __GI___assert_fail (assertion=0x1da4000 "close_manager_->IsMembershipInUse()", file=0x1da3a18 "controller/src/bgp/bgp_xmp
p_channel.cc", line=1641, function=0x1da6780 <BgpXmppChannel::UnregisterTable(int, BgpTable*)::__PRETTY_FUNCTION__> "void BgpXmppChannel::Unregiste
rTable(int, BgpTable*)") at assert.c:101
#4 0x0000000001714874 in BgpXmppChannel::UnregisterTable (this=0x2aaadc0099c0, line=2157, table=0x2aaabc003e30) at controller/src/bgp/bgp_xmpp_channel.cc:1641
#5 0x00000000017223c1 in BgpXmppChannel::ProcessSubscriptionRequest (this=0x2aaadc0099c0, vrf_name="instance2", iq=0x0, add_change=false) at controller/src/bgp/bgp_xmpp_channel.cc:2157
#6 0x000000000171b4c6 in BgpXmppChannel::SweepCurrentSubscriptions (this=0x2aaadc0099c0) at controller/src/bgp/bgp_xmpp_channel.cc:1930
#7 0x0000000001751f58 in BgpXmppPeerClose::GracefulRestartSweep (this=0x2aaadc0086d0) at controller/src/bgp/bgp_xmpp_peer_close.cc:70
#8 0x0000000001547c9f in PeerCloseManager::TriggerSweepStateActions (this=0x2aaadc009e10) at controller/src/bgp/peer_close_manager.cc:405
#9 0x0000000001551b39 in PeerCloseManager::MembershipRequestCallback (this=0x2aaadc009e10, event=0x2aaacc10a2f0) at controller/src/bgp/peer_close_manager.cc:591
#10 0x00000000015533d8 in PeerCloseManager::EventCallback (this=0x2aaadc009e10, event=0x2aaacc10a2f0) at controller/src/bgp/peer_close_manager.cc:759
#11 0x0000000001559242 in boost::_mfi::mf1<bool, PeerCloseManager, PeerCloseManager::Event*>::operator() (this=0x2aaab4ce8b18, p=0x2aaadc009e10, a1=0x2aaacc10a2f0) at /usr/include/boost/bind/mem_fn_template.hpp:165
#12 0x0000000001558a74 in boost::_bi::list2<boost::_bi::value<PeerCloseManager*>, boost::arg<1> >::operator()<bool, boost::_mfi::mf1<bool, PeerCloseManager, PeerCloseManager::Event*>, boost::_bi::list1<PeerCloseManager::Event*&> > (this=0x2aaab4ce8b28, f=..., a=...) at /usr/include/boost/bind/bind.hpp:303
#13 0x0000000001557fda in boost::_bi::bind_t<bool, boost::_mfi::mf1<bool, PeerCloseManager, PeerCloseMan...

Read more...

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/26554
Submitter: Ananth Suryanarayana (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/26555
Submitter: Ananth Suryanarayana (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/26554
Submitter: Ananth Suryanarayana (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/26555
Submitter: Ananth Suryanarayana (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/26554
Submitter: Ananth Suryanarayana (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/26555
Submitter: Ananth Suryanarayana (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/26555
Committed: http://github.org/Juniper/contrail-controller/commit/d8e1a25ef700c2894a75374489437696c19445b4
Submitter: Zuul (<email address hidden>)
Branch: master

commit d8e1a25ef700c2894a75374489437696c19445b4
Author: Ananth Suryanarayana <email address hidden>
Date: Mon Nov 28 17:30:16 2016 -0800

Do not process subscribe/unsubscribe if membership manager is in use

We already check this using a flag BgpXmppChannel::membership_mgr_available_.
This is convoluted and only takes care of a corner case, where in we wanted to
ensure that we never call Register() or Unregister() before all the deferred
subscriptions are fully processed. This could have happened off
SweepCurrentSubscriptions() called via PeerCloseManager::TriggerSweepActions()
before PeerCloseManager called peer_close_->MembershipRequestCallbackComplete()

A stack-trace showing this has been updated to the bug for further and future
reference.

Instead of doing such a convoluted check, just make sure that PeerCloseManager
sets its membership_state to NONE after notifying clients such as via calls
such as IPeerClose::GracefulRestartSweep(). This can make client's life
easier as they can always resort to PeerCloseManager::IsMembershipInUse() and
get accurate current state of affairs.

To cover the usual case, we have to instead check with
PeerCloseManager::IsMembershipInUse(). e.g. When close manager has started
using membership manager (say due to GR timer expiry) but membership manager
has not called IPeer::MembershipResponseHandler() yet, because walk is still
in progress. In this time, if new subscriptions come in, we must defer them
until close manager is done using the membership manager.

Change-Id: I45a2c0acafa7fa0ca45e29f937316689d050c5f2
Closes-Bug: #1645520

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/26554
Committed: http://github.org/Juniper/contrail-controller/commit/e87f6ba9719250e2e50b98043f69fbb8d481179f
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit e87f6ba9719250e2e50b98043f69fbb8d481179f
Author: Ananth Suryanarayana <email address hidden>
Date: Mon Nov 28 17:30:16 2016 -0800

Do not process subscribe/unsubscribe if membership manager is in use

We already check this using a flag BgpXmppChannel::membership_mgr_available_.
This is convoluted and only takes care of a corner case, where in we wanted to
ensure that we never call Register() or Unregister() before all the deferred
subscriptions are fully processed. This could have happened off
SweepCurrentSubscriptions() called via PeerCloseManager::TriggerSweepActions()
before PeerCloseManager called peer_close_->MembershipRequestCallbackComplete()

A stack-trace showing this has been updated to the bug for further and future
reference.

Instead of doing such a convoluted check, just make sure that PeerCloseManager
sets its membership_state to NONE after notifying clients such as via calls
such as IPeerClose::GracefulRestartSweep(). This can make client's life
easier as they can always resort to PeerCloseManager::IsMembershipInUse() and
get accurate current state of affairs.

To cover the usual case, we have to instead check with
PeerCloseManager::IsMembershipInUse(). e.g. When close manager has started
using membership manager (say due to GR timer expiry) but membership manager
has not called IPeer::MembershipResponseHandler() yet, because walk is still
in progress. In this time, if new subscriptions come in, we must defer them
until close manager is done using the membership manager.

Change-Id: I45a2c0acafa7fa0ca45e29f937316689d050c5f2
Closes-Bug: #1645520

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.