contrail-control crashed at IFMapServer::ClientRegister

Bug #1646407 reported by Sandip Dey
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.1
Fix Committed
High
Ananth Suryanarayana
R3.2
Fix Committed
High
Ananth Suryanarayana
R4.0
Fix Committed
High
Ananth Suryanarayana
Trunk
Fix Committed
High
Ananth Suryanarayana

Bug Description

3.2.0.0-1~mitaka

Contrail-control crashed in solution testbed.

Logs/core saved at http://10.204.216.50/Docs/bugs/<bug-id>

BT
===
#0 0x00007f9af4000078 in ?? ()
#1 0x00000000004a2ebf in IFMapServer::ClientRegister (this=0x7ffc81de5ba0, client=0x7f9af472afe0) at controller/src/ifmap/ifmap_server.cc:224
#2 0x00000000004a394d in IFMapServer::ProcessClientWork (this=0x7ffc81de5ba0, add=add@entry=true, client=0x7f9af472afe0) at controller/src/ifmap/ifmap_server.cc:262
#3 0x00000000004c73a2 in IFMapXmppChannel::ProcessVrSubscribe (this=0x7f9af43a6d00, identifier=...) at controller/src/ifmap/ifmap_xmpp.cc:209
#4 0x00000000004d4e4d in ChannelEventProcTask::Run (this=<optimized out>) at controller/src/ifmap/ifmap_xmpp.cc:72
#5 0x00000000006dbf4f in TaskImpl::execute (this=0x7f9b321a9240) at controller/src/base/task.cc:262
#6 0x00007f9b397bcb3a in ?? () from /usr/lib/libtbb.so.2
#7 0x00007f9b397b8816 in ?? () from /usr/lib/libtbb.so.2
#8 0x00007f9b397b7f4b in ?? () from /usr/lib/libtbb.so.2
#9 0x00007f9b397b40ff in ?? () from /usr/lib/libtbb.so.2
#10 0x00007f9b397b42f9 in ?? () from /usr/lib/libtbb.so.2
#11 0x00007f9b399d8184 in start_thread (arg=0x7f9b1bbfe700) at pthread_create.c:312
#12 0x00007f9b38aa937d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Jeba Paulaiyan (jebap)
Changed in juniperopenstack:
importance: Undecided → High
assignee: Ashish Ranjan (aranjan-n) → Nischal Sheth (nsheth)
Revision history for this message
Tapan Karwa (tkarwa) wrote :

Very strange core.

Its says
Program terminated with signal SIGSEGV, Segmentation fault.

#1 0x00000000004a2ebf in IFMapServer::ClientRegister (this=0x7ffc81de5ba0, client=
    0x7f9af472afe0) at controller/src/ifmap/ifmap_server.cc:224

The code is:
    cm_ret = client_map_.insert(make_pair(client->identifier(), client));
But, everything in this line of code is fine in the core. So, cant see how it can crash here because of segv.

One wild guess is that this could be related to bug 1643486 which Ananth is fixing since this code hasnt changed for years and there is no reason it will crash at the location shown by the bt.

Here,
(gdb) p *client
$12 = {
  _vptr.IFMapClient = 0x7f9af49945a0,
  static kIndexInvalid = -1,
  index_ = -192835104, <<<< not set yet since we are at line 224
  exporter_ = 0x0,
  msgs_sent_ = 0,
  msgs_blocked_ = 0,
  bytes_sent_ = 0,
  update_nodes_sent_ = 0,
  delete_nodes_sent_ = 0,
  update_links_sent_ = 0,
  delete_links_sent_ = 0,
  send_is_blocked_ = false,
  vm_map_ = std::map with 0 elements,
  name_ = "nodei27:192.168.1.8",
  created_at_ = 1480566217126441
}

(gdb) p client_map_
$13 = std::map with 5 elements = {
  ["default-global-system-config:nodei11"] = 0x7f9afc739f00,
  ["default-global-system-config:nodei37"] = 0x7f9b008cacb0,
  ["default-global-system-config:nodei6"] = 0x7f9ab49cb750,
  ["default-global-system-config:nodei8"] = 0x7f9ac497db80,
  ["default-global-system-config:nodel7"] = 0x7f9b047de870
}

(gdb) p *(IFMapXmppChannel::IFMapSender *) 0x7f9af472afe0
$10 = {
  <IFMapClient> = {
    _vptr.IFMapClient = 0x7f9af49945a0,
    static kIndexInvalid = -1,
    index_ = -192835104,
    exporter_ = 0x0,
    msgs_sent_ = 0,
    msgs_blocked_ = 0,
    bytes_sent_ = 0,
    update_nodes_sent_ = 0,
    delete_nodes_sent_ = 0,
    update_links_sent_ = 0,
    delete_links_sent_ = 0,
    send_is_blocked_ = false,
    vm_map_ = std::map with 0 elements,
    name_ = "nodei27:192.168.1.8",
    created_at_ = 1480566217126441
  },
  members of IFMapXmppChannel::IFMapSender:
  parent_ = 0x7f9af43a6d00,
  hostname_ = "nodei38",
  identifier_ = "default-global-system-config:nodei38"
}

Revision history for this message
Tapan Karwa (tkarwa) wrote :

Sandip,
Can you please double confirm the build.
I used the following:

/github-build/R3.2/1/ubuntu-14-04/mitaka/store/sandbox/

Revision history for this message
Sandip Dey (sandipd) wrote :

Hi Tapan

The build we got the core was R3.2 build 1 ubuntu14.04 mitaka.

-Sandip

Jeba Paulaiyan (jebap)
tags: added: crashes
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/33434
Submitter: Ananth Suryanarayana (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.0

Review in progress for https://review.opencontrail.org/33435
Submitter: Ananth Suryanarayana (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/33436
Submitter: Ananth Suryanarayana (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1

Review in progress for https://review.opencontrail.org/33437
Submitter: Ananth Suryanarayana (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/33434
Submitter: Ananth Suryanarayana (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.0

Review in progress for https://review.opencontrail.org/33435
Submitter: Ananth Suryanarayana (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/33436
Submitter: Ananth Suryanarayana (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1

Review in progress for https://review.opencontrail.org/33437
Submitter: Ananth Suryanarayana (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/33434
Submitter: Ananth Suryanarayana (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.0

Review in progress for https://review.opencontrail.org/33435
Submitter: Ananth Suryanarayana (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/33436
Submitter: Ananth Suryanarayana (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1

Review in progress for https://review.opencontrail.org/33437
Submitter: Ananth Suryanarayana (<email address hidden>)

Revision history for this message
Ananth Suryanarayana (anantha-l) wrote :

The actual reason for crash is indeed due to IFMapClient object (client) already deleted, especially its vtable. Since there is no entry (or rather invalid entry may be) for method identifier() from the derived class, segv is generated, rightly so..

Just for record (from another core with similar signature)

(gdb) info vtbl client IFMapXmppChannel::IFMapSender
vtable for 'IFMapClient' @ 0x7f83c08ef3b0 (subobject @ 0x7f83c34cb570):
[0]: 0x7f83c2ca7f70
[1]: 0x111
[2]: 0x7f83c3140bf0
[3]: 0x7f83c34cb560
[4]: 0x693c0a3e3f22302e
[5]: 0xdc7370 <vtable for pugi::xml_writer+16>
(gdb)

not good. identifier() method is not present in the vtable and hence we can
assert that client is deleted already.

Good one looks like this (for an object of another type), just e.g.

(gdb) f 8
#8 0x0000000000b53a3d in XmppSession::OnRead (this=0x2417ad0, buffer=...) at controller/src/xmpp/xmpp_session.cc:308
308 controller/src/xmpp/xmpp_session.cc: No such file or directory.
(gdb) info vtbl this
vtable for 'XmppSession' @ 0xdff3b0 (subobject @ 0x2417ad0):
[0]: 0xd0b020 <TcpSession::Send(unsigned char const*, unsigned long, unsigned long*)>
[1]: 0xd09c20 <TcpSession::Connected(boost::asio::ip::basic_endpoint<boost::asio::ip::tcp>)>
[2]: 0xd09820 <TcpSession::Accepted()>
[3]: 0x905550 <TcpSession::ToString() const>
[4]: 0xcefd10 <SslSession::socket() const>
[5]: 0xd05bf0 <TcpSession::ReleaseBuffer(boost::asio::const_buffer)>
[6]: 0xb53b10 <XmppSession::GetSessionInstance() const>
[7]: 0xd0ecc0 <TcpSession::SetSocketOptions()>
[8]: 0xd09640 <TcpSession::AsyncReadStart()>
[9]: 0xd05740 <TcpSession::SetDeferReader(bool)>
[10]: 0x9054e0 <TcpSession::IsReaderDeferred() const>
[11]: 0xcf0ad0 <SslSession::AsyncReadHandlerProcess(boost::asio::mutable_buffer, unsigned long&, boost::system::error_code&)>
[12]: 0xcefd30 <SslSession::CreateReaderTask(boost::asio::mutable_buffer, unsigned long)>
[13]: 0xb52730 <XmppSession::~XmppSession()>
[14]: 0xb52910 <XmppSession::~XmppSession()>
[15]: 0xb53890 <XmppSession::OnRead(boost::asio::const_buffer)>
[16]: 0xb522d0 <XmppSession::WriteReady(boost::system::error_code const&)>
[17]: 0xceffa0 <SslSession::AsyncReadSome(boost::asio::mutable_buffer)>
[18]: 0xcf0b20 <SslSession::WriteSome(unsigned char const*, unsigned long, boost::system::error_code&)>
[19]: 0xcf12a0 <SslSession::AsyncWrite(unsigned char const*, unsigned long)>
[20]: 0x9054f0 <TcpSession::reader_task_id() const>

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/33437
Committed: http://github.com/Juniper/contrail-controller/commit/61e987480ad807b3e5e6a9de685c6ddcd83b60c2
Submitter: Zuul (<email address hidden>)
Branch: R3.1

commit 61e987480ad807b3e5e6a9de685c6ddcd83b60c2
Author: Ananth Suryanarayana <email address hidden>
Date: Wed Jul 5 19:43:00 2017 -0700

Ensure that XmppChannel down event is invoked from XmppStateMachine task

Clients such as IFMapXmppChannel relies on serialization of events such
as VR Subscribe, Client delete etc. via XmppStateMachine task.

Currently, XmppConnection::Shutdown() code directly invoked channel down
notification breaking this assumption. Can can explain why possibly the
map was deleted (from another thread when channel itself was deleted) when
VrSubscribe was processed off XmppStateMachine task

Change-Id: If9524c6bf8b946a48d181a90eebfaa76519d02b8
Closes-Bug: 1646407

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/33435
Committed: http://github.com/Juniper/contrail-controller/commit/46d877b1d6e83f84364de4b4e72fb93471dd78b1
Submitter: Zuul (<email address hidden>)
Branch: R4.0

commit 46d877b1d6e83f84364de4b4e72fb93471dd78b1
Author: Ananth Suryanarayana <email address hidden>
Date: Wed Jul 5 19:43:00 2017 -0700

Ensure that XmppChannel down event is invoked from XmppStateMachine task

Clients such as IFMapXmppChannel relies on serialization of events such
as VR Subscribe, Client delete etc. via XmppStateMachine task.

Currently, XmppConnection::Shutdown() code directly invoked channel down
notification breaking this assumption. Can can explain why possibly the
map was deleted (from another thread when channel itself was deleted) when
VrSubscribe was processed off XmppStateMachine task

Change-Id: If9524c6bf8b946a48d181a90eebfaa76519d02b8
Closes-Bug: 1646407

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/33434
Committed: http://github.com/Juniper/contrail-controller/commit/db47ee5169867bb9b14c26f4f28785cd8799ae24
Submitter: Zuul (<email address hidden>)
Branch: master

commit db47ee5169867bb9b14c26f4f28785cd8799ae24
Author: Ananth Suryanarayana <email address hidden>
Date: Wed Jul 5 19:43:00 2017 -0700

Ensure that XmppChannel down event is invoked from XmppStateMachine task

Clients such as IFMapXmppChannel relies on serialization of events such
as VR Subscribe, Client delete etc. via XmppStateMachine task.

Currently, XmppConnection::Shutdown() code directly invoked channel down
notification breaking this assumption. Can can explain why possibly the
map was deleted (from another thread when channel itself was deleted) when
VrSubscribe was processed off XmppStateMachine task

Change-Id: If9524c6bf8b946a48d181a90eebfaa76519d02b8
Closes-Bug: 1646407

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/33436
Committed: http://github.com/Juniper/contrail-controller/commit/ff533e89d4128be37ff1476896a606ff909a64dd
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit ff533e89d4128be37ff1476896a606ff909a64dd
Author: Ananth Suryanarayana <email address hidden>
Date: Wed Jul 5 19:43:00 2017 -0700

Ensure that XmppChannel down event is invoked from XmppStateMachine task

Clients such as IFMapXmppChannel relies on serialization of events such
as VR Subscribe, Client delete etc. via XmppStateMachine task.

Currently, XmppConnection::Shutdown() code directly invoked channel down
notification breaking this assumption. Can can explain why possibly the
map was deleted (from another thread when channel itself was deleted) when
VrSubscribe was processed off XmppStateMachine task

Change-Id: If9524c6bf8b946a48d181a90eebfaa76519d02b8
Closes-Bug: 1646407

information type: Proprietary → Public
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.