R2.0 Centos6.5-havana-build-16: contrail-control core during control node restart

Bug #1402785 reported by shajuvk
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.0
Fix Released
High
Nischal Sheth
Trunk
Fix Released
High
Nischal Sheth

Bug Description

Test case : test_control_node_restart_and_validate_status_of_the_service created control core during sanity
sanity console: http://anamika.englab.juniper.net:8080/job/centos65_havana_Multi_Node_Sanity/81/consoleFull

Complete logs and cores location: /cs-shared/shaju/bugs/bug-1402785

Bt:
===

Core was generated by `/usr/bin/contrail-control'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000757d6d in BgpPath::IsVrfOriginated() const ()
#0 0x0000000000757d6d in BgpPath::IsVrfOriginated() const ()
#1 0x00000000007f594d in BgpXmppMessage::GetVirtualNetwork(BgpRoute const*) const ()
#2 0x00000000007f483b in BgpXmppMessage::AddEnetReach(BgpRoute const*, RibOutAttr const*) ()
#3 0x00000000007f4e74 in BgpXmppMessage::AddEnetRoute(BgpRoute const*, RibOutAttr const*) ()
#4 0x00000000007f39f0 in BgpXmppMessage::Start(RibOutAttr const*, BgpRoute const*) ()
#5 0x00000000007f5a17 in BgpXmppMessageBuilder::Create(BgpTable const*, RibOutAttr const*, BgpRoute const*) const ()
#6 0x0000000000724c23 in RibOutUpdates::DequeueCommon(UpdateMarker*, RouteUpdate*, RibPeerSet*) ()
#7 0x000000000072504a in RibOutUpdates::TailDequeue(int, RibPeerSet const&, RibPeerSet*) ()
#8 0x00000000007b5427 in SchedulingGroup::UpdateRibOut(RibOut*, int) ()
#9 0x00000000007b8157 in SchedulingGroup::Worker::Run() ()
#10 0x0000000000c9fbda in TaskImpl::execute() ()
#11 0x00002b87bd0dc18a in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x2b8888000900, parent=...,
    child=0x2b8890001040)
    at /ecbuilds/PipeLine/sb/third_party/tbb40_20111130oss/src/tbb/custom_scheduler.h:449
#12 0x00002b87bd0d3033 in tbb::internal::arena::process (this=0x1932d80, s=...)
    at /ecbuilds/PipeLine/sb/third_party/tbb40_20111130oss/src/tbb/arena.cpp:99
#13 0x00002b87bd0d1906 in tbb::internal::market::process (this=0x192ed80,
    j=...)
    at /ecbuilds/PipeLine/sb/third_party/tbb40_20111130oss/src/tbb/market.cpp:393
#14 0x00002b87bd0cc4bc in tbb::internal::rml::private_worker::run (
    this=0x1930600)
    at /ecbuilds/PipeLine/sb/third_party/tbb40_20111130oss/src/tbb/private_server.cpp:263
#15 0x00002b87bd0cc362 in tbb::internal::rml::private_worker::thread_routine (
    arg=0x1930600)
    at /ecbuilds/PipeLine/sb/third_party/tbb40_20111130oss/src/tbb/private_server.cpp:231
#16 0x00002b87bce939d1 in start_thread () from /lib64/libpthread.so.0
#17 0x00002b87bdd8fb5d in clone () from /lib64/libc.so.6

shajuvk (shajuvk)
description: updated
shajuvk (shajuvk)
summary: - R2.0 Centos6.5-havana-build-16: contrail-contol core during control node
- restart
+ R2.0 Centos6.5-havana-build-16: contrail-control core during control
+ node restart
tags: added: blocker
Changed in juniperopenstack:
milestone: r2.0-fcs → none
information type: Proprietary → Public
Revision history for this message
Ashish Ranjan (aranjan-n) wrote :

Hi,

From the core analysis, it looks like control-node crashed while trying to
access a deleted path data strucure. It is not clear though why that happened.

I have checked with Shaju.. Apparnetly this does not happen consistently and
many regressions have passed since. We have to continue debugging, but need
not be blocker for R2.0, IMO.

I also checked. The recent ribout change made is not part of this build. So that
change could not have cuaused this either. Not much has changed since last 2
months in bgp code. So, this is more likely a corner-case bug..

Regards,
Ananth

Nischal Sheth (nsheth)
Changed in juniperopenstack:
assignee: nobody → Nischal Sheth (nsheth)
status: New → In Progress
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/5786
Committed: http://github.org/Juniper/contrail-controller/commit/a22c89cec951b6bad04cea9ef6d1e2347822d6a1
Submitter: Zuul
Branch: R2.0

commit a22c89cec951b6bad04cea9ef6d1e2347822d6a1
Author: Nischal Sheth <email address hidden>
Date: Thu Dec 18 15:45:08 2014 -0800

Fix concurrency issue in XmppMessageBuilder

Do not access BgpPath from bgp::SendTask because the path can get
deleted from db::DBTable task which could be running in parallel.
Store vrf_originated_ attribute in RibOutAttr instead of accessing
the best path to figure out if the route is local Vrf Originated.

Change-Id: I47e45efa28ca51d68f1233c77606a2c499b0678c
Closes-Bug: 1402785

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/5785
Committed: http://github.org/Juniper/contrail-controller/commit/fd99d0d8e0fef67881c009f13a4352a46dbb5946
Submitter: Zuul
Branch: master

commit fd99d0d8e0fef67881c009f13a4352a46dbb5946
Author: Nischal Sheth <email address hidden>
Date: Thu Dec 18 15:45:08 2014 -0800

Fix concurrency issue in XmppMessageBuilder

Do not access BgpPath from bgp::SendTask because the path can get
deleted from db::DBTable task which could be running in parallel.
Store vrf_originated_ attribute in RibOutAttr instead of accessing
the best path to figure out if the route is local Vrf Originated.

Change-Id: I47e45efa28ca51d68f1233c77606a2c499b0678c
Closes-Bug: 1402785

Nischal Sheth (nsheth)
no longer affects: juniperopenstack/r2.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.