Segmentation fault from SchedulingGroup::UpdateRibOut

Bug #1464106 reported by Nischal Sheth
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
Medium
Nischal Sheth
Trunk
Fix Committed
Medium
Nischal Sheth

Bug Description

Segmentation fault with the following backtrace was observed when
running bgp_stress_test:

#0 0x0000000000f3dc84 in std::vector<unsigned long, std::allocator<unsigned long> >::size (this=0x10) at /usr/include/c++/4.6/bits/stl_vector.h:571
#1 0x00000000014e081e in BitSet::find_first (this=0x10) at controller/src/base/bitset.cc:243
#2 0x000000000107f5f4 in SchedulingGroup::RibState::begin (this=0x0, indexmap=...) at controller/src/bgp/scheduling_group.cc:320
#3 0x000000000107c12c in SchedulingGroup::BuildSyncUnsyncBitSet (this=0x38ae760, ribout=0x3919a40, rs=0x0, msync=0x7fcbd88beb50, munsync=0x7fcbd88beb70) at controller/src/bgp/scheduling_group.cc:927
#4 0x000000000107c8df in SchedulingGroup::UpdateRibOut (this=0x38ae760, ribout=0x3919a40, queue_id=1) at controller/src/bgp/scheduling_group.cc:1063
#5 0x000000000107f86c in SchedulingGroup::Worker::Run (this=0x3afbef0) at controller/src/bgp/scheduling_group.cc:444
#6 0x00000000014f66e9 in TaskImpl::execute (this=0x3ab9240) at controller/src/base/task.cc:238
#7 0x00007fcbdf84cece in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x3a3d480, parent=..., child=0x3ab9240)
    at /build/nsheth/default/third_party/tbb40_20111130oss/src/tbb/custom_scheduler.h:449
#8 0x00007fcbdf843e0b in tbb::internal::arena::process (this=0x395cd00, s=...) at /build/nsheth/default/third_party/tbb40_20111130oss/src/tbb/arena.cpp:99
#9 0x00007fcbdf8426f2 in tbb::internal::market::process (this=0x38d0280, j=...) at /build/nsheth/default/third_party/tbb40_20111130oss/src/tbb/market.cpp:393
#10 0x00007fcbdf83d3ce in tbb::internal::rml::private_worker::run (this=0x3956400) at /build/nsheth/default/third_party/tbb40_20111130oss/src/tbb/private_server.cpp:263
#11 0x00007fcbdf83d270 in tbb::internal::rml::private_worker::thread_routine (arg=0x3956400) at /build/nsheth/default/third_party/tbb40_20111130oss/src/tbb/private_server.cpp:231
#12 0x00007fcbdfa76e9a in start_thread (arg=0x7fcbd88bf700) at pthread_create.c:308
#13 0x00007fcbde5e42ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#14 0x0000000000000000 in ?? ()

Happens because the RibOut in question has been removed from the
scheduling group by the time the WorkRibOut item is processed.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/11513
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/11515
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11513
Committed: http://github.org/Juniper/contrail-controller/commit/b4966807e252521fa6521a1db6ab69b23f67b47c
Submitter: Zuul
Branch: master

commit b4966807e252521fa6521a1db6ab69b23f67b47c
Author: Nischal Sheth <email address hidden>
Date: Wed Jun 10 22:05:03 2015 -0700

Fix race condition in SchedulingGroup::Worker

The WorkQueue in the SchedulingGroup contains naked pointers to RibOuts
and IPeerUpdates without any reference counting. If the IPeerUpdate or
RibOut is removed from the SchedulingGroup while there are corresponding
WorkPeer or WorkRibOut entries in the WorkQueue, the Worker::Run method
crashes when it processes the WorkPeer or WorkRibOut.

Fix is to invalidate the relevant WorkQueue entries when an IPeerUpdate
or RibOut is removed from the SchedulingGroup.

Change-Id: I496fc91b85c3c55786676639c0a146cbda2d730c
Closes-Bug: 1464106

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/11515
Committed: http://github.org/Juniper/contrail-controller/commit/aae4b23a629dec070e63a6c32b7a49252ca68e83
Submitter: Zuul
Branch: R2.20

commit aae4b23a629dec070e63a6c32b7a49252ca68e83
Author: Nischal Sheth <email address hidden>
Date: Thu Jun 11 11:17:41 2015 -0700

Fix race condition in SchedulingGroup::Worker

The WorkQueue in the SchedulingGroup contains naked pointers to RibOuts
and IPeerUpdates without any reference counting. If the IPeerUpdate or
RibOut is removed from the SchedulingGroup while there are corresponding
WorkPeer or WorkRibOut entries in the WorkQueue, the Worker::Run method
crashes when it processes the WorkPeer or WorkRibOut.

Fix is to invalidate the relevant WorkQueue entries when an IPeerUpdate
or RibOut is removed from the SchedulingGroup.

Change-Id: I496fc91b85c3c55786676639c0a146cbda2d730c
Closes-Bug: 1464106

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/11566
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/11567
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11566
Committed: http://github.org/Juniper/contrail-controller/commit/945e7157875ce584902e6cc30bf1b3fe0e6fbc55
Submitter: Zuul
Branch: master

commit 945e7157875ce584902e6cc30bf1b3fe0e6fbc55
Author: Nischal Sheth <email address hidden>
Date: Fri Jun 12 10:28:06 2015 -0700

Additional verification for WorkRibOut/WorkPeer invalidate logic

The new tests identified an issue where WorkRibOuts/WorkPeers get
invalidated when splitting a SchedulingGroup.

Fix this by making sure that the invalidation logic kicks in only
when RibOut or IPeerUpdate is removed from SchedulingGroupManager,
not when it's removed from a SchedulingGroup since the latter may
happen because of a split/merge.

Change-Id: I9fde07df65b6dec3c1cb58376c761350b64d0ae2
Closes-Bug: 1464106

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/11567
Committed: http://github.org/Juniper/contrail-controller/commit/10ac9a8e9c6a1e69031784c3b0fb1d65ff89482f
Submitter: Zuul
Branch: R2.20

commit 10ac9a8e9c6a1e69031784c3b0fb1d65ff89482f
Author: Nischal Sheth <email address hidden>
Date: Fri Jun 12 10:28:06 2015 -0700

Additional verification for WorkRibOut/WorkPeer invalidate logic

The new tests identified an issue where WorkRibOuts/WorkPeers get
invalidated when splitting a SchedulingGroup.

Fix this by making sure that the invalidation logic kicks in only
when RibOut or IPeerUpdate is removed from SchedulingGroupManager,
not when it's removed from a SchedulingGroup since the latter may
happen because of a split/merge.

Change-Id: I9fde07df65b6dec3c1cb58376c761350b64d0ae2
Closes-Bug: 1464106

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.