Concurrency issue in RibUpdateMonitor::MergeUpdate

Bug #1451306 reported by Nischal Sheth
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R1.1
Fix Committed
High
Nischal Sheth
R2.0
Fix Committed
High
Nischal Sheth
R2.1
Fix Committed
High
Nischal Sheth
R2.20
Fix Committed
High
Nischal Sheth
Trunk
Fix Committed
High
Nischal Sheth

Bug Description

Noticed that peer Join to a table occasionally does not download all
routes in the table to the peer. Updates for some routes get stuck in
the bulk update queue.

Root cause is a concurrency issue in RibUpdateMonitor::MergeUpdate.

If there's no DBState for the route, EnqueueUpdateUnlocked is called
without locking monitor mutex. If another Task (bgp::SendTask) is in
GetNextUpdate, it's possible that EnqueueUpdateUnlocked returns false,
even though the last UpdateEntry on the queue is being dequeued by the
bgp::SendTask Task. This is so because UpdateQueue::NextUpdate and
UpdateQueue::MoveMarker are 2 separate operations.

As a consequence, BgpExport::Join will not kick the SchedulingGroup
to start a tail dequeue for the bulk queue.

Nischal Sheth (nsheth)
description: updated
Nischal Sheth (nsheth)
description: updated
description: updated
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : master

Review in progress for https://review.opencontrail.org/9872
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.20

Review in progress for https://review.opencontrail.org/9873
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.1

Review in progress for https://review.opencontrail.org/9874
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.0

Review in progress for https://review.opencontrail.org/9875
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R1.10

Review in progress for https://review.opencontrail.org/9876
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/9873
Committed: http://github.org/Juniper/contrail-controller/commit/8401953c0d6d0ff487b7adc9221415815188e25f
Submitter: Zuul
Branch: R2.20

commit 8401953c0d6d0ff487b7adc9221415815188e25f
Author: Nischal Sheth <email address hidden>
Date: Sun May 3 19:31:48 2015 -0700

Fix concurrency issue in RibUpdateMonitor::MergeUpdate

The return value of RibUpdateMonitor::EnqueueUpdateUnlocked could be
incorrect if it's called without holding the monitor mutex. That in
turn prevents BgpExport::Join from kicking SchedulingGroup to start
a tail dequeue for the bulk UpdateQueue.

Change-Id: I1d2a4418346bb0e90eaed68a3a535d9bc741825e
Closes-Bug: #1451306

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/9872
Committed: http://github.org/Juniper/contrail-controller/commit/4227f27167d96df7a4a92e48e8b1bccd375624b3
Submitter: Zuul
Branch: master

commit 4227f27167d96df7a4a92e48e8b1bccd375624b3
Author: Nischal Sheth <email address hidden>
Date: Sun May 3 19:31:48 2015 -0700

Fix concurrency issue in RibUpdateMonitor::MergeUpdate

The return value of RibUpdateMonitor::EnqueueUpdateUnlocked could be
incorrect if it's called without holding the monitor mutex. That in
turn prevents BgpExport::Join from kicking SchedulingGroup to start
a tail dequeue for the bulk UpdateQueue.

Change-Id: I1d2a4418346bb0e90eaed68a3a535d9bc741825e
Closes-Bug: #1451306

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/9874
Committed: http://github.org/Juniper/contrail-controller/commit/a814f8a0b3d2de15b8bec98c0ac6010715043645
Submitter: Zuul
Branch: R2.1

commit a814f8a0b3d2de15b8bec98c0ac6010715043645
Author: Nischal Sheth <email address hidden>
Date: Sun May 3 19:31:48 2015 -0700

Fix concurrency issue in RibUpdateMonitor::MergeUpdate

The return value of RibUpdateMonitor::EnqueueUpdateUnlocked could be
incorrect if it's called without holding the monitor mutex. That in
turn prevents BgpExport::Join from kicking SchedulingGroup to start
a tail dequeue for the bulk UpdateQueue.

Change-Id: I1d2a4418346bb0e90eaed68a3a535d9bc741825e
Closes-Bug: #1451306

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/9875
Committed: http://github.org/Juniper/contrail-controller/commit/90bad7def4a060f0fc5e7cd77a9e707777ff8d20
Submitter: Zuul
Branch: R2.0

commit 90bad7def4a060f0fc5e7cd77a9e707777ff8d20
Author: Nischal Sheth <email address hidden>
Date: Sun May 3 19:31:48 2015 -0700

Fix concurrency issue in RibUpdateMonitor::MergeUpdate

The return value of RibUpdateMonitor::EnqueueUpdateUnlocked could be
incorrect if it's called without holding the monitor mutex. That in
turn prevents BgpExport::Join from kicking SchedulingGroup to start
a tail dequeue for the bulk UpdateQueue.

Change-Id: I1d2a4418346bb0e90eaed68a3a535d9bc741825e
Closes-Bug: #1451306

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/9876
Committed: http://github.org/Juniper/contrail-controller/commit/9810b260da993e09dc510c43abf7b1a0b37a5c6d
Submitter: Zuul
Branch: R1.10

commit 9810b260da993e09dc510c43abf7b1a0b37a5c6d
Author: Nischal Sheth <email address hidden>
Date: Sun May 3 19:31:48 2015 -0700

Fix concurrency issue in RibUpdateMonitor::MergeUpdate

The return value of RibUpdateMonitor::EnqueueUpdateUnlocked could be
incorrect if it's called without holding the monitor mutex. That in
turn prevents BgpExport::Join from kicking SchedulingGroup to start
a tail dequeue for the bulk UpdateQueue.

Change-Id: I1d2a4418346bb0e90eaed68a3a535d9bc741825e
Closes-Bug: #1451306

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.