Make SchedulingGroupManager::Leave more efficient

Bug #1461322 reported by Nischal Sheth
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Nischal Sheth
Trunk
Fix Committed
High
Nischal Sheth

Bug Description

Scaling tests for OVSDB have identified SchedulingGroupManager::Leave
as a heavy CPU user. The tests have a small number of agents that Join
a very large number of VNs. There are 4 TOR-Agents which subscribe to
2K VNs each and 1 TSN-Agent which subscribes to 8K VNs. System takes
23+ minutes to settle down when all these agents are killed. Profiling
data shows that bulk of time is spent in SchedulingGroupManager::Leave.

The numbers will be much worse when the number [TOR|TSN]-Agents and/or
VNs goes up.

This bug tracks improvements to handle this scenario more efficiently.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/11209
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/11210
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11210
Committed: http://github.org/Juniper/contrail-controller/commit/28ced7c439bc5c1a9fce628514683afac2b80d82
Submitter: Zuul
Branch: R2.20

commit 28ced7c439bc5c1a9fce628514683afac2b80d82
Author: Nischal Sheth <email address hidden>
Date: Tue Jun 2 13:28:41 2015 -0700

Make SchedulingGroup::Leave processing more efficient

Maintain a BitSet of advertised RibOut indices in the PeerState
so that PeerState::IsMember can use BitSet::test operation which
is O(1), instead of map::count operation which is O(Log(N)).

Time required to execute the new unit test went from 11.x seconds
to 2.x seconds.

Change-Id: Ia6daf9242fbd91793938897d7cc3bec8ac02df04
Partial-Bug: 1461322

Nischal Sheth (nsheth)
summary: - Make SchedulingGroup::Leave more efficient
+ Make SchedulingGroupManager::Leave more efficient
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/11250
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11209
Committed: http://github.org/Juniper/contrail-controller/commit/2157047e72e98aac0ed6313a5bda66f3088d33ce
Submitter: Zuul
Branch: master

commit 2157047e72e98aac0ed6313a5bda66f3088d33ce
Author: Nischal Sheth <email address hidden>
Date: Tue Jun 2 13:28:41 2015 -0700

Make SchedulingGroup::Leave processing more efficient

Maintain a BitSet of advertised RibOut indices in the PeerState
so that PeerState::IsMember can use BitSet::test operation which
is O(1), instead of map::count operation which is O(Log(N)).

Time required to execute the new unit test went from 11.x seconds
to 2.x seconds.

Change-Id: Ia6daf9242fbd91793938897d7cc3bec8ac02df04
Partial-Bug: 1461322

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/11260
Submitter: Nischal Sheth (<email address hidden>)

Nischal Sheth (nsheth)
description: updated
Nischal Sheth (nsheth)
description: updated
Nischal Sheth (nsheth)
description: updated
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11250
Committed: http://github.org/Juniper/contrail-controller/commit/4a0d8ad4357d466994ca4ee4a246497d70d42f82
Submitter: Zuul
Branch: R2.20

commit 4a0d8ad4357d466994ca4ee4a246497d70d42f82
Author: Nischal Sheth <email address hidden>
Date: Tue Jun 2 20:57:35 2015 -0700

More improvements to SchedulingGroupManager::Leave processing

Following changes are implemented:

- Make GetPeerRibList build a list of RibStates instead of RibOuts
since building a RibStateList is much chepaer and the RibOutList
is needed only if the group can be split (which is very rare).
- Tweak the code to check for overlap in the peers for advertised
and not-advertised RibStateLists to improve performance.
- Add couple more tests to exercise Leave code with large number of
RibOuts.

Change-Id: Iea5458b24c1f4e4be5acd70409b733bd141eda58
Partial-Bug: 1461322

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/11297
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11260
Committed: http://github.org/Juniper/contrail-controller/commit/d7524ab300236c8ea7f1469185ea6f4f73115bb5
Submitter: Zuul
Branch: master

commit d7524ab300236c8ea7f1469185ea6f4f73115bb5
Author: Nischal Sheth <email address hidden>
Date: Tue Jun 2 20:57:35 2015 -0700

More improvements to SchedulingGroupManager::Leave processing

Following changes are implemented:

- Make GetPeerRibList build a list of RibStates instead of RibOuts
since building a RibStateList is much chepaer and the RibOutList
is needed only if the group can be split (which is very rare).
- Tweak the code to check for overlap in the peers for advertised
and not-advertised RibStateLists to improve performance.
- Add couple more tests to exercise Leave code with large number of
RibOuts.

Change-Id: Iea5458b24c1f4e4be5acd70409b733bd141eda58
Partial-Bug: 1461322

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/11316
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11297
Committed: http://github.org/Juniper/contrail-controller/commit/15c9b95e5b01f3d00d2ae61e69f31d895aed7361
Submitter: Zuul
Branch: R2.20

commit 15c9b95e5b01f3d00d2ae61e69f31d895aed7361
Author: Nischal Sheth <email address hidden>
Date: Thu Jun 4 14:23:09 2015 -0700

Implement heuristic to disable SchedulingGroup splitting

It's not worth trying to split a group with a very large number of
(ribout, peer) members. The chances that such a group can be split
are pretty low and the amount of effort spent in figuring it out is
quite large.

Implement a heuristic/hack to make a group split ineligible once it
has more than a certain number of members. This is a sticky property
i.e. the group can never be split once this property gets set.

If/when more parallelism is required on the xmpp send side, it can
be achieved by creating multiple RibOuts for the same tables based
on a hash of the agent name/address.

Change-Id: I601f9aadc4e21a835b7e320f8aae430b5589929d
Partial-Bug: 1461322

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/11316
Committed: http://github.org/Juniper/contrail-controller/commit/47836432d6d9af8386835ae3e716a04f024015b7
Submitter: Zuul
Branch: master

commit 47836432d6d9af8386835ae3e716a04f024015b7
Author: Nischal Sheth <email address hidden>
Date: Thu Jun 4 14:23:09 2015 -0700

Implement heuristic to disable SchedulingGroup splitting

It's not worth trying to split a group with a very large number of
(ribout, peer) members. The chances that such a group can be split
are pretty low and the amount of effort spent in figuring it out is
quite large.

Implement a heuristic/hack to make a group split ineligible once it
has more than a certain number of members. This is a sticky property
i.e. the group can never be split once this property gets set.

If/when more parallelism is required on the xmpp send side, it can
be achieved by creating multiple RibOuts for the same tables based
on a hash of the agent name/address.

Change-Id: I601f9aadc4e21a835b7e320f8aae430b5589929d
Partial-Bug: 1461322

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/11423
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/11425
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11423
Committed: http://github.org/Juniper/contrail-controller/commit/c495b9673f50bf364e4e076cd79fdccce260d004
Submitter: Zuul
Branch: master

commit c495b9673f50bf364e4e076cd79fdccce260d004
Author: Nischal Sheth <email address hidden>
Date: Tue Jun 9 09:20:37 2015 -0700

Fix issue with signed and unsigned comparison

Change-Id: I1b7f43a77da4425b92a9e8429a35e2bcb7fea8fc
Partial-Bug: 1461322

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/11425
Committed: http://github.org/Juniper/contrail-controller/commit/a507e838000efdf33f8fe2a6597ce32d47d192eb
Submitter: Zuul
Branch: R2.20

commit a507e838000efdf33f8fe2a6597ce32d47d192eb
Author: Nischal Sheth <email address hidden>
Date: Tue Jun 9 09:20:37 2015 -0700

Fix issue with signed and unsigned comparison

Change-Id: I1b7f43a77da4425b92a9e8429a35e2bcb7fea8fc
Partial-Bug: 1461322

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.