Implement EndOfRib heuristic for control-node to irond connection

Bug #1446869 reported by Nischal Sheth
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
Wishlist
Tapan Karwa
Trunk
Fix Committed
Wishlist
Tapan Karwa

Bug Description

The existing implementation uses a fixed timeout to clean up stale ifmap
node/link entries when the control-node connection to irond flaps. The
timer is started when the new connection is established.

The issue with using a fixed timer is that it's too conservative when the
configuration database is small and it's not sufficiently large when the
database is large. If the latter scenario happens, entries get cleaned up
prematurely, and vRouters also see premature deletion of configuration.

Proposal is to implement EndOfRib heuristics for the control-node to irond
connection as follows. Start a 5 second timer when the first PollRequest
is sent. If a PollResponse with a searchResult is received the timer is
restarted, and a new PollRequest is sent (this last part is already part
of the current state machine). Expiration of the timer triggers cleanup
of stale entries.

The reasoning here is that the heuristic only depends on the latency of a
PollRequest + PollResponse transaction, which should typically be ~50 ms.
Using a 5 second timer should ensure that we don't get false positives.
Note that the heuristic does not depend on the size of the configuration
database.

As an optimization, the timer can be expired immediately if a PollResponse
with updateResult or deleteResult is received. This indicates that there
are no more searchResults that irond has to send.

This EndOfRib heuristic can be used by control-node to determine when the
XmppServer is published to discovery. Note that this would be a separate
effort i.e. not covered by this bug.

Nischal Sheth (nsheth)
description: updated
summary: - Implement EndOfRib heuristics for control-node to irond connection
+ Implement EndOfRib heuristic for control-node to irond connection
description: updated
description: updated
description: updated
Nischal Sheth (nsheth)
information type: Proprietary → Public
Revision history for this message
Tapan Karwa (tkarwa) wrote :

From the TNC IFMap document:

The first time a pollResult contains search results for a new subscription, the search results MUST consist of the complete set of identifiers, links, and metadata for the subscription as specified in section 3.7.2.1. The complete search results are sent to the MAP Client in a searchResult element. Subsequent pollResults (known as “delta pollResults”) MUST contain
updates in updateResult elements as metadata is added and deletes in deleteResult elements as metadata is removed. The update Result and deleteResult elements returned by the server MUST reflect ALL of the updates which have occurred since the last poll which affect the client’s subscriptions.

A metadata change may introduce or remove a link between two subgraphs that match a subscription. When this happens after an initial searchResult message has been sent, the MAP Server informs the MAP Client about the metadata that was added or removed to the subscription using updateResult and deleteResult elements.

tags: added: quench
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/13105
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/13121
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/13105
Committed: http://github.org/Juniper/contrail-controller/commit/e48172f41d940bc3d89c2ab70f5ff8015b0f2573
Submitter: Zuul
Branch: master

commit e48172f41d940bc3d89c2ab70f5ff8015b0f2573
Author: Tapan Karwa <email address hidden>
Date: Mon Aug 3 08:29:19 2015 -0700

Change the stale cleanup functionality and add endOfRib detection logic.

The existing implementation uses a fixed timeout to clean up stale ifmap
node/link entries when the control-node connection to irond flaps. The ifmap
protocol sends only SearchResults when a connection comes up. Subsequent
adds/deletes are send via UpdateResults and DeleteResults. We use this fact to
decide we have reached end of rib i.e. all existing config has been received.

We start/restart a timer for each SearchResult received. Since, SearchResults
will keep coming in until all data has been downloaded, the timer will make us
wait before we cleanup entries that became stale while the connection was down.
Using a static timer value will not work well for large configs but this scheme
will.

Also, adding logic to detect endOfRib when the daemon first comes up. We use
the same logic as above to detect that we have received all available config.
This is done only when the daemon comes up for the first time. In contrast, the
stale timer functionality is used only when the connection goes down. Detecting
that all config has been received will be used to advertise the control-node to
Discovery only after complete config download from irond.

Also, some name changes for consistency.

Change-Id: Ib14f45f489591679fa406d4af41a846777f4bb28
Closes-Bug: #1446869

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/13121
Committed: http://github.org/Juniper/contrail-controller/commit/0dd2befb5d5fa0c18b350ea9f153aa4ec796a81d
Submitter: Zuul
Branch: R2.20

commit 0dd2befb5d5fa0c18b350ea9f153aa4ec796a81d
Author: Tapan Karwa <email address hidden>
Date: Mon Aug 3 08:29:19 2015 -0700

Change the stale cleanup functionality and add endOfRib detection logic.

The existing implementation uses a fixed timeout to clean up stale ifmap
node/link entries when the control-node connection to irond flaps. The ifmap
protocol sends only SearchResults when a connection comes up. Subsequent
adds/deletes are send via UpdateResults and DeleteResults. We use this fact to
decide we have reached end of rib i.e. all existing config has been received.

We start/restart a timer for each SearchResult received. Since, SearchResults
will keep coming in until all data has been downloaded, the timer will make us
wait before we cleanup entries that became stale while the connection was down.
Using a static timer value will not work well for large configs but this scheme
will.

Also, adding logic to detect endOfRib when the daemon first comes up. We use
the same logic as above to detect that we have received all available config.
This is done only when the daemon comes up for the first time. In contrast, the
stale timer functionality is used only when the connection goes down. Detecting
that all config has been received will be used to advertise the control-node to
Discovery only after complete config download from irond.

Also, some name changes for consistency.

Change-Id: Ib14f45f489591679fa406d4af41a846777f4bb28
Closes-Bug: #1446869

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/13447
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/13490
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/13490
Committed: http://github.org/Juniper/contrail-controller/commit/d55a1bc2ae2afbe0e6a9078c8401b9bf84252914
Submitter: Zuul
Branch: R2.20

commit d55a1bc2ae2afbe0e6a9078c8401b9bf84252914
Author: Tapan Karwa <email address hidden>
Date: Mon Aug 31 14:11:21 2015 -0700

Move the stale cleanup timer to IFMapChannel

Also, stop the end of rib timer if its running and the connection goes down.

Change-Id: I877b91815f7d89e09c50b8fd2fd0ff011b84dc70
Closes-Bug: #1446869

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/13447
Committed: http://github.org/Juniper/contrail-controller/commit/3ce95d2a3e0812979748630e86459e1609d8eb30
Submitter: Zuul
Branch: master

commit 3ce95d2a3e0812979748630e86459e1609d8eb30
Author: Tapan Karwa <email address hidden>
Date: Mon Aug 31 14:11:21 2015 -0700

Move the stale cleanup timer to IFMapChannel

Also, stop the end of rib timer if its running and the connection goes down.

Change-Id: I877b91815f7d89e09c50b8fd2fd0ff011b84dc70
Closes-Bug: #1446869

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.22-dev

Review in progress for https://review.opencontrail.org/13927
Submitter: Vinay Vithal Mahuli (<email address hidden>)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.