[R2.20]DM: Make DM code robust to handle failure scenarios

Bug #1469986 reported by amit surana
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Suresh Balineni
Trunk
Fix Committed
High
Suresh Balineni

Bug Description

This is a catch-all bug to document a bunch of exception condition handling. DM should be able to handle these failures gracefully.

1. If a commit fails for whatever reason, DM gets stuck and will not attempt to push configs again until its restarted.
2. There could be commit failures if the irb/si-* interfaces created by DM already exist on the MX. DM needs to be able to handle this (1469991).
3. DM tries to do bulk push of large config blobs. This could slow down the netconf process on the MX and cause some configs to be missed. Configs need to be pushed reliably in smaller chunks.
4. There is a timing issue when deleting configs via VNC API. Even after deleting the physical router, the __contrail__ groups config is present on the MX.
5. DM is unaware of the status of the ssh netconf connection with the MX and fails to retry periodically (and sync the config) in the event that the connection goes down (1469366)

amit surana (asurana-t)
description: updated
Revision history for this message
Nischal Sheth (nsheth) wrote :

Should fix issue 1, 4 and 5 first so that we can get DM working reliably
for small to medium sized configs.

Revision history for this message
Nischal Sheth (nsheth) wrote :

Issue 2 is already tracked via a separate bug.
Issue 3 can be handled later as part of scale improvements.

tags: added: quench
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/14455
Submitter: Suresh Balineni (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/14455
Committed: http://github.org/Juniper/contrail-controller/commit/24fe008efeb62cad7147d4970e9f6cf01015eed2
Submitter: Zuul
Branch: R2.20

commit 24fe008efeb62cad7147d4970e9f6cf01015eed2
Author: sbalineni <email address hidden>
Date: Thu Oct 15 14:46:10 2015 -0700

DM: Periodic push of config on failure scenorio

- DM tries to re push the configuration periodically in failure case
- Default interval is set to 16 seconds, but can be provisioned
- Cleanup of some code

Change-Id: I5cc84844eebb18d27902ff69a4dbd4f4a7744fc5
Closes-Bug: #1469986

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/14622
Submitter: Suresh Balineni (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/14622
Committed: http://github.org/Juniper/contrail-controller/commit/b68d61caf88b68c5646836e8a00027fb1b4f5253
Submitter: Zuul
Branch: master

commit b68d61caf88b68c5646836e8a00027fb1b4f5253
Author: sbalineni <email address hidden>
Date: Fri Oct 23 10:42:04 2015 -0700

DM: Periodic push of config and control the push interval

- on commit failure, re push the config in incremental intervals
start with 15 seconds delay and double it until it reaches max config
value which is 600seconds, and then continue to re push
on succesful commit, reset this value to 15 seconds

default start time and max interval is daemon config parameters

- on succesful commit, wait for some time before pushing the next commit
this is mainly for not sending too many commits in short intervals.

delay is computed based on the previous commit size.
default value: 1 second delay per every 100kb

There is a daemon knob to disable this feature.

- Cleaned up some code

- Sometimes, it is noticed that Bgp Session paramter has config
something like with all empty values:

"session": [
{
"attributes": [
{
"address_families": {
"family": []
},
"auth_data": null,
"bgp_router": null
}
],
"uuid": null
}

DM should check for real presense of family value.

Closes-Bug: #1469986
Closes-Bug: #1499187

Change-Id: I708648e02e6dfd8bbdcef0bdb708882245e64c7e

Nischal Sheth (nsheth)
information type: Proprietary → Public
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.