contrail-schema failing to start when policy count threshold crossed

Bug #1643846 reported by Adrian Smith
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.21.x
Fix Committed
High
Sahil Sabharwal
R3.0
Fix Committed
High
Sahil Sabharwal
R3.1
Fix Committed
High
Sahil Sabharwal
R3.2
Fix Committed
High
Sahil Sabharwal
Trunk
Fix Committed
High
Sahil Sabharwal

Bug Description

On November 9th contrail-schema failed to start in one of our clusters (contrail 2.21.3-47.el6). The problem is due to the number of network policies in the cluster.

To work around the issue we applied the following patch to /usr/lib/python2.6/site-packages/cfgm_common/zkclient.py.

def _zk_listener(self, state):
        if state == KazooState.CONNECTED:
            if self._election:
                self._election.cancel()
            # Update connection info
            self._sandesh_connection_info_update(status='UP', message='')
        elif state == KazooState.LOST:
            # Lost the session with ZooKeeper Server
            # Best of option we have is to exit the process and restart all
            # over again
            self._sandesh_connection_info_update(status='DOWN',
                                      message='Connection to Zookeeper lost')
            if self._lost_cb:
                self._lost_cb()
            else:
- os._exit(2)
+ pass
        elif state == KazooState.SUSPENDED:
            # Update connection info
            self._sandesh_connection_info_update(status='INIT',
                message = 'Connection to zookeeper lost. Retrying')

This allows contrail-schema to start but needless to say it's a very short term fix.

The problem can be recreated as follows,

1. Create 2 networks.

2. Create 2 policies with 100 rules each and attach to the networks created in step 1. (see note 1 below)

3. Stop and start the contrail-schema service. I started it interactively rather than using the service as it gave me a little more control.

  /usr/bin/python /usr/bin/contrail-schema --conf_file /etc/contrail/contrail-schema.conf --conf_file /etc/contrail/contrail-keystone-auth.conf

4. Note the number of rules returned from /access-control-lists (see note 2) and the time it took contrail-schema to start (if it did actually start).

5. Repeat these steps until contrail-schema fails to start

In my testing contrail-schema failed to start when these numbers were reached,
- 263 virtual networks
- 717 network policies
- 98412 network policy rules as returned from /access-control-lists
- 2:40 time it takes contrail-schema to fail

*Note 1: Policy Details*
Each policy has 100 rules. Each rule looked like this,
 source: local
 source port: any
 dest: 10.0.0.$i/32
 dest_port: any
 protocol: tcp
 direction <>
 action pass

$i runs from 1 to 100.

*Note 2: Counting rules*
curl -s -o acls.json -H "X-Auth-Token: $TOKEN" "http://[CONTRAIL CONTROLLER IP]:9100/access-control-lists?count=False&detail=True"
grep -o rule_uuid acls.json | wc -l

Jeba Paulaiyan (jebap)
tags: added: config
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/26678
Submitter: <email address hidden> (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/26679
Submitter: <email address hidden> (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/26679
Committed: http://github.org/Juniper/contrail-controller/commit/0ba5b5eb2408da2132289057ca0522de00ed6867
Submitter: Zuul (<email address hidden>)
Branch: master

commit 0ba5b5eb2408da2132289057ca0522de00ed6867
Author: Sahil Sabharwal <email address hidden>
Date: Thu Dec 1 14:23:13 2016 -0800

Added Zookeeper timeout config knob in ST

Config knob "zk_timeout" for Zookeeper timeout is added in Schema Transformer.
ZookeeperClient now takes in arguemet for zk_timeout for which default has been
set to 400.

Change-Id: Ia1a5569292f4957dba4b3f64aaee997d7db9f9da
Closes-Bug: 1643846

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/26678
Committed: http://github.org/Juniper/contrail-controller/commit/72328963ff1b1cb3047ed677e0a8b4921b9dfb1b
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit 72328963ff1b1cb3047ed677e0a8b4921b9dfb1b
Author: Sahil Sabharwal <email address hidden>
Date: Thu Dec 1 14:23:13 2016 -0800

Added Zookeeper timeout config knob in ST

Config knob "zk_timeout" for Zookeeper timeout is added in Schema Transformer.
ZookeeperClient now takes in arguemet for zk_timeout for which default has been
set to 400.

Change-Id: Ia1a5569292f4957dba4b3f64aaee997d7db9f9da
Closes-Bug: 1643846

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/26912
Submitter: <email address hidden> (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1

Review in progress for https://review.opencontrail.org/26914
Submitter: <email address hidden> (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/26912
Committed: http://github.org/Juniper/contrail-controller/commit/e99fd6a24196c922f6e31b2e0c765cd63425155a
Submitter: Zuul (<email address hidden>)
Branch: R3.0

commit e99fd6a24196c922f6e31b2e0c765cd63425155a
Author: Sahil Sabharwal <email address hidden>
Date: Thu Dec 1 14:23:13 2016 -0800

Added Zookeeper timeout config knob in ST

Config knob "zk_timeout" for Zookeeper timeout is added in Schema Transformer.
ZookeeperClient now takes in arguemet for zk_timeout for which default has been
set to 400.

Change-Id: Ia1a5569292f4957dba4b3f64aaee997d7db9f9da
Closes-Bug: 1643846

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/26914
Committed: http://github.org/Juniper/contrail-controller/commit/793fffdc20da9ad50e0a190c7936995896bfd1f9
Submitter: Zuul (<email address hidden>)
Branch: R3.1

commit 793fffdc20da9ad50e0a190c7936995896bfd1f9
Author: Sahil Sabharwal <email address hidden>
Date: Thu Dec 1 14:23:13 2016 -0800

Added Zookeeper timeout config knob in ST

Config knob "zk_timeout" for Zookeeper timeout is added in Schema Transformer.
ZookeeperClient now takes in arguemet for zk_timeout for which default has been
set to 400.

Change-Id: Ia1a5569292f4957dba4b3f64aaee997d7db9f9da
Closes-Bug: 1643846

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/26968
Submitter: <email address hidden> (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/26968
Committed: http://github.org/Juniper/contrail-controller/commit/ee3b0f9d171b90a43556aca5a74fb2cc3554c789
Submitter: Zuul (<email address hidden>)
Branch: R2.21.x

commit ee3b0f9d171b90a43556aca5a74fb2cc3554c789
Author: Sahil Sabharwal <email address hidden>
Date: Thu Dec 1 14:23:13 2016 -0800

Added Zookeeper timeout config knob in ST

Config knob "zk_timeout" for Zookeeper timeout is added in Schema Transformer.
ZookeeperClient now takes in arguemet for zk_timeout for which default has been
set to 400.

Change-Id: Ia1a5569292f4957dba4b3f64aaee997d7db9f9da
Closes-Bug: 1643846
(cherry picked from commit 793fffdc20da9ad50e0a190c7936995896bfd1f9)

Fawad (fshaikh)
tags: added: wpc
Revision history for this message
Sachin Bansal (sbansal) wrote :

Also committed the following fix to add a knob in schema transformer config file to reduce the number of ACL rules if logical routers are disabled: https://review.opencontrail.org/27285

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/28353
Submitter: Sachin Bansal (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/28354
Submitter: Sachin Bansal (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1

Review in progress for https://review.opencontrail.org/28355
Submitter: Sachin Bansal (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/28356
Submitter: Sachin Bansal (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/28357
Submitter: Sachin Bansal (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/28354
Committed: http://github.org/Juniper/contrail-controller/commit/f3adcbb0f5e6ca9b3a8f060602452a499a1b459b
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit f3adcbb0f5e6ca9b3a8f060602452a499a1b459b
Author: Sachin Bansal <email address hidden>
Date: Wed Feb 1 17:45:04 2017 -0800

zk_timeout option must be read as integer

Change-Id: I679d75da227248a81e6841851ed26ad1eb390a1e
Closes-Bug: 1643846

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/28355
Committed: http://github.org/Juniper/contrail-controller/commit/3dec299db50d7052c3621b65b3d9e759a4582a30
Submitter: Zuul (<email address hidden>)
Branch: R3.1

commit 3dec299db50d7052c3621b65b3d9e759a4582a30
Author: Sachin Bansal <email address hidden>
Date: Wed Feb 1 17:45:04 2017 -0800

zk_timeout option must be read as integer

Change-Id: I679d75da227248a81e6841851ed26ad1eb390a1e
Closes-Bug: 1643846

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/28353
Committed: http://github.org/Juniper/contrail-controller/commit/6d18f14d97e3358a0defb75bd159db99e2196bdb
Submitter: Zuul (<email address hidden>)
Branch: master

commit 6d18f14d97e3358a0defb75bd159db99e2196bdb
Author: Sachin Bansal <email address hidden>
Date: Wed Feb 1 17:45:04 2017 -0800

zk_timeout option must be read as integer

Change-Id: I679d75da227248a81e6841851ed26ad1eb390a1e
Closes-Bug: 1643846

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/28357
Committed: http://github.org/Juniper/contrail-controller/commit/fd7d3c5cb2942d8968ef2ff186022e5e025ea29f
Submitter: Zuul (<email address hidden>)
Branch: R2.21.x

commit fd7d3c5cb2942d8968ef2ff186022e5e025ea29f
Author: Sachin Bansal <email address hidden>
Date: Wed Feb 1 17:48:44 2017 -0800

zk_timeout option must be read as integer

Change-Id: I679d75da227248a81e6841851ed26ad1eb390a1e
Closes-Bug: 1643846

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/28356
Committed: http://github.org/Juniper/contrail-controller/commit/5175b5de31bcc69965c49634399a2df2d2fbfe6d
Submitter: Zuul (<email address hidden>)
Branch: R3.0

commit 5175b5de31bcc69965c49634399a2df2d2fbfe6d
Author: Sachin Bansal <email address hidden>
Date: Wed Feb 1 17:45:04 2017 -0800

zk_timeout option must be read as integer

Change-Id: I679d75da227248a81e6841851ed26ad1eb390a1e
Closes-Bug: 1643846

information type: Proprietary → Public
Jim Reilly (jpreilly)
tags: added: att-aic-contrail
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.