Alarmgen is unstable/slow in scale setup (80,000 VNs)

Bug #1596795 reported by Anish Mehta
26
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.0
Fix Committed
Medium
Anish Mehta
R3.0.2.x
Fix Committed
Medium
Anish Mehta
Trunk
Fix Committed
Medium
Anish Mehta

Bug Description

On the scale setup, the following issues were observed in alarmgen:

1. After restart, Alarmgen takes 12 min+ to reflect the states of Node UVEs
2. During HA events, Alarmgen sometimes exits because it takes too long to give up partition ownership.

Anish Mehta (amehta00)
tags: added: scale
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/21480
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/21481
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/21480
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/21482
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/21483
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/21482
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/21483
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/21482
Committed: http://github.org/Juniper/contrail-provisioning/commit/91f9905bed0303fc687ceee4a06ab7b2d7d1f9e5
Submitter: Zuul
Branch: R3.0

commit 91f9905bed0303fc687ceee4a06ab7b2d7d1f9e5
Author: Anish Mehta <email address hidden>
Date: Mon Jun 27 23:14:21 2016 -0700

Ensure than number of partitions doesn't change across upgrade.
Closes-Bug: 1596795

Change-Id: I929a99253b3dd6e3253a8a393054dc7a4abc5d09

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/21483
Committed: http://github.org/Juniper/contrail-provisioning/commit/9653b07e60cf95c21715dd2e6147f6b652630473
Submitter: Zuul
Branch: master

commit 9653b07e60cf95c21715dd2e6147f6b652630473
Author: Anish Mehta <email address hidden>
Date: Mon Jun 27 23:14:21 2016 -0700

Ensure than number of partitions doesn't change across upgrade.
Closes-Bug: 1596795

Change-Id: I929a99253b3dd6e3253a8a393054dc7a4abc5d09

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0.2.x

Review in progress for https://review.opencontrail.org/21611
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/21611
Committed: http://github.org/Juniper/contrail-provisioning/commit/c4b8b2dbb44ce7ce7c17500e32d842c4c130f3c0
Submitter: Zuul
Branch: R3.0.2.x

commit c4b8b2dbb44ce7ce7c17500e32d842c4c130f3c0
Author: Anish Mehta <email address hidden>
Date: Mon Jun 27 23:14:21 2016 -0700

Ensure than number of partitions doesn't change across upgrade.
Closes-Bug: 1596795

Change-Id: I929a99253b3dd6e3253a8a393054dc7a4abc5d09

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0.2.x

Review in progress for https://review.opencontrail.org/21623
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/21480
Committed: http://github.org/Juniper/contrail-controller/commit/02e4b3b53332dafb8c53c402905c6849337e3f87
Submitter: Zuul
Branch: master

commit 02e4b3b53332dafb8c53c402905c6849337e3f87
Author: Anish Mehta <email address hidden>
Date: Mon Jun 27 22:33:57 2016 -0700

Scale fixes in Alarmgen.
- Timeout for releasing partition ownership is extended from 60s to 120s
- Took out printing of UVE Keys in log messages - these lists can get too big
- We only process upto 200 UVE updates at at time. Unlimited processing can starve other partitions
- Changed log levels of some libpartition logs
- Fixed discovery client ID of alarmgen to accomodate multiple alarmgens per analytics node
- Increased partitions from 15 to 30
Closes-Bug: 1596795

Change-Id: I32b57c3aba9c2760748da684cd3dff63e55121fd

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/21481
Committed: http://github.org/Juniper/contrail-controller/commit/fc29689147db7fff31b4d23d35c91e1aee88e076
Submitter: Zuul
Branch: R3.0

commit fc29689147db7fff31b4d23d35c91e1aee88e076
Author: Anish Mehta <email address hidden>
Date: Mon Jun 27 22:33:57 2016 -0700

Scale fixes in Alarmgen.
- Timeout for releasing partition ownership is extended from 60s to 120s
- Took out printing of UVE Keys in log messages - these lists can get too big
- We only process upto 200 UVE updates at at time. Unlimited processing can starve other partitions
- Changed log levels of some libpartition logs
- Fixed discovery client ID of alarmgen to accomodate multiple alarmgens per analytics node
- Increased partitions from 15 to 30
Closes-Bug: 1596795
Change-Id: Id4a8f9a7c411e8e1a9c223afd7be3455abfff855

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/21623
Committed: http://github.org/Juniper/contrail-controller/commit/377a688c9e10186facfe270bc10502a03061ded9
Submitter: Zuul
Branch: R3.0.2.x

commit 377a688c9e10186facfe270bc10502a03061ded9
Author: Anish Mehta <email address hidden>
Date: Mon Jun 27 22:33:57 2016 -0700

Scale fixes in Alarmgen.
- Timeout for releasing partition ownership is extended from 60s to 120s
- Took out printing of UVE Keys in log messages - these lists can get too big
- We only process upto 200 UVE updates at at time. Unlimited processing can starve other partitions
- Changed log levels of some libpartition logs
- Fixed discovery client ID of alarmgen to accomodate multiple alarmgens per analytics node
- Increased partitions from 15 to 30
Closes-Bug: 1596795

Change-Id: I32b57c3aba9c2760748da684cd3dff63e55121fd

Jim Reilly (jpreilly)
information type: Proprietary → Public
tags: added: att-aic-contrail
information type: Public → Private
information type: Private → Public
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.