[3.2.0.0-8~mitaka ] contrail-alarm-gen failed in scale setup

Bug #1648338 reported by chhandak
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.0
Fix Committed
High
Anish Mehta
R3.0.2.x
Fix Committed
High
Anish Mehta
R3.0.3.x
Fix Committed
High
Anish Mehta
R3.1
Fix Committed
High
Anish Mehta
R3.2
Fix Committed
High
Anish Mehta
Trunk
Fix Committed
High
Anish Mehta

Bug Description

In scale setup alarm-gen is in failed state and analytics-api stuck in initializing state for both the collector node

== Contrail Analytics ==
supervisor-analytics: active
contrail-alarm-gen:0 failed
contrail-analytics-api initializing (UvePartitions:UVE-Aggregation[Partitions:0] connection down)
contrail-analytics-nodemgr active
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology active

Alarm Gen Traceback
----------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 375, in _notify_links
    link(self)
  File "/usr/lib/python2.7/dist-packages/gevent/threading.py", line 22, in _cleanup
    __threading__._active.pop(id(g))
KeyError: 140509057985424
(<function _cleanup at 0x7fcad3bb2e60>, <ServicePoller at 0x7fcad07d2b90>) failed with KeyError

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 375, in _notify_links
    link(self)
  File "/usr/lib/python2.7/dist-packages/gevent/threading.py", line 22, in _cleanup
    __threading__._active.pop(id(g))
KeyError: 140509057985424
(<function _cleanup at 0x7fcad3bb2e60>, <ServicePoller at 0x7fcad07d2b90>) failed with KeyError

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 375, in _notify_links
    link(self)
  File "/usr/lib/python2.7/dist-packages/gevent/threading.py", line 22, in _cleanup
    __threading__._active.pop(id(g))
KeyError: 140509057985424
(<function _cleanup at 0x7fcad3bb2e60>, <ServicePoller at 0x7fcad07d2b90>) failed with KeyError

env.roledefs = {
    'all': [host1,host2,host3,host4,host5,host6,host7,host8,host9,host10],
    'cfgm': [host1,host2,host3],
    'openstack': [host1,host2,host3],
    'webui': [host1],
    'control': [host2,host3],
    'compute': [host4,host5,host6,host7,host8,host9,host10],
    'tsn': [host4,host5,host6,host7],
    'toragent': [host4,host5,host6,host7],
    'collector': [host2,host3],
    'database': [host1,host2,host3],
    'build': [host_build],
}

host1 ='root@10.87.121.68'
host2 ='root@10.87.121.69'
host3 ='root@10.87.121.70'
host4 ='root@10.87.121.71'
host5 ='root@10.87.121.72'
host6 ='root@10.87.121.73'
host7 ='root@10.87.121.74'
host8 ='root@10.87.121.75'
host9 ='root@10.87.121.76'
host10 ='root@10.87.121.77'

Tags: analytics
Revision history for this message
chhandak (chhandak) wrote :

Logs copied in /auto/cores/1648338

information type: Proprietary → Public
Revision history for this message
Anish Mehta (amehta00) wrote :
Revision history for this message
Anish Mehta (amehta00) wrote :
Revision history for this message
Anish Mehta (amehta00) wrote :
Revision history for this message
Anish Mehta (amehta00) wrote :

From: Anish Mehta <email address hidden>
Date: Thursday, December 8, 2016 at 11:41 AM
To: Chhandak Mukherjee <email address hidden>, Raj Reddy <email address hidden>
Subject: Re: [Bug 1648338] [NEW] [3.2.0.0-8~mitaka ] contrail-alarm-gen failed in scale setup

I have recovered the system by restarting alarmgens, and transferred the logs.
You can use the system now.

Alarmgen saw an unexpected sequence of UVE structure deletes of some XMPP peer objects, and exited.
I’ll investigate.

- Anish

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/27238
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/27239
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/27239
Committed: http://github.org/Juniper/contrail-controller/commit/3bb348e925189c8fb66a7f788f6a8a17c85f4909
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit 3bb348e925189c8fb66a7f788f6a8a17c85f4909
Author: Anish Mehta <email address hidden>
Date: Tue Dec 13 22:29:46 2016 -0800

Exiting alarmgen with errorcode, so it will automatically restart.
Closes-Bug:1648338

Change-Id: I1d7bf76271bc53625a396d214684f87e1062ca5e

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/27238
Committed: http://github.org/Juniper/contrail-controller/commit/5c328ce0f146eda7aabbf9f38d0ea51e02d8ea65
Submitter: Zuul (<email address hidden>)
Branch: master

commit 5c328ce0f146eda7aabbf9f38d0ea51e02d8ea65
Author: Anish Mehta <email address hidden>
Date: Tue Dec 13 22:29:46 2016 -0800

Exiting alarmgen with errorcode, so it will automatically restart.
Closes-Bug:1648338

Change-Id: I1d7bf76271bc53625a396d214684f87e1062ca5e

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1

Review in progress for https://review.opencontrail.org/28202
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/28203
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0.2.x

Review in progress for https://review.opencontrail.org/28204
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0.3.x

Review in progress for https://review.opencontrail.org/28205
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/28205
Committed: http://github.org/Juniper/contrail-controller/commit/6f270be131287cbc401daa5c67598b7bed71acd6
Submitter: Zuul (<email address hidden>)
Branch: R3.0.3.x

commit 6f270be131287cbc401daa5c67598b7bed71acd6
Author: Anish Mehta <email address hidden>
Date: Tue Dec 13 22:29:46 2016 -0800

Exiting alarmgen with errorcode, so it will automatically restart.
Closes-Bug:1648338

Change-Id: I1d7bf76271bc53625a396d214684f87e1062ca5e

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/28202
Committed: http://github.org/Juniper/contrail-controller/commit/1029c1dbdfcf523bc11b1f5a424828c8848b8ba4
Submitter: Zuul (<email address hidden>)
Branch: R3.1

commit 1029c1dbdfcf523bc11b1f5a424828c8848b8ba4
Author: Anish Mehta <email address hidden>
Date: Tue Dec 13 22:29:46 2016 -0800

Exiting alarmgen with errorcode, so it will automatically restart.
Closes-Bug:1648338

Change-Id: I1d7bf76271bc53625a396d214684f87e1062ca5e

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/28203
Committed: http://github.org/Juniper/contrail-controller/commit/c835a851728ad24df92da125328835061d38e6bc
Submitter: Zuul (<email address hidden>)
Branch: R3.0

commit c835a851728ad24df92da125328835061d38e6bc
Author: Anish Mehta <email address hidden>
Date: Tue Dec 13 22:29:46 2016 -0800

Exiting alarmgen with errorcode, so it will automatically restart.
Closes-Bug:1648338

Change-Id: I1d7bf76271bc53625a396d214684f87e1062ca5e

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/28204
Committed: http://github.org/Juniper/contrail-controller/commit/4182b2780ab32cef1b7134c6ceed4714459b8185
Submitter: Zuul (<email address hidden>)
Branch: R3.0.2.x

commit 4182b2780ab32cef1b7134c6ceed4714459b8185
Author: Anish Mehta <email address hidden>
Date: Tue Dec 13 22:29:46 2016 -0800

Exiting alarmgen with errorcode, so it will automatically restart.
Closes-Bug:1648338

Change-Id: I1d7bf76271bc53625a396d214684f87e1062ca5e

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.