cc13-OSP13:rhel-queens-5.0-145 analytics-alarm gen service timeout

Bug #1782291 reported by shajuvk on 2018-07-18
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R5.0
Fix Committed
High
alexey-mr
Trunk
Fix Committed
High
alexey-mr

Bug Description

== Contrail analytics ==
snmp-collector: active
query-engine: active
api: active
alarm-gen: timeout
nodemgr: active
collector: active
topology: active

logs:
==
[root@overcloudey3-ca-0 heat-admin]# sudo docker logs -f 47eb3ae0350b
07/17/2018 07:15:50 PM [contrail-alarm-gen] [INFO]: SANDESH: CONNECT TO COLLECTOR: True
07/17/2018 07:15:50 PM [contrail-alarm-gen] [ERROR]: Failed to import package "sandesh"
07/17/2018 07:15:50 PM [contrail-alarm-gen] [ERROR]: Failed to import package "sandesh"
07/17/2018 07:15:50 PM [contrail-alarm-gen] [INFO]: SANDESH: Logging: LEVEL: [SYS_INFO] -> [SYS_NOTICE]
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/gevent/greenlet.py", line 375, in _notify_links
    link(self)
  File "/usr/lib64/python2.7/site-packages/gevent/threading.py", line 22, in _cleanup
    __threading__._active.pop(id(g))
KeyError: 140701627214896
(<function _cleanup at 0x7ff7acbedf50>, <AnalyticsDiscovery at 0x7ff7a6826c30>) failed with KeyError

===

07/17/2018 09:47:11 PM [kafka.producer.sender] [ERROR]: Uncaught error in kafka producer I/O thread
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/kafka/producer/sender.py", line 60, in run
    self.run_once()
  File "/usr/lib/python2.7/site-packages/kafka/producer/sender.py", line 159, in run_once
    self._client.poll(poll_timeout_ms, sleep=True)
  File "/usr/lib/python2.7/site-packages/kafka/client_async.py", line 509, in poll
    responses.extend(self._poll(timeout, sleep=sleep))
  File "/usr/lib/python2.7/site-packages/kafka/client_async.py", line 526, in _poll
    ready = self._selector.select(timeout)
  File "/usr/lib/python2.7/site-packages/kafka/vendor/selectors34.py", line 340, in select
    r, w, _ = self._select(self._readers, self._writers, [], timeout)
TypeError: select() takes at most 4 arguments (5 given)
07/17/2018 09:47:11 PM [kafka.producer.sender] [ERROR]: Uncaught error in kafka producer I/O thread

==

shajuvk (shajuvk) on 2018-07-18
information type: Proprietary → Public
shajuvk (shajuvk) on 2018-07-18
summary: - cc13-OSP13:rhel-queens-5.0-145 analytics-alarm gen serviced failed
+ cc13-OSP13:rhel-queens-5.0-145 analytics-alarm gen service timeout
Jack Jonnalagadda (jackjvs) wrote :

(analytics-alarm-gen)[root@overcloudey3-ca-0 /root]$ tail -f /var/log/contrail/contrail-alarm-gen.log

07/19/2018 12:09:29 PM [kafka.producer.sender] [ERROR]: Uncaught error in kafka producer I/O thread
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/kafka/producer/sender.py", line 60, in run
    self.run_once()
  File "/usr/lib/python2.7/site-packages/kafka/producer/sender.py", line 159, in run_once
    self._client.poll(poll_timeout_ms, sleep=True)
  File "/usr/lib/python2.7/site-packages/kafka/client_async.py", line 509, in poll
    responses.extend(self._poll(timeout, sleep=sleep))
  File "/usr/lib/python2.7/site-packages/kafka/client_async.py", line 526, in _poll
    ready = self._selector.select(timeout)
  File "/usr/lib/python2.7/site-packages/kafka/vendor/selectors34.py", line 340, in select
    r, w, _ = self._select(self._readers, self._writers, [], timeout)
TypeError: select() takes at most 4 arguments (5 given)
07/19/2018 12:09:29 PM [kafka.producer.sender] [ERROR]: Uncaught error in kafka producer I/O thread
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/kafka/producer/sender.py", line 60, in run
    self.run_once()
  File "/usr/lib/python2.7/site-packages/kafka/producer/sender.py", line 159, in run_once
    self._client.poll(poll_timeout_ms, sleep=True)
  File "/usr/lib/python2.7/site-packages/kafka/client_async.py", line 509, in poll
    responses.extend(self._poll(timeout, sleep=sleep))
  File "/usr/lib/python2.7/site-packages/kafka/client_async.py", line 526, in _poll
    ready = self._selector.select(timeout)
  File "/usr/lib/python2.7/site-packages/kafka/vendor/selectors34.py", line 340, in select
    r, w, _ = self._select(self._readers, self._writers, [], timeout)
TypeError: select() takes at most 4 arguments (5 given)

Jack Jonnalagadda (jackjvs) wrote :
Download full text (12.1 KiB)

[root@overcloudey3-ca-0 heat-admin]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f7622b5b46c3 ci-repo.englab.juniper.net:5010/contrail-analytics-collector:rhel-queens-5.0-145 "/entrypoint.sh /u..." 11 hours ago Up 11 hours contrail_analytics_collector
9b20c9fa768f ci-repo.englab.juniper.net:5010/contrail-analytics-snmp-collector:rhel-queens-5.0-145 "/entrypoint.sh /u..." 11 hours ago Up 11 hours contrail_analytics_snmp_collector
6d4a908267a9 ci-repo.englab.juniper.net:5010/contrail-analytics-api:rhel-queens-5.0-145 "/entrypoint.sh /u..." 11 hours ago Up 11 hours contrail_anlytics_api
53f6ed1e4059 ci-repo.englab.juniper.net:5010/contrail-nodemgr:rhel-queens-5.0-145 "/entrypoint.sh /b..." 11 hours ago Up 11 hours contrail_analytics_nodemgr
cde85e305a03 ci-repo.englab.juniper.net:5010/contrail-analytics-topology:rhel-queens-5.0-145 "/entrypoint.sh /u..." 11 hours ago Up 11 hours contrail_analytics_topology
91d410f1980b ci-repo.englab.juniper.net:5010/contrail-analytics-alarm-gen:rhel-queens-5.0-145 "/entrypoint.sh /u..." 11 hours ago Up 11 hours contrail_analytics_alarmgen
dee1d3e1be93 ci-repo.englab.juniper.net:5010/contrail-analytics-query-engine:rhel-queens-5.0-145 "/entrypoint.sh /u..." 11 hours ago Up 11 hours contrail_analytics_queryengine
938ab93ecba1 192.0.2.1:8787/rhosp13/openstack-cron:13.0-43 "kolla_start" 11 hours ago Up 11 hours logrotate_crond
aead33803706 docker.io/redis "docker-entrypoint..." 11 hours ago Up 11 hours contrail_redis
[root@overcloudey3-ca-0 heat-admin]#
[root@overcloudey3-ca-0 heat-admin]#
[root@overcloudey3-ca-0 heat-admin]# sudo docker logs -f 91d410f1980b
07/19/2018 12:35:21 AM [contrail-alarm-gen] [INFO]: SANDESH: CONNECT TO COLLECTOR: True
07/19/2018 12:35:21 AM [contrail-alarm-gen] [ERROR]: Failed to import package "sandesh"
07/19/2018 12:35:21 AM [contrail-alarm-gen] [ERROR]: Failed to import package "sandesh"
07/19/2018 12:35:21 AM [contrail-alarm-gen] [INFO]: SANDESH: Logging: LEVEL: [SYS_INFO] -> [SYS_NOTICE]
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/gevent/greenlet.py", line 375, in _notify_links
    link(self)
  File "/usr/lib64/python2.7/site-packages/gevent/threading.py", line 22, in _cleanup
    __threading__._active.pop(id(g))
KeyError: 139984610413616
(<function _cleanup at 0x7f50bb36bf50>, <AnalyticsDiscovery at 0x7f50b4fa4c30>) failed with KeyError
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/gevent...

Jack Jonnalagadda (jackjvs) wrote :

(analytics-alarm-gen)[root@overcloudey3-ca-0 /]$ python
Python 2.7.5 (default, May 31 2018, 09:41:32)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pip
>>> installed_packages = pip.get_installed_distributions()
>>> installed_packages_list = sorted(["%s==%s" % (i.key, i.version)
... for i in installed_packages])
>>> print(installed_packages_list)
['amqp==2.1.4', 'anyjson==0.3.3', 'asn1crypto==0.23.0', 'backports.ssl-match-hostname==3.5.0.1', 'bitarray==0.8.1', 'blist==1.3.6', 'bottle==0.12.13', 'cassandra-driver==3.9.0', 'cffi==1.11.2', 'cfgm-common==0.1.dev0', 'char
det==2.2.1', 'consistent-hash==1.0', 'contrail-api-client==5.0.1', 'contrail-snmp-collector==0.2.0', 'contrail-topology==0.1.0', 'contrailanalyticscli==0.1', 'contrailcli==0.1', 'contrailprovisioning==0.1.dev0', 'cryptograph
y==2.1.4', 'decorator==3.4.0', 'enum34==1.0.4', 'ethtool==0.8', 'futures==3.1.1', 'gevent==1.0', 'geventhttpclient==1.0a0', 'greenlet==0.4.12', 'idna==2.5', 'iniparse==0.4', 'ipaddress==1.0.16', 'ipy==0.75', 'kafka-python==1
.3.1', 'kazoo==2.2.1', 'kitchen==1.1.1', 'kombu==4.0.2', 'libpartition==0.1.dev0', 'lxml==3.2.1', 'netaddr==0.7.19', 'netifaces==0.10.4', 'netsnmp-python==1.0a1', 'opserver==0.1.dev0', 'pbr==3.1.1', 'pip==8.1.2', 'ply==3.4',
 'policycoreutils-default-encoding==0.1', 'prettytable==0.7.2', 'psutil==5.2.2', 'pycassa==1.10.0', 'pycparser==2.14', 'pycurl==7.19.0', 'pygobject==3.22.0', 'pygpgme==0.3', 'pyinotify==0.9.4', 'pyliblzma==0.5.3', 'pyopenssl
==17.3.0', 'pysocks==1.5.6', 'python-dateutil==2.6.1', 'python-dmidecode==3.10.13', 'pyxattr==0.5.1', 'redis==2.10.3', 'requests==2.14.2', 'sandesh-common==0.1.dev0', 'sandesh==0.1.dev0', 'scales==1.0.5', 'seobject==0.1', 's
epolicy==1.1', 'setuptools==0.9.8', 'simplejson==3.5.3', 'six==1.10.0', 'sqlalchemy==1.2.2', 'sseclient==0.0.11', 'stevedore==1.28.0', 'subscription-manager==1.20.11', 'thrift==0.9.1', 'urlgrabber==3.10', 'urllib3==1.21.1',
'vine==1.1.3', 'xmltodict==0.7.0', 'yum-metadata-parser==1.1.4', 'zope.interface==4.0.5']
>>>

Jack Jonnalagadda (jackjvs) wrote :

root@a5s2:~# python
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pip
>>> installed_packages = pip.get_installed_distributions()
>>> installed_packages_list = sorted(["%s==%s" % (i.key, i.version)
... for i in installed_packages])
>>> print(installed_packages_list)
['boto==2.12.0', 'contrail-fabric-utils==0.1.dev0', 'contrailprovisioning==0.1.dev0', 'euca2ools==0.0.0', 'fabric==1.7.5', 'gevent==1.1.0', 'greenlet==0.4.14', 'servermanagercli==4.1.2.0.132']
>>>
root@a5s2:~#
root@a5s2:~#
root@a5s2:~# pip install gevent==1.1
Requirement already satisfied (use --upgrade to upgrade): gevent==1.1 in /usr/local/lib/python2.7/dist-packages
Requirement already satisfied (use --upgrade to upgrade): greenlet>=0.4.9 in /usr/local/lib/python2.7/dist-packages (from gevent==1.1)
Cleaning up...
root@a5s2:~# ls /usr/local/lib/python2.7/dist-packages
boto contrail_fabric_utils-0.1dev.egg-info euca2ools fabric gevent-1.1.0.egg-info servermanagercli-4.1.2.0.132.egg-info
boto-2.12.0.egg-info contrail_provisioning euca2ools-0.0.0.egg-info Fabric-1.7.5.egg-info greenlet-0.4.14.egg-info smgrcliapp
contrail_fabric_utils ContrailProvisioning-0.1dev.egg-info fabfile gevent greenlet.so
root@a5s2:~#
root@a5s2:~#
root@a5s2:~#
root@a5s2:~# python
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pip
>>> installed_packages = pip.get_installed_distributions()
>>> installed_packages_list = sorted(["%s==%s" % (i.key, i.version)
... for i in installed_packages])
>>> print(installed_packages_list)
['boto==2.12.0', 'contrail-fabric-utils==0.1.dev0', 'contrailprovisioning==0.1.dev0', 'euca2ools==0.0.0', 'fabric==1.7.5', 'gevent==1.1.0', 'greenlet==0.4.14', 'servermanagercli==4.1.2.0.132']

Jack Jonnalagadda (jackjvs) wrote :

On a working system gevent 1.1.2 is used
(analytics-alarm-gen)[root@a1s27 /]$ pip freeze | grep geven
You are using pip version 8.1.2, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
gevent==1.1.2
geventhttpclient==1.0a0
(analytics-alarm-gen)[root@a1s27 /]$ pip freeze | grep kaf
You are using pip version 8.1.2, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
kafka-python==1.0.2
(analytics-alarm-gen)[root@a1s27 /]$

(yum install python-pip, listed above is the output of pip freeze)

Jack Jonnalagadda (jackjvs) wrote :

On this deployment:
'gevent==1.0', 'geventhttpclient==1.0a0', 'greenlet==0.4.12'
'kafka-python==1

Jack Jonnalagadda (jackjvs) wrote :

The alarm-gen issue on the setup requires python-gevent package version 1.1 or higher be run. However, the gevent version on the system appears to be 1.0 (enclosed the listing in LP). On one of the working RHAT deployments [Linux version 3.10.0-862.6.3.el7.x86_64 (<email address hidden>) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP Tue Jun 26 16:32:21 UTC 2018], the python gevent version in use seems to be 1.2

Locally, upgrading gevent pkg on the container and restarting didn’t work (container is stuck in restarting).

Depending on how the required libraries are packaged the install should ensure that it uses gevent 1.1 or higher.

The alarm-gen issue on the setup points to following topic listed at:
https://github.com/dpkp/kafka-python/issues/702
Resolution:
https://pypi.org/project/selectors34/ (Note that this is no longer an issue with the stdlib selectors module on Gevent 1.1 and later.)

Jack Jonnalagadda (jackjvs) wrote :

Once pip is installed in the container, [(analytics-alarm-gen)[root@overcloudey3-ca-0 /]$ python]
Running the following script will list all the installed packages:
python
>>> import pip
>>> installed_packages = pip.get_installed_distributions()
>>> installed_packages_list = sorted(["%s==%s" % (i.key, i.version)
... for i in installed_packages])
>>> print(installed_packages_list)

Jack Jonnalagadda (jackjvs) wrote :

[yum install python-pip if necessary]

alexey-mr (alexey-morlang) wrote :

Shaju, is there newer build?

RedHat repos have the latest 1.0-3 version for python-gevent (python-gevent-1.0-3.el7.x86_64.rpm)

Now when I'm building my rhel-based containers I have in my RPM repo:
[alexm@localhost ~]$ ls /var/www/5.1.0-198-queens/|grep geve
python-gevent-1.1rc5-1contrail1.el7.x86_64.rpm
python-geventhttpclient-1.0a-0contrail.el7.x86_64.rpm

alexey-mr (alexey-morlang) wrote :

AFAIK, these rpms are added into TPC repo. So new build should have correct versions.
Could you please test and reopen it again if needed.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.