contrail-database-nodemgr EXITED with traceback while calculating cpu/memory utilization of a analytics process

Bug #1596867 reported by Sandip Dey
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.0
Fix Committed
Medium
Santosh Gupta
R3.0.2.x
Fix Committed
Medium
Santosh Gupta
R3.0.3.x
Invalid
Medium
Santosh Gupta
R3.1
Invalid
Medium
Santosh Gupta
Trunk
Fix Committed
Medium
Santosh Gupta

Bug Description

Build:R3.02 52 Kilo Ubuntu14.04

Logs saved at :http://10.204.216.50/Docs/bugs/<bug-id>

1.Had 3 database node - 'nodei27', 'nodei28' , 'nodei35'

In all the 3 nodes , contrail-database-nodemgr exited with the below traceback.
From the code in /usr/lib/python2.7/dist-packages/nodemgr/common/event_manager.py, looks like it tries to calculate the cpu/memory utilization of a list of analytics processes.

During this calculation , if one of them dies, contrail-database-nodemgr also dies.

In this setup,cassandra died due to
https://bugs.launchpad.net/juniperopenstack/+bug/1596803

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 327, in run
    result = self._run(*self.args, **self.kwargs)
  File "/usr/lib/python2.7/dist-packages/nodemgr/database_nodemgr/database_event_manager.py", line 343, in runforever
    prev_current_time = self.event_tick_60(prev_current_time)
  File "/usr/lib/python2.7/dist-packages/nodemgr/common/event_manager.py", line 511, in event_tick_60
    process_mem_cpu_usage = self.get_all_processes_mem_cpu_usage()
  File "/usr/lib/python2.7/dist-packages/nodemgr/common/event_manager.py", line 396, in get_all_processes_mem_cpu_usage
    process_mem_cpu = mem_cpu_usage_data.get_process_mem_cpu_info()
  File "/usr/lib/python2.7/dist-packages/nodemgr/common/cpuinfo.py", line 86, in get_process_mem_cpu_info
    process_mem_cpu.cpu_share = self._process.get_cpu_percent(interval=0.1)/psutil.NUM_CPUS
  File "/usr/lib/python2.7/dist-packages/psutil/__init__.py", line 709, in get_cpu_percent
    pt2 = self._platform_impl.get_cpu_times()
  File "/usr/lib/python2.7/dist-packages/psutil/_pslinux.py", line 470, in wrapper
    raise NoSuchProcess(self.pid, self._process_name)
NoSuchProcess: process no longer exists (pid=3876)
<Greenlet at 0x7ffbc3e39870: <bound method DatabaseEventManager.runforever of <nodemgr.database_nodemgr.database_event_manager.DatabaseEventManager object at 0x7ffbc40f0e50>>> failed with NoSuchProcess

Sandip Dey (sandipd)
description: updated
Raj Reddy (rajreddy)
Changed in juniperopenstack:
assignee: Raj Reddy (rajreddy) → Santosh Gupta (sangupta)
importance: Undecided → Medium
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/21635
Submitter: Santosh Gupta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/21635
Committed: http://github.org/Juniper/contrail-controller/commit/1d5db12f5f7614d5567cac902fc4e5f9a244643c
Submitter: Zuul
Branch: master

commit 1d5db12f5f7614d5567cac902fc4e5f9a244643c
Author: Santosh Gupta <email address hidden>
Date: Fri Jul 1 13:35:18 2016 -0700

Catch exception while calling psutils mem/cpu apis

A process might exit/crash after its checked and the actual call made to get mem/cpu details.
Added check for that.

Change-Id: Ib0b6f3614db88274939d54ef591666cf59448038
Closes-Bug: 1596867

information type: Proprietary → Public
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/23219
Submitter: Santosh Gupta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/23219
Committed: http://github.org/Juniper/contrail-controller/commit/45e961a34817c44496ef0551c43d27dffb5aa056
Submitter: Zuul
Branch: R3.0

commit 45e961a34817c44496ef0551c43d27dffb5aa056
Author: Santosh Gupta <email address hidden>
Date: Thu Aug 11 12:30:52 2016 -0700

Catch exception while calling psutils mem/cpu apis

A process might exit/crash after its checked and the actual call made to get mem/cpu details.
Added check for that.

Change-Id: I6e422fdd82a86f6c81b12c5991b7f6eb0668c35d
Closes-Bug: 1596867

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0.2.x

Review in progress for https://review.opencontrail.org/25543
Submitter: Santosh Gupta (<email address hidden>)

Jim Reilly (jpreilly)
tags: added: att-aic-contrail
Revision history for this message
Santosh Gupta (sangupta) wrote :

Fix already present.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/25543
Committed: http://github.org/Juniper/contrail-controller/commit/c4d2bc6b456fdab070c4c0a3336d437104ae76f9
Submitter: Zuul
Branch: R3.0.2.x

commit c4d2bc6b456fdab070c4c0a3336d437104ae76f9
Author: Santosh Gupta <email address hidden>
Date: Thu Aug 11 12:30:52 2016 -0700

Catch exception while calling psutils mem/cpu apis

A process might exit/crash after its checked and the actual call made to get mem/cpu details.
Added check for that.

Change-Id: I6e422fdd82a86f6c81b12c5991b7f6eb0668c35d
Closes-Bug: 1596867
(cherry picked from commit 45e961a34817c44496ef0551c43d27dffb5aa056)

Jim Reilly (jpreilly)
information type: Public → Private
Umamaheshwar (urao)
tags: added: sscc-contrail
information type: Private → Public
Jim Reilly (jpreilly)
information type: Public → Private
information type: Private → Public
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.