[HDP] ganglia monitoring daemon is not running

Bug #1215769 reported by Kim Kwangjin
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Sahara
Fix Released
Undecided
Unassigned

Bug Description

When I created Hadoop Cluster by HDP plugin , Ganglia monitoring daemon is not running.

I guess gmond process had config file crash

  /etc/ganglia/hdp/HDPJobTracker/gmond.core.conf - 42 line
          tcp_accept_channel {
            bind = localhost
            port = 8662
          }

   /etc/ganglia/hdp/HDPJobTracker/conf.d/gmond.master.conf - 10 line
           tcp_accept_channel {
             port = 8662
           }

In this case ganglia monitoring process have double definition of tcp_accept_channel.

And HDPNameNode, HDPSlaves had same problem.

So I have one idea for solving this problem.

In /usr/libexec/hdp/ganglia/gmondLib.sh - 158 line is ref 2.

This code change to ref 3..

thank you.

ref 2)
          /* You can specify as many tcp_accept_channels as you like to share
           * an XML description of the state of the cluster.
           *
           * At the very least, every gmond must expose its XML state to
           * queriers from localhost.
           */
          tcp_accept_channel {
            bind = localhost
            port = ${gmondPort}
          }

ref 3)
          /* You can specify as many tcp_accept_channels as you like to share
           * an XML description of the state of the cluster.
           *
           * At the very least, every gmond must expose its XML state to
           * queriers from localhost.
          tcp_accept_channel {
            bind = localhost
            port = ${gmondPort}
          }
           */

Tags: plugin.hdp
ruhe (ruhe)
Changed in savanna:
assignee: nobody → John Speidel (jspeidel)
summary: - ganglia monitoring daemon is not running
+ [HDP] ganglia monitoring daemon is not running
tags: added: plugin.hdp
Revision history for this message
John Speidel (jspeidel) wrote :

I have seen issues where the Ganglia monitor doesn't run when /etc/hosts is not written properly.
Please provide the /etc/hosts file for a host where the Ganglia monitor fails to start.

Revision history for this message
Kim Kwangjin (mandu23) wrote :

/etc/hosts

[root@newhdpcluster01-master-001 ~]# cat /etc/hosts
127.0.0.1 localhost
10.0.0.2 newhdpcluster01-slave-001.novalocal newhdpcluster01-slave-001
10.0.0.4 newhdpcluster01-slave-002.novalocal newhdpcluster01-slave-002
10.0.0.6 newhdpcluster01-master-001.novalocal newhdpcluster01-master-001

ganglia monitor was failed on newhdpcluster01-master-001
===========
In default configuration, I typed command for starting ganglia daemon.
To refer belows

[root@newhdpcluster01-master-001 ~]# /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPJobTracker/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPJobTracker/gmond.pid -f
Unable to create tcp_accept_channel. Exiting.

thank you.

ruhe (ruhe)
Changed in savanna:
status: New → Incomplete
importance: Undecided → Low
Changed in savanna:
importance: Low → Undecided
Changed in savanna:
assignee: John Speidel (jspeidel) → nobody
Revision history for this message
John Speidel (jspeidel) wrote :

There was a very small window where this error was possible when the logic of writing /etc/hosts was removed from the HDP plugin and handled solely by the savanna controller.

This issue was resolved by the following patch:

commit 683f3e88b5d7c030c5dcb3c55cba08467f0d5f1b
Author: John Speidel <email address hidden>
Date: Fri Aug 2 12:06:42 2013 -0400

    Fix Ganglia service start failure

    Ganglia Service now starts properly with Savanna updates to /etc/hosts

    Fixes: bug #1207819

    Change-Id: I2d683b6f2213fdeacc21d3ad33ff029b4682c8ec

Changed in savanna:
status: Incomplete → Fix Released
Changed in savanna:
milestone: none → 0.3
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.