Sahara

[HDP] ganglia monitoring daemon is not running

Bug #1215769 reported by Kim Kwangjin on 2013-08-23

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Sahara	Fix Released	Undecided	Unassigned	Sahara 0.3

Bug Description

When I created Hadoop Cluster by HDP plugin , Ganglia monitoring daemon is not running.

I guess gmond process had config file crash

  /etc/ganglia/hdp/HDPJobTracker/gmond.core.conf - 42 line
          tcp_accept_channel {
            bind = localhost
            port = 8662
          }

   /etc/ganglia/hdp/HDPJobTracker/conf.d/gmond.master.conf - 10 line
           tcp_accept_channel {
             port = 8662
           }

In this case ganglia monitoring process have double definition of tcp_accept_channel.

And HDPNameNode, HDPSlaves had same problem.

So I have one idea for solving this problem.

In /usr/libexec/hdp/ganglia/gmondLib.sh - 158 line is ref 2.

This code change to ref 3..

thank you.

ref 2)
          /* You can specify as many tcp_accept_channels as you like to share
           * an XML description of the state of the cluster.
           *
           * At the very least, every gmond must expose its XML state to
           * queriers from localhost.
           */
          tcp_accept_channel {
            bind = localhost
            port = ${gmondPort}
          }

ref 3)
          /* You can specify as many tcp_accept_channels as you like to share
           * an XML description of the state of the cluster.
           *
           * At the very least, every gmond must expose its XML state to
           * queriers from localhost.
          tcp_accept_channel {
            bind = localhost
            port = ${gmondPort}
          }
           */

Tags:

ruhe (ruhe) on 2013-08-23

Changed in savanna:
assignee:	nobody → John Speidel (jspeidel)
summary:	- ganglia monitoring daemon is not running + [HDP] ganglia monitoring daemon is not running

Sergey Lukjanov (slukjanov) on 2013-08-23

tags:

added: plugin.hdp

Revision history for this message

John Speidel (jspeidel) wrote on 2013-08-23:

I have seen issues where the Ganglia monitor doesn't run when /etc/hosts is not written properly.
Please provide the /etc/hosts file for a host where the Ganglia monitor fails to start.

Revision history for this message

Kim Kwangjin (mandu23) wrote on 2013-08-26:

/etc/hosts

[root@newhdpcluster01-master-001 ~]# cat /etc/hosts
127.0.0.1 localhost
10.0.0.2 newhdpcluster01-slave-001.novalocal newhdpcluster01-slave-001
10.0.0.4 newhdpcluster01-slave-002.novalocal newhdpcluster01-slave-002
10.0.0.6 newhdpcluster01-master-001.novalocal newhdpcluster01-master-001

ganglia monitor was failed on newhdpcluster01-master-001
===========
In default configuration, I typed command for starting ganglia daemon.
To refer belows

[root@newhdpcluster01-master-001 ~]# /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPJobTracker/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPJobTracker/gmond.pid -f
Unable to create tcp_accept_channel. Exiting.

thank you.

ruhe (ruhe) on 2013-08-29

Changed in savanna:
status:	New → Incomplete
importance:	Undecided → Low

Sergey Lukjanov (slukjanov) on 2013-11-19

Changed in savanna:
importance:	Low → Undecided

Sergey Lukjanov (slukjanov) on 2013-11-19

Changed in savanna:
assignee:	John Speidel (jspeidel) → nobody

Revision history for this message

John Speidel (jspeidel) wrote on 2013-11-19:

There was a very small window where this error was possible when the logic of writing /etc/hosts was removed from the HDP plugin and handled solely by the savanna controller.

This issue was resolved by the following patch:

commit 683f3e88b5d7c030c5dcb3c55cba08467f0d5f1b
Author: John Speidel <email address hidden>
Date: Fri Aug 2 12:06:42 2013 -0400

Fix Ganglia service start failure

Ganglia Service now starts properly with Savanna updates to /etc/hosts

Fixes: bug #1207819

Change-Id: I2d683b6f2213fdeacc21d3ad33ff029b4682c8ec

Changed in savanna:
status:	Incomplete → Fix Released

Sergey Lukjanov (slukjanov) on 2013-11-19

Changed in savanna:
milestone:	none → 0.3

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.