Comment 1 for bug 1434179

Revision history for this message
Steve Varnau (steve-varnau) wrote :

http://logs.trafodion.org/26/1326/2/gate/core-regress-core-ahw2.2/7691eba/Install_Start.log

Log shows that start HDFS start command failed, but installer kept polling the command result forever. The test job eventually timed out.
Combing through the Ambari logs, I see that request 1079 failed. Ideally installer should report that and retreive the stdout/stderr of the command request. Ambari logs show it was specifically NameNode start that failed.

Ambari is not much more helpful about the ultimate failure, but at least it timesout after 10 minutes. After re-trying this command every 10 seconds:
2015-03-19 08:43:03,106 - Retrying after 10 seconds. Reason: Execution of 'su -s /bin/bash - hdfs -c 'export PATH=$PATH:/usr/hdp/current/hadoop-client/bin ; hdfs --config /etc/hadoop/conf dfsadmin -safemode get' | grep 'Safe mode is OFF'' returned 1.

Digging down into HDFS logs, namenode could not contact datanode. Looking at datanode log, it got this error:
2015-03-19 08:33:14,884 FATAL datanode.DataNode (DataNode.java:secureMain(2385)) - Exception in secureMain
java.net.BindException: Problem binding to [0.0.0.0:50010] java.net.BindException: Address already in use

So, perhaps when Ambari tells us HDFS is shutdown, there may sometimes still be processes hanging around that prevent a proper start-up. That seems to be an Ambari bug.