[HDP] Cluster log don't have error message after timeout

Bug #1263068 reported by Sergey Galkin
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Sahara
Fix Released
High
Dmitry Mescheryakov

Bug Description

Found during invistagation https://bugs.launchpad.net/savanna/+bug/1259836
Steps to reproduce:

1. Create LAB with slow Internet connection.
2. Start HDP cluster with 1 master node - ( ['JOBTRACKER', 'NAMENODE', 'SECONDARY_NAMENODE', 'GANGLIA_SERVER', 'NAGIOS_SERVER' 'AMBARI_SERVER']) and 1 worker node ['TASKTRACKER', 'DATANODE', 'HDFS_CLIENT', 'MAPREDUCE_CLIENT']

Actually setup is fail with timeout but savanna logs has only: " _execute_command took 300.0 seconds to complete _log_command "

2013-12-19 15:30:20.846 14737 DEBUG savanna.utils.remote [-] [hdp-master-001] Executing "ambari-server setup -s > /dev/null 2>&1" _log_command /usr/lib/python2.6/site-packages/savanna/utils/remote.py:299
......
2013-12-19 15:35:20.846 14737 DEBUG savanna.utils.remote [-] [hdp-master-001] _execute_command took 300.0 seconds to complete _log_command /usr/lib/python2.6/site-packages/savanna/utils/remote.py:299

2013-12-19 15:35:21.935 14737 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): 172.18.92.90
2013-12-19 15:35:21.936 14737 INFO savanna.plugins.hdp.versions.1_3_2.versionhandler [-] Waiting to connect to ambari server ...
2013-12-19 15:35:26.938 14737 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): 172.18.92.90
2013-12-19 15:35:26.939 14737 INFO savanna.plugins.hdp.versions.1_3_2.versionhandler [-] Waiting to connect to ambari server ...
2013-12-19 15:35:31.944 14737 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): 172.18.92.90
2013-12-19 15:35:31.946 14737 INFO savanna.plugins.hdp.versions.1_3_2.versionhandler [-] Waiting to connect to ambari server ...
2013-12-19 15:35:36.947 14737 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): 172.18.92.90
2013-12-19 15:35:36.948 14737 INFO savanna.plugins.hdp.versions.1_3_2.versionhandler [-] Waiting to connect to ambari server ...

Revision history for this message
Sergey Lukjanov (slukjanov) wrote :

John, please, take a look on it.

Changed in savanna:
assignee: nobody → John Speidel (jspeidel)
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Sergey, I've actually made a fix already:
https://review.openstack.org/#/c/63393/

Not sure why it didn't attach here

Changed in savanna:
assignee: John Speidel (jspeidel) → nobody
assignee: nobody → Dmitry Mescheryakov (dmitrymex)
Sergey Galkin (sgalkin)
Changed in savanna:
status: New → Fix Committed
Changed in savanna:
status: Fix Committed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to savanna (master)

Reviewed: https://review.openstack.org/63393
Committed: https://git.openstack.org/cgit/openstack/savanna/commit/?id=a679194419193eec5cce56fd6b73e97fadb2ebac
Submitter: Jenkins
Branch: master

commit a679194419193eec5cce56fd6b73e97fadb2ebac
Author: Dmitry Mescheryakov <email address hidden>
Date: Fri Dec 20 15:01:27 2013 +0400

    Properly catch timeout exception raised in thread

    The Timeout exception raised by eventlet is subclass of
    BaseException, not Exception. So we need to catch the former.

    Fixes bug: #1263068

    Change-Id: I3d63493a2ff99585436aa68e510047d7f4e61345

Changed in savanna:
status: In Progress → Fix Committed
Changed in savanna:
importance: Undecided → High
milestone: none → icehouse-2
Thierry Carrez (ttx)
Changed in savanna:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in sahara:
milestone: icehouse-2 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.