[Sahara] Cluster fails to deploy, CDH 5.4.0, Ubuntu

Bug #1564345 reported by Fabrizio Soppelsa
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Won't Fix
Critical
MOS Sahara

Bug Description

Detailed bug description: CDH 5.4.0 with Ubuntu cluster fails to deploy due to: Provision Hadoop Cluster: Command aborted because of exception: Command timed-out after 90 seconds. It was observed that with CentOS, cluster got created (but not reproduced in Mirantis labs)
Steps to reproduce: Create typical 5.4.0 Cluster Template, and create a cluster with Ubuntu baseimage
Expected results: Cluster deploys
Actual result: Cluster fails to deploy
Reproducibility: Yes
Workaround: Unknown yet
Impact: Impossible to deploy a cluster
Description of the environment:
- Operating system: Ubuntu 14.04
- Versions of components: MOS 7.0
- Reference architecture: HA, Neutron DVR, l2pop (if that matters)
- Network model: VXLAN (if that matters)
- Related projects installed: Sahara
Additional information:

2016-03-31 10:00:00.074 2219 INFO sahara.utils.general [-] Cluster status has been changed: id=c2d1cfe3-d1a0-4d9f-82f4-4df086e9b960, New status=Starting
2016-03-31 10:05:16.352 2219 ERROR sahara.service.ops [-] Error during operating on cluster test-ubuntu (reason: Failed to Provision Hadoop Cluster: Command aborted because of exception: Command timed-out after 90 seconds
Error ID: d3d03a37-b6f3-40bb-b32e-ea13fa9de5ee)
2016-03-31 10:05:17.078 2219 INFO sahara.utils.general [-] Cluster status has been changed: id=c2d1cfe3-d1a0-4d9f-82f4-4df086e9b960, New status=Error

We have a hardware lab with this error reproduced and logs, contact me for access.

Revision history for this message
Nikita Konovalov (nkonovalov) wrote :

This is a customer-found issue

tags: added: customer-found support
Revision history for this message
masafumi_ohta (masafumi-ohta) wrote :
Revision history for this message
masafumi_ohta (masafumi-ohta) wrote :

I have been looking into the issue perhaps I have found the rootcause might be /etc/hosts couldn't be unchanged launching Ubuntu instances according to the instance name though /etc/hostname is changed.
the workaround is using cloud-init to force changing /etc/hosts/ using 'manage_etc_hosts: true'

for more detail http://community.cloudera.com/t5/Cloudera-Manager-Installation/CDH-cluster-command-timed-out-after-90-seconds/td-p/39167

Revision history for this message
Nikita Konovalov (nkonovalov) wrote :
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Removed from 7.0-mu-4 as per feedback from Sahara team that solution from Cloudera forum doesn't actually resolve this issue. More time for reproduction and producing a fix is needed.

Changed in mos:
milestone: 7.0-mu-4 → 7.0-updates
Revision history for this message
Fabrizio Soppelsa (fsoppelsa) wrote :

Guys, any news on this, can we target it for MU5?

Changed in mos:
milestone: 7.0-updates → 7.0-mu-5
Revision history for this message
Nikita Konovalov (nkonovalov) wrote :

This issue occurred on the customer's environment with custom internal DNS.
The issue reproduced repeatedly when the Ubuntu-based guest image was used.

The root-cause was never found since the timeout and actual requests made by Cloudera Manager are in the proprietary codebase.

There is a workaround build specially for the described case. It temporarily disables all DNS resolution on VMs by modifying /etc/resolve.conf for the provisioning time. Then is returns the file to its original state.

The workaround is available here https://review.fuel-infra.org/#/c/19099/2

Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Closing as Won't Fix as there is workaround and no fix.

Changed in mos:
milestone: 7.0-mu-5 → 7.0-updates
status: Confirmed → Won't Fix
Revision history for this message
Fabrizio Soppelsa (fsoppelsa) wrote :

Nikita, is the workaround intended to stay there with -2, or is there an upcoming discussion in community on this DNS thing?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.