[Vanilla2] Failed to scale cluster because of hive config

Bug #1399822 reported by Andrew Lazarev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Sahara
Fix Released
High
Andrew Lazarev

Bug Description

Steps to reproduce:

1. Create vanilla2 cluster. I used 1*["namenode", "resourcemanager", "historyserver", "oozie"] + 3*["datanode", "nodemanager"].
2. Scale cluster. I was trying to add 1*["datanode"].

Stacktrace:
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops Traceback (most recent call last):
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops File "/Users/alazarev/openstack/sahara/sahara/service/ops.py", line 141, in wrapper
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops f(cluster_id, *args, **kwds)
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops File "/Users/alazarev/openstack/sahara/sahara/service/ops.py", line 272, in _provision_scaled_cluster
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops plugin.scale_cluster(cluster, instances)
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops File "/Users/alazarev/openstack/sahara/sahara/plugins/vanilla/plugin.py", line 60, in scale_cluster
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops cluster.hadoop_version).scale_cluster(cluster, instances)
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops File "/Users/alazarev/openstack/sahara/sahara/plugins/vanilla/v2_4_1/versionhandler.py", line 106, in scale_cluster
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops sc.scale_cluster(self.pctx, cluster, instances)
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops File "/Users/alazarev/openstack/sahara/sahara/plugins/vanilla/hadoop2/scaling.py", line 31, in scale_cluster
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops config.configure_instances(pctx, instances)
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops File "/Users/alazarev/openstack/sahara/sahara/plugins/vanilla/hadoop2/config.py", line 57, in configure_instances
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops _provisioning_configs(pctx, instance)
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops File "/Users/alazarev/openstack/sahara/sahara/plugins/vanilla/hadoop2/config.py", line 63, in _provisioning_configs
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops _push_xml_configs(instance, xmls)
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops File "/Users/alazarev/openstack/sahara/sahara/plugins/vanilla/hadoop2/config.py", line 261, in _push_xml_configs
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops _push_configs_to_instance(instance, xml_confs)
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops File "/Users/alazarev/openstack/sahara/sahara/plugins/vanilla/hadoop2/config.py", line 268, in _push_configs_to_instance
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops r.write_file_to(fl, data, run_as_root=True)
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops File "/Users/alazarev/openstack/sahara/sahara/utils/ssh_remote.py", line 578, in write_file_to
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops self._run_s(_write_file_to, timeout, remote_file, data, run_as_root)
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops File "/Users/alazarev/openstack/sahara/sahara/utils/ssh_remote.py", line 643, in _run_s
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops return self._run_with_log(func, timeout, *args, **kwargs)
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops File "/Users/alazarev/openstack/sahara/sahara/utils/ssh_remote.py", line 517, in _run_with_log
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops return self._run(func, *args, **kwargs)
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops File "/Users/alazarev/openstack/sahara/sahara/utils/ssh_remote.py", line 640, in _run
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops return procutils.run_in_subprocess(self.proc, func, args, kwargs)
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops File "/Users/alazarev/openstack/sahara/sahara/utils/procutils.py", line 53, in run_in_subprocess
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops SubprocessException: RemoteCommandException: Error during command execution: "mv temp-file-b3401ce5-2fa2-441b-ab39-3ecaa189a648 /opt/hive/conf/hive-site.xml"
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops Return code: 1
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops STDERR:
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops mv: cannot move temp-file-b3401ce5-2fa2-441b-ab39-3ecaa189a648 to /opt/hive/conf/hive-site.xml: No such file or directory
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops Error ID: 059ca13b-3d1a-414e-96fe-3cad80c80376
2014-12-05 15:12:25.769 97815 TRACE sahara.service.ops
2014-12-05 15:12:25.911 97815 INFO sahara.utils.general [-] Cluster status has been changed: id=93d765de-4bf4-4a89-ad3a-75c5d3b14888, New status=Error

Reproduced both on ubuntu and fedora juno images.
Note, stacktrace in master is hidden because of https://bugs.launchpad.net/sahara/+bug/1399490

It looks like problem was introduced by https://review.openstack.org/#/c/133186/

description: updated
Revision history for this message
Andrew Lazarev (alazarev) wrote :

The problem is that instance config after merging contains {'Hive': {}}.
Code that decides either to place service config uses simple "if service in service_to_conf_map.keys()". Empty config is considered enough reason to place config file. And new node (with only datanode process) doesn't have folder for hive config.

Revision history for this message
Andrew Lazarev (alazarev) wrote :

The problem is more broad than hive only. Sahara tries to write all configs available to node with just datanode. Hive is just one that fails because of missing dir.

Revision history for this message
Luigi Toscano (ltoscano) wrote :

Does it affect master/kilo only (as I guess from the linked review)?

Changed in sahara:
milestone: none → kilo-2
Revision history for this message
Andrew Lazarev (alazarev) wrote :

@Luigi It looks that it affects juno, but it is not so critical as for master.
For juno: sahara writes all configs to node even if service is not installed
For kilo: like for juno, but it fails to write config for hive and cluster ends in Error state

Changed in sahara:
assignee: nobody → Andrew Lazarev (alazarev)
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to sahara (master)

Fix proposed to branch: master
Review: https://review.openstack.org/140137

Changed in sahara:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to sahara (master)

Reviewed: https://review.openstack.org/140137
Committed: https://git.openstack.org/cgit/openstack/sahara/commit/?id=79587dfe28f6118f9c620ba5bd10fb684e765d18
Submitter: Jenkins
Branch: master

commit 79587dfe28f6118f9c620ba5bd10fb684e765d18
Author: Andrew Lazarev <email address hidden>
Date: Mon Dec 8 12:09:31 2014 -0800

    Fixed configs generation for vanilla2

    Plugin has convention that config should contain configs only for
    existing services. Initialization with empty dict leads to copy of
    all configuration files.

    Change-Id: I53090f95c657ef5ef215a3654476ffebf4826cf4
    Closes-Bug: #1399822

Changed in sahara:
status: In Progress → Fix Committed
Changed in sahara:
milestone: kilo-2 → kilo-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in sahara:
milestone: kilo-1 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.