fail to start vanilla cluster with image built by latest sahara-image-element script

Bug #1727757 reported by Shu Yingya
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Sahara
Fix Released
Undecided
Unassigned

Bug Description

using latest sahara-image-elements project(lastest commit is f3f5613238a58a43281c917a49ca23eae65db16e) to build a vanilla image, the cluster can't successfully start.

sahara-engine.log:
Error ID: a777b7bc-2940-4c0a-838d-d6480f9fa866
Error ID: 19960099-615b-40b1-8267-6bd7732626ea
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/sahara/context.py", line 167, in _wrapper
    func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/sahara/utils/cluster_progress_ops.py", line 139, in handler
    add_fail_event(instance, e)
  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/site-packages/sahara/utils/cluster_progress_ops.py", line 136, in handler
    value = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/sahara/plugins/vanilla/hadoop2/run_scripts.py", line 61, in _start_processes
    'sudo su - -c "hadoop-daemon.sh start datanode" hadoop')
  File "/usr/lib/python2.7/site-packages/sahara/utils/ssh_remote.py", line 855, in execute_command
    cmd, run_as_root, get_stderr, raise_when_error)
  File "/usr/lib/python2.7/site-packages/sahara/utils/ssh_remote.py", line 955, in _run_s
    return self._run_with_log(func, timeout, description, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/sahara/utils/ssh_remote.py", line 766, in _run_with_log
    return self._run(func, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/sahara/utils/ssh_remote.py", line 951, in _run
    return procutils.run_in_subprocess(self.proc, func, args, kwargs)
  File "/usr/lib/python2.7/site-packages/sahara/utils/procutils.py", line 54, in run_in_subprocess
    raise exceptions.SubprocessException(result['exception'])
SubprocessException: RemoteCommandException: Error during command execution: "sudo su - -c "hadoop-daemon.sh start datanode" hadoop"
Return code: 1
STDOUT:
starting datanode, logging to /mnt/log/hadoop/hadoop/hadoop-hadoop-datanode-origin-vanilla-vw-1.novalocal.out

and the log in :/mnt/log/hadoop/hadoop/hadoop-hadoop-datanode-origin-vanilla-vw-1.novalocal.log
is http://paste.openstack.org/show/624724/

Shu Yingya (felixshu)
summary: - fail to start cluster with latest sahara-image-elements script
+ fail to start vanilla cluster with image built by latest sahara-image-
+ element script
Revision history for this message
Shi Yan (yanshi-403) wrote :

I have the exact same issue when using the latest script to build the vanilla image.

Further, if using the old vanilla image(ocata), it is working fine.

Shu Yingya (felixshu)
Changed in sahara:
assignee: nobody → Shu Yingya (felixshu)
assignee: Shu Yingya (felixshu) → nobody
Revision history for this message
Shu Yingya (felixshu) wrote :

If you would like to build the latest image, please revert the change "a77a9a978a655044a0b58a299df965c89391090d" util this patch has been fixed.

Revision history for this message
Luigi Toscano (ltoscano) wrote :

Uhm, the commit a77a9a978a655044a0b58a299df965c89391090d is "Add S3 jar to Hadoop classpath"; Jeremy, could it be the change to DIB_HDFS_LIB_DIR?

If anyone has some bandwidth to test it, can you please try to only change back the value of DIB_HDFS_LIB_DIR to

export DIB_HDFS_LIB_DIR="/opt/hadoop/share/hadoop/tools/lib"

in diskimage-create/diskimage-create.sh, keeping the rest of the change?

Revision history for this message
Shu Yingya (felixshu) wrote :

Thanks for your hint, Luigi. Let me try it on my env.

Revision history for this message
Shi Yan (yanshi-403) wrote :

Hi, Luigi and Yingya

After I rebuild the vanilla image with the only change to DIB_HDFS_LIB_DIR, as Luigi indicated here, I test it and it works well!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to sahara-image-elements (master)

Reviewed: https://review.openstack.org/522990
Committed: https://git.openstack.org/cgit/openstack/sahara-image-elements/commit/?id=3ee20cbc09fc0fcfb91c81382747470de7050a03
Submitter: Zuul
Branch: master

commit 3ee20cbc09fc0fcfb91c81382747470de7050a03
Author: Jeremy Freudberg <email address hidden>
Date: Sun Nov 26 21:47:27 2017 +0000

    Revise s3_hadoop

    * Handle Hadoop classpath better
    * Include proper support for Spark classpath
    * Formally limit element's use to Vanilla and Spark

    Change-Id: I65abd7e375dba11599a4ab943d24f878235cd71d
    Closes-Bug: #1727757
    Closes-Bug: #1728061

Changed in sahara:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to sahara-image-elements (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/542369

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to sahara-image-elements (stable/pike)

Reviewed: https://review.openstack.org/542369
Committed: https://git.openstack.org/cgit/openstack/sahara-image-elements/commit/?id=779c2f57047a9e326ef4c8175f319f8940eef489
Submitter: Zuul
Branch: stable/pike

commit 779c2f57047a9e326ef4c8175f319f8940eef489
Author: Jeremy Freudberg <email address hidden>
Date: Sun Nov 26 21:47:27 2017 +0000

    Revise s3_hadoop

    * Handle Hadoop classpath better
    * Include proper support for Spark classpath
    * Formally limit element's use to Vanilla and Spark

    Change-Id: I65abd7e375dba11599a4ab943d24f878235cd71d
    Closes-Bug: #1727757
    Closes-Bug: #1728061

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/sahara-image-elements 8.0.0.0rc1

This issue was fixed in the openstack/sahara-image-elements 8.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/sahara-image-elements 7.0.2

This issue was fixed in the openstack/sahara-image-elements 7.0.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.