[HDP] Image fails to connect to repository

Bug #1285133 reported by wondra
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Sahara
Fix Released
Medium
Michael McCune

Bug Description

With both images provided on the website, the plugin fails to configure a cluster, because a PGP key cannot be downloaded. Verified with a browser. Log:

2014-02-26 13:19:06.563 30108 ERROR savanna.context [-] Thread 'hdp-provision-instance-hdp-smallcluster-test-hdp-slave-001' fails with exception: 'RemoteCommandException: Error during command execution: "yum -y install epel-release"
Return code: 1
STDERR:
Repository HDP-UTILS-1.1.0.16 is listed more than once in the configuration
http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.16/repos/centos6/repodata/repomd.xml: [Errno 12] Timeout on http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.16/repos/centos6/repodata/repomd.xml: (28, 'Operation too slow. Less than 1 bytes/sec transfered the last 30 seconds')
Trying other mirror.
Error: Cannot retrieve repository metadata (repomd.xml) for repository: HDP-UTILS-1.1.0.16. Please verify its path and try again

STDOUT:
Loaded plugins: fastestmirror
Determining fastest mirrors
 * base: mirror.hexageek.com
 * extras: mirror.hexageek.com
 * updates: mirror.hexageek.com
'

Revision history for this message
Erik Bergenholtz (ebergenholtz) wrote :

This is not a bug, but rather a symptom of extremely poor network connectivity. Please use one of the [new] images that has the HDP packages pre-populated:

- https://s3.amazonaws.com/public-repo-1.hortonworks.com/savanna/images/centos-6_4-64-hdp-1_3_2.qcow2
- https://s3.amazonaws.com/public-repo-1.hortonworks.com/savanna/images/centos-6_4-64-hdp-2_0_6.qcow2

Changed in savanna:
status: New → Invalid
Revision history for this message
wondra (wondra) wrote :

The plugin still fails on the repository timeout. It works outside the OpenStack virtual environment, even on the bare metal servers. I have Quantum with GRE, KVM virtualization.
However, when I delete ambari.repo, I can install software without trouble. There must be something wrong with your Amazon S3 repository.
Concretely - when I set MTU to 1454 (estimated with tracepath) in the virtual machine, it downloads. At 1500 it does time out.
I have ICMP -1 -1 allowed in the secgroup, so Fragmentation Needed messages should pass.
I do see them in tcpdump on my Network Node:
00:43:17.300808 IP 46.255.224.xxx > server-54-230-15-23.ams1.r.cloudfront.net: ICMP 46.255.224.xxx unreachable - need to frag (mtu 1454), length 556
I see the same with the other repositories, and they react.
Does Amazon block ICMP?
It will be a problem with OpenStack Neutron networking.
How do I clone the yum repo? I'm a debianist...

Changed in savanna:
status: Invalid → New
Revision history for this message
Erik Bergenholtz (ebergenholtz) wrote :

Which image are you using? If you are using one of the above images, there should be no need to install any packages at deployment time. If this is not the case, then that should be considered a bug in the image.

You can not ping s3.amazonaws.com (ICMP not allowed).

Revision history for this message
wondra (wondra) wrote :

Then consider that some OpenStack installation will not be able to connect to it.

I used the first of the above images.
The problem is just that the custom repo is enabled, because the Savanna provisioning script call yum at some point, and it fails on downloading repodata.xml for these repos.

How does one clone a repo for personal use?

Revision history for this message
wondra (wondra) wrote :

OK, something is wrong in Savanna HDP plugin itself.
I deleted the .repo files from the image and Savanna created hem again, with enabled=1, which is set to 0 in the image. No image hacking and repo cloning will help here.

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Wondra,

The HDP plugin places .repo file on the host during instance initialization. The file oigin is hardcoded here:
https://github.com/openstack/savanna/blob/stable/0.3/savanna/plugins/hdp/hadoopserver.py#L22

As far as I recall at some time during the 0.2 development timeframe it was configurable in a cluster template. Later the option was lost during one of refactorings.

As a hack, I would suggest to change it right in the code. You can also find how one can create Ambari mirrors here:
http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.2/bk_reference/content/deployinghdp_appendix_chap4_3.html

Revision history for this message
wondra (wondra) wrote :
Download full text (3.8 KiB)

There is an interesting line in /savanna-venv/lib/python2.7/site-packages/savanna/plugins/hadoopserver.py

#TODO(jspeidel): based on image type, use correct command
rpm_cmd = 'curl -s -o /etc/yum.repos.d/ambari.repo %s' % \
        self.ambari_rpm
where:
AMBARI_RPM = 'http://s3.amazonaws.com/public-repo-1.hortonworks.com/' \
             'ambari/centos6/1.x/updates/1.2.5.17/ambari.repo'

I have commented out all calls to yum. Now the thing probably hangs on this:

[root@hdp-smallcluster-test-hdp-master-001 ~]# ambari-server setup
Using python /usr/bin/python2.6
Initializing...
Setup ambari-server
Checking SELinux...
SELinux status is 'enabled'
SELinux mode is 'permissive'
WARNING: SELinux is set to 'permissive' mode and temporarily disabled.
OK to continue [y/n] (y)? y
Ambari-server daemon is configured to run under user 'root'. Change this setting [y/n] (n)? n
Adjusting ambari-server permissions and ownership...
Checking iptables...
iptables is disabled now. please reenable later.
Checking JDK...
JDK already exists, using /var/lib/ambari-server/resources/jdk-6u31-linux-x64.bin
To install the Oracle JDK you must accept the license terms found at http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u21-license-159167.txt. Not accepting will cancel the Ambari Server setup.
Do you accept the Oracle Binary Code License Agreement [y/n] (y)? y
Installing JDK to /usr/jdk64
/usr/sbin/ambari-server.py:1584: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
  print "Installation of JDK has failed: %s\n" % e.message
Installation of JDK has failed:

JDK found at /var/lib/ambari-server/resources/jdk-6u31-linux-x64.bin. Would you like to re-download the JDK [y/n] (y)? n
ERROR: Exiting with exit code 1. Reason: Downloading or installing JDK failed: 'Fatal exception: Unable to install JDK. Please remove JDK file found at /var/lib/ambari-server/resources/jdk-6u31-linux-x64.bin and re-run Ambari Server setup, exit code 1'. Exiting.
[root@hdp-smallcluster-test-hdp-master-001 ~]# rm /var/lib/ambari-server/resources/jdk-6u31-linux-x64.bin
[root@hdp-smallcluster-test-hdp-master-001 ~]# ambari-server setupUsing python /usr/bin/python2.6
Initializing...
Setup ambari-server
Checking SELinux...
SELinux status is 'enabled'
SELinux mode is 'permissive'
WARNING: SELinux is set to 'permissive' mode and temporarily disabled.
OK to continue [y/n] (y)? y
Ambari-server daemon is configured to run under user 'root'. Change this setting [y/n] (n)?
Adjusting ambari-server permissions and ownership...
Checking iptables...
iptables is disabled now. please reenable later.
Checking JDK...
Downloading JDK from http://public-repo-1.hortonworks.com/ARTIFACTS/jdk-6u31-linux-x64.bin to /var/lib/ambari-server/resources/jdk-6u31-linux-x64.bin
JDK distribution size is 85581913 bytes
jdk-6u31-linux-x64.bin... 100% (81.6 MB of 81.6 MB)
Successfully downloaded JDK distribution to /var/lib/ambari-server/resources/jdk-6u31-linux-x64.bin
To install the Oracle JDK you must accept the license terms found at http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u21-license-159167.txt. Not accepting will cancel the...

Read more...

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Are you using Savanna 0.3? I've found a documentation issue here. We have different docs for 0.3 and current trunk:
http://savanna.readthedocs.org/en/0.3/userdoc/hdp_plugin.html
http://savanna.readthedocs.org/en/latest/userdoc/hdp_plugin.html

The problem is they reference the same images. We had several changes during Icehouse timeframe to the image, and I am afraid that not all of them are backward compatible. Meaning that working HDP images for Savanna 0.3 are not available now.

The ambari-server in trunk is set up in a different way to support pre-installed JRE. See here:
https://github.com/openstack/savanna/blob/master/savanna/plugins/hdp/hadoopserver.py#L92

Revision history for this message
wondra (wondra) wrote :

So, Ambari also uses the Hortonworks repository. It is referenced in /etc/ambari-server/conf/ambari.properties. Using
jce_policy.url=ftp://ftp.su.se/pub/buildit/source-archive/jce_policy-6.zip
jdk.url=http://mrplus.googlecode.com/files/jdk-6u31-linux-x64.bin

I then manually ran the server and the agent on the master (whose configuration thread is probably still hanging on ambari-server setup). And all nodes registered. Now the other repo file HDP.repo got created and Ambari hung. The cluster got to Error state.

It is referenced in the file default-cluster.template. Deleting the line..

Revision history for this message
wondra (wondra) wrote :

OK, patching the setup execution. Take back the first paragraph of the last comment.

Ambari still doesn't install Hadoop. Any way to get to the web interface manually? admin/admin doesn't work.

Revision history for this message
wondra (wondra) wrote :

Update: patched it to --jdk_arg='--java-home=/opt/jdk64/jdk1.6.0_31/'. Ambari seems to ignore the -j <path> argument.

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Hmm, as far as I recall admin/admin are default credentials for ambari. Maybe that is a sign that a setup didn't complete successfully?

To mitigate issues with the image not working with 0.3, you might try to use Savanna from the master. It still has the issue that mirrors can not be configured via Template, but the image should work with it.

Revision history for this message
wondra (wondra) wrote :

And I still needed to change jce_policy.url. used plugins/hdp/versions/1_3_2/resources/ambari-config-resource.json
And I know why it doesn't install Hadoop - Puppet installs the HDP.repo repository file. On the VM:
/var/lib/ambari-agent/data/HDP-1.3.2-8.pp contains the URL
This file is generated by the ambari-server, which has the config data in
/var/lib/ambari-server/resources/stacks/HDPLocal/1.3.2/repos/repoinfo.xml, /var/lib/ambari-server/resources/stacks/HDP/1.3.2/repos/repoinfo.xml
How do I get rid of this?

Revision history for this message
wondra (wondra) wrote :

Deleting it makes things much worse. At least it fails fast :-).

Revision history for this message
wondra (wondra) wrote :

Hmm, and changing plugins/hdp/versions/1_3_2/resources/ambari-config-resource.json doesn't do anything as well. Still no automatic ambari setup. Enough for today.

Revision history for this message
Michael McCune (mimccune) wrote :

This bug is related to the issue that the HDP plugin needs access to the internet to work properly.

https://bugs.launchpad.net/sahara/+bug/1320991

Although the Ambari rpms, JDK, and Hadoop Swift integration rpms are on the image, the plugin always instructs the instance to download the software from the internet.

There is also a patch currently in the works to help fix this.

https://review.openstack.org/#/c/94460/

Changed in sahara:
status: New → Confirmed
Changed in sahara:
importance: Undecided → Medium
assignee: nobody → Michael McCune (mimccune)
milestone: none → juno-1
Revision history for this message
Sergey Lukjanov (slukjanov) wrote :
Changed in sahara:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to sahara (master)

Reviewed: https://review.openstack.org/94460
Committed: https://git.openstack.org/cgit/openstack/sahara/commit/?id=6ae7af25fadd272cab45d1ba72f7b71a7d28d717
Submitter: Jenkins
Branch: master

commit 6ae7af25fadd272cab45d1ba72f7b71a7d28d717
Author: Michael McCune <email address hidden>
Date: Tue May 20 15:43:59 2014 -0400

    Adding disconnected mode fixes to hdp plugin

    Changes
    * Adding function to determine if package epel-release is installed
    * Adding logic to determine if hadoop-swift integration package is
    installed
    * Changing behavior of provision_ambari to check for installed rpm if it
    fails to download files from the internet
    * Changing behavior of install_swift_integration to check for installed
    rpm if it fails to download rpm from the internet
    * Adding variable for epel-release name
    * Adding variable hadoop-swift local rpm
    * Removing -v -h from rpm command
    * Adding --quiet to rpm command
    * Adding a variable for yum command to install swift integration rpm
    * Adding log messages for local fallbacks

    Closes-Bug: #1320991
    Closes-Bug: #1285133
    Change-Id: I7fbbdf131fcee698684c39c7241bb919cd904e26

Changed in sahara:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in sahara:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in sahara:
milestone: juno-1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.