NameNode HA for HDP2 does not set up Oozie correctly

Bug #1470841 reported by Luigi Toscano
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Sahara
Fix Released
High
Elise Gafford
Kilo
Fix Released
High
Elise Gafford

Bug Description

The NameNode HA feature for HDP, described here:
http://specs.openstack.org/openstack/sahara-specs/specs/kilo/hdp-plugin-enable-hdfs-ha.html
even if it is able to turn a cluster into NameNode HA configuration, does not change the configuration of Oozie, which still points to one of the two NameNodes. If it points to the standby node, job execution does not even start with a strange error:

2015-07-01 13:46:58.882 15549 WARNING sahara.service.edp.job_manager [-] Can't run job execution 437c1c6a-72e8-4b86-b036-6fa4b5657538 (reason: type Status report
 message
 description This request requires HTTP authentication. )

and keystone reports
2015-07-01 13:47:46.604 31419 WARNING keystone.token.controllers [-] User 0545bfa11fc444bb8782acb14f3e871e is unauthorized for tenant bd133d1e161345a69a15778cf7a580ca
2015-07-01 13:47:46.605 31419 WARNING keystone.common.wsgi [-] Authorization failed. The request you have made requires authentication. from x.y.z.t

which would translate as that "User admin is unauthorized for tenant services".

The errors, which could be improved, seems to be a red herring. The real issue is that Oozie returns with an exception that:
2015-07-01 09:50:07,693 INFO BaseJobServlet:539 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] AuthorizationException
org.apache.oozie.service.AuthorizationException: E0501: Could not perform authorization operation, Operation category READ is not supported in state standby
        at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
        [...]

This is the cluster configuration used, detailed by node groups and number of nodes for each of them:

* master-ha-common (1 node)
   - AMBARI_SERVER
   - HISTORYSERVER
   - OOZIE_SERVER
   - RESOURCEMANAGER
   - SECONDARY_NAMENODE

* master-ha-nn (2 nodes)
   - NAMENODE
   - ZOOKEEPER_SERVER
   - JOURNALNODE

* master-ha-node (1 node)
   - ZOOKEEPER_SERVER
   - JOURNALNODE

* worker-ha (3 nodes)
   - DATANODE
   - HDFS_CLIENT
   - MAPREDUCE2_CLIENT
   - NODEMANAGER
   - OOZIE_CLIENT
   - PIG
   - YARN_CLIENT
   - ZOOKEEPER_CLIENT

The configuration key hdfs.nnha is set to true, as described by the documentation.

I tested using a beta version of RHEL-OSP7, so basically Kilo, but the relevant code did not change in master:
openstack-sahara-common-2015.1.0-4.el7ost.noarch
openstack-sahara-engine-2015.1.0-4.el7ost.noarch
openstack-sahara-api-2015.1.0-4.el7ost.noarch

Revision history for this message
Elise Gafford (egafford) wrote :

From http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.3/bk_using_Ambari_book/content/install-ha_2x.html:

   If you are using Oozie, you need to use the Nameservice URI instead of the NameNode URI in your workflow files. For example, where the Nameservice ID is mycluster:

  <workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf">
      <start to="mr-node"/>
      <action name="mr-node">
          <map-reduce>
              <job-tracker>${jobTracker}</job-tracker>
              <name-node>hdfs://mycluster</name-node>

From http://172.24.4.230:8080/#/main/hosts/my-hdp2ha-b090a511-master-ha-common-f2ebfbc6-001.novalocal/configs (which pulls from hdfs-site.xml):

  dfs.nameservices: my-hdp2ha-b090a511

From your cluster definition (which should be valid):
  | info | {u'HDFS': {u'NameNode': |
  | | u'hdfs://172.24.4.229:8020', u'Web UI': |
  | | u'http://172.24.4.229:50070'}, u'JobFlow': |
  | | {u'Oozie': u'http://172.24.4.230:11000'}, |
  | | u'MapReduce2': {u'Web UI': |
  | | u'http://172.24.4.230:19888', u'History |
  | | Server': u'172.24.4.230:10020'}, u'Yarn': |
  | | {u'Web UI': u'http://172.24.4.230:8088', |
  | | u'ResourceManager': u'172.24.4.230:8050'}, |
  | | u'Ambari Console': {u'Web UI': |
  | | u'http://172.24.4.230:8080'}} |

From sahara/sahara/service/edp/oozie/engine.py:
        nn_path = self.get_name_node_uri(self.cluster)
        ...
        job_parameters = {
            "jobTracker": rm_path,
            "nameNode": nn_path,
            "user.name": hdfs_user,
            oozie_libpath_key: oozie_libpath,
            app_path: "%s%s" % (nn_path, path_to_workflow),
            "oozie.use.system.libpath": "true"}

From sahara/sahara/plugins/hdp/edp_engine.py:
    def get_name_node_uri(self, cluster):
        return cluster['info']['HDFS']['NameNode']

It seems that we've succeeded in setting up a highly available cluster, but we're hard-coding ourselves into only using Oozie through one of the nodes, rather than using the nameservice; as the [info][HDFS][NameNode] is hardcoded as of cluster creation time, and the feature did not build any sense of nameservice into Sahara.

Changed in sahara:
assignee: nobody → Ethan Gafford (egafford)
Changed in sahara:
status: New → In Progress
Changed in sahara:
milestone: none → liberty-2
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to sahara (master)

Reviewed: https://review.openstack.org/198895
Committed: https://git.openstack.org/cgit/openstack/sahara/commit/?id=aced37e13f1357a44b7d6f8367a4f49f559ad6c7
Submitter: Jenkins
Branch: master

commit aced37e13f1357a44b7d6f8367a4f49f559ad6c7
Author: Ethan Gafford <email address hidden>
Date: Mon Jul 6 17:13:17 2015 -0400

    [HDP] Nameservice awareness for NNHA case

    With the addition of NNHA in Kilo, Oozie continued to be pointed at only
    one namenode's IP. This change directs EDP jobs to the nameservice,
    which defaults to the cluster's name as sent to HDP.

    As it is intended primarily for backport, to allow Sahara's EDP to
    function in the NNHA case, this change takes a minimal-path approach to
    resolving this issue, which can be supplemented or replaced by a more
    robust solution for nameservice configuration and load balancing for all
    components as time permits.

    Change-Id: Icc937fcb534427f752d6db788ac10a934dfbfd4c
    Closes-bug: 1470841

Changed in sahara:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to sahara (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/200167

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to sahara (stable/kilo)

Reviewed: https://review.openstack.org/200167
Committed: https://git.openstack.org/cgit/openstack/sahara/commit/?id=26f3b091ddcf50a5bdee06017862349a514066e7
Submitter: Jenkins
Branch: stable/kilo

commit 26f3b091ddcf50a5bdee06017862349a514066e7
Author: Ethan Gafford <email address hidden>
Date: Mon Jul 6 17:13:17 2015 -0400

    [HDP] Nameservice awareness for NNHA case

    With the addition of NNHA in Kilo, Oozie continued to be pointed at only
    one namenode's IP. This change directs EDP jobs to the nameservice,
    which defaults to the cluster's name as sent to HDP.

    As it is intended primarily for backport, to allow Sahara's EDP to
    function in the NNHA case, this change takes a minimal-path approach to
    resolving this issue, which can be supplemented or replaced by a more
    robust solution for nameservice configuration and load balancing for all
    components as time permits.

    Closes-bug: 1470841
    (cherry picked from commit aced37e13f1357a44b7d6f8367a4f49f559ad6c7)

    Change-Id: I14ccd5b32c68d0f989b12c4485598094e8841474

Thierry Carrez (ttx)
Changed in sahara:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in sahara:
milestone: liberty-2 → 3.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.