Scale down cluster error (cdh5.5)

Bug #1597701 reported by yacine
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Sahara
Fix Released
High
Vitalii Gridnev

Bug Description

Steps:

1) Create a cdh5.5 cluster (1 nameNode, 1 secondary nameNode, 3 dataNodes)
2) Launch scale up on it (add 1 slave)
3) try to down scale (remove 1 slave)

result :

2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [req-88e226aa-6939-4f9c-bbf0-58cc3e8bb6f1 4891080eb4294707a76cb37fce2d60cb ca3109264b23446283933a1bfc7d4d67 - - -] [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] Error during operating on cluster (reason: CM API error: HTTP Error 400: Bad Request
Error ID: 4bd38f4e-e4d0-42c3-b28a-9b930b8cb716)
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] Traceback (most recent call last):
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] File "/home/sahara/sahara/.venv/local/lib/python2.7/site-packages/sahara/service/ops.py", line 192, in wrapper
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] f(cluster_id, *args, **kwds)
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] File "/home/sahara/sahara/.venv/local/lib/python2.7/site-packages/sahara/service/ops.py", line 332, in _provision_scaled_cluster
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] plugin.decommission_nodes(cluster, instances_to_delete)
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] File "/home/sahara/sahara/.venv/local/lib/python2.7/site-packages/sahara/plugins/cdh/plugin.py", line 63, in decommission_nodes
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] cluster.hadoop_version).decommission_nodes(cluster, instances)
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] File "/home/sahara/sahara/.venv/local/lib/python2.7/site-packages/sahara/plugins/cdh/abstractversionhandler.py", line 114, in decommission_nodes
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] self.deploy.decommission_cluster(cluster, instances)
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] File "/home/sahara/sahara/.venv/local/lib/python2.7/site-packages/sahara/plugins/cdh/v5_5_0/deploy.py", line 127, in decommission_cluster
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] CU.delete_instances(cluster, instances)
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] File "/home/sahara/sahara/.venv/local/lib/python2.7/site-packages/sahara/utils/cluster_progress_ops.py", line 139, in handler
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] add_fail_event(instance, e)
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] File "/home/sahara/sahara/.venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] self.force_reraise()
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] File "/home/sahara/sahara/.venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] six.reraise(self.type_, self.value, self.tb)
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] File "/home/sahara/sahara/.venv/local/lib/python2.7/site-packages/sahara/utils/cluster_progress_ops.py", line 136, in handler
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] value = func(*args, **kwargs)
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] File "/home/sahara/sahara/.venv/local/lib/python2.7/site-packages/sahara/plugins/cdh/cloudera_utils.py", line 105, in delete_instances
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] cm_cluster.remove_host(host.hostId)
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] File "/home/sahara/sahara/.venv/local/lib/python2.7/site-packages/sahara/plugins/cdh/client/clusters.py", line 166, in remove_host
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] return self._delete("hosts/" + hostId, types.ApiHostRef, api_version=3)
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] File "/home/sahara/sahara/.venv/local/lib/python2.7/site-packages/sahara/plugins/cdh/client/types.py", line 383, in _delete
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] params, api_version)
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] File "/home/sahara/sahara/.venv/local/lib/python2.7/site-packages/sahara/plugins/cdh/client/types.py", line 411, in _call
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] api_version)
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] File "/home/sahara/sahara/.venv/local/lib/python2.7/site-packages/sahara/plugins/cdh/client/types.py", line 150, in call
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] ret = method(path, params=params)
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] File "/home/sahara/sahara/.venv/local/lib/python2.7/site-packages/sahara/plugins/cdh/client/resource.py", line 137, in delete
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] return self.invoke("DELETE", relpath, params)
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] File "/home/sahara/sahara/.venv/local/lib/python2.7/site-packages/sahara/plugins/cdh/client/resource.py", line 79, in invoke
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] headers=headers)
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] File "/home/sahara/sahara/.venv/local/lib/python2.7/site-packages/sahara/plugins/cdh/client/http_client.py", line 135, in execute
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] raise self._exc_class(message)
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] CMApiException: CM API error: HTTP Error 400: Bad Request
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26] Error ID: 4bd38f4e-e4d0-42c3-b28a-9b930b8cb716
2016-06-30 10:37:59.463 9837 ERROR sahara.service.ops [instance: none, cluster: 64bd71a4-4b8b-4ff5-a33b-16bcfcbf1f26]

Changed in sahara:
status: New → Triaged
importance: Undecided → High
milestone: none → newton-3
Changed in sahara:
assignee: nobody → Vitaly Gridnev (vgridnev)
Changed in sahara:
status: Triaged → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to sahara (master)

Fix proposed to branch: master
Review: https://review.openstack.org/342266

Changed in sahara:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to sahara (master)

Reviewed: https://review.openstack.org/342266
Committed: https://git.openstack.org/cgit/openstack/sahara/commit/?id=ea44774c506232051fe7d2794fa5e09d22886914
Submitter: Jenkins
Branch: master

commit ea44774c506232051fe7d2794fa5e09d22886914
Author: Vitaly Gridnev <email address hidden>
Date: Thu Jul 14 19:14:52 2016 +0300

    improved scaling for cdh plugin

    this fixes issues with decommissioning nodes,
    when gateway nodes are included. additionally,
    restarting of stale services are required to
    for redeployment of configs.

    Change-Id: I9d439936c0e2f851054735f0defba8efc592c84d
    Closes-bug: 1597701

Changed in sahara:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/sahara 5.0.0.0b3

This issue was fixed in the openstack/sahara 5.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.