[Heat] Cluster cannot be scaled

Bug #1376829 reported by Yaroslav Lobankov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Sahara
Fix Released
High
Sergey Reshetnyak

Bug Description

Environment:
Devstack with Heat and Neutron

How to reproduce:
1. Create a cluster.
2. Wait for "Active" status of the cluster.
3. Try to scale the cluster.

Expected result:
The cluster can be successfully scaled.

Observed result:
The cluster failed to scale.

The problem is that Heat removes floating and fixed IPs of cluster nodes that must not be deleted when the cluster is scaled.

2014-10-02 11:53:35.388 INFO sahara.cli.sahara_all [-] 172.18.78.27 - - [02/Oct/2014 11:53:35] "GET /v1.1/3a58ddb1a8ed4881a07b37a40bb0e952/clusters/681414c2-5ebb-4d79-9ef5-fc056a79139a HTTP/1.1" 200 5204 0.
121090
2014-10-02 11:53:37.518 ERROR sahara.context [-] Thread 'configure-instance-sahara-cluster-1542150812-jt-nn-ooz-sec-nn-001' fails with exception: 'error: [Errno 113] No route to host'
2014-10-02 11:53:37.518 TRACE sahara.context Traceback (most recent call last):
2014-10-02 11:53:37.518 TRACE sahara.context File "/opt/stack/sahara/sahara/context.py", line 153, in _wrapper
2014-10-02 11:53:37.518 TRACE sahara.context func(*args, **kwargs)
2014-10-02 11:53:37.518 TRACE sahara.context File "/opt/stack/sahara/sahara/service/engine.py", line 135, in _configure_instance
2014-10-02 11:53:37.518 TRACE sahara.context with instance.remote() as r:
2014-10-02 11:53:37.518 TRACE sahara.context File "/opt/stack/sahara/sahara/utils/ssh_remote.py", line 322, in __enter__
2014-10-02 11:53:37.518 TRACE sahara.context _release_remote_semaphore()
2014-10-02 11:53:37.518 TRACE sahara.context File "/usr/local/lib/python2.7/dist-packages/oslo/utils/excutils.py", line 82, in __exit__
2014-10-02 11:53:37.518 TRACE sahara.context six.reraise(self.type_, self.value, self.tb)
2014-10-02 11:53:37.518 TRACE sahara.context File "/opt/stack/sahara/sahara/utils/ssh_remote.py", line 318, in __enter__
2014-10-02 11:53:37.518 TRACE sahara.context self.bulk = BulkInstanceInteropHelper(self.instance)
2014-10-02 11:53:37.518 TRACE sahara.context File "/opt/stack/sahara/sahara/utils/ssh_remote.py", line 471, in __init__
2014-10-02 11:53:37.518 TRACE sahara.context procutils.shutdown_subprocess(self.proc, _cleanup)
2014-10-02 11:53:37.518 TRACE sahara.context File "/usr/local/lib/python2.7/dist-packages/oslo/utils/excutils.py", line 82, in __exit__
2014-10-02 11:53:37.518 TRACE sahara.context six.reraise(self.type_, self.value, self.tb)
2014-10-02 11:53:37.518 TRACE sahara.context File "/opt/stack/sahara/sahara/utils/ssh_remote.py", line 468, in __init__
2014-10-02 11:53:37.518 TRACE sahara.context self._get_conn_params())
2014-10-02 11:53:37.518 TRACE sahara.context File "/opt/stack/sahara/sahara/utils/procutils.py", line 52, in run_in_subprocess
2014-10-02 11:53:37.518 TRACE sahara.context raise SubprocessException(result['exception'])
2014-10-02 11:53:37.518 TRACE sahara.context SubprocessException: error: [Errno 113] No route to host
2014-10-02 11:53:37.518 TRACE sahara.context
2014-10-02 11:53:37.584 ERROR sahara.context [-] Thread 'configure-instance-sahara-cluster-1542150812-tt-dn-001' fails with exception: 'error: [Errno 113] No route to host'
2014-10-02 11:53:37.584 TRACE sahara.context Traceback (most recent call last):
2014-10-02 11:53:37.584 TRACE sahara.context File "/opt/stack/sahara/sahara/context.py", line 153, in _wrapper
2014-10-02 11:53:37.584 TRACE sahara.context func(*args, **kwargs)
2014-10-02 11:53:37.584 TRACE sahara.context File "/opt/stack/sahara/sahara/service/engine.py", line 135, in _configure_instance
2014-10-02 11:53:37.584 TRACE sahara.context with instance.remote() as r:
2014-10-02 11:53:37.584 TRACE sahara.context File "/opt/stack/sahara/sahara/utils/ssh_remote.py", line 322, in __enter__
2014-10-02 11:53:37.584 TRACE sahara.context _release_remote_semaphore()
2014-10-02 11:53:37.584 TRACE sahara.context File "/usr/local/lib/python2.7/dist-packages/oslo/utils/excutils.py", line 82, in __exit__
2014-10-02 11:53:37.584 TRACE sahara.context six.reraise(self.type_, self.value, self.tb)
2014-10-02 11:53:37.584 TRACE sahara.context File "/opt/stack/sahara/sahara/utils/ssh_remote.py", line 318, in __enter__
2014-10-02 11:53:37.584 TRACE sahara.context self.bulk = BulkInstanceInteropHelper(self.instance)
2014-10-02 11:53:37.584 TRACE sahara.context File "/opt/stack/sahara/sahara/utils/ssh_remote.py", line 471, in __init__
2014-10-02 11:53:37.584 TRACE sahara.context procutils.shutdown_subprocess(self.proc, _cleanup)
2014-10-02 11:53:37.584 TRACE sahara.context File "/usr/local/lib/python2.7/dist-packages/oslo/utils/excutils.py", line 82, in __exit__
2014-10-02 11:53:37.584 TRACE sahara.context six.reraise(self.type_, self.value, self.tb)
2014-10-02 11:53:37.584 TRACE sahara.context File "/opt/stack/sahara/sahara/utils/ssh_remote.py", line 468, in __init__
2014-10-02 11:53:37.584 TRACE sahara.context self._get_conn_params())
2014-10-02 11:53:37.584 TRACE sahara.context File "/opt/stack/sahara/sahara/utils/procutils.py", line 52, in run_in_subprocess
2014-10-02 11:53:37.584 TRACE sahara.context raise SubprocessException(result['exception'])
2014-10-02 11:53:37.584 TRACE sahara.context SubprocessException: error: [Errno 113] No route to host
2014-10-02 11:53:37.584 TRACE sahara.context
2014-10-02 11:53:37.645 ERROR sahara.service.ops [-] Error during rollback of cluster 'sahara-cluster-1542150812' (reason: An error occurred in thread 'configure-instance-sahara-cluster-1542150812-jt-nn-ooz-sec-nn-001': error: [Errno 113] No route to host)
2014-10-02 11:53:37.645 TRACE sahara.service.ops Traceback (most recent call last):
2014-10-02 11:53:37.645 TRACE sahara.service.ops File "/opt/stack/sahara/sahara/service/ops.py", line 130, in wrapper
2014-10-02 11:53:37.645 TRACE sahara.service.ops if _rollback_cluster(cluster, ex):
2014-10-02 11:53:37.645 TRACE sahara.service.ops File "/opt/stack/sahara/sahara/service/ops.py", line 153, in _rollback_cluster
2014-10-02 11:53:37.645 TRACE sahara.service.ops return INFRA.rollback_cluster(cluster, reason)
2014-10-02 11:53:37.645 TRACE sahara.service.ops File "/opt/stack/sahara/sahara/service/heat_engine.py", line 108, in rollback_cluster
2014-10-02 11:53:37.645 TRACE sahara.service.ops cluster, rollback_count, target_count, reason)
2014-10-02 11:53:37.645 TRACE sahara.service.ops File "/opt/stack/sahara/sahara/service/heat_engine.py", line 175, in _rollback_cluster_scaling
2014-10-02 11:53:37.645 TRACE sahara.service.ops launcher.launch_instances(cluster, rollback_count)
2014-10-02 11:53:37.645 TRACE sahara.service.ops File "/opt/stack/sahara/sahara/service/heat_engine.py", line 226, in launch_instances
2014-10-02 11:53:37.645 TRACE sahara.service.ops self._configure_instances(cluster)
2014-10-02 11:53:37.645 TRACE sahara.service.ops File "/opt/stack/sahara/sahara/service/engine.py", line 130, in _configure_instances
2014-10-02 11:53:37.645 TRACE sahara.service.ops self._configure_instance, instance, hosts_file)
2014-10-02 11:53:37.645 TRACE sahara.service.ops File "/opt/stack/sahara/sahara/context.py", line 224, in __exit__
2014-10-02 11:53:37.645 TRACE sahara.service.ops self._wait()
2014-10-02 11:53:37.645 TRACE sahara.service.ops File "/opt/stack/sahara/sahara/context.py", line 217, in _wait
2014-10-02 11:53:37.645 TRACE sahara.service.ops raise ex.ThreadException(self.failed_thread, self.exc)
2014-10-02 11:53:37.645 TRACE sahara.service.ops ThreadException: An error occurred in thread 'configure-instance-sahara-cluster-1542150812-jt-nn-ooz-sec-nn-001': error: [Errno 113] No route to host
2014-10-02 11:53:37.645 TRACE sahara.service.ops
2014-10-02 11:53:37.735 INFO sahara.utils.general [-] Cluster status has been changed: id=681414c2-5ebb-4d79-9ef5-fc056a79139a, New status=Error

Screenshot 1: Cluster is in "Active" status and is ready for scaling.

Screenshot 2-3: Cluster is being scaled. Nodes sahara-cluster-1542150812-dn-001 and sahara-cluster-1542150812-tt-001 must be deleted.

Screenshot 4: Nodes sahara-cluster-1542150812-dn-001 and sahara-cluster-1542150812-tt-001 was deleted. Heat deleted floating IPs of cluster nodes that must not be deleted.

Revision history for this message
Yaroslav Lobankov (ylobankov) wrote :
Revision history for this message
Yaroslav Lobankov (ylobankov) wrote :
Revision history for this message
Yaroslav Lobankov (ylobankov) wrote :
Revision history for this message
Yaroslav Lobankov (ylobankov) wrote :
tags: added: juno-rc-potential
Changed in sahara:
assignee: nobody → Sergey Reshetnyak (sreshetniak)
status: New → Triaged
importance: Undecided → High
milestone: none → kilo-1
Changed in sahara:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to sahara (master)

Reviewed: https://review.openstack.org/125900
Committed: https://git.openstack.org/cgit/openstack/sahara/commit/?id=4e9c29facbf6898047539a5a9405fd0a775ccfd7
Submitter: Jenkins
Branch: master

commit 4e9c29facbf6898047539a5a9405fd0a775ccfd7
Author: Sergey Reshetnyak <email address hidden>
Date: Fri Oct 3 10:58:47 2014 +0400

    Fix scaling with Heat and Neutron

    Closes-bug: #1376829

    Change-Id: Icbc950cc9e5f31871ea96dd1c7846fafdad444f4

Changed in sahara:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in sahara:
milestone: kilo-1 → juno-rc2
tags: removed: juno-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to sahara (proposed/juno)

Fix proposed to branch: proposed/juno
Review: https://review.openstack.org/126386

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to sahara (proposed/juno)

Reviewed: https://review.openstack.org/126386
Committed: https://git.openstack.org/cgit/openstack/sahara/commit/?id=e092be6b135f79bf35f0835681c3764db056095f
Submitter: Jenkins
Branch: proposed/juno

commit e092be6b135f79bf35f0835681c3764db056095f
Author: Sergey Reshetnyak <email address hidden>
Date: Fri Oct 3 10:58:47 2014 +0400

    Fix scaling with Heat and Neutron

    Closes-bug: #1376829

    Change-Id: Icbc950cc9e5f31871ea96dd1c7846fafdad444f4
    (cherry picked from commit 4e9c29facbf6898047539a5a9405fd0a775ccfd7)

Thierry Carrez (ttx)
Changed in sahara:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in sahara:
milestone: juno-rc2 → 2014.2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to sahara (master)

Fix proposed to branch: master
Review: https://review.openstack.org/128889

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to sahara (master)
Download full text (5.1 KiB)

Reviewed: https://review.openstack.org/128889
Committed: https://git.openstack.org/cgit/openstack/sahara/commit/?id=2217fb27ecf8c5b4e4c4673b5f22f0f16016b677
Submitter: Jenkins
Branch: master

commit 3630ccffb25f66e2efc9297b0ecb852f8d932363
Author: Trevor McKay <email address hidden>
Date: Wed Oct 1 17:23:29 2014 -0400

    Fix HDFS url description, and other various edits

    HDFS url description is wrong as a result of code changes. This was
    the major motivation for this CR.

    Additional changes

    * formatted for 80 characters
    * consistent use of '.' at the end of bullets
    * added mention of Spark
    * adding '.sahara' suffix is no longer necessary
    * some other minor changes

    Closes-Bug: 1376457
    Change-Id: I72134bcdf6c42911d07e65952a9a56331d896699
    (cherry picked from commit a718ec7ddf85ef2e1e17868f6e2cd05b1c2762cd)

commit ff3bf76318821336810709eb1ff4b88cf94b67c7
Author: Trevor McKay <email address hidden>
Date: Wed Oct 1 13:16:57 2014 -0400

    Remove line saying that scaling and EDP are not supported for Spark

    Closes-Bug: 1376364
    Change-Id: I82249f8b9fb932c206876c2f6652c0a0b9e0650b
    (cherry picked from commit e385e3ed02bddf4db3f0b82c800b2cc0e2c056ba)

commit 4f23cfefa18332274d88475984491facd79b85f3
Author: Trevor McKay <email address hidden>
Date: Wed Oct 1 12:34:14 2014 -0400

    Description of job config hints in new doc page is wrong

    The 'configs' field is not a dictionary, it is actually
    a list of dictionaries. Update the description.

    Closes-Bug: #1357615
    Change-Id: I540abe050f1d81e36f4b5dcca547a7e5c3514c84
    (cherry picked from commit 61be4ece04d6370086d8b5b9bea4224010ec0d15)

commit 0d94b67fca6b0c5776ddcfe0f3e5b489afe376ea
Author: Michael McCune <email address hidden>
Date: Wed Oct 1 11:25:41 2014 -0400

    Removing extraneous Swift information from Features

    Changes
    * removing repeated information from Features page for Swift integration
    * refactoring features.rst to 80 columns

    Change-Id: Ib37e4476258cc4547d4a27847c89a9611bff05bc
    Closes-Bug: #1376309
    (cherry picked from commit eb529ca4f2dd153d494c4e02dd302998b3d6f43b)

commit 9e3fbb654d3530b11d3e6c1fb652028e631e5859
Author: Trevor McKay <email address hidden>
Date: Tue Sep 30 16:08:15 2014 -0400

    Update the Elastic Data Processing (EDP) documentation page

    * Add description of MapReduce.Streaming job type
    * Add description of Spark job type
    * Add reference to advanced configuration for Swfit proxy
    * Note that .sahara suffix is added to Swift URLs automatically
    * A few minor changes

    Closes-Bug: 1374574
    Closes-Bug: 1374606
    Change-Id: Ie53888975ce436439cc808b2fdc45dff66bae1a9
    (cherry picked from commit 7973db35e61b0c2d686798cb2de50d281713b03b)

commit 360aedfb323fb888acc4745b262eb7746d14ef27
Author: Trevor McKay <email address hidden>
Date: Tue Sep 30 12:51:08 2014 -0400

    Add documentation on the EDP job engine SPI

    Closes-Bug: 1357615
    Change-Id: I57dae10da9460deb2a332025cc3a0ea37ae233ee
    (cherry picked from commit 62ba37a8c415f1c422f010c96c0d553ff788d343)

commit 9fa0c5473d29c5eeeef3a23e7...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.