[DOC] Restart of sahara-engine can leave cluster in "Starting" state

Bug #1185909 reported by Matthew Farrellee
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Sahara
Fix Released
Medium
Michael Ionkin

Bug Description

Theme - Savanna should be tolerant of its own faults

Cluster creation happens in an eventlet running api.py:_cluster_creation_job within the savanna-api process. If the savanna-api process is restarted during cluster creation[0], the cluster creation process is not restarted.

The result is all clusters that were "Starting" at the time of the restart will always be "Starting", i.e. never transition to "Active".

Expectation is that the startup process will be resumed once the savanna-api process is restarted.

[0] A long lived operation, because it waits for all instances to be up and accessible

Tags: docs
Revision history for this message
Sergey Lukjanov (slukjanov) wrote :

We are working at the next phase with pluggable provisioning support and due to the supporting plugins that will provide an ability to deploy management consoles cluster startup process will be very complicated process. Generally I think that we should support resuming cluster creation after savanna-api restart, but it's a very good question - how it should work?

I think that blueprint should be created for such problem with several ways to solve it.

Revision history for this message
Sergey Lukjanov (slukjanov) wrote :

BTW it is an extremely good point and it would be really cool to solve this problem as well as the problem with allocated resource management.

Changed in savanna:
importance: Undecided → Wishlist
status: New → Confirmed
tags: added: 0.2
tags: removed: 0.2
Changed in savanna:
milestone: none → 0.2.1
tags: added: 0.2.1
tags: removed: 0.2.1
Changed in savanna:
milestone: 0.2.1 → none
Changed in savanna:
milestone: none → next
status: Confirmed → Triaged
Changed in savanna:
milestone: next → none
Revision history for this message
Andrew Lazarev (alazarev) wrote :

I think this one has pretty simple solution - move non-Active cluster to 'Error' after some period of time. This solution is not ideal, but much better than having cluster in 'Starting' forever.

Changed in sahara:
importance: Wishlist → Low
Revision history for this message
Sergey Lukjanov (slukjanov) wrote :

This bug is > 180 days without activity. We are unsetting assignee and milestone and setting status to Incomplete in order to allow its expiry in 60 days.

If the bug is still valid, then update the bug status.

Changed in sahara:
status: Triaged → Incomplete
Revision history for this message
Sergey Reshetnyak (sreshetniak) wrote :

This bug is > 60 days without activity. We are unsetting assignee and milestone and setting status to Incomplete in order to allow its expiry in 60 days.

If the bug is still valid, then update the bug status.

Revision history for this message
Luigi Toscano (ltoscano) wrote : Re: Restart of sahara-engine can leave cluster in "Starting" state

This is still true (try to restart sahara-engine during cluster creation, the creation is not resumed, whether it was in Progress, Configuring or Starting).
On the other side, it is possible to configure cleanup_time_for_incomplete_clusters to automatically cleanup clusters in non-final states, but it is not enabled by default. This bug is about the default behavior and I think then that it's not solved.

summary: - Restart of savanna-api can leave cluster in "Starting" state
+ Restart of sahara-engine can leave cluster in "Starting" state
Changed in sahara:
status: Incomplete → Confirmed
Revision history for this message
Vitalii Gridnev (vgridnev) wrote :

It's hard to predict what value should be used by default. I think that we should document that feature well. adding docs tags and moving to Medium prio + newton-3

tags: added: docs
summary: - Restart of sahara-engine can leave cluster in "Starting" state
+ [DOCS] Restart of sahara-engine can leave cluster in "Starting" state
summary: - [DOCS] Restart of sahara-engine can leave cluster in "Starting" state
+ [DOC] Restart of sahara-engine can leave cluster in "Starting" state
Changed in sahara:
importance: Low → Medium
assignee: nobody → Michael Ionkin (msionkin)
milestone: none → newton-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to sahara (master)

Fix proposed to branch: master
Review: https://review.openstack.org/340422

Changed in sahara:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to sahara (master)

Reviewed: https://review.openstack.org/340422
Committed: https://git.openstack.org/cgit/openstack/sahara/commit/?id=c90a1da596411743c8a11310112b420dbc1380ef
Submitter: Jenkins
Branch: master

commit c90a1da596411743c8a11310112b420dbc1380ef
Author: Michael Ionkin <email address hidden>
Date: Mon Jul 11 18:00:16 2016 +0300

    [DOC] Cleanup time for incomplete clusters

    Cleanup time for incomplete clusters feature documented.
    closes-bug: 1185909

    Change-Id: I785b25b30e88cf0604236d34aaf3170b990b21d3

Changed in sahara:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/sahara 5.0.0.0b3

This issue was fixed in the openstack/sahara 5.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.