Can't ignore updates to OS::Nova::Server

Bug #1539541 reported by Steven Hardy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
High
huangtianhua
tripleo
Fix Released
Undecided
Steve Baker

Bug Description

When managing large groups of OS::Nova::Server resources inside e.g a ResourceGroup, you may want to change a property such that it only affects new instances of the resource.

Examples:
 - adding node-specific data to a mapping written via user_data
  - Changing the image to point to a new version, existing nodes are updated in-place

Currently we have image_update_policy, to which we could add an IGNORE option to cater for the latter example.

For the first example, we need a user_data_update_policy with a similar IGNORE option.

The use-case for this is allowing non-destructive stack updates for TripleO, but it's a general problem for maintaining large groups of servers over a long period of time, where a forced replacement or even rolling update are not desired.

Tags: tripleo
Steven Hardy (shardy)
Changed in heat:
assignee: nobody → Steven Hardy (shardy)
Steven Hardy (shardy)
Changed in heat:
status: New → Triaged
importance: Undecided → High
milestone: none → mitaka-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/274101

Changed in heat:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/274149

Revision history for this message
Zane Bitter (zaneb) wrote :

This problem is a consequence of two (IMHO) incorrect design decisions:

1) Not having separate Launch Configs for Autoscaling groups; and
2) Allowing ResourceGroup to exist

The good news is that Senlin corrects these mistakes.

The bad news is that apparently we are going to hack around it in the short term by adding non-local state that completely changes how Heat behaves in ways that increasingly remind one of http://eev.ee/blog/2012/04/09/php-a-fractal-of-bad-design/#philosophy (search for "action at a distance").

Revision history for this message
Steven Hardy (shardy) wrote :

@Zane: I can appreciate your viewpoint, but given that ResourceGroup does exist, what would you suggest?

This is a blocker bug for TripleO because it's impossible to ever change anything in user_data for existing deployments, or you destroy every.single.node, which is invariably *never* what the operator actually wants.

What I have proposed is a simple and low-risk workaround for folks getting bitten by the incorrect design decisions you reference, which I see as a reasonable interim fix while we get to the point where folks could feasibly migrate to an alternative solution like Senlin.

AFAICT Senlin isn't a magic bullet here, firstly AIUI it's not really ready for production usage, you have to roll out an entirely new service, modify your templates, and even then I can't see how we'd migrate anything from exisitng ASG/RG resources anyway, so you're back to square 1 where the only option is to redeploy the world.

tags: added: tripleo
Revision history for this message
Steve Baker (steve-stevebaker) wrote :

I'm fine with a user_data_update_policy, its a perfectly reasonable thing to put in the template author's control.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on heat (master)

Change abandoned by Steven Hardy (<email address hidden>) on branch: master
Review: https://review.openstack.org/274101
Reason: Heh, this has been described as pure-evil by zaneb and stevebaker doesn't like it either, so I'm going to abandon it in favour of one of their proposed alternatives.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Steven Hardy (<email address hidden>) on branch: master
Review: https://review.openstack.org/274149

Revision history for this message
Marios Andreou (marios-b) wrote :

Hi all, I was wondering what the status is here. I see the two reviews by shardy at https://review.openstack.org/274101 and https://review.openstack.org/274149 have been abandoned (though sbaker has restored the dependent latter review).

From those reviews and the discussion above I'm lead to believe there is some debate around the approach being taken(!). I'm mainly interested in how we move forward. This bug blocks tripleo upgrades because of the user_data example shardy gives in the description (because we do this https://review.openstack.org/#/c/220057/ ).

Do you think it is possible to change shardy's existing reviews into something palatable to everyone? I can't see much actual discussion on the reviews (at least not of the sort to justify the abandonment) so given the discussion happened on irc/wherever else, I'm not sure how far the differing viewpoints diverge.

Assuming there is an alternative proposal can you please include a link here. Thanks for any info you can provide!

Revision history for this message
Steven Hardy (shardy) wrote :

Hi Marios - I debated this with Zane and Steve Baker, and was under the impression Steve was going to propose an alternative approach, and that my config-file approach was being rejected, hence abandoning the reviews.

My understanding is that Steve had an idea about a DiB element which would create the user, including retrieving the SSH key from the nova metadata (such as is done via cloud-init with the userdata approach).

I'm fine with doing that, but don't have bandwidth to work on that alternate solution, so I've assigned this to stevebaker.

I do think this is a heat issue tho, we removed the heat-admin user from our internal user-data, admittedly with a deprecation period, but we failed to provide any migration path for those relying on the feature.

Changed in heat:
assignee: Steven Hardy (shardy) → nobody
assignee: nobody → Steve Baker (steve-stevebaker)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/283311

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on heat (master)

Change abandoned by Steve Baker (<email address hidden>) on branch: master
Review: https://review.openstack.org/283311
Reason: This is likely better done by the subclass building its own modified properties_schema

Changed in tripleo:
status: New → In Progress
assignee: nobody → Steve Baker (steve-stevebaker)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to instack-undercloud (master)

Fix proposed to branch: master
Review: https://review.openstack.org/283379

Revision history for this message
Marios Andreou (marios-b) wrote :

sbaker: for clarity, if we were to land https://review.openstack.org/#/c/283379/ "Override OS::Nova::Server for user_data updates" into instack-undercloud which makes it so Server.USER_DATA is update_allowed should it fix the user_data issue in itself? I understand you're also going to try and land https://review.openstack.org/#/c/274149/3 "Add user_data_update_policy property to OS::Nova::Server" for Mitaka in heat, but afaics we can just use the former patch at #/c/283379 which fixes the USER_DATA itself.

WRT using the subclass of OS::Nova::Server in #/c/283379. AFAICS this is then always aliased with the def resource_mapping(): return {'OS::Nova::Server': ServerUpdateAllowed}. If we have an existing overcloud with 'normal' OS::Nova::Server and then update after adding this to your undercloud, will the existing servers survive (I'll add a note to the review too),

thanks again, marios

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on instack-undercloud (master)

Change abandoned by Steve Baker (<email address hidden>) on branch: master
Review: https://review.openstack.org/283379
Reason: See https://review.openstack.org/#/c/283832/

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

I'm proposing that https://review.openstack.org/#/c/283832/ be the fix for stable/liberty tripleo, and that tripleo-heat-templates specifies a user_data_update_policy when a heat release contains this change https://review.openstack.org/#/c/274149/

Changed in heat:
milestone: mitaka-3 → mitaka-rc1
Changed in tripleo:
status: In Progress → Fix Released
Changed in heat:
assignee: Steve Baker (steve-stevebaker) → huangtianhua (huangtianhua)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/274149
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=d4188127a14686f6d9844180b977cf5fa05aa024
Submitter: Jenkins
Branch: master

commit d4188127a14686f6d9844180b977cf5fa05aa024
Author: Steve Baker <email address hidden>
Date: Wed Feb 24 16:25:16 2016 +1300

    Add user_data_update_policy property to OS::Nova::Server

    This may be set to either 'REPLACE' (default) or 'IGNORE'.
    This allows template authors to choose the desired behaviour when the
    user_data property is changed.

    Co-Authored-By: Steve Hardy <email address hidden>

    Change-Id: I3239c7252a2c329330283b86181abd52aee9e967
    Closes-Bug: #1539541

Changed in heat:
status: In Progress → Fix Released
Revision history for this message
Marios Andreou (marios-b) wrote :

Update - the landed fix above worked fine for the user-data replacement issue but as discussed at https://bugzilla.redhat.com/show_bug.cgi?id=1314429 the problem persists because of updates to other properties of the OS::Nova::Server resource.

Steve Baker has a new review @ https://review.openstack.org/#/c/288273/ "Prevent any property change from replacing OS::Nova::Server"

to prevent any property update from replacing the server (for now)

Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/heat 6.0.0.0rc1

This issue was fixed in the openstack/heat 6.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.