ResourceGroup count is causing increment on stack-update

Bug #1383709 reported by Ladislav Smola
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Invalid
Undecided
Unassigned
tripleo
Fix Released
High
Steven Hardy

Bug Description

The behavior of ResourceGroup count when running heat stack-update is that number of resources should end up on that count.

Right now it is getting incremented for me

So if I update with the same parameters as I create the stack, number of resources is getting incremented by the count instead of staying the same.

Tags: tripleo
Revision history for this message
Steven Hardy (shardy) wrote :

I've tried to reproduce this on latest master, and I can't, see my attached reproduce attempt.

Can you try the same, and verify if you see the same behavior or incremented RandomString resources?

Changed in heat:
status: New → Incomplete
Revision history for this message
Steven Hardy (shardy) wrote :

FWIW, in future we need much more info than this to effectively investigate a bug:

- What version of heat (you've told me instack on IRC, so Juno right?!)

- The template you used, ideally reduced to a minimal reproducer

- The output from related heat cli output (e.g in this case heat resource-list on the nested stack or with --nested-depth set to show the nested resources)

- A cut/paste of the heat-engine.log with debugging enabled

Without this basic information, we stand very little chance of solving the bug unless you've confirmed it's easy to reproduce on latest master before submitting the bug.

Revision history for this message
Steven Hardy (shardy) wrote :

Ok, I've now seen this on a (not latest master) devtest tripleO environment.

Still need more info, so leaving incomplete for now and assigning to myself while I figure out what's happening.

Changed in heat:
assignee: nobody → Steven Hardy (shardy)
importance: Undecided → High
Revision history for this message
Steven Hardy (shardy) wrote :

Update, so the ResourceGroup itself isn't causing the increment, it seems the update is triggering a replacement of the ResourceGroup, and the apparent increment is because we create the new ones before deleting the old:

$ heat resource-list overcloud | grep ResourceGroup
| Controller | bc3c047d-f951-4494-9209-7082f13dc335 | OS::Heat::ResourceGroup | CREATE_COMPLETE | 2014-10-23T15:19:17Z |
| Compute | d4f3b141-e8bb-4ae5-8c9d-2ec385ae4e27 | OS::Heat::ResourceGroup | CREATE_COMPLETE | 2014-10-23T15:19:18Z |

<do a stack update, without changing anything>

$ heat resource-list overcloud | grep ResourceGroup
| Compute | 086c398c-f2ef-4f5a-ba5f-7f17071b6bfc | OS::Heat::ResourceGroup | CREATE_IN_PROGRESS | 2014-10-23T16:20:50Z |
| Controller | a6212af5-7a3e-4287-a1a2-14bd93eb41d1 | OS::Heat::ResourceGroup | CREATE_IN_PROGRESS | 2014-10-23T16:20:53Z |

Both the ID's are different, and the nova instances are duplicated:
$ nova list
+--------------------------------------+-------------------------------------------------------+--------+------------+-------------+---------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+-------------------------------------------------------+--------+------------+-------------+---------------------+
| b10821dd-3edc-446f-b817-c4c2197eccc3 | ov-56swikovh7m-0-6qig4k2hlldz-Controller-4wm3wkupmbpr | BUILD | spawning | NOSTATE | ctlplane=192.0.2.12 |
| bbb2d1a9-56ef-4ab5-91f4-912fc42bb212 | ov-72r43zeuhw-0-7dsvnuujefsw-NovaCompute-xcf4wjohc6iq | BUILD | spawning | NOSTATE | ctlplane=192.0.2.10 |
| 8d58e247-c900-4d56-8086-9968f7ac0d44 | ov-72r43zeuhw-1-o76yax6xvbjc-NovaCompute-dejavowtpml5 | BUILD | spawning | NOSTATE | ctlplane=192.0.2.11 |
| 98c867d0-f2fc-42aa-bad7-1bd291c80c6f | ov-hhgns72yzh-0-it6zqgdcqnvg-NovaCompute-rp5kcn6gqmxo | ACTIVE | - | Running | ctlplane=192.0.2.6 |
| 28f2dcd2-7a28-43ef-99f6-d6b50ebfd989 | ov-hhgns72yzh-1-eqweuh63vcrl-NovaCompute-uszpglj4svss | ACTIVE | - | Running | ctlplane=192.0.2.7 |
| e9f7d3b4-7726-4ced-8e0d-c2cb3e4d59f8 | ov-pghiisxo4ea-0-hk477xeasx5h-Controller-ioo5hg7xvlkh | ACTIVE | - | Running | ctlplane=192.0.2.5 |
+--------------------------------------+-------------------------------------------------------+--------+------------+-------------+---------------------+

Obviously, if the template hasn't changed, we shouldn't be updating anything at all.

Changed in heat:
status: Incomplete → Confirmed
milestone: none → kilo-1
Steven Hardy (shardy)
tags: added: tripleo
Revision history for this message
Steve Baker (steve-stevebaker) wrote :

bug #1381834 might be a duplicate of this, but it is AutoScalingGroup

Revision history for this message
Shunli Zhou (shunliz) wrote :

bug #1381834's template changed. The AutoScalingGroup size changed from 11->8. But 10 instances created after stack update.

Revision history for this message
Steven Hardy (shardy) wrote :

Yeah, I'm not sure this is the same issue as #1381834 but I'll keep it in mind. The problem here is something in the TripleO usage of ResourceGroup is triggering replacement of the entire ResourceGroup, which AIUI is not quite the same problem as described for AutoScalingGroup.

Revision history for this message
Steven Hardy (shardy) wrote :

So, it appears to be the replacement of one (or more) OS::Nova::Port resources which is causing this:

(seed)[shardy@localhost tripleo]$ cat resource_list_preupdate.txt | grep ControlVirtualIP
| ControlVirtualIP | e96070d4-aea1-4d9a-9c52-8bb5bbd9eb8a | OS::Neutron::Port | CREATE_COMPLETE | 2014-10-28T16:02:22Z |
(seed)[shardy@localhost tripleo]$ cat resource_list_postupdate.txt | grep ControlVirtualIP
| ControlVirtualIP | 8db57bf6-f1ea-4621-b5c3-9635808428d3 | OS::Neutron::Port | CREATE_COMPLETE | 2014-10-28T17:28:07Z |

Basically ControlVirtualIP gets replaced, then because it's referenced in the ResourceGroup properties, that is replaced too.

Still investigating, but both those things seem wrong - we shouldn't replace the port, and even if we did, the resource group update should recurse and update the resources, not replace the whole group.

Revision history for this message
Steven Hardy (shardy) wrote :

ML thread started about flipping the default replacement_policy for OS::Neutron::Port:

http://lists.openstack.org/pipermail/openstack-dev/2014-October/049376.html

I suspect, but have not yet proven, that will solve this issue. If there's some barrier to changing the default in heat, we'll have to change the property value in the tripleo-heat-templates (maybe we should do that anyway for compatibility with older versions of heat..)

Revision history for this message
Steven Hardy (shardy) wrote :

Ok, confirmed, adding replacement_policy: AUTO to the templates solves the issue, so I'll propose that while we discuss the most sane default.

Steven Hardy (shardy)
Changed in tripleo:
assignee: nobody → Steven Hardy (shardy)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/131538

Changed in tripleo:
importance: Undecided → High
Revision history for this message
Steven Hardy (shardy) wrote :

Ok, so following the ML discussion with stevebaker, I'm proposing we do the following:

1. Get the AUTO workaround into tripleo-heat-templates to mitigate the immediate problem

2. Wait for https://blueprints.launchpad.net/heat/+spec/rich-network-prop to land, then convert to using that, so we get less undesirable behaviour in the various update patterns AUTO doesn't handle so well (ref http://lists.openstack.org/pipermail/openstack-dev/2014-October/049491.html which contains a very nice summary of the current pros/cons to each replacement_property setting.

Accordingly, I think we can close the heat part of this as until rich-network-prop lands, this can probably be categorized as a known issue, rather than a bug.

I've associated the rich-network-prop with this bug anyway, so we have some connection between the two issues,

Steven Hardy (shardy)
Changed in heat:
status: Confirmed → Invalid
importance: High → Undecided
assignee: Steven Hardy (shardy) → nobody
milestone: kilo-1 → none
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/131538
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=9cf073075c4331ebd323768a11deda30e975d22d
Submitter: Jenkins
Branch: master

commit 9cf073075c4331ebd323768a11deda30e975d22d
Author: Steve Hardy <email address hidden>
Date: Tue Oct 28 19:22:18 2014 +0000

    Don't replace OS::Neutron::Port on update

    Due to an ununsual interface to OS::Neutron::Port resources,
    it's necessary to specify replacement_policy: AUTO, or the
    resource is unconditionally replaced on every stack update.

    I've started discussion re possibly changing the default in
    Heat, but right now, we need this or we have the bad outcome
    of replacing all (!) compute and controller nodes on every
    stack-update, even if the templates are unmodified.

    Passing the AUTO value should be safe regardless of any
    potential change of default value in Heat.

    Change-Id: I6dd02ae17407f8f4c81ae418e5027f4f38ae4e9b
    Closes-Bug: #1383709

Changed in tripleo:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/133435

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/133435
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=906d7cd40bc04c8ddf36a625ea0a5f3db89ddf94
Submitter: Jenkins
Branch: master

commit 906d7cd40bc04c8ddf36a625ea0a5f3db89ddf94
Author: James Polley <email address hidden>
Date: Mon Nov 10 11:43:51 2014 +0100

    Don't replace OS::Neutron::Port on update of undercloud

    This change is congruent with I6dd02ae17407f8f4c81ae418e5027f4f38ae4e9b
    but applies to undercloud configs rather than overcloud configs.

    I've listed this as closing 138709 even though that bug didn't talk
    about the undercloud as this seems like it's another instance of the
    same issue seen there.

    Change-Id: I3ee80043bb455460991e78525fa4310934df4697
    Closes-Bug: #1383709

Derek Higgins (derekh)
Changed in tripleo:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.