Any error in heat output persist and prevent the use/update of the overcloud.

Bug #1712280 reported by Sofer Athlan-Guyot
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
High
Zane Bitter
Pike
Fix Committed
High
Zane Bitter
Queens
Fix Released
High
Zane Bitter
tripleo
Fix Released
Medium
Unassigned

Bug Description

Hi,

Summary: any failure in the output persist and prevent further use of the overcloud.

Longer version:

during testing of the update one of the patch had a typo in it:

  https://review.openstack.org/#/c/486567/32..34/docker/services/pacemaker/haproxy.yaml

DockerHaproxyImage instead of DockerHAProxyImage

It mades the update fails:

    The Parameter (DockerHaproxyImage) was not provided.

Then we updated the template locally with the right parameters and even add the DockerHaproxyImage definition in the docker.yaml image definition file.

The problem is that it was impossible to update the stack as the output error was remembered and made the deploy command fail.

Actually even :

    $ openstack stack output show overcloud --all > output.dump
    ERROR: Error in 38 output role_data: The Parameter (DockerHaproxyImage) was not provided.

make it fails

Doing a deploy (with corrected templates):

HTTPInternalServerError: ERROR: Error in 38 output role_data: The Parameter (DockerHAproxyImage) was not provided.
clean_up DeployOvercloud: ERROR: Error in 38 output role_data: The Parameter (DockerHAproxyImage) was not provided.
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/osc_lib/shell.py", line 134, in run
    ret_val = super(OpenStackShell, self).run(argv)
  File "/usr/lib/python2.7/site-packages/cliff/app.py", line 279, in run
    result = self.run_subcommand(remainder)
  File "/usr/lib/python2.7/site-packages/osc_lib/shell.py", line 169, in run_subcommand
    ret_value = super(OpenStackShell, self).run_subcommand(argv)
  File "/usr/lib/python2.7/site-packages/cliff/app.py", line 400, in run_subcommand
    result = cmd.run(parsed_args)
  File "/usr/lib/python2.7/site-packages/osc_lib/command/command.py", line 41, in run
    return super(Command, self).run(parsed_args)
  File "/usr/lib/python2.7/site-packages/cliff/command.py", line 137, in run
    return_code = self.take_action(parsed_args) or 0
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 891, in take_action
    stack = utils.get_stack(self.orchestration_client, parsed_args.stack)
  File "/usr/lib/python2.7/site-packages/tripleoclient/utils.py", line 359, in get_stack
    stack = orchestration_client.stacks.get(stack_name)
  File "/usr/lib/python2.7/site-packages/heatclient/v1/stacks.py", line 280, in get
    resp = self.client.get('/stacks/%s' % stack_id, **kwargs)
  File "/usr/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 288, in get
    return self.request(url, 'GET', **kwargs)
  File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 317, in request
    raise exc.from_response(resp)
HTTPInternalServerError: ERROR: Error in 38 output role_data: The Parameter (DockerHaproxyImage) was not provided.

The get_stack method in utils (/usr/lib/python2.7/site-packages/tripleoclient/utils.py", line 359, in get_stack) call orchestration with an output validation that make this fails.

I try to passing resolve_outputs=True to the get(stacks.py) method from the get_stack(utils.py) but it just failed at a later time. I'm assuming that get_stack is not the only one to call get(stacks.py) and that at some point it fails before changing the output.

So we were stuck and we had to delete and recreate the stack.

Revision history for this message
Steven Hardy (shardy) wrote :
Download full text (38.2 KiB)

I reproduced this like:

(undercloud) [stack@undercloud tripleo-heat-templates]$ git diff
diff --git a/puppet/services/keystone.yaml b/puppet/services/keystone.yaml
index 218ba74..5f3242d 100644
--- a/puppet/services/keystone.yaml
+++ b/puppet/services/keystone.yaml
@@ -369,6 +369,7 @@ outputs:
             keystone::cron::token_flush::maxdelay: {get_param: KeystoneCronTokenFlushMaxDelay}
             keystone::cron::token_flush::destination: {get_param: KeystoneCronTokenFlushDestination}
             keystone::cron::token_flush::user: {get_param: KeystoneCronTokenFlushUser}
+ shtest: {get_param: NoexistParamSHDEBUG}
           -
             if:
             - keystone_ldap_domain_enabled
(undercloud) [stack@undercloud tripleo-heat-templates]$ openstack stack show overcloud
ERROR: Error in 50 output role_data: The Parameter (NoexistParamSHDEBUG) was not provided.

However it's possible to work around it via:

(undercloud) [stack@undercloud tripleo-heat-templates]$ openstack stack show overcloud --no-resolve-outputs | head -n 20
+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------...

Changed in heat:
status: New → Triaged
importance: Undecided → Medium
milestone: none → queens-1
Revision history for this message
Steven Hardy (shardy) wrote :

My previous comment was truncated, I think we probably need two changes:

1. In tripleoclient change all get_stack calls to not resolve outputs, as I don't think we actually need them? This should improve performance and reduce the overhead of the polling too I think?

2. In heat, have a fallback that catches any error when resolving outputs is enabled, and re-tries with resolving outputs disabled, so stack show works, but in a degraded mode with an error in the outputs section?

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

My vote goes for the #1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/496196

Changed in heat:
assignee: nobody → Steven Hardy (shardy)
status: Triaged → In Progress
Revision history for this message
Steven Hardy (shardy) wrote :

I pushed a WIP patch to heat showing how we might do (2)

Changed in tripleo:
status: Confirmed → Triaged
importance: High → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/506678

Changed in heat:
assignee: Steven Hardy (shardy) → Thomas Herve (therve)
Changed in heat:
assignee: Thomas Herve (therve) → Zane Bitter (zaneb)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/507249

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on heat (master)

Change abandoned by Steven Hardy (<email address hidden>) on branch: master
Review: https://review.openstack.org/496196
Reason: https://review.openstack.org/#/c/506678/ is the preferred alternative

Changed in tripleo:
milestone: queens-1 → queens-2
Zane Bitter (zaneb)
tags: added: pike-backport-potential
Changed in heat:
importance: Medium → High
Rico Lin (rico-lin)
Changed in heat:
milestone: queens-1 → queens-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/506678
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=4f4932c7900ae0e88e9a44f06b952c882474729e
Submitter: Zuul
Branch: master

commit 4f4932c7900ae0e88e9a44f06b952c882474729e
Author: Thomas Herve <email address hidden>
Date: Fri Sep 22 16:44:51 2017 +0200

    Defer exceptions in calculating node_data()

    When generating the node_data() for a resource, catch and store any
    exceptions (other than InvalidTemplateAttribute) encountered while
    getting attributes. Re-raise the exception at the point where we try to
    read the attribute value, including where we try to serialise the
    NodeData object to store in the database.

    In convergence, we generate and immediately serialise the NodeData, so
    this should result in no substantial change in behaviour there.

    In other situations (e.g. when we're just loading the data to show the
    stack), this prevents an error in attribute calculation from aborting
    the whole operation. The exception will still be raised if (and only if)
    the erroneous attribute is accessed, but may be handled more
    appropriately. For example, errors in calculating output values are
    handled by reporting an error only for that particular output.

    Change-Id: Idc97aee87405cc13e83be3373078b52e725850ea
    Co-Authored-By: Zane Bitter <email address hidden>
    Closes-Bug: #1712280

Changed in heat:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/515454

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (stable/pike)

Reviewed: https://review.openstack.org/515454
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=31175a5641035abeec58c3f135ad09d3f231ac41
Submitter: Zuul
Branch: stable/pike

commit 31175a5641035abeec58c3f135ad09d3f231ac41
Author: Thomas Herve <email address hidden>
Date: Fri Sep 22 16:44:51 2017 +0200

    Defer exceptions in calculating node_data()

    When generating the node_data() for a resource, catch and store any
    exceptions (other than InvalidTemplateAttribute) encountered while
    getting attributes. Re-raise the exception at the point where we try to
    read the attribute value, including where we try to serialise the
    NodeData object to store in the database.

    In convergence, we generate and immediately serialise the NodeData, so
    this should result in no substantial change in behaviour there.

    In other situations (e.g. when we're just loading the data to show the
    stack), this prevents an error in attribute calculation from aborting
    the whole operation. The exception will still be raised if (and only if)
    the erroneous attribute is accessed, but may be handled more
    appropriately. For example, errors in calculating output values are
    handled by reporting an error only for that particular output.

    Change-Id: Idc97aee87405cc13e83be3373078b52e725850ea
    Co-Authored-By: Zane Bitter <email address hidden>
    Closes-Bug: #1712280
    (cherry picked from commit 4f4932c7900ae0e88e9a44f06b952c882474729e)

Changed in tripleo:
milestone: queens-2 → queens-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/507249
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=af0feeb18a4f5fb2c20fffb6d85617d1775e5844
Submitter: Zuul
Branch: master

commit af0feeb18a4f5fb2c20fffb6d85617d1775e5844
Author: Zane Bitter <email address hidden>
Date: Mon Sep 25 14:32:13 2017 -0400

    Ignore errors in OS::stack_id output

    If a provider stack contained an OS::stack_id output and there was an error
    in the output, we would raise TemplateOutputError when trying to calculate
    the reference ID of the facade resource. Since we do that in many API
    calls, such an error could render the stack effectively unusable.

    If we encounter such an error, log it and fall back to the default
    reference ID.

    Change-Id: I1bc921fe74c54eb0999541ef36afc42b9c19e9bc
    Partial-Bug: #1712280
    Related-Bug: #1719333

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/523137

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/heat 10.0.0.0b2

This issue was fixed in the openstack/heat 10.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/heat 9.0.2

This issue was fixed in the openstack/heat 9.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (stable/pike)

Reviewed: https://review.openstack.org/523137
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=6a27419521bae9cf6b0ca033d04ef268abebe1bd
Submitter: Zuul
Branch: stable/pike

commit 6a27419521bae9cf6b0ca033d04ef268abebe1bd
Author: Zane Bitter <email address hidden>
Date: Mon Sep 25 14:32:13 2017 -0400

    Ignore errors in OS::stack_id output

    If a provider stack contained an OS::stack_id output and there was an error
    in the output, we would raise TemplateOutputError when trying to calculate
    the reference ID of the facade resource. Since we do that in many API
    calls, such an error could render the stack effectively unusable.

    If we encounter such an error, log it and fall back to the default
    reference ID.

    Change-Id: I1bc921fe74c54eb0999541ef36afc42b9c19e9bc
    Partial-Bug: #1712280
    Related-Bug: #1719333
    (cherry picked from commit af0feeb18a4f5fb2c20fffb6d85617d1775e5844)

tags: added: in-stable-pike
Changed in tripleo:
milestone: queens-3 → queens-rc1
Revision history for this message
Zane Bitter (zaneb) wrote :

I think this can be closed in TripleO now that all of the patches are in Heat?

Zane Bitter (zaneb)
tags: removed: pike-backport-potential
Changed in tripleo:
milestone: queens-rc1 → rocky-1
Zane Bitter (zaneb)
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.