Fluentd-client not found during P->Q ugrade.

Bug #1758406 reported by Sofer Athlan-Guyot
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Jiří Stránský
Changed in tripleo:
status: Confirmed → Triaged
importance: Critical → High
Revision history for this message
Jose Luis Franco (jfrancoa) wrote :

@Pradeep, we are getting this when upgrading from pike to queens as the fluent-client.yml service has been removed from tht. Could you have a look at it?

Changed in tripleo:
assignee: nobody → Pradeep Kilambi (pkilambi)
Revision history for this message
Jose Luis Franco (jfrancoa) wrote :

As the name of the service has changed, it has to be set to OS::Heat::None in the resource registry before rendering the templates.

Changed in tripleo:
assignee: Pradeep Kilambi (pkilambi) → nobody
Revision history for this message
Jose Luis Franco (jfrancoa) wrote :

Mistral executor logs:

2018-03-22 20:20:17.616 2354 ERROR tripleo_common.actions.templates [req-bf096fbf-a9d1-4c10-9047-43c9f8e42eeb fe3644319b4947b4a6e1f4c5ea76806f f8bc1de6bb4344599e8514f7c47364e8 - default default] Error occurred while processing plan files.: HTTPError: 404 Client Error: Not Found for url: http://192.168.24.1:8080/v1/AUTH_f8bc1de6bb4344599e8514f7c47364e8/overcloud/docker/services/fluentd-client.yaml
2018-03-22 20:20:17.616 2354 ERROR tripleo_common.actions.templates Traceback (most recent call last):
2018-03-22 20:20:17.616 2354 ERROR tripleo_common.actions.templates File "/usr/lib/python2.7/site-packages/tripleo_common/actions/templates.py", line 442, in run
2018-03-22 20:20:17.616 2354 ERROR tripleo_common.actions.templates object_request=_object_request))
2018-03-22 20:20:17.616 2354 ERROR tripleo_common.actions.templates File "/usr/lib/python2.7/site-packages/heatclient/common/template_utils.py", line 258, in process_multiple_environments_and_files
2018-03-22 20:20:17.616 2354 ERROR tripleo_common.actions.templates include_env_in_files=include_env_in_files)
2018-03-22 20:20:17.616 2354 ERROR tripleo_common.actions.templates File "/usr/lib/python2.7/site-packages/heatclient/common/template_utils.py", line 307, in process_environment_and_files
2018-03-22 20:20:17.616 2354 ERROR tripleo_common.actions.templates env_base_url, is_object=True, object_request=object_request)
2018-03-22 20:20:17.616 2354 ERROR tripleo_common.actions.templates File "/usr/lib/python2.7/site-packages/heatclient/common/template_utils.py", line 357, in resolve_environment_urls
2018-03-22 20:20:17.616 2354 ERROR tripleo_common.actions.templates is_object=is_object, object_request=object_request)
2018-03-22 20:20:17.616 2354 ERROR tripleo_common.actions.templates File "/usr/lib/python2.7/site-packages/heatclient/common/template_utils.py", line 159, in get_file_contents
2018-03-22 20:20:17.616 2354 ERROR tripleo_common.actions.templates file_content = object_request('GET', str_url)
2018-03-22 20:20:17.616 2354 ERROR tripleo_common.actions.templates File "/usr/lib/python2.7/site-packages/tripleo_common/actions/templates.py", line 431, in _object_request
2018-03-22 20:20:17.616 2354 ERROR tripleo_common.actions.templates response.raise_for_status()
2018-03-22 20:20:17.616 2354 ERROR tripleo_common.actions.templates File "/usr/lib/python2.7/site-packages/requests/models.py", line 928, in raise_for_status
2018-03-22 20:20:17.616 2354 ERROR tripleo_common.actions.templates raise HTTPError(http_error_msg, response=self)
2018-03-22 20:20:17.616 2354 ERROR tripleo_common.actions.templates HTTPError: 404 Client Error: Not Found for url: http://192.168.24.1:8080/v1/AUTH_f8bc1de6bb4344599e8514f7c47364e8/overcloud/docker/services/fluentd-client.yaml
2018-03-22 20:20:17.616 2354 ERROR tripleo_common.actions.templates

Logs: http://logs.openstack.org/80/552080/13/experimental/tripleo-ci-centos-7-scenario001-multinode-oc-upgrade/a17d01c/logs/undercloud/var/log/mistral/executor.log.txt.gz#_2018-03-22_20_20_17_616

Revision history for this message
Jiří Stránský (jistr) wrote :

So here's the root cause:

We start with user-env file which points to a file which doesn't exist in the templates we've just uploaded to swift. The templates are already new, and env is persisted from previous deployment. We try to fix the env by merging it with an env file which overrides it to OS::Heat::None, and thus points to something existing (but no-op).

I'd expect that this would work, but it looks like we:

1) resolve all environments *incl. fetching contents of what they point to*

2) merge resource_registry from the results of ^

So we can never fix this just by appending a correct env file as long as we process the envs via the process_multiple_environments_and_files from heatclient and only then merging the results. We break on step 1, never get to merging the envs in step 2.

Some solution suggestions:

A) We could make sure that when we persist/restore the plan/user-environment, it gets cleaned from references to files which aren't present in the plan container anymore. I'm a bit worried that this might hide some legitimate problems though. Like a file disappearing from the plan by accident rather than on purpose, and instead of failing early, we'd just drop it from resource_registry and carry on.

B) We might reimplement part of heatclient logic for merging env files. Perhaps it would be possible to give OS::Heat::None a special treatment:

Walk through all the env files on the command line, pick out resource_registry, find all OS::Heat::None, and apply them onto the user-env first. Essentially we'd apply no-ops first, and only then we'd pass the whole thing to the heatclient machinery for correctly resolving everything.

AFAICT this should be safe even if one env file no-ops and a subsequent env file sets the resource to something real, as the heatclient machinery would still have the last word in resolving what the resource really maps to. We'd just help it by "cleaning" the user-env before it tries to resolve the resources there.

C) We might stop trying to persist user-env and plan-env and go back to requiring users to pass all their env files to the upgrade/update commands. (aka "I told you so!" :D Sorry i just couldn't help myself here...)

Revision history for this message
Jiří Stránský (jistr) wrote :

D) Just hard code into the client the names of resources we want to drop from registry. Essentially like A), but scoped within a static list. It only solves what we ship in t-h-t though. It wouldn't work for any custom user resources which might have done the same thing (renaming, removing...)

Revision history for this message
Jiří Stránský (jistr) wrote :

We probably found the least painful way to solve this, just reintroduce fluentd-client.yaml (now as an empty file) back to Queens t-h-t so that we make the Heatclient template processor happy. It will not get used in the stack, it will just allow us to progress towards mapping the resource to OS::Heat::None.

17:45 <jistr> anyway to clarify the possible solution (for myself too)
17:46 <jistr> 1) in pike we have fluentd-client.yaml and it's used
17:46 <chem> jistr: oki, so back to the "dirty" it happen we have to find a way solution :)
17:46 <chem> ack
17:46 <jistr> 2) in queens we put back fluentd-client.yaml but it's empty. Only there to be there, but not any content.
17:46 <chem> (provided the files aren't checked)
17:47 <jistr> 3) that ^ allows us to set the resource_registry mapping to OS::Heat::None during P->Q upgrade
17:47 <chem> but ack (it's a detail at that point)
17:47 <jistr> 4) in rocky we can safely drop the file as it cannot be referenced from anywhere anymore
17:47 <chem> ack
17:48 <chem> ack
17:48 <chem> oki done!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/559365

Changed in tripleo:
status: Triaged → In Progress
assignee: nobody → Jiří Stránský (jistr)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to python-tripleoclient (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/559746

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to python-tripleoclient (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/559753

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/queens)

Reviewed: https://review.openstack.org/559365
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=f30d13561e2c9a8745ab2668c3d69f3c44549b74
Submitter: Zuul
Branch: stable/queens

commit f30d13561e2c9a8745ab2668c3d69f3c44549b74
Author: Jiri Stransky <email address hidden>
Date: Fri Apr 6 18:07:54 2018 +0200

    [queens] Put back (now empty) fluentd-client.yaml

    This will allow us to get around a tricky error in heatclient template
    processing with persisted user-environment.yaml on
    upgrade. Interestingly, we need the old file to exist in the plan so
    that we can map the resource to OS::Heat::None during the upgrade
    process (heatclient tries to fetch it even though a subsequent env
    file would map the resource to None), and all alternative solutions
    are considerably worse than just putting the file back in place.

    We will not need this file in Rocky as the resource will be mapped to
    None before upgrading Q->R. This commit is specific for P->Q upgrade.

    Change-Id: I39573cca4dc16a024582de562b850ecfa9bb1dfd
    Co-Authored-By: Jose Luis Franco <email address hidden>
    Co-Authored-By: Sofer Athlan-Guyot <email address hidden>
    Closes-Bug: #1758406

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to python-tripleoclient (master)

Reviewed: https://review.openstack.org/559746
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=18efb193c1162b93a25761d70433d9b18b39c7bf
Submitter: Zuul
Branch: master

commit 18efb193c1162b93a25761d70433d9b18b39c7bf
Author: Jiri Stransky <email address hidden>
Date: Mon Apr 9 16:04:19 2018 +0200

    Stop persisting previous configuration on update/upgrade prepare

    We've been getting a number of errors related to this persistence
    (e.g. LP#1758406 or rhbz#1541024). It doesn't seem feasible to
    continue to fix them one by one, we should go back to requiring users
    pass their -e files to the prepare command.

    Related-Bug: #1758406
    Change-Id: I9db2c9256ed20d4d0b74bb467ee6ae0a9633bcc8

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to python-tripleoclient (stable/queens)

Reviewed: https://review.openstack.org/559753
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=10e30a03ac2562b014233f5d7844f92af46a74c3
Submitter: Zuul
Branch: stable/queens

commit 10e30a03ac2562b014233f5d7844f92af46a74c3
Author: Jiri Stransky <email address hidden>
Date: Mon Apr 9 16:04:19 2018 +0200

    Stop persisting previous configuration on update/upgrade prepare

    We've been getting a number of errors related to this persistence
    (e.g. LP#1758406 or rhbz#1541024). It doesn't seem feasible to
    continue to fix them one by one, we should go back to requiring users
    pass their -e files to the prepare command.

    Related-Bug: #1758406
    Change-Id: I9db2c9256ed20d4d0b74bb467ee6ae0a9633bcc8
    (cherry picked from commit 18efb193c1162b93a25761d70433d9b18b39c7bf)

Changed in tripleo:
milestone: rocky-1 → rocky-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 8.0.2

This issue was fixed in the openstack/tripleo-heat-templates 8.0.2 release.

Changed in tripleo:
milestone: rocky-2 → rocky-3
Revision history for this message
Sofer Athlan-Guyot (sofer-athlan-guyot) wrote :

Corrected status to fix released.

Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.