DeploymentFailuresAction uses incorrect execution id to look for errors file

Bug #1779093 reported by Jiri Tomasek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
James Slagle

Bug Description

Currently the config-download deployment files are stored at /var/lib/mistral/<config_download_deploy workflow id>. DeploymentFailuresAction looks for execution id from deployment_status.yaml to find correct ansible-errors.json file. Problem is that execution id stored in deployment_status.yaml is not config_download_deploy workflow execution id so it can not be reliably used to get the correct ID. In addition, execution ID is not persisted forever. Result is that the action is looking for non existent file and failures are not retrieved.

Possible solution:
Since there is a symlink to latest config download, we could skip reaching for execution completely and just load failures from /var/lib/mistral/config-download-latest/ansible-errors.json

Issue with this approach is that config download data stored in /var/lib/mistral/ are not scoped by plan name which we would need to successfully identify failures for different plans. So in addition, we need to change the files structure so the config-download deployment files are stored in
/var/lib/mistral/<planName>/<config-download-deploy workflow execution id>/

Then we can reliably load the failures by looking at

/var/lib/mistral/<planName>/config-download-latest/ansible-errors.json

Also we need to make sure that /var/lib/mistral/<planName> is deleted when deployment plan is deleted to avoid name collisions in case when another deployment plan with the same name is created later.

related discussion:
https://review.openstack.org/#/c/567318/9/tripleo_common/actions/deployment.py

Tags: workflows
Revision history for this message
James Slagle (james-slagle) wrote :

for tripleoclient, the execution_id actually is that of config_download_deploy.

what do you mean by the execution_id not being persisted forever? The execution record isn't persisted forever in the mistral db, but we don't actually use that for anything. We are looking up the execution_id from deployment_status.yaml from the <plan name>-messages swift container. All of that should be persisted forever, unless a user triggered delete is done.

we can consider namespacing by plan name under /var/lib/mistral, i'm not entirely sure that is needed though. I may also look into explicitly setting something like config_download_deploy_execution_id within deployment_status.yaml so that it gives us a consistent way to get the execution id across the client and UI.

Revision history for this message
Jiri Tomasek (jtomasek) wrote :

Sorry, You're right, execution persistence does not affect this action at all.

TripleO UI uses deploy_plan workflow with config_download: true input, that makes the config_download_deploy workflow run as a sub workflow. This results in deploy_plan workflow message to be the last one which updates deployment_status.yaml. tripleoclient currently probably calls these 2 workflows independently.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (master)

Fix proposed to branch: master
Review: https://review.openstack.org/579635

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/583293

Changed in tripleo:
assignee: James Slagle (james-slagle) → Jiri Tomasek (jtomasek)
Changed in tripleo:
assignee: Jiri Tomasek (jtomasek) → James Slagle (james-slagle)
Changed in tripleo:
assignee: James Slagle (james-slagle) → Jiri Tomasek (jtomasek)
Changed in tripleo:
assignee: Jiri Tomasek (jtomasek) → James Slagle (james-slagle)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (master)

Reviewed: https://review.openstack.org/579635
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=81b022ce03a83f4764e88b7fbf0e002d67f79546
Submitter: Zuul
Branch: master

commit 81b022ce03a83f4764e88b7fbf0e002d67f79546
Author: James Slagle <email address hidden>
Date: Mon Jul 2 12:37:13 2018 -0400

    Use /var/lib/mistral/<plan-name> as config-download dir

    Use a consistent directory under /var/lib/mistral (which defaults to the
    plan name) as the working and config-download directory in the
    config_download_deploy workflow. Since the config-download directory
    is now managed by git, we can re-use the same dir and preserve the
    history.

    Change-Id: Id639d0a99aa1103f6f9cc54de676ea8ba6111332
    Closes-Bug: #1779093

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-common (master)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.openstack.org/583293
Reason: tripleo gate is now broken, I'm clearing the gate. Please do not recheck or restore this patch, I'll take care of it when things work again.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 9.2.0

This issue was fixed in the openstack/tripleo-common 9.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (master)

Reviewed: https://review.openstack.org/583293
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=eec131ee5e7c3070bfef441d143202bf9255b2e2
Submitter: Zuul
Branch: master

commit eec131ee5e7c3070bfef441d143202bf9255b2e2
Author: Jiri Tomasek <email address hidden>
Date: Tue Jul 17 17:43:24 2018 +0200

    Update failures listing to use latest ansible-errors.json location

    Since the config-download files are stored per plan name, we don't need
    to reach for deployment status object and execution any more.

    This change also updates get_deployment_failures workflow to set status
    FAILED when an error message is returned from the action.

    Closes-Bug: 1779093
    Change-Id: I49805f1e2ce845b1cebc04df813261cca12ec431

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 9.3.0

This issue was fixed in the openstack/tripleo-common 9.3.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.