paunch succeeds if a container's volumes fail validation

Bug #1855444 reported by James Slagle
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Emilien Macchi

Bug Description

If a container's requested volumes fail validation in paunch (they don't exist on the host filesystem, or aren't readable, etc), paunch still exits 0 even though the container was not started. This causes the deployment to continue running and then typically fail in an unknown way at a later step since all the needed containers don't exist.

Example output where everything was reported as "ok" by ansible:

2019-12-05 22:14:09,796 p=375841 u=centos | TASK [Start containers for step 2 using paunch] **********************************************************************************************************************************************************************************
2019-12-05 22:14:09,796 p=375841 u=centos | Thursday 05 December 2019 22:14:09 +0000 (0:00:01.117) 0:17:18.803 *****
2019-12-05 22:14:10,482 p=375841 u=centos | ok: [compute-19]

2019-12-05 22:14:12,245 p=375841 u=centos | TASK [Debug output for task: Start containers for step 2] ************************************************************************************************************************************************************************
2019-12-05 22:14:12,245 p=375841 u=centos | Thursday 05 December 2019 22:14:12 +0000 (0:00:02.449) 0:17:21.253 *****

2019-12-05 22:14:12,466 p=375841 u=centos | ok: [compute-19] => {
    "failed_when_result": false,
    "outputs.stdout_lines | default([]) | union(outputs.stderr_lines | default([]))": []
}

You don't even get any stdout/stderr from paunch telling you the error because paunch is only returning stdout/stderr from the container, not paunch's own logging.

But, if we look in the paunch log on the managed host (compute-19), we see:

2019-12-05 22:26:37.609 28170 ERROR paunch [ ] /var/log/containers/nova is not a valid volume source

Revision history for this message
James Slagle (james-slagle) wrote :

This fix here would be for paunch to exit 1 when the validation fails. There's no reason to continue the deployment if all containers can't be started at each step.

We also need to make sure that paunch returns the error message in it's stdout/stderr, as opposed to just logging it locally on the managed host.

Changed in tripleo:
status: New → In Progress
importance: Undecided → High
assignee: nobody → James Slagle (james-slagle)
milestone: none → ussuri-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to paunch (master)

Fix proposed to branch: master
Review: https://review.opendev.org/697666

Changed in tripleo:
assignee: James Slagle (james-slagle) → Emilien Macchi (emilienm)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to paunch (master)

Reviewed: https://review.opendev.org/697666
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=3ab0936c03dfdd8b12fb931d3a3480982a76574e
Submitter: Zuul
Branch: master

commit 3ab0936c03dfdd8b12fb931d3a3480982a76574e
Author: James Slagle <email address hidden>
Date: Fri Dec 6 08:37:40 2019 -0500

    Exit 1 if a container fails and return the error

    When a container's volumes failed validation, paunch still exited 0.
    This caused the deployment to continue running even though not all
    containers had been started.

    This patch changes the rc to 1 when a container's volumes fail
    validation and the container can't be started. The error message is also
    returned in stderr so that it's available to the paunch ansible module
    and will be seen in the deployment output.

    Depends-On: I1f062b8b9f936e6fbf2febf64244e91b59b8ba1b
    Change-Id: I67860a79572c0ff4dcaca9ec9597c41f56792fca
    Closes-Bug: #1855444

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to paunch (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/698570

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to paunch (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/698571

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to paunch (stable/train)

Reviewed: https://review.opendev.org/698570
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=e0f444396309cc0275e58900d6b398deeb44935d
Submitter: Zuul
Branch: stable/train

commit e0f444396309cc0275e58900d6b398deeb44935d
Author: James Slagle <email address hidden>
Date: Fri Dec 6 08:37:40 2019 -0500

    Exit 1 if a container fails and return the error

    When a container's volumes failed validation, paunch still exited 0.
    This caused the deployment to continue running even though not all
    containers had been started.

    This patch changes the rc to 1 when a container's volumes fail
    validation and the container can't be started. The error message is also
    returned in stderr so that it's available to the paunch ansible module
    and will be seen in the deployment output.

    Depends-On: I1f062b8b9f936e6fbf2febf64244e91b59b8ba1b
    Change-Id: I67860a79572c0ff4dcaca9ec9597c41f56792fca
    Closes-Bug: #1855444
    (cherry picked from commit 3ab0936c03dfdd8b12fb931d3a3480982a76574e)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to paunch (stable/stein)

Reviewed: https://review.opendev.org/698571
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=65f063f946662bb85308b030922caac086ab4e5f
Submitter: Zuul
Branch: stable/stein

commit 65f063f946662bb85308b030922caac086ab4e5f
Author: James Slagle <email address hidden>
Date: Fri Dec 6 08:37:40 2019 -0500

    Exit 1 if a container fails and return the error

    When a container's volumes failed validation, paunch still exited 0.
    This caused the deployment to continue running even though not all
    containers had been started.

    This patch changes the rc to 1 when a container's volumes fail
    validation and the container can't be started. The error message is also
    returned in stderr so that it's available to the paunch ansible module
    and will be seen in the deployment output.

    Depends-On: I1f062b8b9f936e6fbf2febf64244e91b59b8ba1b
    Change-Id: I67860a79572c0ff4dcaca9ec9597c41f56792fca
    Closes-Bug: #1855444
    (cherry picked from commit 3ab0936c03dfdd8b12fb931d3a3480982a76574e)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/paunch 5.3.1

This issue was fixed in the openstack/paunch 5.3.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/paunch 6.0.1

This issue was fixed in the openstack/paunch 6.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/paunch stein-eol

This issue was fixed in the openstack/paunch stein-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.