Message collection size is too large for Zaqar

Bug #1812172 reported by Adriano Petrich on 2019-01-17
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
High
Adriano Petrich

Bug Description

Description of problem:
I tried to deploy 3 controllers + 2 computes + 3 ceph with network isolation in a virtual test environment from the GUI (OSP14). My only configuration mistake was that I forgot to change the Ceph defaults which give too many pages and pools, so the deployment failed.

From the GUI, I clicked open the failure details dialog - but the workflow got stuck. We see this in the executor.log:

# /var/log/containers/mistral/executor.log

ZaqarAction.queue_post failed: Error response from Zaqar. Code: 400. Title: Invalid API request. Description: Message collection size is too large. Max size 1048576.: ActionException: ZaqarAction.queue_post failed: Error response from Zaqar. Code: 400. Title: Invalid API request. Description: Message collection size is too large. Max size 1048576.
2019-01-07 09:37:29.806 1 ERROR mistral.executors.default_executor Traceback (most recent call last):
2019-01-07 09:37:29.806 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/mistral/executors/default_executor.py", line 114, in run_action
2019-01-07 09:37:29.806 1 ERROR mistral.executors.default_executor result = action.run(action_ctx)
2019-01-07 09:37:29.806 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/mistral/actions/openstack/base.py", line 130, in run
2019-01-07 09:37:29.806 1 ERROR mistral.executors.default_executor (self.__class__.__name__, self.client_method_name, str(e))
2019-01-07 09:37:29.806 1 ERROR mistral.executors.default_executor ActionException: ZaqarAction.queue_post failed: Error response from Zaqar. Code: 400. Title: Invalid API request. Description: Message collection size is too large. Max size 1048576.

Version-Release number of selected component (if applicable):
openstack-zaqar-7.0.1-0.20180917132250.5932b8f.el7ost.noarch

How reproducible:
unknown

Steps to Reproduce:
1. Deploy a setup as described above

Proposed solution:

In general, we can remove deprecated 'execution' item from zaqar message payload [1]. This item can potentially contain a lot of data.

Specifically for this bug, the "deployment_failures" [2] can be quite large, so we should either somehow reduce it or don't include it in the message.

[1] https://github.com/openstack/tripleo-common/blob/master/workbooks/messaging.yaml#L34
[2] https://github.com/openstack/tripleo-common/blob/master/workbooks/deployment.yaml#L939

python-tripleoclient needs changed also (before changes to tripleocommon can be merged), but the changes to the client need to be backwards compatible as losing the ability to talk to older versions could seriously impact users like for example using rdo-cloud.

Changed in tripleo:
status: Triaged → In Progress
Changed in tripleo:
milestone: none → stein-3

Reviewed: https://review.openstack.org/630970
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=cad7916ce8d21295992c2efe0e18657c6e56604d
Submitter: Zuul
Branch: master

commit cad7916ce8d21295992c2efe0e18657c6e56604d
Author: apetrich <email address hidden>
Date: Tue Jan 15 14:16:00 2019 +0100

    Remove execution from workflow message send

    Serializing all the execution in a message can make the message too big.
    This change was done in tripleo-common. this supports that change
    This change still supports the old format and is backwards compatible.

    Partial-Bug: #1812172
    Change-Id: I40ee028366222f38f5ba1db58d171f98be75d009

Changed in tripleo:
milestone: stein-3 → stein-rc1
Changed in tripleo:
milestone: stein-rc1 → train-1
Changed in tripleo:
milestone: train-1 → train-2
Changed in tripleo:
milestone: train-2 → train-3

Reviewed: https://review.opendev.org/663657
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=99ac7f9e119c5366ed0b1ebb112c7ee90be863ec
Submitter: Zuul
Branch: stable/rocky

commit 99ac7f9e119c5366ed0b1ebb112c7ee90be863ec
Author: apetrich <email address hidden>
Date: Tue Jan 15 14:16:00 2019 +0100

    Remove execution from workflow message send

    Serializing all the execution in a message can make the message too big.
    This change was done in tripleo-common. this supports that change
    This change still supports the old format and is backwards compatible.

    Partial-Bug: #1812172
    Change-Id: I40ee028366222f38f5ba1db58d171f98be75d009
    (cherry picked from commit cad7916ce8d21295992c2efe0e18657c6e56604d)

tags: added: in-stable-rocky

Reviewed: https://review.opendev.org/663688
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=092449a9c6da8311914b2b13f1fdc5f0a5f68f0c
Submitter: Zuul
Branch: stable/queens

commit 092449a9c6da8311914b2b13f1fdc5f0a5f68f0c
Author: apetrich <email address hidden>
Date: Tue Jan 15 14:16:00 2019 +0100

    Remove execution from workflow message send

    Serializing all the execution in a message can make the message too big.
    This change was done in tripleo-common. this supports that change
    This change still supports the old format and is backwards compatible.

    Partial-Bug: #1812172
    Change-Id: I40ee028366222f38f5ba1db58d171f98be75d009
    (cherry picked from commit cad7916ce8d21295992c2efe0e18657c6e56604d)

tags: added: in-stable-queens
Changed in tripleo:
milestone: train-3 → ussuri-1
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers