Retry flows fails with "Data too long for column 'failure' at row 1"

Bug #1959243 reported by Pavlo Shchelokovskyy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
taskflow
Fix Released
Undecided
Pavlo Shchelokovskyy

Bug Description

This is very similar to https://bugs.launchpad.net/taskflow/+bug/1838015 and https://bugs.launchpad.net/taskflow/+bug/1926304 but now for 'failure' column in atomdetails table.

Octavia (Victoria release) + Taskflow 4.5.1 + SQLAlchemy 1.3.19 + Python 3.6 + MariaDB 10.4.17

While attaching interfaces to amphora and calling Nova API a ConnectionResetError is raised from inside urllib3,
and while attempting to retry, taskflow fails with StorageFailure with a telltale sign of

2022-01-26 09:11:04,773.773 45 ERROR taskflow.conductors.backends.impl_executor sqlalchemy.exc.DataError: (pymysql.err.DataError) (1406, "Data too long for column 'failure' at row 1")
2022-01-26 09:11:04,773.773 45 ERROR taskflow.conductors.backends.impl_executor [SQL: UPDATE atomdetails SET updated_at=%(updated_at)s, meta=%(meta)s, name=%(name)s, version=%(version)s, state=%(state)s, uuid=%(uuid)s, failure=%(failure)s, results=%(results)s, revert_results=%(revert_results)s, revert_failure=%(revert_failure)s, intention=%(intention)s WHERE atomdetails.uuid = %(uuid_1)s]
2022-01-26 09:11:04,773.773 45 ERROR taskflow.conductors.backends.impl_executor [parameters: {'updated_at': datetime.datetime(2022, 1, 26, 9, 11, 4, 626699), 'meta': '{"progress": 0.0}', 'name': 'STANDALONE-octavia-plug-net-subflow-octavia-amp-plug-vip', 'version': '1.0', 'state': 'FAILURE', 'uuid': '26e600f7-8ed1-4433-a1fc-b97328906b81', 'failure': '{"exception_str": "Error plugging amphora (compute_id: d7611c1b-db58-4663-b269-84fe86275789) into vip network 554111c5-1f63-4242-b133-1411183a763a.", ... (92518 characters truncated) ... s": ["ConnectionResetError", "ConnectionError", "OSError", "Exception"], "version": 1, "exc_args": [104, "Connection reset by peer"], "causes": []}]}', 'results': None, 'revert_results': None, 'revert_failure': None, 'intention': 'EXECUTE', 'uuid_1': '26e600f7-8ed1-4433-a1fc-b97328906b81'}]
2022-01-26 09:11:04,773.773 45 ERROR taskflow.conductors.backends.impl_executor (Background on this error at: http://sqlalche.me/e/13/9h9h)

Full trace is too big to upload to paste.opendev.org :-/ (about 95KB) so I attach it here.

It has many nested traces with

The above exception was the direct cause of the following exception:
or
During handling of the above exception, another exception occurred:

due to how exceptions are handled now in Python3 :-)

Current column type for failure is TEXT which is 64KB max (for single-byte encoding, less for Unicode), and judging by the "(92518 characters truncated)" from the log, we overshoot it quite a lot in this example.

Similar to other mentioned issues, we need to raise this JSON field to LARGETEXT.

Revision history for this message
Pavlo Shchelokovskyy (pshchelo) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to taskflow (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/taskflow/+/826722

Changed in taskflow:
status: New → In Progress
Changed in taskflow:
assignee: nobody → Pavlo Shchelokovskyy (pshchelo)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to taskflow (master)

Reviewed: https://review.opendev.org/c/openstack/taskflow/+/826722
Committed: https://opendev.org/openstack/taskflow/commit/83dfa6581eec1b9d32d519592c4212e6195998a3
Submitter: "Zuul (22348)"
Branch: master

commit 83dfa6581eec1b9d32d519592c4212e6195998a3
Author: Pavlo Shchelokovskyy <email address hidden>
Date: Thu Jan 27 18:20:06 2022 +0200

    Fix atomdetails failure column size

    failure and revert_failure fields in atomdetails is defined as a JSON type,
    but its data type is 'text' in mysql, which is limited to 64kbytes.
    JSON data type should have the same size as a LONGTEXT.

    Closes-Bug: #1959243
    Change-Id: I65b6a6d896d3e8aad871dc19b0f8d0eddf48bdd6

Changed in taskflow:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/taskflow 4.7.0

This issue was fixed in the openstack/taskflow 4.7.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.