StorageFailure in sqlalchemy backend due to entity not serializable

Bug #1935957 reported by Pavlo Shchelokovskyy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
taskflow
In Progress
Undecided
Pavlo Shchelokovskyy

Bug Description

The sqlalchemy backend uses JSONType from sqlalchemy_utils that does simple json.dumps on the object w/o any fallbacks for serialization. This may lead to unserializable entities slipping in, leading to StorageFailure.

As an example, this is a traceback from Masakari failing to call Nova, with MaxRetryError bubbling up all the way down from urllib3

2021-07-06 10:24:01,257 1 DEBUG masakari.compute.nova [req-93e72eca-fe30-4561-a722-084cda651369 admin - - - -] Creating a Nova client using "admin" user novaclient /var/lib/openstack/lib/python3.6/site-packages/masakari/compute/nova.py:103
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder [req-93e72eca-fe30-4561-a722-084cda651369 admin - - - -] Engine 'DisableComputeServiceTask==1.0' atom post-completion failed: tenacity.RetryError: RetryError[<Future at 0x7f1affd3b5f8 state=finished raised StorageFailure>]
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder Traceback (most recent call last):
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder File "/var/lib/openstack/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1204, in _execute_context
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder context = constructor(dialect, self, conn, *args)
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder File "/var/lib/openstack/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 865, in _init_compiled
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder for key in compiled_params
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder File "/var/lib/openstack/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 865, in <genexpr>
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder for key in compiled_params
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder File "/var/lib/openstack/lib/python3.6/site-packages/sqlalchemy/sql/type_api.py", line 1230, in process
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder return impl_processor(process_param(value, dialect))
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder File "/var/lib/openstack/lib/python3.6/site-packages/sqlalchemy_utils/types/json.py", line 80, in process_bind_param
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder value = six.text_type(json.dumps(value))
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder File "/usr/lib/python3.6/json/__init__.py", line 231, in dumps
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder return _default_encoder.encode(obj)
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder File "/usr/lib/python3.6/json/encoder.py", line 199, in encode
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder chunks = self.iterencode(o, _one_shot=True)
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder File "/usr/lib/python3.6/json/encoder.py", line 257, in iterencode
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder return _iterencode(o, 0)
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder File "/usr/lib/python3.6/json/encoder.py", line 180, in default
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder o.__class__.__name__)
2021-07-06 10:24:18,047 1 ERROR taskflow.engines.action_engine.builder TypeError: Object of type 'MaxRetryError' is not JSON serializable

I am not sure if it is only taskflow library consumers job to sanitize the inputs for the lib, and think taskflow could do somewhat better too in this regard.

Revision history for this message
Pavlo Shchelokovskyy (pshchelo) wrote :

patch proposed https://review.opendev.org/c/openstack/taskflow/+/800609

subclass JSONType form sqlalchemy_utils and override couple of methods there to use oslo_serialization.jsonutils for JSON dumping and loading

this one should be more reliable than sqla_utils.JSONType as it better handles complex data, and at least provides 'str' as default fallback function when serializing.

Changed in taskflow:
status: New → In Progress
assignee: nobody → Pavlo Shchelokovskyy (pshchelo)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to taskflow (master)

Reviewed: https://review.opendev.org/c/openstack/taskflow/+/800609
Committed: https://opendev.org/openstack/taskflow/commit/3e1f150926029b6a553cbef959f370e39ce6bb5a
Submitter: "Zuul (22348)"
Branch: master

commit 3e1f150926029b6a553cbef959f370e39ce6bb5a
Author: Pavlo Shchelokovskyy <email address hidden>
Date: Tue Jul 13 12:06:41 2021 +0300

    Use custom JSONType columns

    the JSONType from sqlalchemy_utils is quite brittle as it only does
    primitive json.dumps on values. This leads to various sorts of
    StorageFailure exceptions in taskflow when, for example, an unserializable
    exception bubbles up to the 'failure' field of AtomDetails.

    This patch sublclasses the JSONType from sqlalchemy_utils and overrides
    two of its methods that do (de)serialization to work via
    oslo.serialization functions. They deal with such occurencies much
    better, for example, by providing 'str' as a fallback default.

    Change-Id: I3b9e9498b155199a4e707006a0cf22cda0567c06
    Related-Bug: #1935957

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.