WSREP deadlock with OperationalError is not detected correctly

Bug #2057987 reported by melanie witt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
oslo.db
Fix Released
Undecided
Unassigned

Bug Description

This is the same issue as https://bugs.launchpad.net/oslo.db/+bug/1648818 was except the exception is raising OperationalError instead of InternalError. Maybe something to do with newer versions of things having changed or added exceptions.

This variant of the issue was reported downstream recently:

https://bugzilla.redhat.com/show_bug.cgi?id=2269548

2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 274, in dispatch
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 79, in wrapped
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server function_name, call_dict, binary, tb)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server self.force_reraise()
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server raise value
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 69, in wrapped
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 191, in decorated_function
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server "Error: %s", e, instance=instance)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server self.force_reraise()
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server raise value
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 161, in decorated_function
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/utils.py", line 1386, in decorated_function
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server with EventReporter(context, event_name, host, instance_uuid):
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/utils.py", line 1356, in __enter__
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server host=self.host)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 177, in wrapper
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server args, kwargs)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/conductor/rpcapi.py", line 241, in object_class_action_versions
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server args=args, kwargs=kwargs)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 181, in call
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server transport_options=self.transport_options)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/transport.py", line 129, in _send
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server transport_options=transport_options)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 674, in send
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server transport_options=transport_options)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 664, in _send
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server raise result
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server oslo_messaging.rpc.client.RemoteError: Remote error: OperationalError (pymysql.err.OperationalError) (1213, 'Deadlock: wsrep aborted transaction')
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server [SQL: UPDATE instance_actions SET updated_at=%(updated_at)s WHERE instance_actions.id = %(instance_actions_id)s]
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server [parameters: {'updated_at': datetime.datetime(2024, 3, 13, 22, 15, 12, 801841), 'instance_actions_id': 426}]
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server (Background on this error at: http://sqlalche.me/e/e3q8)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server ['Traceback (most recent call last):\n', ' File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context\n cursor, statement, parameters, context\n', ' File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute\n cursor.execute(statement, parameters)\n', ' File "/usr/lib/python3.6/site-packages/pymysql/cursors.py", line 163, in execute\n result = self._query(query)\n', ' File "/usr/lib/python3.6/site-packages/pymysql/cursors.py", line 321, in _query\n conn.query(q)\n', ' File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 505, in query\n self._affected_rows = self._read_query_result(unbuffered=unbuffered)\n', ' File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 724, in _read_query_result\n result.read()\n', ' File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 1069, in read\n first_packet = self.connection._read_packet()\n', ' File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 676, in _read_packet\n packet.raise_for_error()\n', ' File "/usr/lib/python3.6/site-packages/pymysql/protocol.py", line 223, in raise_for_error\n err.raise_mysql_exception(self._data)\n', ' File "/usr/lib/python3.6/site-packages/pymysql/err.py", line 107, in raise_mysql_exception\n raise errorclass(errno, errval)\n', "pymysql.err.OperationalError: (1213, 'Deadlock: wsrep aborted transaction')\n", '\nThe above exception was the direct cause of the following exception:\n\n', 'Traceback (most recent call last):\n', ' File "/usr/lib/python3.6/site-packages/nova/conductor/manager.py", line 135, in _object_dispatch\n return getattr(target, method)(*args, **kwargs)\n', ' File "/usr/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 184, in wrapper\n result = fn(cls, context, *args, **kwargs)\n', ' File "/usr/lib/python3.6/site-packages/nova/objects/instance_action.py", line 201, in event_start\n db_event = db.action_event_start(context, values)\n', ' File "/usr/lib/python3.6/site-packages/nova/db/api.py", line 1691, in action_event_start\n return IMPL.action_event_start(context, values)\n', ' File "/usr/lib/python3.6/site-packages/oslo_db/api.py", line 154, in wrapper\n ectxt.value = e.inner_exc\n', ' File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n self.force_reraise()\n', ' File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n six.reraise(self.type_, self.value, self.tb)\n', ' File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise\n raise value\n', ' File "/usr/lib/python3.6/site-packages/oslo_db/api.py", line 142, in wrapper\n return f(*args, **kwargs)\n', ' File "/usr/lib/python3.6/site-packages/nova/db/sqlalchemy/api.py", line 224, in wrapped\n return f(context, *args, **kwargs)\n', ' File "/usr/lib/python3.6/site-packages/nova/db/sqlalchemy/api.py", line 5290, in action_event_start\n action.save(context.session)\n', ' File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/models.py", line 50, in save\n session.flush()\n', ' File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 2451, in flush\n self._flush(objects)\n', ' File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 2589, in _flush\n transaction.rollback(_capture_exception=True)\n', ' File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__\n compat.reraise(exc_type, exc_value, exc_tb)\n', ' File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 129, in reraise\n raise value\n', ' File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 2549, in _flush\n flush_context.execute()\n', ' File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute\n rec.execute(self)\n', ' File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", line 589, in execute\n uow,\n', ' File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 236, in save_obj\n update,\n', ' File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 978, in _emit_update_statements\n statement, multiparams\n', ' File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 988, in execute\n return meth(self, multiparams, params)\n', ' File "/usr/lib64/python3.6/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection\n return connection._execute_clauseelement(self, multiparams, params)\n', ' File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1107, in _execute_clauseelement\n distilled_params,\n', ' File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context\n e, statement, parameters, cursor, context\n', ' File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1464, in _handle_dbapi_exception\n util.raise_from_cause(newraise, exc_info)\n', ' File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 383, in raise_from_cause\n reraise(type(exception), exception, tb=exc_tb, cause=cause)\n', ' File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 128, in reraise\n raise value.with_traceback(tb)\n', ' File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context\n cursor, statement, parameters, context\n', ' File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute\n cursor.execute(statement, parameters)\n', ' File "/usr/lib/python3.6/site-packages/pymysql/cursors.py", line 163, in execute\n result = self._query(query)\n', ' File "/usr/lib/python3.6/site-packages/pymysql/cursors.py", line 321, in _query\n conn.query(q)\n', ' File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 505, in query\n self._affected_rows = self._read_query_result(unbuffered=unbuffered)\n', ' File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 724, in _read_query_result\n result.read()\n', ' File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 1069, in read\n first_packet = self.connection._read_packet()\n', ' File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 676, in _read_packet\n packet.raise_for_error()\n', ' File "/usr/lib/python3.6/site-packages/pymysql/protocol.py", line 223, in raise_for_error\n err.raise_mysql_exception(self._data)\n', ' File "/usr/lib/python3.6/site-packages/pymysql/err.py", line 107, in raise_mysql_exception\n raise errorclass(errno, errval)\n', "sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (1213, 'Deadlock: wsrep aborted transaction')\n[SQL: UPDATE instance_actions SET updated_at=%(updated_at)s WHERE instance_actions.id = %(instance_actions_id)s]\n[parameters: {'updated_at': datetime.datetime(2024, 3, 13, 22, 15, 12, 801841), 'instance_actions_id': 426}]\n(Background on this error at: http://sqlalche.me/e/e3q8)\n"].
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.db (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/oslo.db/+/913279

Changed in oslo.db:
status: New → In Progress
melanie witt (melwitt)
description: updated
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.db (master)

Reviewed: https://review.opendev.org/c/openstack/oslo.db/+/913279
Committed: https://opendev.org/openstack/oslo.db/commit/8e1f869910e2fee3d567565b28b9cd520b07bacf
Submitter: "Zuul (22348)"
Branch: master

commit 8e1f869910e2fee3d567565b28b9cd520b07bacf
Author: melanie witt <email address hidden>
Date: Fri Mar 15 01:19:55 2024 +0000

    exc_filters: Handle OperationalError for MariaDB/Galera

    Currently InternalError is being handled for detecting MariaDB/Galera
    deadlocks but recently we have seen a variant that raises
    OperationalError instead. Because it's not being detected, usage of the
    @retry_on_deadlock decorator is not performing retries in those cases.

    This adds handling of OperationalError for detecting this deadlock.

    Closes-Bug: #2057987

    Change-Id: I6ff3667b35ea38a2d3c258f810a55eda9abe465e

Changed in oslo.db:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.db (stable/2024.1)

Fix proposed to branch: stable/2024.1
Review: https://review.opendev.org/c/openstack/oslo.db/+/913600

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.db (stable/2023.2)

Fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/oslo.db/+/913601

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.db (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/oslo.db/+/913603

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.db (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/oslo.db/+/913604

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.db (stable/2024.1)

Reviewed: https://review.opendev.org/c/openstack/oslo.db/+/913600
Committed: https://opendev.org/openstack/oslo.db/commit/3a314786ac6f0d4a153ad9c5adc0b283ef7eb7fa
Submitter: "Zuul (22348)"
Branch: stable/2024.1

commit 3a314786ac6f0d4a153ad9c5adc0b283ef7eb7fa
Author: melanie witt <email address hidden>
Date: Fri Mar 15 01:19:55 2024 +0000

    exc_filters: Handle OperationalError for MariaDB/Galera

    Currently InternalError is being handled for detecting MariaDB/Galera
    deadlocks but recently we have seen a variant that raises
    OperationalError instead. Because it's not being detected, usage of the
    @retry_on_deadlock decorator is not performing retries in those cases.

    This adds handling of OperationalError for detecting this deadlock.

    Closes-Bug: #2057987

    Change-Id: I6ff3667b35ea38a2d3c258f810a55eda9abe465e
    (cherry picked from commit 8e1f869910e2fee3d567565b28b9cd520b07bacf)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.db (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/oslo.db/+/913601
Committed: https://opendev.org/openstack/oslo.db/commit/8fb36e949d3f01bb5f69596828211800525136ee
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit 8fb36e949d3f01bb5f69596828211800525136ee
Author: melanie witt <email address hidden>
Date: Fri Mar 15 01:19:55 2024 +0000

    exc_filters: Handle OperationalError for MariaDB/Galera

    Currently InternalError is being handled for detecting MariaDB/Galera
    deadlocks but recently we have seen a variant that raises
    OperationalError instead. Because it's not being detected, usage of the
    @retry_on_deadlock decorator is not performing retries in those cases.

    This adds handling of OperationalError for detecting this deadlock.

    Closes-Bug: #2057987

    Change-Id: I6ff3667b35ea38a2d3c258f810a55eda9abe465e
    (cherry picked from commit 8e1f869910e2fee3d567565b28b9cd520b07bacf)
    (cherry picked from commit 3a314786ac6f0d4a153ad9c5adc0b283ef7eb7fa)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.db (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/oslo.db/+/913603
Committed: https://opendev.org/openstack/oslo.db/commit/be0515daa260811ad77b5eb5b78670c77eda2df6
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit be0515daa260811ad77b5eb5b78670c77eda2df6
Author: melanie witt <email address hidden>
Date: Fri Mar 15 01:19:55 2024 +0000

    exc_filters: Handle OperationalError for MariaDB/Galera

    Currently InternalError is being handled for detecting MariaDB/Galera
    deadlocks but recently we have seen a variant that raises
    OperationalError instead. Because it's not being detected, usage of the
    @retry_on_deadlock decorator is not performing retries in those cases.

    This adds handling of OperationalError for detecting this deadlock.

    Closes-Bug: #2057987

    Change-Id: I6ff3667b35ea38a2d3c258f810a55eda9abe465e
    (cherry picked from commit 8e1f869910e2fee3d567565b28b9cd520b07bacf)
    (cherry picked from commit 3a314786ac6f0d4a153ad9c5adc0b283ef7eb7fa)
    (cherry picked from commit 8fb36e949d3f01bb5f69596828211800525136ee)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.db (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/oslo.db/+/913604
Committed: https://opendev.org/openstack/oslo.db/commit/8620d4d3af8aabb4da15a9190aaf96d3fdb516be
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 8620d4d3af8aabb4da15a9190aaf96d3fdb516be
Author: melanie witt <email address hidden>
Date: Fri Mar 15 01:19:55 2024 +0000

    exc_filters: Handle OperationalError for MariaDB/Galera

    Currently InternalError is being handled for detecting MariaDB/Galera
    deadlocks but recently we have seen a variant that raises
    OperationalError instead. Because it's not being detected, usage of the
    @retry_on_deadlock decorator is not performing retries in those cases.

    This adds handling of OperationalError for detecting this deadlock.

    Closes-Bug: #2057987

    Change-Id: I6ff3667b35ea38a2d3c258f810a55eda9abe465e
    (cherry picked from commit 8e1f869910e2fee3d567565b28b9cd520b07bacf)
    (cherry picked from commit 3a314786ac6f0d4a153ad9c5adc0b283ef7eb7fa)
    (cherry picked from commit 8fb36e949d3f01bb5f69596828211800525136ee)
    (cherry picked from commit be0515daa260811ad77b5eb5b78670c77eda2df6)

tags: added: in-stable-zed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.