finish_resize fails with DBDeadlock on migrations table

Bug #1642537 reported by Miguel Angel Ajo on 2016-11-17
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Matt Riedemann
Queens
Medium
Unassigned
Rocky
Medium
Unassigned
Stein
Medium
Matt Riedemann

Bug Description

http://logs.openstack.org/00/352200/14/check/gate-tempest-dsvm-neutron-linuxbridge-ubuntu-xenial/6ba61d3/console.html

2016-11-17 05:06:33.890498 | Captured traceback:
2016-11-17 05:06:33.890507 | ~~~~~~~~~~~~~~~~~~~
2016-11-17 05:06:33.890534 | Traceback (most recent call last):
2016-11-17 05:06:33.890566 | File "tempest/api/compute/servers/test_server_actions.py", line 289, in test_resize_server_confirm
2016-11-17 05:06:33.890586 | self._test_resize_server_confirm(stop=False)
2016-11-17 05:06:33.890613 | File "tempest/api/compute/servers/test_server_actions.py", line 271, in _test_resize_server_confirm
2016-11-17 05:06:33.890624 | 'VERIFY_RESIZE')
2016-11-17 05:06:33.890644 | File "tempest/common/waiters.py", line 75, in wait_for_server_status
2016-11-17 05:06:33.890655 | server_id=server_id)
2016-11-17 05:06:33.890686 | tempest.exceptions.BuildErrorException: Server 00876f9f-2f30-4507-93c0-f0f5a1699565 failed to build and is in ERROR status
2016-11-17 05:06:33.890761 | Details: {u'code': 500, u'message': u"Remote error: DBDeadlock (pymysql.err.InternalError) (1213, u'Deadlock found when trying to get lock; try restarting transaction') [SQL: u'UPDATE migrations SET updated_at=%(updated_at)s, status=%(status)s WHERE migrations.id = %(migrations_id)s'] [parame", u'created': u'2016-11-17T04:43:56Z'}
2016-11-17 05:06:33.890767 |

Tags: db Edit Tag help
Matt Riedemann (mriedem) wrote :
Download full text (3.5 KiB)

2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [req-027e93fd-83f6-4c9a-8e42-bc3544a52402 tempest-ServerActionsTestJSON-1155389376 tempest-ServerActionsTestJSON-1155389376] [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] Setting instance vm_state to ERROR
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] Traceback (most recent call last):
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] File "/opt/stack/new/nova/nova/compute/manager.py", line 3963, in finish_resize
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] disk_info, image_meta)
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] File "/opt/stack/new/nova/nova/compute/manager.py", line 3932, in _finish_resize
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] migration.save()
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 210, in wrapper
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] ctxt, self, fn.__name__, args, kwargs)
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] File "/opt/stack/new/nova/nova/conductor/rpcapi.py", line 241, in object_action
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] objmethod=objmethod, args=args, kwargs=kwargs)
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 169, in call
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] retry=self.retry)
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 97, in _send
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] timeout=timeout, retry=retry)
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 467, in send
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] retry=retry)
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 458, in _send
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] raise result
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] RemoteError: Remote error: DB...

Read more...

Changed in nova:
status: New → Confirmed
summary: - Request fails with DBDeadlock
+ finish_resize fails with DBDeadlock on migrations table
Matt Riedemann (mriedem) wrote :

Don't really see this in nova logs in upstream CI anymore:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Deadlock%20found%20when%20trying%20to%20get%20lock%3B%20try%20restarting%20transaction'%5C%22&from=7d

^ are hitting in neutron. But I have heard operators say they hit migration_update deadlocks downstream so it's still probably worth putting a retry decorator on that method.

tags: added: db
Matt Riedemann (mriedem) wrote :
Download full text (5.5 KiB)

Just saw it again in the gate here:

http://logs.openstack.org/40/627540/8/gate/tempest-full-py3/b25fd48/controller/logs/screen-n-api.txt.gz?level=TRACE#_Jan_24_19_53_01_166399

Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi pymysql.err.OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi The above exception was the direct cause of the following exception:
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi Traceback (most recent call last):
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/openstack/wsgi.py", line 671, in wrapped
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi return f(*args, **kwargs)
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/openstack/compute/servers.py", line 763, in _action_revert_resize
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi self.compute_api.revert_resize(context, instance)
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/compute/api.py", line 206, in inner
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi return function(self, context, instance, *args, **kwargs)
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/compute/api.py", line 214, in _wrapped
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi return fn(self, context, instance, *args, **kwargs)
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/compute/api.py", line 154, in inner
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi return f(self, context, instance, *args, **kw)
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/compute/api.py", line 3455, in revert_resize
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi migration.save()
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api...

Read more...

Changed in nova:
importance: Undecided → Medium
Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/633075
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=74f9bba9c7ac27df8f73d11fc317649ae22040fb
Submitter: Zuul
Branch: master

commit 74f9bba9c7ac27df8f73d11fc317649ae22040fb
Author: Matt Riedemann <email address hidden>
Date: Thu Jan 24 17:11:06 2019 -0500

    Add retry_on_deadlock to migration_update DB API

    Lots of different things can be updating the same migration
    record at the same time (different computes, the API, etc)
    and we see DBDeadlock errors on migration_update at times
    (and operators that do a lot of migrations have mentioned it
    also) so this change simply adds the retry_on_deadlock decorator
    to the migration_update DB API method.

    Change-Id: I98bbe96e4be5326e18ef2c2ac50c0d24993d52ec
    Closes-Bug: #1642537

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/648428
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=99b3f44eb138b02bd5e967cbbb1574ebcdbc6cc5
Submitter: Zuul
Branch: stable/stein

commit 99b3f44eb138b02bd5e967cbbb1574ebcdbc6cc5
Author: Matt Riedemann <email address hidden>
Date: Thu Jan 24 17:11:06 2019 -0500

    Add retry_on_deadlock to migration_update DB API

    Lots of different things can be updating the same migration
    record at the same time (different computes, the API, etc)
    and we see DBDeadlock errors on migration_update at times
    (and operators that do a lot of migrations have mentioned it
    also) so this change simply adds the retry_on_deadlock decorator
    to the migration_update DB API method.

    Change-Id: I98bbe96e4be5326e18ef2c2ac50c0d24993d52ec
    Closes-Bug: #1642537
    (cherry picked from commit 74f9bba9c7ac27df8f73d11fc317649ae22040fb)

This issue was fixed in the openstack/nova 19.0.1 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers