finish_resize fails with DBDeadlock on migrations table

Bug #1642537 reported by Miguel Angel Ajo
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann
Queens
Fix Released
Medium
Matt Riedemann
Rocky
Fix Released
Medium
Matt Riedemann
Stein
Fix Committed
Medium
Matt Riedemann

Bug Description

http://logs.openstack.org/00/352200/14/check/gate-tempest-dsvm-neutron-linuxbridge-ubuntu-xenial/6ba61d3/console.html

2016-11-17 05:06:33.890498 | Captured traceback:
2016-11-17 05:06:33.890507 | ~~~~~~~~~~~~~~~~~~~
2016-11-17 05:06:33.890534 | Traceback (most recent call last):
2016-11-17 05:06:33.890566 | File "tempest/api/compute/servers/test_server_actions.py", line 289, in test_resize_server_confirm
2016-11-17 05:06:33.890586 | self._test_resize_server_confirm(stop=False)
2016-11-17 05:06:33.890613 | File "tempest/api/compute/servers/test_server_actions.py", line 271, in _test_resize_server_confirm
2016-11-17 05:06:33.890624 | 'VERIFY_RESIZE')
2016-11-17 05:06:33.890644 | File "tempest/common/waiters.py", line 75, in wait_for_server_status
2016-11-17 05:06:33.890655 | server_id=server_id)
2016-11-17 05:06:33.890686 | tempest.exceptions.BuildErrorException: Server 00876f9f-2f30-4507-93c0-f0f5a1699565 failed to build and is in ERROR status
2016-11-17 05:06:33.890761 | Details: {u'code': 500, u'message': u"Remote error: DBDeadlock (pymysql.err.InternalError) (1213, u'Deadlock found when trying to get lock; try restarting transaction') [SQL: u'UPDATE migrations SET updated_at=%(updated_at)s, status=%(status)s WHERE migrations.id = %(migrations_id)s'] [parame", u'created': u'2016-11-17T04:43:56Z'}
2016-11-17 05:06:33.890767 |

Tags: db
Revision history for this message
Matt Riedemann (mriedem) wrote :
Download full text (3.5 KiB)

2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [req-027e93fd-83f6-4c9a-8e42-bc3544a52402 tempest-ServerActionsTestJSON-1155389376 tempest-ServerActionsTestJSON-1155389376] [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] Setting instance vm_state to ERROR
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] Traceback (most recent call last):
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] File "/opt/stack/new/nova/nova/compute/manager.py", line 3963, in finish_resize
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] disk_info, image_meta)
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] File "/opt/stack/new/nova/nova/compute/manager.py", line 3932, in _finish_resize
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] migration.save()
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 210, in wrapper
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] ctxt, self, fn.__name__, args, kwargs)
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] File "/opt/stack/new/nova/nova/conductor/rpcapi.py", line 241, in object_action
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] objmethod=objmethod, args=args, kwargs=kwargs)
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 169, in call
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] retry=self.retry)
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 97, in _send
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] timeout=timeout, retry=retry)
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 467, in send
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] retry=retry)
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 458, in _send
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] raise result
2016-11-17 04:43:55.673 3598 ERROR nova.compute.manager [instance: 00876f9f-2f30-4507-93c0-f0f5a1699565] RemoteError: Remote error: DB...

Read more...

Changed in nova:
status: New → Confirmed
summary: - Request fails with DBDeadlock
+ finish_resize fails with DBDeadlock on migrations table
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

Don't really see this in nova logs in upstream CI anymore:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Deadlock%20found%20when%20trying%20to%20get%20lock%3B%20try%20restarting%20transaction'%5C%22&from=7d

^ are hitting in neutron. But I have heard operators say they hit migration_update deadlocks downstream so it's still probably worth putting a retry decorator on that method.

tags: added: db
Revision history for this message
Matt Riedemann (mriedem) wrote :
Download full text (5.5 KiB)

Just saw it again in the gate here:

http://logs.openstack.org/40/627540/8/gate/tempest-full-py3/b25fd48/controller/logs/screen-n-api.txt.gz?level=TRACE#_Jan_24_19_53_01_166399

Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi pymysql.err.OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi The above exception was the direct cause of the following exception:
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi Traceback (most recent call last):
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/openstack/wsgi.py", line 671, in wrapped
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi return f(*args, **kwargs)
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/openstack/compute/servers.py", line 763, in _action_revert_resize
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi self.compute_api.revert_resize(context, instance)
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/compute/api.py", line 206, in inner
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi return function(self, context, instance, *args, **kwargs)
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/compute/api.py", line 214, in _wrapped
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi return fn(self, context, instance, *args, **kwargs)
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/compute/api.py", line 154, in inner
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi return f(self, context, instance, *args, **kw)
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/compute/api.py", line 3455, in revert_resize
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api.openstack.wsgi migration.save()
Jan 24 19:53:01.166399 ubuntu-bionic-ovh-bhs1-0002105626 <email address hidden>[9264]: ERROR nova.api...

Read more...

Changed in nova:
importance: Undecided → Medium
Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/633075
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=74f9bba9c7ac27df8f73d11fc317649ae22040fb
Submitter: Zuul
Branch: master

commit 74f9bba9c7ac27df8f73d11fc317649ae22040fb
Author: Matt Riedemann <email address hidden>
Date: Thu Jan 24 17:11:06 2019 -0500

    Add retry_on_deadlock to migration_update DB API

    Lots of different things can be updating the same migration
    record at the same time (different computes, the API, etc)
    and we see DBDeadlock errors on migration_update at times
    (and operators that do a lot of migrations have mentioned it
    also) so this change simply adds the retry_on_deadlock decorator
    to the migration_update DB API method.

    Change-Id: I98bbe96e4be5326e18ef2c2ac50c0d24993d52ec
    Closes-Bug: #1642537

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.openstack.org/648428

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/stein)

Reviewed: https://review.openstack.org/648428
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=99b3f44eb138b02bd5e967cbbb1574ebcdbc6cc5
Submitter: Zuul
Branch: stable/stein

commit 99b3f44eb138b02bd5e967cbbb1574ebcdbc6cc5
Author: Matt Riedemann <email address hidden>
Date: Thu Jan 24 17:11:06 2019 -0500

    Add retry_on_deadlock to migration_update DB API

    Lots of different things can be updating the same migration
    record at the same time (different computes, the API, etc)
    and we see DBDeadlock errors on migration_update at times
    (and operators that do a lot of migrations have mentioned it
    also) so this change simply adds the retry_on_deadlock decorator
    to the migration_update DB API method.

    Change-Id: I98bbe96e4be5326e18ef2c2ac50c0d24993d52ec
    Closes-Bug: #1642537
    (cherry picked from commit 74f9bba9c7ac27df8f73d11fc317649ae22040fb)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.1

This issue was fixed in the openstack/nova 19.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 20.0.0.0rc1

This issue was fixed in the openstack/nova 20.0.0.0rc1 release candidate.

Revision history for this message
Ronan Lanore (ronan-lanore) wrote :

Hi, are they any workarround for Queens ?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/711461

Revision history for this message
Matt Riedemann (mriedem) wrote :

> Hi, are they any workarround for Queens ?

Not that I'm aware of but we can backport the fix through to Queens so you could pick it up from there. I just posted a backport to stable/rocky and then once that is merged we can backport it to stable/queens.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/711461
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1f83d773a436c41709b5d6b30a51c979bb935fa2
Submitter: Zuul
Branch: stable/rocky

commit 1f83d773a436c41709b5d6b30a51c979bb935fa2
Author: Matt Riedemann <email address hidden>
Date: Thu Jan 24 17:11:06 2019 -0500

    Add retry_on_deadlock to migration_update DB API

    Lots of different things can be updating the same migration
    record at the same time (different computes, the API, etc)
    and we see DBDeadlock errors on migration_update at times
    (and operators that do a lot of migrations have mentioned it
    also) so this change simply adds the retry_on_deadlock decorator
    to the migration_update DB API method.

    Change-Id: I98bbe96e4be5326e18ef2c2ac50c0d24993d52ec
    Closes-Bug: #1642537
    (cherry picked from commit 74f9bba9c7ac27df8f73d11fc317649ae22040fb)
    (cherry picked from commit 99b3f44eb138b02bd5e967cbbb1574ebcdbc6cc5)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/712212

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/712212
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f223c443b5b64f4385f128fa3b93cac25ea3e1f5
Submitter: Zuul
Branch: stable/queens

commit f223c443b5b64f4385f128fa3b93cac25ea3e1f5
Author: Matt Riedemann <email address hidden>
Date: Thu Jan 24 17:11:06 2019 -0500

    Add retry_on_deadlock to migration_update DB API

    Lots of different things can be updating the same migration
    record at the same time (different computes, the API, etc)
    and we see DBDeadlock errors on migration_update at times
    (and operators that do a lot of migrations have mentioned it
    also) so this change simply adds the retry_on_deadlock decorator
    to the migration_update DB API method.

    Change-Id: I98bbe96e4be5326e18ef2c2ac50c0d24993d52ec
    Closes-Bug: #1642537
    (cherry picked from commit 74f9bba9c7ac27df8f73d11fc317649ae22040fb)
    (cherry picked from commit 99b3f44eb138b02bd5e967cbbb1574ebcdbc6cc5)
    (cherry picked from commit 1f83d773a436c41709b5d6b30a51c979bb935fa2)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova queens-eol

This issue was fixed in the openstack/nova queens-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova rocky-eol

This issue was fixed in the openstack/nova rocky-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.