DBDeadlock during router creation

Bug #1477096 reported by Anastasia Kuznetsova
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
Oleg Bondarev

Bug Description

Heat OSTF test failed in http://jenkins-product.srt.mirantis.net:8080/view/7.0_swarm/job/7.0.system_test.ubuntu.services_ha/35/ during router creation.

Here is an error in heat test logs:
2015-07-22 05:15:28 ERROR (nose_storage_plugin) fuel_health.tests.tests_platform.test_heat.HeatSmokeTests.test_update
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/unittest2/case.py", line 340, in run
    testMethod()
  File "/usr/lib/python2.6/site-packages/fuel_health/tests/tests_platform/test_heat.py", line 421, in test_update
    parameters['network'], _ = self.create_network_resources()
  File "/usr/lib/python2.6/site-packages/fuel_health/nmanager.py", line 901, in create_network_resources
    router = self._create_router(router_name, ext_net)
  File "/usr/lib/python2.6/site-packages/fuel_health/nmanager.py", line 987, in _create_router
    router = self.neutron_client.create_router(router_body)['router']
  File "/usr/lib/python2.6/site-packages/neutronclient/v2_0/client.py", line 98, in with_params
    ret = self.function(instance, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/neutronclient/v2_0/client.py", line 402, in create_router
    return self.post(self.routers_path, body=body)
  File "/usr/lib/python2.6/site-packages/neutronclient/v2_0/client.py", line 1325, in post
    headers=headers, params=params)
  File "/usr/lib/python2.6/site-packages/neutronclient/v2_0/client.py", line 1251, in do_request
    self._handle_fault_response(status_code, replybody)
  File "/usr/lib/python2.6/site-packages/neutronclient/v2_0/client.py", line 1216, in _handle_fault_response
    exception_handler_v20(status_code, des_error_body)
  File "/usr/lib/python2.6/site-packages/neutronclient/v2_0/client.py", line 66, in exception_handler_v20
    status_code=status_code)
InternalServerError: Request Failed: internal server error while processing your request.

Here is an error in neutron server log:
DBDeadlock: (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'SELECT ml2_network_segments.id AS ml2_network_segments_id, ml2_network_segments.network_id AS ml2_network_segments_network_id, ml2_network_segments.network_type AS ml2_network_segments_network_type, ml2_network_segments.physical_network AS ml2_network_segments_physical_network, ml2_network_segments.segmentation_id AS ml2_network_segments_segmentation_id, ml2_network_segments.is_dynamic AS ml2_network_segments_is_dynamic, ml2_network_segments.segment_index AS ml2_network_segments_segment_index \nFROM ml2_network_segments \nWHERE ml2_network_segments.network_id = %s AND ml2_network_segments.is_dynamic = 0 ORDER BY ml2_network_segments.segment_index' ('9bcf42de-ab57-4270-900b-447500f5885b',)

Full traceback http://paste.openstack.org/show/399624/
Full logs snapshot will be attached below.

Environment configuration:
ISO fuel-7.0-65-2015-07-21, UBUNTU, NEUTRON GRE, 3 controllers

Tags: neutron
Revision history for this message
Anastasia Kuznetsova (akuznetsova) wrote :
description: updated
Changed in mos:
assignee: nobody → MOS Neutron (mos-neutron)
Changed in mos:
milestone: none → 7.0
tags: added: neutron
Changed in mos:
status: New → Confirmed
Changed in mos:
importance: Undecided → Medium
Changed in mos:
assignee: MOS Neutron (mos-neutron) → Oleg Bondarev (obondarev)
Revision history for this message
Oleg Bondarev (obondarev) wrote :

I inspected neutron server logs on all controllers and found that deadlock usually happens when router port is created in parallel with dhcp port(s) creation on other servers. Generally we have simultaneous port creation. Port creation involves locking 'ports' and 'binding' tables: get_locked_port_and_binding() ml2 db method, which essentially does:
        port = (session.query(models_v2.Port).
                enable_eagerloads(False).
                filter_by(id=port_id).
                with_lockmode('update').
                one())
        binding = (session.query(models.PortBinding).
                   enable_eagerloads(False).
                   filter_by(port_id=port_id).
                   with_lockmode('update').
                   one())

I'm not sure how exacly this may lead to deadlock. It may probably happen due to specifics of Galera working in active-active
mode: throwing deadlock errors when it fails to validate a change with other members of the cluster.

I'm going to apply fix similar to https://review.openstack.org/#/c/180466/. Though it's more a workaround, it should fix the issue with the only downside of a slight delay in port creation in a very rare circumstances.

Revision history for this message
Oleg Bondarev (obondarev) wrote :

Raising to High as it happens pretty often during CI & test runs

Changed in mos:
importance: Medium → High
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/neutron (openstack-ci/fuel-7.0/2015.1.0)

Fix proposed to branch: openstack-ci/fuel-7.0/2015.1.0
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/9822

Revision history for this message
Oleg Bondarev (obondarev) wrote :

Yet another occurrence, traceback slightly differs: http://paste.openstack.org/show/405057/

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/neutron (openstack-ci/fuel-7.0/2015.1.0)

Reviewed: https://review.fuel-infra.org/9822
Submitter: mos-infra-ci <>
Branch: openstack-ci/fuel-7.0/2015.1.0

Commit: 4405a45b5af3c3fce510130990b9cf5de2be969c
Author: Oleg Bondarev <email address hidden>
Date: Thu Jul 23 15:08:00 2015

Retry ML2 create_port() on Deadlock

ML2 create_port operation currently involves locking ports
and bindings tables which may lead to DBDeadlock errors in certain
cases when several ports are created concurrently.
That may happen due to specifics of Galera working in active-active
mode: it may throw deadlock errors when it fails to validate
a change with other members of the cluster.
The fix adds retries to create port operation to overcome such
deadlocks.
This also moves wrapper from _create_port_db() as it doesn't make
much sense to have both methods wrapped.

Closes-Bug: #1477096
Change-Id: I3d68a07beee63cc52762b109586115b9bf2a4de5

Changed in mos:
status: Confirmed → In Progress
status: In Progress → Fix Committed
Revision history for this message
Anastasia Kuznetsova (akuznetsova) wrote :

Problem with DBDeadLocks was reproduced again in ISO 108 during Heat Autoscaling test (link to failed job
http://jenkins-product.srt.mirantis.net:8080/view/7.0_swarm/job/7.0.system_test.ubuntu.services_ha/43/ )

Traceback in tests:
urllib3.connectionpool: DEBUG: "POST http://10.109.2.3:9696/v2.0/routers.json HTTP/1.1" 500 150
neutronclient.client: DEBUG: RESP:500 CaseInsensitiveDict({'content-length': '150', 'via': '1.1 apache_api_proxy:9696', 'server': 'Apache/2.4.7 (Ubuntu)', 'connection': 'close', 'date': 'Thu, 30 Jul 2015 06:57:38 GMT', 'content-type': 'application/json; charset=UTF-8', 'x-openstack-request-id': 'req-1a3a3e1e-fda4-4a62-96a3-b4958592a611'}) {"NeutronError": {"message": "Request Failed: internal server error while processing your request.", "type": "HTTPInternalServerError", "detail": ""}}

neutronclient.v2_0.client: DEBUG: Error message: {"NeutronError": {"message": "Request Failed: internal server error while processing your request.", "type": "HTTPInternalServerError", "detail": ""}}
--------------------- >> end captured logging << ---------------------

Changed in mos:
status: Fix Committed → Confirmed
Revision history for this message
Oleg Bondarev (obondarev) wrote :

The reason for deadlocks still happening in this scenario is that create_port() is called inside create_router()'s transaction. This causes db_retry decorator (added to create_port() by previos commit) to not work as expected. Basically we get:
  InvalidRequestError: This Session's transaction has been rolled back by a nested rollback() call. To begin a new transaction, issue Session.rollback() first
for the parent transaction.

need to refactor create_router() to not create port from inside a transaction.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/neutron (openstack-ci/fuel-7.0/2015.1.0)

Fix proposed to branch: openstack-ci/fuel-7.0/2015.1.0
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/9989

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/neutron (openstack-ci/fuel-7.0/2015.1.0)

Reviewed: https://review.fuel-infra.org/9989
Submitter: mos-infra-ci <>
Branch: openstack-ci/fuel-7.0/2015.1.0

Commit: b4d0b20d00339f058586a2d7a360eac3a0ea2245
Author: Oleg Bondarev <email address hidden>
Date: Thu Jul 30 15:40:48 2015

Do not create gateway port inside a transaction

Currently if router is created with external gateway,
l3 plugin starts a transaction and creates both router
and gw port inside it. This is generally wrong because
core plugin's create_port() may go to backends so should
always be called outside a db transaction to prevent deadlocks.

ML2 create port is also wrapped with db_retry decorator.
Calling it from a transaction just nullifies db_retry.

The fix makes gw port creation outside a transaction

Closes-Bug: #1477096
Change-Id: Ice4565c75f8b96c414c71dff4e97def8b1f5581f

Changed in mos:
status: Confirmed → Fix Committed
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Related fix proposed to openstack/neutron (openstack-ci/fuel-7.0/2015.1.0)

Related fix proposed to branch: openstack-ci/fuel-7.0/2015.1.0
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/10264

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote :

Related fix proposed to branch: openstack-ci/fuel-7.0/2015.1.0
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/10265

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Related fix merged to openstack/neutron (openstack-ci/fuel-7.0/2015.1.0)

Reviewed: https://review.fuel-infra.org/10264
Submitter: mos-infra-ci <>
Branch: openstack-ci/fuel-7.0/2015.1.0

Commit: 8c5cca7646d4583af719f4ee8e2ae2a2812f9fbf
Author: Oleg Bondarev <email address hidden>
Date: Mon Aug 10 13:59:29 2015

Add oslo db retry decorator to non-CRUD actions

The previously added decorators to the create and update handlers
in the API layer only applied to actions that followed the standard
create/update path. However, for API operations like add_router_interface,
a different path is followed that wasn't covered by a retry decorator.
This patch adds the decorator to handle deadlocks in those operations as
well.

Related-Bug: #1477096
Closes-Bug: #1475218
Change-Id: Ib354074e6a3f68cedb95fd774f905d94ca16a830

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote :

Reviewed: https://review.fuel-infra.org/10265
Submitter: mos-infra-ci <>
Branch: openstack-ci/fuel-7.0/2015.1.0

Commit: 6767c640a73e74bb927fa72842422e0acbe1cb01
Author: Oleg Bondarev <email address hidden>
Date: Mon Aug 10 16:43:07 2015

Add oslo db retry decorator to the RPC handlers

The decorator was previously added at the API layer.
However some RPC handlers are also dealing with port
create/update/delete operations, like dhcp ports for example.
We need to cover these cases too.

Also remove db retry from ml2 plugin delete_port()
as it's not needed once we retry at the API and RPC layers.
(there is already a unit test on this)

Upstream review: https://review.openstack.org/207532
Related-Bug: #1477096
Closes-Bug: #1479738
Closes-Bug: #1470615
Change-Id: I7793a8f7c37ca542b8bc12372168aaaa0826ac4c

Changed in mos:
status: Fix Committed → Fix Released
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/neutron (openstack-ci/fuel-8.0/liberty)

Fix proposed to branch: openstack-ci/fuel-8.0/liberty
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/13303

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote :

Fix proposed to branch: openstack-ci/fuel-8.0/liberty
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/13308

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/neutron (openstack-ci/fuel-8.0/liberty)

Change abandoned by Ann Kamyshnikova <email address hidden> on branch: openstack-ci/fuel-8.0/liberty
Review: https://review.fuel-infra.org/13303
Reason: Is not needed fixed in https://review.openstack.org/#/c/214424/

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote :

Change abandoned by Elena Ezhova <email address hidden> on branch: openstack-ci/fuel-8.0/liberty
Review: https://review.fuel-infra.org/13308
Reason: This change is not needed as DB retry decorators we removed from ML2 plugin in https://review.openstack.org/#/c/238994/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.