Neutron network deletion fails: "DBDeadlock: (OperationalError)"

Bug #1392386 reported by Artem Panchenko
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Invalid
Undecided
Oleg Bondarev
6.0.x
Won't Fix
Medium
Eugene Nikanorov
6.1.x
Invalid
Medium
Eugene Nikanorov
7.0.x
Invalid
Undecided
Oleg Bondarev

Bug Description

api: '1.0'
astute_sha: ad1a56c4a872ac1f73d79bb2d54a04bed1bc3dc3
auth_required: true
build_id: 2014-11-13_11-49-47
build_number: '100'
feature_groups:
- mirantis
fuellib_sha: b9d4068bcb9058abe9ca88785bf4e9cf2b42a2e1
fuelmain_sha: 144262798c38c9186ad2d9c71c60a795b1474014
nailgun_sha: 677dc548a1fed866ae26a4e421bcd4bf31409b0e
ostf_sha: 720cc1308c3a7081736edd167e7928ca61914aaa
production: docker
release: '6.0'

Sometimes Neutron network deletion fails with the following error:

"InternalServerError: Request Failed: internal server error while processing your request."

This issue was reproduced during build verification tests on CI, OSTF test 'Check network connectivity from instance via floating IP (failure)' failed:

http://jenkins-product.srt.mirantis.net:8080/job/6.0.centos.bvt_1/82/testReport/junit/(root)/deploy_neutron_vlan_ha/deploy_neutron_vlan_ha/

and I wasn't able to reproduce it manually on the same environment.

Here is the error and traceback from Neutron logs (node-3):

http://paste.openstack.org/show/132839/

Full diagnostic snapshot is attached.

Tags: neutron
Revision history for this message
Artem Panchenko (apanchenko-8) wrote :
Changed in fuel:
importance: Undecided → High
milestone: none → 6.0
Changed in mos:
assignee: MOS Neutron (mos-neutron) → Eugene Nikanorov (enikanorov)
no longer affects: fuel
Changed in mos:
status: New → Confirmed
Revision history for this message
Elena Ezhova (eezhova) wrote :

It seems that the problem arises because one transaction attempts to delete a network, while another tries to update a port which belongs to that network, and both transactions try to update the table 'ports'.

http://paste.openstack.org/show/134710/

Revision history for this message
Elena Ezhova (eezhova) wrote :

There is an upstream patch on review https://review.openstack.org/#/c/115360/9 that may help.

Revision history for this message
Alexander Ignatov (aignatov) wrote :

Decreasing importance since this issue is floating, occurs very rare.

Changed in mos:
importance: High → Medium
milestone: 6.0 → 6.1
no longer affects: mos/6.1.x
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Download full text (3.6 KiB)

Still reproduced , add one more snapshot
http://jenkins-product.srt.mirantis.net:8080/view/6.0_swarm/job/6.0_fuelmain.system_test.centos.ha_neutron/53/

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.0"
  api: "1.0"
  build_number: "49"
  build_id: "2014-12-09_22-41-06"
  astute_sha: "16b252d93be6aaa73030b8100cf8c5ca6a970a91"
  fuellib_sha: "2c99931072d951301d395ebd5bf45c8d401301bb"
  ostf_sha: "a9afb68710d809570460c29d6c3293219d3624d4"
  nailgun_sha: "22bd43b89a17843f9199f92d61fc86cb0f8772f1"
  fuelmain_sha: "3aab16667f47dd8384904e27f70f7a87ba15f4ee"

Steps:
Deploy cluster in HA mode with Neutron VLAN and public network
           assigned to all nodes

        Scenario:
            1. Create cluster
            2. Add 3 nodes with controller role
            3. Add 2 nodes with compute role
            4. Enable assign public networks to all nodes option
            5. Deploy the cluster
            6. Check that public network was assigned to all nodes
            7. Run network verification
            8. Run OSTF

Actual:
Ostf test check network connectivity failed on step delete network

urllib3.connectionpool: DEBUG: "DELETE http://10.108.13.2:9696//v2.0/networks/cd0d89f4-4432-4f89-84e8-7cf9bae2b273.json HTTP/1.1" 500 150
neutronclient.client: DEBUG: RESP:500 CaseInsensitiveDict({'content-length': '150', 'via': '1.1 node-3.test.domain.local:9696', 'connection': 'close', 'date': 'Fri, 12 Dec 2014 06:38:31 GMT', 'content-type': 'application/json; charset=UTF-8', 'x-openstack-request-id': 'req-2c0ffc59-2412-4fb8-b7f0-b8c6857262fd'}) {"NeutronError": {"message": "Request Failed: internal server error while processing your request.", "type": "HTTPInternalServerError", "detail": ""}}

neutronclient.v2_0.client: DEBUG: Error message: {"NeutronError": {"message": "Request Failed: internal server error while processing your request.", "type": "HTTPInternalServerError", "detail": ""}}
fuel_health.common.test_mixins: DEBUG: Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/fuel_health/common/test_mixins.py", line 186, in verify
    result = func(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/fuel_health/neutronmanager.py", line 117, in _remove_network
    self.neutron_client.delete_network(network['id'])
  File "/usr/lib/python2.6/site-packages/neutronclient/v2_0/client.py", line 98, in with_params
    ret = self.function(instance, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/neutronclient/v2_0/client.py", line 359, in delete_network
    return self.delete(self.network_path % (network))
  File "/usr/lib/python2.6/site-packages/neutronclient/v2_0/client.py", line 1316, in delete
    headers=headers, params=params)
  File "/usr/lib/python2.6/site-packages/neutronclient/v2_0/client.py", line 1297, in retry_request
    headers=headers, params=params)
  File "/usr/lib/python2.6/site-packages/neutronclient/v2_0/client.py", line 1251, in do_request
    self._handle_fault_response(status_code, replybody)
  File "/usr/lib/python2.6/site-packages/neutronclient/v2_0/client.py", line 1216, in _handle_fault_response
    exception_handler_v20(status_code, des_error_body)
  Fil...

Read more...

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
tags: added: scale
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Currently there is no proven fix that we can apply to repair this.

tags: removed: scale
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Changed 6.0 to won't fix due to medium fixes are not backported

Revision history for this message
Elena Ezhova (eezhova) wrote :

The upstream patch that can perhaps fix the problem https://review.openstack.org/#/c/115360/ has merged.

Changed in mos:
status: Confirmed → Won't Fix
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

The code that has caused the traceback has been changed in 6.1 code, in particular, with_lockmode condition was removed, so now the code doesn't try to acquire db lock.
So right now the issue is obsolete.

Revision history for this message
Artem Panchenko (apanchenko-8) wrote :
Download full text (5.4 KiB)

Reproduced again on fuel 7.0 (kilo), caused BVT failure:

<163>Jul 30 14:07:58 node-4 neutron-server 2015-07-30 14:07:58.147 20498 ERROR oslo_db.api [req-2eeaf233-afa4-4a41-9661-a4b169a67afc ] DB error.
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api Traceback (most recent call last):
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api File "/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 131, in wrapper
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api return f(*args, **kwargs)
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py", line 1284, in delete_port
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api context, id, do_notify=False)
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api File "/usr/lib/python2.7/dist-packages/neutron/db/l3_dvr_db.py", line 267, in disassociate_floatingips
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api context, port_id=port_id)
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api File "/usr/lib/python2.7/dist-packages/neutron/db/l3_dvr_db.py", line 261, in _get_floatingip_on_port
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api return floating_ip.first()
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2341, in first
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api ret = list(self[0:1])
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2208, in __getitem__
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api return list(res)
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2412, in __iter__
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api return self._execute_and_instances(context)
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2427, in _execute_and_instances
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api result = conn.execute(querycontext.statement, self._params)
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 729, in execute
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api return meth(self, multiparams, params)
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api File "/usr/lib/python2.7/dist-packages/sqlalchemy/sql/elements.py", line 321, in _execute_on_connection
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api return connection._execute_clauseelement(self, multiparams, params)
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 826, in _execute_clauseelement
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api compiled_sql, distilled_params
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 958, in _execute_context
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api context)
2015-07-30 14:07:58.147 20498 TRACE oslo_db.api File "/usr/lib/python2.7/dist-packages/oslo_db/sqlalchemy/compat/handle_error.py", line 261, in _handle_...

Read more...

Revision history for this message
Artem Panchenko (apanchenko-8) wrote :
Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Twice reproduced on BVT, increased to Critical.

Revision history for this message
Oleg Bondarev (obondarev) wrote :
Download full text (4.4 KiB)

Traceback from comment 11 is not the cause for network delete failure, it rather shows that db retry works. Logs from snapshot show that port failed to be deleted first time (DBDeadlock), but then was deleted on a second attempt (thanks to db retry).

Network deletion failure was caused by ObjectDeletedError. So this is not a regression, I'd suggest to open another bug for this.

There is a race on network deletion where dhcp ports of the network may be deleted concurrently: by dhcp agent issuing port deletion on server side and by neutron server itself when deleting network. This may lead to ObjectDeletedError on one of neutron servers. Traceback:

2015-07-30 14:07:59.476 20498 TRACE neutron.plugins.ml2.plugin Traceback (most recent call last):
2015-07-30 14:07:59.476 20498 TRACE neutron.plugins.ml2.plugin File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py", line 681, in _delete_ports
2015-07-30 14:07:59.476 20498 TRACE neutron.plugins.ml2.plugin self.delete_port(context, port.id)
2015-07-30 14:07:59.476 20498 TRACE neutron.plugins.ml2.plugin File "/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 131, in wrapper
2015-07-30 14:07:59.476 20498 TRACE neutron.plugins.ml2.plugin return f(*args, **kwargs)
2015-07-30 14:07:59.476 20498 TRACE neutron.plugins.ml2.plugin File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py", line 1288, in delete_port
2015-07-30 14:07:59.476 20498 TRACE neutron.plugins.ml2.plugin super(Ml2Plugin, self).delete_port(context, id)
2015-07-30 14:07:59.476 20498 TRACE neutron.plugins.ml2.plugin File "/usr/lib/python2.7/dist-packages/neutron/db/db_base_plugin_v2.py", line 1915, in delete_port
2015-07-30 14:07:59.476 20498 TRACE neutron.plugins.ml2.plugin self._delete_port(context, id)
2015-07-30 14:07:59.476 20498 TRACE neutron.plugins.ml2.plugin File "/usr/lib/python2.7/dist-packages/neutron/db/db_base_plugin_v2.py", line 1938, in _delete_port
2015-07-30 14:07:59.476 20498 TRACE neutron.plugins.ml2.plugin query.delete()
2015-07-30 14:07:59.476 20498 TRACE neutron.plugins.ml2.plugin File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2670, in delete
2015-07-30 14:07:59.476 20498 TRACE neutron.plugins.ml2.plugin delete_op.exec_()
2015-07-30 14:07:59.476 20498 TRACE neutron.plugins.ml2.plugin File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 896, in exec_
2015-07-30 14:07:59.476 20498 TRACE neutron.plugins.ml2.plugin self._do_pre_synchronize()
2015-07-30 14:07:59.476 20498 TRACE neutron.plugins.ml2.plugin File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 958, in _do_pre_synchronize
2015-07-30 14:07:59.476 20498 TRACE neutron.plugins.ml2.plugin eval_condition(obj)]
2015-07-30 14:07:59.476 20498 TRACE neutron.plugins.ml2.plugin File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/evaluator.py", line 115, in evaluate
2015-07-30 14:07:59.476 20498 TRACE neutron.plugins.ml2.plugin left_val = eval_left(obj)
2015-07-30 14:07:59.476 20498 TRACE neutron.plugins.ml2.plugin File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/evaluator.py", line 72, in <lambda>
2015-07-3...

Read more...

Revision history for this message
Oleg Bondarev (obondarev) wrote :

Snapshot from comment 12 points to bug https://bugs.launchpad.net/mos/+bug/1454423. Marking this one as invalid

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.