[Pluggable IPAM] Original exception is replaced on rollback failure

Bug #1571666 reported by Pavel Bondar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Pavel Bondar

Bug Description

During concurent allocation/deallocation/update multiple ips in ipam_pluggable_backend deadlock can be raised [1]. It causes rollback actions to be performed in [2], but because of raising deadlock exception database is not in operatable state unless full transaction restart. So it cause rollback action to fail and raise another exception 'InternalError'.
As a result original exception DeadLock was replaced with InternalError and db_retry wrapper does not restart transaction in this case.

Resulting exception as seen in logs[3]:

2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters [req-15281429-cfa8-4b9f-a904-0105fc947933 admin -] DBAPIError exception wrapped from (pymysql.err.InternalError) (1305, u'SAVEPOINT sa_savepoint_3 does not exist') [SQL: u'ROLLBACK TO SAVEPOINT sa_savepoint_3']
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent call last):
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters context)
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 450, in do_execute
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters cursor.execute(statement, parameters)
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/cursors.py", line 158, in execute
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters result = self._query(query)
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/cursors.py", line 308, in _query
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters conn.query(q)
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 820, in query
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters self._affected_rows = self._read_query_result(unbuffered=unbuffered)
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 1002, in _read_query_result
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters result.read()
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 1285, in read
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters first_packet = self.connection._read_packet()
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 966, in _read_packet
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters packet.check_error()
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 394, in check_error
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters err.raise_mysql_exception(self._data)
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/err.py", line 120, in raise_mysql_exception
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters _check_mysql_exception(errinfo)
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/err.py", line 115, in _check_mysql_exception
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters raise InternalError(errno, errorvalue)
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters InternalError: (1305, u'SAVEPOINT sa_savepoint_3 does not exist')
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters
2016-04-13 12:15:04.687 18294 DEBUG oslo_messaging._drivers.amqpdriver [req-56495b4e-29c9-4360-8609-f5b3ded6ed8f - -] sending reply msg_id: ee2f4dd536fb48e79bee32a668e3cc1d reply queue: reply_d2a1789c66c141888838b059e322350c time elapsed: 0.006736069s _send_reply /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:74
/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/langhelpers.py:68: SAWarning: An exception has occurred during handling of a previous exception. The previous exception is:
 <class 'oslo_db.exception.DBDeadlock'> (pymysql.err.InternalError) (1213, u'Deadlock found when trying to get lock; try restarting transaction') [SQL: u'INSERT INTO ipamallocations (ip_address, status, ipam_subnet_id) VALUES (%(ip_address)s, %(status)s, %(ipam_subnet_id)s)'] [parameters: {'status': 'ALLOCATED', 'ip_address': u'2001:db8::4e', 'ipam_subnet_id': u'1c084441-cec0-4865-848a-a75f01711e36'}]

  "is:\n %s %s\n" % (self._exc_info[0], self._exc_info[1]))
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters [req-15281429-cfa8-4b9f-a904-0105fc947933 admin -] DBAPIError exception wrapped from (pymysql.err.InternalError) (1305, u'SAVEPOINT sa_savepoint_3 does not exist') [SQL: u'ROLLBACK TO SAVEPOINT sa_savepoint_3']
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent call last):
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters context)
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 450, in do_execute
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters cursor.execute(statement, parameters)
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/cursors.py", line 158, in execute
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters result = self._query(query)
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/cursors.py", line 308, in _query
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters conn.query(q)
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 820, in query
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters self._affected_rows = self._read_query_result(unbuffered=unbuffered)
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 1002, in _read_query_result
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters result.read()
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 1285, in read
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters first_packet = self.connection._read_packet()
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 966, in _read_packet
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters packet.check_error()
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 394, in check_error
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters err.raise_mysql_exception(self._data)
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/err.py", line 120, in raise_mysql_exception
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters _check_mysql_exception(errinfo)
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/local/lib/python2.7/dist-packages/pymysql/err.py", line 115, in _check_mysql_exception
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters raise InternalError(errno, errorvalue)
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters InternalError: (1305, u'SAVEPOINT sa_savepoint_3 does not exist')
2016-04-13 12:15:04.688 18292 ERROR oslo_db.sqlalchemy.exc_filters

[1] https://github.com/openstack/neutron/blob/2a305c563073a3066aac3f07aab3c895ec2cd2fb/neutron/db/ipam_pluggable_backend.py#L59
[2] https://github.com/openstack/neutron/blob/2a305c563073a3066aac3f07aab3c895ec2cd2fb/neutron/db/ipam_pluggable_backend.py#L69
[3] http://logs.openstack.org/23/181023/64/check/gate-tempest-dsvm-neutron-dvr/f503538/logs/screen-q-svc.txt.gz#_2016-04-13_12_15_04_685

Tags: l3-ipam-dhcp
Pavel Bondar (pasha117)
Changed in neutron:
assignee: nobody → Pavel Bondar (pasha117)
tags: added: l3-ipam-dhcp
Revision history for this message
Pavel Bondar (pasha117) wrote :

Looks like my initial analysis in not fully correct. So now it looks like this error is observed due to using nested transaction in allocate block of reference ipam driver [1]:

Resulting error:
2016-04-13 12:15:04.685 18292 ERROR oslo_db.sqlalchemy.exc_filters [req-15281429-cfa8-4b9f-a904-0105fc947933 admin -] DBAPIError exception wrapped from (pymysql.err.InternalError) (1305, u'SAVEPOINT sa_savepoint_3 does not exist') [SQL: u'ROLLBACK TO SAVEPOINT sa_savepoint_3']

Original error:
 <class 'oslo_db.exception.DBDeadlock'> (pymysql.err.InternalError) (1213, u'Deadlock found when trying to get lock; try restarting transaction') [SQL: u'INSERT INTO ipamallocations (ip_address, status, ipam_subnet_id) VALUES (%(ip_address)s, %(status)s, %(ipam_subnet_id)s)'] [parameters: {'status': 'ALLOCATED', 'ip_address': u'2001:db8::4e', 'ipam_subnet_id': u'1c084441-cec0-4865-848a-a75f01711e36'}]

When DBDeadlock is raised inside nested transaction block, nested transaction wrapper tries to rollback to safepoint where nested transaction was initiated. But looks like in DBDeadlock case this is not possible.

[1] https://github.com/openstack/neutron/blob/2dc9d67f6e0fbec8272dc1dfdb594d05378536d5/neutron/ipam/drivers/neutrondb_ipam/driver.py#L338

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/307329

Changed in neutron:
status: New → In Progress
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Pavel Bondar (pasha117) wrote :
Download full text (12.8 KiB)

After applying fix [1], still seeing error where original exception is replaced with another one in rollback block [2].
This is where original analysis(from ticket description) applies. So fix [3] is still needed.

2016-04-19 00:50:30.716 18062 ERROR neutron.db.ipam_pluggable_backend [req-0ed5f0d8-42a0-47e4-89e2-bdf6fef3e1dc tempest-PortsTestJSON-447574898 -] IP deallocation failed on external system for 10.100.0.5, 10.100.0.20
2016-04-19 00:50:30.726 18062 ERROR root [req-0ed5f0d8-42a0-47e4-89e2-bdf6fef3e1dc tempest-PortsTestJSON-447574898 -] Original exception being dropped: ['Traceback (most recent call last):\n', ' File "/opt/stack/new/neutron/neutron/db/ipam_pluggable_backend.py", line 310, in update_port_with_ips\n ip[\'subnet_id\'], ip[\'ip_address\'])\n', ' File "/opt/stack/new/neutron/neutron/db/db_base_plugin_common.py", line 92, in _delete_ip_allocation\n subnet_id=subnet_id).delete()\n', ' File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 3048, in delete\n delete_op.exec_()\n', ' File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 1125, in exec_\n self._do_pre()\n', ' File "<string>", line 1, in <lambda>\n', ' File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 1163, in _do_pre\n session._autoflush()\n', ' File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1293, in _autoflush\n self.flush()\n', ' File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 2019, in flush\n self._flush(objects)\n', ' File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 2137, in _flush\n transaction.rollback(_capture_exception=True)\n', ' File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__\n compat.reraise(exc_type, exc_value, exc_tb)\n', ' File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 2101, in _flush\n flush_context.execute()\n', ' File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", line 373, in execute\n rec.execute(self)\n', ' File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", line 532, in execute\n uow\n', ' File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 174, in save_obj\n mapper, table, insert)\n', ' File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 767, in _emit_insert_statements\n execute(statement, multiparams)\n', ' File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 914, in execute\n return meth(self, multiparams, params)\n', ' File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection\n return connection._execute_clauseelement(self, multiparams, params)\n', ' File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement\n compiled_sql, distilled_params\n', ' File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context\n context)\n', ' File "/usr/local/lib/py...

Pavel Bondar (pasha117)
tags: added: liberty-backport-potential mitaka-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/306519
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=1f5008564de03bcd5d03743bdbce004bb27aaad1
Submitter: Jenkins
Branch: master

commit 1f5008564de03bcd5d03743bdbce004bb27aaad1
Author: Pavel Bondar <email address hidden>
Date: Fri Apr 15 18:29:19 2016 +0300

    Do not rewrite original exception for IPAM

    If deadlock exception is raised during allocation of multiple ips,
    this exception gets rewritten by rollback mechnism. Rollback tries
    to deallocate already allocated ips, but can't do it because database
    is not operatable unless entire transaction is restarted. So another
    exception is generated and it prevents retry wrappers from restarting
    transaction correctly.

    To fix this behavior rollback actions are covered with try-except block.
    This way any exception during rollback operations do not replace
    original exception.

    Partial-Bug: #1571666
    Change-Id: Ifdba7c49dc4a774be5221338b003d371750374d0

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/307329
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ad2d7d3a8fa9d498475a963f3a2d0a344ae63ee2
Submitter: Jenkins
Branch: master

commit ad2d7d3a8fa9d498475a963f3a2d0a344ae63ee2
Author: Pavel Bondar <email address hidden>
Date: Mon Apr 18 17:56:08 2016 +0300

    Remove nested transaction from ipam driver

    Using nested transaction in reference ipam driver causes rewriting
    original exception due failure to restore to safepoint in case of
    DeadLock.

    DBDeadlock exception gets replaced with DBAPIError. It prevents db_retry
    wrapper from working correctly.

    This patch removed nested transaction block from reference ipam driver.

    Change-Id: Ib710e87b0132aae3cd3afd12c5448961f1a3b25c
    Partial-Bug: #1571666

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/332098

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/332098
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=1d8eabaff1cff9efbc122b68a54b94745e001cd6
Submitter: Jenkins
Branch: master

commit 1d8eabaff1cff9efbc122b68a54b94745e001cd6
Author: Pavel Bondar <email address hidden>
Date: Tue Jun 21 14:34:57 2016 +0300

    Do not rewrite original exception for IPAM part 2

    If subnet allocation fails on _save_subnet stage, then rollback mechnism
    is executed. Currently if exception is raised on rollback stage it
    replaces original exception. But original exception is needed by
    retry wrappers to work properly also it is more informative for user.

    Using _safe_rollback method to prevent replacing original exception with
    one from rollback stage.

    Closes-Bug: #1571666
    Change-Id: I388272820451bb117281948fef0000bb46a951d4

Changed in neutron:
status: In Progress → Fix Released
tags: added: neutron-proactive-backport-potential
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 9.0.0.0b2

This issue was fixed in the openstack/neutron 9.0.0.0b2 development milestone.

tags: removed: neutron-proactive-backport-potential
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

Not critical for Mitaka that is now critical/CVE only.

tags: removed: liberty-backport-potential
tags: removed: mitaka-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.