ipallocation instances live between retries

Bug #1556178 reported by Kevin Benton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Kevin Benton

Bug Description

The retry decorator doesn't clear the session between each retry. This is normally not an issue; however, if some called code holds onto a reference a reference of a DB object that will conflict with a newly created one, we will get errors like the following:

FlushError: New instance <IPAllocation at 0x7fd98001ce50> with identity key (<class 'neutron.db.models_v2.IPAllocation'>, (u'10.0.0.2', u'70dfccfd-f18a-423b-9323-095a38b301a9', u'4e0d6054-6f90-450d-87d6-fb86fa194a91')) conflicts with persistent instance <IPAllocation at 0x7fd9805aa910>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/291795

Changed in neutron:
status: New → In Progress
tags: added: db
Changed in neutron:
importance: High → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Kevin Benton (<email address hidden>) on branch: master
Review: https://review.openstack.org/291795

Revision history for this message
Kevin Benton (kevinbenton) wrote : Re: retry decorator doesn't clear session in context

appears to be just an error with a specific unit test set that relies on DB events

Changed in neutron:
milestone: mitaka-rc1 → newton-1
Revision history for this message
Kevin Benton (kevinbenton) wrote : Re: sqlalchemy weakrefs live between retries in unit tests

For an example, check out the test disabled by this patch: https://review.openstack.org/#/c/291765/

summary: - retry decorator doesn't clear session in context
+ sqlalchemy weakrefs live between retries in unit tests
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/291765
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3ab6f8fb4539136fc193c722be98926701da6b32
Submitter: Jenkins
Branch: master

commit 3ab6f8fb4539136fc193c722be98926701da6b32
Author: Kevin Benton <email address hidden>
Date: Fri Mar 11 06:57:01 2016 -0800

    Extend dicts with original model in create/update

    This adjusts the logic in ML2 create port and create
    network to use the original DB object created rather
    than attempting to retrieve it from the DB again.
    This eliminates the race window where the object could
    be deleted out from underneath the create operations
    resulting in a 404.

    To achieve this it adds a create_port_db and
    create_network_db method to db_base_plugin_v2 that
    returns the original model instead of returning a
    dictionary.

    This also adds a skip to a unit test that was using DB
    events to trigger a rollback and retry. This test was
    causing failures with a FlushError conflict between an
    SQL alchemy new instance and a persistent instance in
    the session. This bug is fixed in the follow-up patch.

    Closes-Bug: #1556139
    Related-Bug: #1556178
    Change-Id: I3c4b4cb4f63236fca24abaf58881879c2d5657fa

Revision history for this message
Mike Bayer (zzzeek) wrote : Re: sqlalchemy weakrefs live between retries in unit tests

Here's a demonstration of how the SQLAlchemy Session works during a rollback condition. Note that any row that was INSERTed as part of the transaction is automatically expunged; there is no need to call expunge_all():

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class A(Base):
    __tablename__ = 'a'
    id = Column(Integer, primary_key=True)

e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)

s = Session(e, autocommit=True)

try:
    with s.begin():
        a1 = A(id=1)
        s.add(a1)

        s.flush()

        # a1 is now persistent
        assert inspect(a1).persistent

        # exception is raised
        raise Exception("something broke")
except Exception:
    # context manager has called rollback()

    # a1 is no longer persistent, it's bumped out
    assert a1 not in s
    assert inspect(a1).transient

    # retry - no problem with a2
    with s.begin():
        a2 = A(id=1)
        s.add(a2)

    assert inspect(a1).transient
    assert inspect(a2).persistent

Revision history for this message
Kevin Benton (kevinbenton) wrote :

Here's a demonstration of how it is failing in Neutron. It seems to be an issue with the relationship and it being referenced.

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class B(Base):
    __tablename__ = 'b'
    a_id = Column(Integer, ForeignKey('a.id', ondelete="CASCADE"),
                  nullable=True)
    address = Column(String(36), nullable=False, primary_key=True)

class A(Base):
    __tablename__ = 'a'
    id = Column(Integer, primary_key=True)
    b_recs = relationship(B, backref='A', lazy='joined',
                          passive_deletes='all')

e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)

s = Session(e, autocommit=True)

try:
    with s.begin():
        a1 = A(id=1)
        s.add(a1)

        s.flush()
        s.add(B(a_id=1, address='1234'))

        # other code references relationship
        assert a1.b_recs

        # a1 is now persistent
        assert inspect(a1).persistent

        # exception is raised
        raise Exception("something broke")
except Exception:
    # context manager has called rollback()

    # a1 is no longer persistent, it's bumped out
    assert a1 not in s
    assert inspect(a1).transient

    # retry - no problem with a2
    with s.begin():
        a2 = A(id=1)
        s.add(a2)
        s.add(B(a_id=1, address='1234'))

    assert inspect(a1).transient
    assert inspect(a2).persistent

summary: - sqlalchemy weakrefs live between retries in unit tests
+ ipallocation instances live between retries in unit tests
Revision history for this message
Kevin Benton (kevinbenton) wrote : Re: ipallocation instances live between retries in unit tests

After a discussion with Michael on IRC[1], I found out this is being caused by the original IPAllocation object being immediately garbage collected after the _store_ip_allocation function returns. Then when the fixed_ips relationship is referenced in the port creation it results in a new persistent object being created that SQLAlchemy doesn't know was a result of the same transaction. So on the rollback the object stays in the session and conflicts with the retry operation.

1. http://eavesdrop.openstack.org/irclogs/%23openstack-oslo/%23openstack-oslo.2016-03-14.log.html#t2016-03-14T19:44:01

summary: - ipallocation instances live between retries in unit tests
+ ipallocation instances live between retries
Changed in neutron:
importance: Low → Medium
Revision history for this message
Kevin Benton (kevinbenton) wrote :

This could potentially happen outside of a unit test if the port creation fails for a reason other than a duplicate allocation record so I've upgraded the priority to a medium.

Revision history for this message
Mike Bayer (zzzeek) wrote :

upstream issue is fixed for SQLAlchemy 1.1. for now, Neutron will store a reference to the rolled-back object in Session.info. upstream issue: https://bitbucket.org/zzzeek/sqlalchemy/issues/3677/double-check-conflicting-row-switch

Changed in neutron:
milestone: newton-1 → mitaka-rc1
Changed in neutron:
milestone: mitaka-rc1 → newton-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/291795
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7d9169967fca3d81076cf60eb772f4506735a218
Submitter: Jenkins
Branch: master

commit 7d9169967fca3d81076cf60eb772f4506735a218
Author: Kevin Benton <email address hidden>
Date: Sun Mar 13 20:52:09 2016 -0700

    Add IPAllocation object to session info to stop GC

    This adds the IPAllocation object created in the _store_ip_allocation
    method to the session info dictionary to prevent it from being
    immediately garbage collected. This is necessary because otherwise a
    new persistent object will be created when the fixed_ips relationship
    is referenced during the rest of the port create/update opertions.
    This persistent object will then interfere with a retry operation
    that uses the same session if it tries to create a conflicting record.

    By preventing the object from being garbage collected, the reference
    to fixed IPs will re-use the newly created sqlalchemy object instead
    which will properly be cleaned up on a rollback.

    This also removes the 'passive_delete' option from the fixed_ips
    relationship on ports because IPAllocation objects would now be
    left in the session after port deletes. At first glance, this might
    look like a performance penalty because fixed_ips would be looked
    up before port deletes; however, we already do that in the IPAM
    code as well as the ML2 code so this relationship is already being
    loaded on the delete_port operation.

    Closes-Bug: #1556178
    Change-Id: Ieee1343bb90cf111c55e00b9cabc27943b46c350

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/295517

tags: added: mitaka-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/296509

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/mitaka)

Reviewed: https://review.openstack.org/295517
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=aafa702d2f389c0a4f3679df65be07940095ec29
Submitter: Jenkins
Branch: stable/mitaka

commit aafa702d2f389c0a4f3679df65be07940095ec29
Author: Kevin Benton <email address hidden>
Date: Sun Mar 13 20:52:09 2016 -0700

    Add IPAllocation object to session info to stop GC

    This adds the IPAllocation object created in the _store_ip_allocation
    method to the session info dictionary to prevent it from being
    immediately garbage collected. This is necessary because otherwise a
    new persistent object will be created when the fixed_ips relationship
    is referenced during the rest of the port create/update opertions.
    This persistent object will then interfere with a retry operation
    that uses the same session if it tries to create a conflicting record.

    By preventing the object from being garbage collected, the reference
    to fixed IPs will re-use the newly created sqlalchemy object instead
    which will properly be cleaned up on a rollback.

    This also removes the 'passive_delete' option from the fixed_ips
    relationship on ports because IPAllocation objects would now be
    left in the session after port deletes. At first glance, this might
    look like a performance penalty because fixed_ips would be looked
    up before port deletes; however, we already do that in the IPAM
    code as well as the ML2 code so this relationship is already being
    loaded on the delete_port operation.

    Closes-Bug: #1556178
    Change-Id: Ieee1343bb90cf111c55e00b9cabc27943b46c350
    (cherry picked from commit 7d9169967fca3d81076cf60eb772f4506735a218)

tags: added: in-stable-mitaka
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 8.1.0

This issue was fixed in the openstack/neutron 8.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/314250

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/296509
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b30eb16bd0d35a007baa68c87b85186111bb591a
Submitter: Jenkins
Branch: stable/liberty

commit b30eb16bd0d35a007baa68c87b85186111bb591a
Author: Kevin Benton <email address hidden>
Date: Sun Mar 13 20:52:09 2016 -0700

    Add IPAllocation object to session info to stop GC

    This adds the IPAllocation object created in the _store_ip_allocation
    method to the session info dictionary to prevent it from being
    immediately garbage collected. This is necessary because otherwise a
    new persistent object will be created when the fixed_ips relationship
    is referenced during the rest of the port create/update opertions.
    This persistent object will then interfere with a retry operation
    that uses the same session if it tries to create a conflicting record.

    By preventing the object from being garbage collected, the reference
    to fixed IPs will re-use the newly created sqlalchemy object instead
    which will properly be cleaned up on a rollback.

    This also removes the 'passive_delete' option from the fixed_ips
    relationship on ports because IPAllocation objects would now be
    left in the session after port deletes. At first glance, this might
    look like a performance penalty because fixed_ips would be looked
    up before port deletes; however, we already do that in the IPAM
    code as well as the ML2 code so this relationship is already being
    loaded on the delete_port operation.

    Closes-Bug: #1556178
    Change-Id: Ieee1343bb90cf111c55e00b9cabc27943b46c350
    (cherry picked from commit 7d9169967fca3d81076cf60eb772f4506735a218)

tags: added: in-stable-liberty
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)
Download full text (36.9 KiB)

Reviewed: https://review.openstack.org/314250
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3bf73801df169de40d365e6240e045266392ca63
Submitter: Jenkins
Branch: master

commit a323769143001d67fd1b3b4ba294e59accd09e0e
Author: Ryan Moats <email address hidden>
Date: Tue Oct 20 15:51:37 2015 +0000

    Revert "Improve performance of ensure_namespace"

    This reverts commit 81823e86328e62850a89aef9f0b609bfc0a6dacd.

    Unneeded optimization: this commit only improves execution
    time on the order of milliseconds, which is less than 1% of
    the total router update execution time at the network node.

    This also

    Closes-bug: #1574881

    Change-Id: Icbcdf4725ba7d2e743bb6761c9799ae436bd953b

commit 7fcf0253246832300f13b0aa4cea397215700572
Author: OpenStack Proposal Bot <email address hidden>
Date: Thu Apr 21 07:05:16 2016 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I9e930750dde85a9beb0b6f85eeea8a0962d3e020

commit 643b4431606421b09d05eb0ccde130adbf88df64
Author: OpenStack Proposal Bot <email address hidden>
Date: Tue Apr 19 06:52:48 2016 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I52d7460b3265b5460b9089e1cc58624640dc7230

commit 1ffea42ccdc14b7a6162c1895bd8f2aae48d5dae
Author: OpenStack Proposal Bot <email address hidden>
Date: Mon Apr 18 15:03:30 2016 +0000

    Updated from global requirements

    Change-Id: Icb27945b3f222af1d9ab2b62bf2169d82b6ae26c

commit b970ed5bdac60c0fa227f2fddaa9b842ba4f51a7
Author: Kevin Benton <email address hidden>
Date: Fri Apr 8 17:52:14 2016 -0700

    Clear DVR MAC on last agent deletion from host

    Once all agents are deleted from a host, the DVR MAC generated
    for that host should be deleted as well to prevent a buildup of
    pointless flows generated in the OVS agent for hosts that don't
    exist.

    Closes-Bug: #1568206
    Change-Id: I51e736aa0431980a595ecf810f148ca62d990d20
    (cherry picked from commit 92527c2de2afaf4862fddc101143e4d02858924d)

commit eee9e58ed258a48c69effef121f55fdaa5b68bd6
Author: Mike Bayer <email address hidden>
Date: Tue Feb 9 13:10:57 2016 -0500

    Add an option for WSGI pool size

    Neutron currently hardcodes the number of
    greenlets used to process requests in a process to 1000.
    As detailed in
    http://lists.openstack.org/pipermail/openstack-dev/2015-December/082717.html

    this can cause requests to wait within one process
    for available database connection while other processes
    remain available.

    By adding a wsgi_default_pool_size option functionally
    identical to that of Nova, we can lower the number of
    greenlets per process to be more in line with a typical
    max database connection pool size.

    DocImpact: a previously unused configuration value
               wsgi_default_pool_size is now used to a...

tags: added: neutron-proactive-backport-potential
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/neutron 7.1.0

This issue was fixed in the openstack/neutron 7.1.0 release.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 9.0.0.0b1

This issue was fixed in the openstack/neutron 9.0.0.0b1 development milestone.

tags: removed: neutron-proactive-backport-potential
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

This resurfaced lately. A possible fix is: https://review.openstack.org/#/c/485328/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.