bagpipe: IPAllocation DB query failing, engine facade mismatch

Bug #1746996 reported by Thomas Morin
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
BaGPipe
Fix Released
Undecided
Unassigned
networking-bgpvpn
Fix Released
Undecided
Unassigned
neutron
Fix Released
High
Unassigned

Bug Description

networking_bgpvpn.tests.unit.services.bagpipe.test_bagpipe.TestBagpipeServiceDriverV2RPCs.test_router_itf_event_network_assoc is currently failing

Traceback (most recent call last):
  File "/home/teom7365/prog/openstack/networking-bgpvpn/networking_bgpvpn/neutron/services/service_drivers/bagpipe/bagpipe_v2.py", line 260, in registry_router_interface_before_delete
    context, router_id, network_id)
  File "/home/teom7365/prog/openstack/networking-bgpvpn/.tox/py27/local/lib/python2.7/site-packages/oslo_log/helpers.py", line 67, in wrapper
    return method(*args, **kwargs)
  File "/home/teom7365/prog/openstack/networking-bgpvpn/networking_bgpvpn/neutron/services/service_drivers/bagpipe/bagpipe_v2.py", line 218, in notify_router_interface_deleted
    network_id=net_id) +
  File "neutron/objects/base.py", line 561, in get_objects
    return [cls._load_object(context, db_obj) for db_obj in db_objs]
  File "neutron/objects/base.py", line 496, in _load_object
    obj.from_db_object(db_obj)
  File "/home/teom7365/prog/openstack/networking-bgpvpn/.tox/py27/src/networking-bagpipe/networking_bagpipe/objects/bgpvpn.py", line 156, in from_db_object
    self._load_subnets(obj)
  File "/home/teom7365/prog/openstack/networking-bgpvpn/.tox/py27/src/networking-bagpipe/networking_bagpipe/objects/bgpvpn.py", line 150, in _load_subnets
    subnets_info = _get_subnets_info(self.obj_context, self.network_id)
  File "/home/teom7365/prog/openstack/networking-bgpvpn/.tox/py27/src/networking-bagpipe/networking_bagpipe/objects/bgpvpn.py", line 62, in _get_subnets_info
    for subnet in subnets
  File "/home/teom7365/prog/openstack/networking-bgpvpn/.tox/py27/src/networking-bagpipe/networking_bagpipe/objects/bgpvpn.py", line 47, in _get_gateway_mac_by_subnet
    port = Port.get_object(obj_context, id=ip_allocation.port_id)
  File "neutron/objects/base.py", line 538, in get_object
    return cls._load_object(context, db_obj)
  File "neutron/objects/base.py", line 496, in _load_object
    obj.from_db_object(db_obj)
  File "neutron/objects/ports.py", line 409, in from_db_object
    super(Port, self).from_db_object(db_obj)
  File "neutron/objects/base.py", line 440, in from_db_object
    self.load_synthetic_db_fields(db_obj)
  File "neutron/objects/base.py", line 737, in load_synthetic_db_fields
    for obj in synth_db_objs]
  File "neutron/objects/base.py", line 498, in _load_object
    context.session.expunge(obj.db_obj)
  File "/home/teom7365/prog/openstack/networking-bgpvpn/.tox/py27/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1650, in expunge
    state_str(state))
InvalidRequestError: Instance <IPAllocation at 0x7f276b935910> is not present in this Session

"git bisect" show that neutron commit 906eda44d2c230be99fc5c61daa0f359ab46a3ad [1] introduces the issue

[1] https://review.openstack.org/#/c/536913/

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

The issue we have seems to be related to the fact that this unit test mixes:
- old engine facade code from networking-bagpipe OVO code
- new engine facade code for IPAllocations

This issue is strongly related to bug 1744829 .

My current understanding is that:
- [2] resulted in the use of OVO for IPAllocations
- networking-bgpvpn then started to make use of OVO, resulting in IPAllocation in OVO objects construction (both old engine facade, so all good)
- when [1] merged to revert [2] because of bug 174829, IPAllocation went back to non-OVO, hence
using the new engine facade, while we still use OVO to query IPAllocations in networking-bagpipe OVO objects

[1] https://review.openstack.org/#/c/536913
[2] https://review.openstack.org/#/c/407868

Miguel Lavalle (minsel)
Changed in neutron:
milestone: none → queens-rc1
Changed in bgpvpn:
status: New → Confirmed
Changed in neutron:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

https://review.openstack.org/#/c/540441/ is an attempt at addressing this bug
the operation that fails above, now succeeds with this change.
However I've got many OVO tests that now fail with this kind of error:

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such savepoint: sa_savepoint_1 [SQL: u'ROLLBACK TO SAVEPOINT sa_savepoint_1'] (Background on this error at: http://sqlalche.me/e/e3q8)

( http://logs.openstack.org/41/540441/2/check/openstack-tox-py27/7c414dd/testr_results.html.gz )

I don't know how to solve these errors.

The other approach we discussed with Ihar was to try to see if we could find the code using the new-DB-facade for IPAllocations that gets triggered in the codepath trigerring the notifications that lead to this bug, and selectively switch this code back to using the old-DB-facade. After some research I see the following: assuming the code triggering the event callback is the L3 DB code add_interface_by_subnet, this code ends up triggering an IPAllocation by calling the core_plugin.create_port with a fixed IP, code which is using the new DB facade. I guess it is not reasonable to talk about "selectively switching back to old-DB-facade" if we talk about create_port ...

I'm currenly out of ideas on how to tackle this bug, except having the base Neutron OVO code changed to use the new DB facade.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/541512

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-bagpipe (master)

Change abandoned by Thomas Morin (<email address hidden>) on branch: master
Review: https://review.openstack.org/540441
Reason: in favor of Ie4ad57f6357c59f2a978467099642b08e730133e

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to networking-bgpvpn (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/541775

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/542589

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/541512
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=6f83466307fb21aee5bb596974644d457ae1fa60
Submitter: Zuul
Branch: master

commit 6f83466307fb21aee5bb596974644d457ae1fa60
Author: Ihar Hrachyshka <email address hidden>
Date: Tue Feb 6 18:12:38 2018 -0800

    Allow objects to opt in new engine facade

    New facade is enabled by setting new_facade = True for the object of
    interest. With new_facade on, all OVO actions will use the new reader /
    writer decorator to activate sessions.

    There are two new facade decorators added to OVO: db_context_reader and
    db_context_write that should be used instead of explicit
    autonested_transaction / reader.using / writer.using in OVO context.

    All neutron.objects.db.api helpers now receive OVO classes / objects
    instead of model classes, since they need to know which type of engine
    facade to use for which object. While it means we change signatures for
    those helper functions, they are not used anywhere outside neutron tree
    except vmware-nsx unit tests, and the latter pass anyway because the
    tests completely mock out them disregarding their signatures.

    This patch also adds several new OVO objects to be able to continue
    using neutron.objects.db.api helpers to persist models that previously
    didn't have corresponding OVO classes.

    Finally, the patch adds registration for missing options in
    neutron/tests/unit/extensions/test_qos_fip.py to be able to debug
    failures in those unit tests. Strictly speaking, this change doesn't
    belong to the patch, but I include it nevertheless to speed up merge in
    time close to release.

    There are several non-obvious changes included, specifically:

    - in neutron.objects.base, decorator() that refreshes / expunges models
    from the active session now opens a subtransaction for the whole span of
    call / refresh / expunge, so that we can safely refresh model regardless
    of whether caller opened another parent subtransaction (it was not the
    case for create_subnetpool in base db plugin code).

    - in neutron.db.l3_fip_qos, removed code that updates obj.db_model
    relationship directly after corresponding insertions for child policy
    binding model. This code is not needed because the only caller to the
    _process_extra_fip_qos_update method refetches latest state of floating
    ip OVO object anyway, and this code triggers several unit test failures.

    - unit tests checking that a single commit happens for get_object and
    get_objects are no longer valid for new facade objects that use reader
    decorator that doesn't commit but close. This change is as intended, so
    unit tests were tweaked to check close for new facade objects.

    Change-Id: I15ec238c18a464f977f7d1079605b82965052311
    Related-Bug: #1746996

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-bagpipe (master)

Reviewed: https://review.openstack.org/541513
Committed: https://git.openstack.org/cgit/openstack/networking-bagpipe/commit/?id=53e073f1b8b8de4105762140fcc5aaa2645ec752
Submitter: Zuul
Branch: master

commit 53e073f1b8b8de4105762140fcc5aaa2645ec752
Author: Ihar Hrachyshka <email address hidden>
Date: Tue Feb 6 18:13:37 2018 -0800

    Use new facade for OVO objects

    This is to accommodate for neutron plugin code issuing callbacks from
    under new facade contexts.

    Depends-On: I15ec238c18a464f977f7d1079605b82965052311
    Change-Id: Ie4ad57f6357c59f2a978467099642b08e730133e
    Closes-Bug: #1746996

Changed in networking-bagpipe:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to networking-bgpvpn (master)

Reviewed: https://review.openstack.org/541775
Committed: https://git.openstack.org/cgit/openstack/networking-bgpvpn/commit/?id=cc671c8871a896d09f598235741f7f938b9f4b8f
Submitter: Zuul
Branch: master

commit cc671c8871a896d09f598235741f7f938b9f4b8f
Author: Thomas Morin <email address hidden>
Date: Wed Feb 7 17:10:41 2018 +0100

    switch to use new DB facade

    Beyond obvious reasons to try to migrate to the new facade, bug
    17446996 is leading us into using the new DB facade for BGPVPN
    OVObjects used by bagpipe driver.

    However, unless we switched all DB operations to the new DB facade,
    we have a situation where we mix both old and new facade, which
    is a known source of issues.

    This change switches all DB operation to use the new DB facade.

    Change-Id: I5a395560ea0573c92a3c615d7dbcb343d6983a78
    Depends-On: Ie4ad57f6357c59f2a978467099642b08e730133e
    Related-Bug: 1746996

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-bagpipe 8.0.0.0rc1

This issue was fixed in the openstack/networking-bagpipe 8.0.0.0rc1 release candidate.

Miguel Lavalle (minsel)
Changed in neutron:
status: Confirmed → Fix Released
Changed in bgpvpn:
status: Confirmed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/542589
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=adbef046291f40f59e673f26b9126522f5d2d6f1
Submitter: Zuul
Branch: master

commit adbef046291f40f59e673f26b9126522f5d2d6f1
Author: Ihar Hrachyshka <email address hidden>
Date: Thu Feb 8 20:53:58 2018 -0800

    Remove redundant get_object call when creating fip qos binding

    We try to fetch binding (and doing it wrong by passing incorrect OVO
    class), and then return the result to callers. But no callers care about
    the result, so it's useless work (and bad code).

    TrivialFix

    Change-Id: I68b436c9a0665d276bcda30f414a79f4d83e3758
    Related-Bug: #1746996

tags: added: neutron-proactive-backport-potential
tags: removed: neutron-proactive-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.