ObjectDeleted error when network already removed during rescheduling

Bug #1424593 reported by Eugene Nikanorov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Eugene Nikanorov
Kilo
New
Undecided
Unassigned

Bug Description

In some cases when concurrent rescheduling occurs, the following trace is observed:

ERROR neutron.openstack.common.loopingcall [-] in fixed duration looping call
TRACE neutron.openstack.common.loopingcall Traceback (most recent call last):
TRACE neutron.openstack.common.loopingcall File "/usr/lib/python2.7/dist-packages/neutron/openstack/common/loopingcall.py", line 76, in _inner
TRACE neutron.openstack.common.loopingcall self.f(*self.args, **self.kw)
TRACE neutron.openstack.common.loopingcall File "/usr/lib/python2.7/dist-packages/neutron/db/agentschedulers_db.py", line 269, in remove_networks_from_down_agents
TRACE neutron.openstack.common.loopingcall {'net': binding.network_id,
TRACE neutron.openstack.common.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py", line 239, in __get__
TRACE neutron.openstack.common.loopingcall return self.impl.get(instance_state(instance), dict_)
TRACE neutron.openstack.common.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py", line 589, in get
TRACE neutron.openstack.common.loopingcall value = callable_(state, passive)
TRACE neutron.openstack.common.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/state.py", line 424, in __call__
TRACE neutron.openstack.common.loopingcall self.manager.deferred_scalar_loader(self, toload)
TRACE neutron.openstack.common.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/loading.py", line 614, in load_scalar_attributes
TRACE neutron.openstack.common.loopingcall raise orm_exc.ObjectDeletedError(state)
TRACE neutron.openstack.common.loopingcall ObjectDeletedError: Instance '<NetworkDhcpAgentBinding at 0x52b1850>' has been deleted, or its row is otherwise not present.

Need to avoid accessing db object after it has been deleted from db as attribute access may trigger this exception.
This issue terminates periodic task of rescheduling networks.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/158678

Changed in neutron:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/158678
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c8c07d6f47099bbe6070369db2d70a6294a9be33
Submitter: Jenkins
Branch: master

commit c8c07d6f47099bbe6070369db2d70a6294a9be33
Author: Eugene Nikanorov <email address hidden>
Date: Tue Feb 24 15:35:22 2015 +0300

    Avoid ObjectDeletedError while accessing deleted binding

    In some cases access to db object's attribute triggers its refresh.
    When the object has been already removed from the database, it could
    lead to ObjectDeletedError being thrown.
    Unit test was added to cover code path where the issue occurs
    however issue doesn't seem to be reproducible with sqlite.

    Change-Id: I0d6f9a91572c89cf7da1b66b880aeaa6b4b1987e
    Closes-Bug: #1424593

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → kilo-3
status: Fix Committed → Fix Released
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :
Download full text (3.5 KiB)

Another manifestation of the bug has been discovered.
It's not severe now, but leaves traces in the log file which better be avoided:

2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db Traceback (most recent call last):
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db File "/usr/lib/python2.7/dist-packages/neutron/db/agentschedulers_db.py", line 269, in remove_networks_from_down_agents
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db notify=False)
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db File "/usr/lib/python2.7/dist-packages/neutron/db/agentschedulers_db.py", line 351, in remove_network_from_dhcp_agent
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db query.delete()
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2670, in delete
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db delete_op.exec_()
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 896, in exec_
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db self._do_pre_synchronize()
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 958, in _do_pre_synchronize
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db eval_condition(obj)]
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/evaluator.py", line 90, in evaluate
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db value = sub_evaluate(obj)
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/evaluator.py", line 115, in evaluate
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db left_val = eval_left(obj)
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/evaluator.py", line 72, in <lambda>
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db return lambda obj: get_corresponding_attr(obj)
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py", line 239, in __get__
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db return self.impl.get(instance_state(instance), dict_)
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py", line 589, in get
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db value = callable_(state, passive)
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/state.py", line 424, in __call__
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedulers_db self.manager.deferred_scalar_loader(self, toload)
2015-04-07 15:41:44.857 19946 TRACE neutron.db.agentschedule...

Read more...

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Apparently that is caused by .one() method populating object into the active session.
When calling query.delete() this object is deleted from the session, and already is absent in the db which leads the the trace above.

Need to avoid synchronizing session on delete.

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Changing importance to medium since now it doesn't lead to periodic agent check failure.

Changed in neutron:
importance: High → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/171588

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Changing back to high as it was discovered another condition which could lead autorescheduling loop to fail

Changed in neutron:
status: Fix Released → In Progress
importance: Medium → High
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/neutron (openstack-ci/fuel-6.1/2014.2)

Reviewed: https://review.fuel-infra.org/5581
Submitter: Andrey Epifanov <email address hidden>
Branch: openstack-ci/fuel-6.1/2014.2

Commit: 06e256fc7012f8eba8cc79e275812c65c1d93e66
Author: Eugene Nikanorov <email address hidden>
Date: Fri Apr 10 05:58:18 2015

Avoid synchronizing session when deleting networkdhcpagentbinding

Synchronizing session on delete leads to traces in neutron-server
logs when such binding is deleted concurrently.
Also, catch and ignore ObjectDeletedError during iterating over
bindings, that is possible since the code is not within a transaction.

Change-Id: I7a2c9a8a59ce313c7d242230eeb5da69986bfbd4
Closes-Bug: #1424593
Closes-Bug: #1442334

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/171588
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=29dd67ff754eba3064549886c0906a83500d1879
Submitter: Jenkins
Branch: master

commit 29dd67ff754eba3064549886c0906a83500d1879
Author: Eugene Nikanorov <email address hidden>
Date: Wed Apr 8 00:15:43 2015 +0400

    Avoid synchronizing session when deleting networkdhcpagentbinding

    Synchronizing session on delete leads to traces in neutron-server
    logs when such binding is deleted concurrently.
    Also, catch and ignore ObjectDeletedError during iterating over
    bindings, that is possible since the code is not within a transaction.

    Change-Id: I7a2c9a8a59ce313c7d242230eeb5da69986bfbd4
    Closes-Bug: #1424593

Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-3 → 2015.1.0
Thierry Carrez (ttx)
Changed in neutron:
milestone: 2015.1.0 → none
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (neutron-pecan)

Fix proposed to branch: neutron-pecan
Review: https://review.openstack.org/185072

Thierry Carrez (ttx)
Changed in neutron:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/198816

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/kilo)

Reviewed: https://review.openstack.org/198816
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c2f2281bc5c1283f4ec424346c44a46b4f0c0cae
Submitter: Jenkins
Branch: stable/kilo

commit c2f2281bc5c1283f4ec424346c44a46b4f0c0cae
Author: Eugene Nikanorov <email address hidden>
Date: Wed Apr 8 00:15:43 2015 +0400

    Avoid synchronizing session when deleting networkdhcpagentbinding

    Synchronizing session on delete leads to traces in neutron-server
    logs when such binding is deleted concurrently.
    Also, catch and ignore ObjectDeletedError during iterating over
    bindings, that is possible since the code is not within a transaction.

    Change-Id: I7a2c9a8a59ce313c7d242230eeb5da69986bfbd4
    Closes-Bug: #1424593
    (cherry picked from commit 29dd67ff754eba3064549886c0906a83500d1879)

tags: added: in-stable-kilo
Anna Babich (ababich)
tags: added: on-verification
Revision history for this message
Anna Babich (ababich) wrote :

Verify on 7.0:
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2014.2.2-7.0"
  api: "1.0"
  build_number: "62"
  build_id: "2015-07-21_03-17-05"
  nailgun_sha: "0ffd672e798010cf40add8ebcdb81d1648106520"
  python-fuelclient_sha: "57713c96a30831d33609e6e812f8d4eb6054d04e"
  fuel-agent_sha: "4e9fe5563e39e15c9c4451c38de8184d92e66ac6"
  astute_sha: "b1f37a988e097175cbbd14338286017b46b584c3"
  fuel-library_sha: "eb44f21e2f48aafb914bedb1a70ea77193a18434"
  fuel-ostf_sha: "1c447f2cf81bd26bdf056aceb7f695e9b97925d8"
  fuelmain_sha: "c0bd3d3dc536e55db563de04d643a38e60e68e1d"

The bug has been verified on 50 networks by multiple banning and clearing dhcp agents in various combinations. No errors related with bug description appeared in logs during verification process.

for i in {1..30}
        do
                pcs resource ban p_neutron-dhcp-agent ${NODE1} ${NODE2}
                sleep 10
                pcs resource clear p_neutron-dhcp-agent ${NODE1} ${NODE2}
                sleep 10
                pcs resource ban p_neutron-dhcp-agent ${NODE2} ${NODE3}
                sleep 10
                pcs resource clear p_neutron-dhcp-agent ${NODE2} ${NODE3}
                sleep 10
                pcs resource ban p_neutron-dhcp-agent ${NODE1} ${NODE3}
                sleep 10
                pcs resource clear p_neutron-dhcp-agent ${NODE1} ${NODE3}
                sleep 10
                pcs resource ban p_neutron-dhcp-agent ${NODE1} ${NODE2} ${NODE3}
                sleep 10
                pcs resource clear p_neutron-dhcp-agent ${NODE1} ${NODE2} ${NODE3}
                sleep 10
        done

tags: removed: on-verification
Revision history for this message
Anna Babich (ababich) wrote :

Sorry, the previous comment is related to https://bugs.launchpad.net/mos/+bug/1442334

Thierry Carrez (ttx)
Changed in neutron:
milestone: liberty-1 → 7.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.