KeyError Exception while synching routers

Bug #1232525 reported by Stephen Ma
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
haruka tanizawa

Bug Description

Sometimes this exception is seen while the L3-agent _rpc_loop periodic task is executing:

30748 ERROR quantum.agent.l3_agent [-] Failed synchronizing routers
30748 TRACE quantum.agent.l3_agent Traceback (most recent call last):
30748 TRACE quantum.agent.l3_agent File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line 725, in _rpc_loop
30748 TRACE quantum.agent.l3_agent self._process_router_delete()
30748 TRACE quantum.agent.l3_agent File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line 733, in _process_router_delete
30748 TRACE quantum.agent.l3_agent self._router_removed(router_id)
30748 TRACE quantum.agent.l3_agent File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line 319, in _router_removed
30748 TRACE quantum.agent.l3_agent ri = self.router_info[router_id]
30748 TRACE quantum.agent.l3_agent KeyError: u'0f9ec689-c094-44c7-bae7-b995830d405f'

0f9ec689-c094-44c7-bae7-b995830d405f is a router that just has been deleted.

Stephen Ma (stephen-ma)
Changed in neutron:
assignee: nobody → Stephen Ma (stephen-ma)
Revision history for this message
ZhiQiang Fan (aji-zqfan) wrote :
Changed in neutron:
status: New → Confirmed
Revision history for this message
ZhiQiang Fan (aji-zqfan) wrote :

i think the root cause is:

in neutron/tests/unit/test_l3_agent.py:542

    def test_removed_from_agent(self):
        agent = l3_agent.L3NATAgent(HOSTNAME, self.conf)
        agent.router_removed_from_agent(None, {'router_id': FAKE_ID})
        # verify that will set fullsync
        self.assertIn(FAKE_ID, agent.removed_routers)

so this ut call router_removed_from_agent with fake id, and it will stored to l3's removed_routers, which will be removed one by one in _rpc_loop but it is not cached in router_info, so the key error is raised

so i think mock is needed in this ut

Revision history for this message
ZhiQiang Fan (aji-zqfan) wrote :

i'm wrong, the _rpc_loop has already been mocked in setup.
sorry

Revision history for this message
Stephen Ma (stephen-ma) wrote :

A router could be deleted twice. The function _rpc_loop calls _process_routers. In it it spawns off threads to run _router_removed for all routers in the set that is the difference of the routers the currently agent manages minus those that are recorded in the database.

Then _rpc_loop calls _process_router_remove. _Process_router_remove calls _router_removed on those routers in the removed_routers list.

The elements in removed_routers list could also be in the difference set used in _process_routers. This will result in some routers to be deleted twice.

Also

     def _process_router_delete(self):
         current_removed_routers = list(self.removed_routers)
         for router_id in current_removed_routers:
             self._router_removed(router_id)
             self.removed_routers.remove(router_id)

Exceptions could also occur in _router_removed while processing a list of routers. An exception there will cause the routine to abort processing the remaining routers.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/49300

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/124304

Changed in neutron:
assignee: Stephen Ma (stephen-ma) → haruka tanizawa (h-tanizawa)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/124304
Reason: This can be abandoned as it's superseded by:

https://review.openstack.org/#/c/126789/

Revision history for this message
Carl Baldwin (carl-baldwin) wrote :
Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (proposed/juno)

Fix proposed to branch: proposed/juno
Review: https://review.openstack.org/128288

Thierry Carrez (ttx)
Changed in neutron:
milestone: none → juno-rc3
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (proposed/juno)

Reviewed: https://review.openstack.org/128288
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b28eda57223e492924edb731e24c2e4f64cc0de5
Submitter: Jenkins
Branch: proposed/juno

commit b28eda57223e492924edb731e24c2e4f64cc0de5
Author: Carl Baldwin <email address hidden>
Date: Wed Oct 8 03:22:49 2014 +0000

    Remove two sets that are not referenced

    The code no longer references the updated_routers and removed_routers
    sets. This should have been cleaned up before but was missed.

    Closes-bug: #1232525

    Change-Id: I0396e13d2f7c3789928e0c6a4c0a071b02d5ff17
    (cherry picked from commit edb26bfcddf9d9a0e95955a6590d11fa7245ea2b)

Changed in neutron:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: juno-rc3 → 2014.2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/128913

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)
Download full text (7.4 KiB)

Reviewed: https://review.openstack.org/128913
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=71df7c80b9efa84f2ef87a2299600066816870b4
Submitter: Jenkins
Branch: master

commit b28eda57223e492924edb731e24c2e4f64cc0de5
Author: Carl Baldwin <email address hidden>
Date: Wed Oct 8 03:22:49 2014 +0000

    Remove two sets that are not referenced

    The code no longer references the updated_routers and removed_routers
    sets. This should have been cleaned up before but was missed.

    Closes-bug: #1232525

    Change-Id: I0396e13d2f7c3789928e0c6a4c0a071b02d5ff17
    (cherry picked from commit edb26bfcddf9d9a0e95955a6590d11fa7245ea2b)

commit 9cce0bfdb713c2b975b289d90de6d57b68ca3854
Author: Mark McClain <email address hidden>
Date: Thu Oct 9 13:29:48 2014 +0000

    Add Juno release milestone

    Change-Id: Iea584b00329d9474c14847db958f8743d4058525
    Closes-Bug: #1378855
    (cherry picked from commit 4e8a5b7de71ba6f8c050c424613c025310498940)

commit 8e76cccb1ed9a248439b1188d1d805649169e46b
Author: Mark McClain <email address hidden>
Date: Wed Oct 8 18:49:20 2014 +0000

    Add database relationship between router and ports

    Add an explicit schema relationship between a router and its ports. This
    change ensures referential integrity among the entities and prevents orphaned
    ports.

    Change-Id: I09e8a694cdff7f64a642a39b45cbd12422132806
    Closes-Bug: #1378866
    (cherry picked from commit 93012915a3445a8ac8a0b30b702df30febbbb728)

commit 5610343d5aab876480cbe15c8d77631e67d6142f
Author: Henry Gessau <email address hidden>
Date: Tue Oct 7 20:38:38 2014 -0400

    Disable PUT for IPv6 subnet attributes

    In Juno we are not ready for allowing the IPv6 attributes on a subnet
    to be updated after the subnet is created, because:
    - The implementation for supporting updates is incomplete.
    - Perceived lack of usefulness, no good use cases known yet.
    - Allowing updates causes more complexity in the code.
    - Have not tested that radvd, dhcp, etc. behave OK after update.

    Therefore, for now, we set 'allow_put' to False for the two IPv6
    attributes, ipv6_ra_mode and ipv6_address_mode. This prevents the
    modes from being updated via the PUT:subnets API.

    Closes-bug: #1378952

    Change-Id: Id6ce894d223c91421b62f82d266cfc15fa63ed0e
    (cherry picked from commit 8a08a3cb47d0dd69d4aa2e8fa661d04054fe95ae)

commit 54be5a9e977ea344cc53addb87635ddba0cfd815
Author: Sean M. Collins <email address hidden>
Date: Mon Oct 6 15:47:24 2014 -0400

    Skip IPv6 Tests in the OpenContrail plugin

    Similar to the way we are skipping tests in the OneConvergence plugin,
    introduced by Kevin Benton in 9294de441e684a81f6e802ba0564083f1ad319d6.

    Partial-Bug: #1378952

    Change-Id: I1650b0708af73ce63e92c55bc842607bb69efe60
    (cherry picked from commit 67962943969bc737a3f680a0defc2fc9df03c429)

commit aefc12ec552afe32f0d1d6f7c8c588afac956988
Author: Ihar Hrachyshka <email address hidden>
Date: Thu Aug 7 22:27:23 2014 +0200

    Removed kombu from requirements

    Since we've replaced oslo-incubator RPC layer with...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.