HA router in l3 dvr_snat/legacy agent has no ha_port

Bug #1607381 reported by LIU Yulong
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
John Schwarz

Bug Description

This is a successor to https://bugs.launchpad.net/neutron/+bug/1533441.

HA router can not be deleted in L3 agent after race between HA router creating and deleting

Exception:
1. Unable to process HA router %s without HA port (HA router initialize)
2. AttributeError: 'NoneType' object has no attribute 'config' (HA router deleting procedure)

http://paste.openstack.org/show/523757/

the absent of ha_port may also cause infinite loop trace, which is now have a new LP bug, https://bugs.launchpad.net/neutron/+bug/1606844:
http://paste.openstack.org/show/528407/

LIU Yulong (dragon889)
summary: - HA router in l3 dvr_snat/legacy agent has not ha_port
+ HA router in l3 dvr_snat/legacy agent has no ha_port
Changed in neutron:
assignee: nobody → LIU Yulong (dragon889)
status: New → In Progress
Revision history for this message
LIU Yulong (dragon889) wrote :
LIU Yulong (dragon889)
description: updated
John Schwarz (jschwarz)
tags: added: l3-ha
Revision history for this message
venkata anil (anil-venkata) wrote :

Some of the experiments we can do to resolve races on agent side -
1) In router_processing_queue, if multiple RouterUpdate objects belonging to same router_id exists in the queue, then RouterUpdate object with delete action should be given precedence and others should be discarded. So threads now reading from the queue process only "delete" action.
i.e PRIORITY_ROUTER_DELETE = 0
PRIORITY_RPC = 1

This avoids router update processing on agent after router delete action.

2) _process_router_update should run with a lock on router_id, to avoid simultaneous processing of same router by agent threads and avoid races.

    def _process_router_update(self):
        for rp, update in self._queue.each_update_to_next_router():
           with lockutils.lock(update.id, lock_file_prefix='process-router', external=True):

I might be wrong but sharing some thoughts on this.

Revision history for this message
LIU Yulong (dragon889) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/365653

Changed in neutron:
assignee: LIU Yulong (dragon889) → John Schwarz (jschwarz)
John Schwarz (jschwarz)
Changed in neutron:
importance: Undecided → High
milestone: none → newton-rc1
Revision history for this message
John Schwarz (jschwarz) wrote :

As per a discussion with Kevin on IRC [1], I've set this to RC1.

[1]: http://eavesdrop.openstack.org/irclogs/%23openstack-neutron/latest.log.html#t2016-09-05T11:45:30

Changed in neutron:
assignee: John Schwarz (jschwarz) → LIU Yulong (dragon889)
assignee: LIU Yulong (dragon889) → John Schwarz (jschwarz)
Revision history for this message
Carl Baldwin (carl-baldwin) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/365653
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=29cec0345617627b64a73b9de35c46bccdc4ffa3
Submitter: Jenkins
Branch: master

commit 29cec0345617627b64a73b9de35c46bccdc4ffa3
Author: John Schwarz <email address hidden>
Date: Mon Sep 5 16:34:44 2016 +0300

    l3 ha: don't send routers without '_ha_interface'

    Change I22ff5a5a74527366da8f82982232d4e70e455570 changed
    get_ha_sync_data_for_host such that if an agent requests a router's
    details, then it is always returned, even when it doesn't have the key
    '_ha_interface'. Further changes to this change tried to put this check
    back in (Ie38baf061d678fc5d768195b25241efbad74e42f), but this patch
    failed to do so for the case where no bindings were returned (possible
    when the router has been concurrently deleted). This patch puts this
    check back in.

    Closes-Bug: #1607381
    Change-Id: I047e53ea9b3e20a21051f29d0a44624e2a31c83c

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 9.0.0.0rc1

This issue was fixed in the openstack/neutron 9.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by LIU Yulong (<email address hidden>) on branch: master
Review: https://review.openstack.org/265672
Reason: Will restore this if needed.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/440799

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/mitaka)

Reviewed: https://review.openstack.org/440799
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=8c77ee6b20dd38cc0246e854711cb91cffe3a069
Submitter: Jenkins
Branch: stable/mitaka

commit 8c77ee6b20dd38cc0246e854711cb91cffe3a069
Author: John Schwarz <email address hidden>
Date: Mon Sep 5 16:34:44 2016 +0300

    l3 ha: don't send routers without '_ha_interface'

    Change I22ff5a5a74527366da8f82982232d4e70e455570 changed
    get_ha_sync_data_for_host such that if an agent requests a router's
    details, then it is always returned, even when it doesn't have the key
    '_ha_interface'. Further changes to this change tried to put this check
    back in (Ie38baf061d678fc5d768195b25241efbad74e42f), but this patch
    failed to do so for the case where no bindings were returned (possible
    when the router has been concurrently deleted). This patch puts this
    check back in.

    Closes-Bug: #1607381
    Closes-bug: #1668410
    Change-Id: I047e53ea9b3e20a21051f29d0a44624e2a31c83c
    (cherry picked from commit 29cec0345617627b64a73b9de35c46bccdc4ffa3)

tags: added: in-stable-mitaka
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.