HA router can not be deleted in L3 agent after race between HA router creating and deleting

Bug #1533441 reported by LIU Yulong
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
LIU Yulong
Kilo
Invalid
Undecided
Unassigned

Bug Description

HA router can not be deleted in L3 agent after race between HA router creating and deleting

Exception:
1. Unable to process HA router %s without HA port (HA router initialize)

2. AttributeError: 'NoneType' object has no attribute 'config' (HA router deleting procedure)

With the newest neutron code, I find a infinite loop in _safe_router_removed.
Consider a HA router without HA port was placed in the l3 agent,
usually because of the race condition.

Infinite loop steps:
1. a HA router deleting RPC comes
2. l3 agent remove it
3. the RouterInfo will delete its the router namespace(self.router_namespace.delete())
4. the HaRouter, ha_router.delete(), where the AttributeError: 'NoneType' or some error will be raised.
5. _safe_router_removed return False
6. self._resync_router(update)
7. the router namespace is not existed, RuntimeError raised, go to 5, infinite loop 5 - 7

LIU Yulong (dragon889)
summary: - HA router can not be deleted after race between HA router creating and
- deleting
+ HA router can not be deleted in L3 agent after race between HA router
+ creating and deleting
description: updated
Changed in neutron:
assignee: nobody → LIU Yulong (dragon889)
status: New → In Progress
LIU Yulong (dragon889)
description: updated
tags: added: kilo-backport-potential liberty-backport-potential
LIU Yulong (dragon889)
tags: added: l3-ha
Revision history for this message
LIU Yulong (dragon889) wrote :
Changed in neutron:
assignee: LIU Yulong (dragon889) → Kevin Benton (kevinbenton)
Changed in neutron:
assignee: Kevin Benton (kevinbenton) → Assaf Muller (amuller)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/285572
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=046be0b8f30291cd029e6e97a4c6c5a1717a8bd1
Submitter: Jenkins
Branch: master

commit 046be0b8f30291cd029e6e97a4c6c5a1717a8bd1
Author: Kevin Benton <email address hidden>
Date: Wed Feb 24 13:30:24 2016 -0800

    Filter HA routers without HA interface and state

    This patch adjusts the sync method to exclude any HA
    routers from the response that are missing necessary
    HA fields (the HA interface and the HA state).

    This prevents the agent from every receiving a partially
    formed router.

    Co-Authored-By: Ann Kamyshnikova <email address hidden>

    Related-Bug: #1499647
    Closes-Bug: #1533441
    Change-Id: Iadb5a69d4cbc2515fb112867c525676cadea002b

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/286065

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/286074

Revision history for this message
Assaf Muller (amuller) wrote :

Patches proposed to stable/kilo and liberty. High priority backports from my perspective.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by LIU Yulong (<email address hidden>) on branch: master
Review: https://review.openstack.org/265672
Reason: This patch has a filter for the HA router to make sure that all the router in l3 agent is full-built https://review.openstack.org/#/c/285572/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/286065
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=e2ca10e7bbe287cc6f3af791e968d31369d77ab4
Submitter: Jenkins
Branch: stable/liberty

commit e2ca10e7bbe287cc6f3af791e968d31369d77ab4
Author: Kevin Benton <email address hidden>
Date: Wed Feb 24 13:30:24 2016 -0800

    Filter HA routers without HA interface and state

    This patch adjusts the sync method to exclude any HA
    routers from the response that are missing necessary
    HA fields (the HA interface and the HA state).

    This prevents the agent from every receiving a partially
    formed router.

    Co-Authored-By: Ann Kamyshnikova <email address hidden>

    Related-Bug: #1499647
    Closes-Bug: #1533441
    Change-Id: Iadb5a69d4cbc2515fb112867c525676cadea002b
    (cherry picked from commit 046be0b8f30291cd029e6e97a4c6c5a1717a8bd1)

tags: added: in-stable-liberty
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/neutron 8.0.0.0b3

This issue was fixed in the openstack/neutron 8.0.0.0b3 development milestone.

Assaf Muller (amuller)
tags: removed: liberty-backport-potential
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 7.0.4

This issue was fixed in the openstack/neutron 7.0.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/kilo)

Reviewed: https://review.openstack.org/286074
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9d924efe138fd8f18a3470134d2f1c04b925e88c
Submitter: Jenkins
Branch: stable/kilo

commit 9d924efe138fd8f18a3470134d2f1c04b925e88c
Author: Kevin Benton <email address hidden>
Date: Wed Feb 24 13:30:24 2016 -0800

    Filter HA routers without HA interface and state

    This patch adjusts the sync method to exclude any HA
    routers from the response that are missing necessary
    HA fields (the HA interface and the HA state).

    This prevents the agent from every receiving a partially
    formed router.

    Co-Authored-By: Ann Kamyshnikova <email address hidden>

    Related-Bug: #1499647
    Closes-Bug: #1533441
    Change-Id: Iadb5a69d4cbc2515fb112867c525676cadea002b
    (cherry picked from commit 046be0b8f30291cd029e6e97a4c6c5a1717a8bd1)

tags: added: in-stable-kilo
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 2015.1.4

This issue was fixed in the openstack/neutron 2015.1.4 release.

Revision history for this message
LIU Yulong (dragon889) wrote :
Changed in neutron:
status: Fix Released → New
LIU Yulong (dragon889)
Changed in neutron:
assignee: Assaf Muller (amuller) → LIU Yulong (dragon889)
Revision history for this message
LIU Yulong (dragon889) wrote :

Paste the infinite loop trace:
http://paste.openstack.org/show/528407/

which is after the log:
http://paste.openstack.org/show/523757/

Changed in neutron:
status: New → In Progress
Revision history for this message
Ann Taraday (akamyshnikova) wrote :

On the scale environment I hit this issue. After rally create_deleete_routers execution a lot of http://paste.openstack.org/show/525987/ in agent logs. Also http://paste.openstack.org/show/525175/ in server logs.

Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Ann Taraday (akamyshnikova) wrote :
Revision history for this message
John Schwarz (jschwarz) wrote :

I've gone through the 2 errors initially reported:

1. Concurrency issues with HA ports: fixed by https://review.openstack.org/#/c/257059/ (introduction of the ALLOCATING status for routers)

2. AttributeError: already referenced by https://bugs.launchpad.net/neutron/+bug/1605546

So this bug can be closed.

Changed in neutron:
status: In Progress → Invalid
Revision history for this message
LIU Yulong (dragon889) wrote :

I've marked this back to 'Fix Released',
since we use a successor bug to deal with it:
https://bugs.launchpad.net/neutron/+bug/1605546

Changed in neutron:
status: Invalid → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

This issue was fixed in the openstack/neutron 2015.1.4 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.