Deleting an HA router with interfaces attached leaves the DB in an inconsistent state

Bug #1402698 reported by Assaf Muller
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Yoni Shafrir

Bug Description

Steps to reproduce:
Create HA router (Observe that HA ports were created)
Attach an interface
Delete router

Result:
The deletion fails because the router has an interface attached, however all of its HA ports were deleted, leaving the system in an inconsistent state. Any L3 agents that attempt to configure this HA router will fail because it doesn't have HA ports.

Expected behavior:
Deletion validations should occur before deleting any resources. The deletion should fail and the router should continue to exist properly, with HA ports and all.

Reason:
As per https://github.com/openstack/neutron/blob/master/neutron/db/l3_hamode_db.py#L411:

When deleting a router, the call first reaches the L3 HA DB mixin, which deletes HA-specific resources first, in its own commit. It then calls to the super class, which will eventually try to delete the router object itself. If any deletion validation fails, the router object itself is not deleted, but the HA ports and their VRID allocations were already deleted and the transaction was commited.

Yoni Shafrir (yshafrir)
Changed in neutron:
assignee: nobody → Yoni (yshafrir)
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

Once system is in this state, l3-agent will be blocked when trying to update routers with an error like this:

2014-12-15 10:35:41.271 6046 ERROR neutron.agent.l3_agent [-] 'NoneType' object has no attribute 'config'
2014-12-15 10:35:41.271 6046 TRACE neutron.agent.l3_agent Traceback (most recent call last):
2014-12-15 10:35:41.271 6046 TRACE neutron.agent.l3_agent File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 341, in call
2014-12-15 10:35:41.271 6046 TRACE neutron.agent.l3_agent return func(*args, **kwargs)
2014-12-15 10:35:41.271 6046 TRACE neutron.agent.l3_agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3_agent.py", line 899, in process_router
2014-12-15 10:35:41.271 6046 TRACE neutron.agent.l3_agent self.internal_network_added(ri, p)
2014-12-15 10:35:41.271 6046 TRACE neutron.agent.l3_agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3_agent.py", line 1526, in internal_network_added
2014-12-15 10:35:41.271 6046 TRACE neutron.agent.l3_agent self._add_vip(ri, internal_cidr, interface_name)
2014-12-15 10:35:41.271 6046 TRACE neutron.agent.l3_agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3_ha_agent.py", line 169, in _add_vip
2014-12-15 10:35:41.271 6046 TRACE neutron.agent.l3_agent instance = ri.keepalived_manager.config.get_instance(ri.ha_vr_id)
2014-12-15 10:35:41.271 6046 TRACE neutron.agent.l3_agent AttributeError: 'NoneType' object has no attribute 'config'
2014-12-15 10:35:41.271 6046 TRACE neutron.agent.l3_agent

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

My previous comment was wrong, that error didn't block the l3-agent, just stopped the proper creation of
the specific router.

So it's not that critical, yet needs to be fixed, of course.

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

The actual message of this issue is:

2014-12-14 19:38:18.663 19705 ERROR neutron.agent.l3_ha_agent [-] Unable to process HA router 275af32a-3ada-4374-88a7-69a1191ddde6 without ha port

Changed in neutron:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/144260

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/144681

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Yoni Shafrir (<email address hidden>) on branch: master
Review: https://review.openstack.org/144681
Reason: As suggested in several reviews, this change is not desired. a different patch will be sent doing the opposite which is to allow to user to freely remove a router from the same agent twice w/o any error message (today it's a 409 conflict)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/144260
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7632f5dba619bbd9286708f59c31fa392b9c419c
Submitter: Jenkins
Branch: master

commit 7632f5dba619bbd9286708f59c31fa392b9c419c
Author: Yoni Shafrir <email address hidden>
Date: Mon Dec 22 11:25:39 2014 +0200

    Deleting HA router with attached port causes DB inconsistencies

    When a HA router is being deleted with 'python-neutronclient'
    while it has an attached interface the deletion will fail since
    the router is in use. The order in which the deletion
    is done is - first remove the HA interfaces from DB and
    then delete the router. In this case the HA interfaces were
    indeed deleted but the router itself was not (router is in use).
    This causes the DB to be inconsistent where an HA router
    exists in the DB while it's ports were removed from the DB.

    This patch simply deletes the router first, and then
    we know it's safe to remove it's HA interfaces as
    well. If the router is in use and deletion fails
    the HA interfaces remain intact.

    Closes-Bug: #1402698

    Change-Id: I956d0094ae6e2412e859d79feeb4003941d2bb4b

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → kilo-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-3 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.