L3 agent restart causes network outage

Bug #1175695 reported by Jack McCann
42
This bug affects 8 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Stephen Ma
Havana
Fix Released
Undecided
Unassigned

Bug Description

When L3 agent is restarted, it destroys all existing namespaces and then recreates them. This causes a network outage for the affected routers and floating IPs, even if those routers/floating IPs are still valid. We should be able to preserve existing, valid namespaces across an agent restart and avoid the network outage.

Changed in quantum:
status: New → Confirmed
importance: Undecided → Medium
Changed in quantum:
assignee: nobody → Jack McCann (jack-mccann)
tags: added: l3-ipam-dhcp
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to quantum (master)

Fix proposed to branch: master
Review: https://review.openstack.org/30988

Changed in quantum:
assignee: Jack McCann (jack-mccann) → stevebma (stephen-ma)
status: Confirmed → In Progress
Revision history for this message
Carl Baldwin (carl-baldwin) wrote :

I believe that this bug should be prioritized above medium. For any restart of the l3 agent, all namespaces get destroyed and rebuilt. This means that we cannot restart it without causing a momentary outage for every router on the agent. With more routers, the outage time grows longer. This is really bad imo.

I have seen this cause big problems without this fix applied.

Revision history for this message
Alan Meadows (alan-meadows) wrote :

I completely agree with Carl on all counts.

This has resulted in major unneeded outages in production environments. This patch needs to be focused and get merged.

Changed in neutron:
importance: Medium → High
tags: added: havana-backport-potential
Changed in neutron:
milestone: none → icehouse-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/30988
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=011d99f300ea5d5f4ce48023bd04a795a4872287
Submitter: Jenkins
Branch: master

commit 011d99f300ea5d5f4ce48023bd04a795a4872287
Author: Stephen Ma <email address hidden>
Date: Tue May 28 18:52:27 2013 -0700

    L3 Agent restart causes network outage

    When a L3 agent controlling multiple qrouter namespaces
    restarts, it destroys all qrouter namespaces even if
    some of them are still in use. As a result, network
    traffic could be stopped on the VMs that use the
    networks associated with these namespaces.

    So what is needed is for the L3 agent to preserve those
    qrouter namespaces a L3 agent instance recognizes and to
    destroy those it does not know about.

    Closes-Bug: #1175695

    Change-Id: Idae77886bd195d773878c3d212ccfd56269216fb

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
status: Fix Committed → Fix Released
Revision history for this message
Liping Mao (limao) wrote :

I think that this bug also need to be merged into stable/havana .

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/havana)

Related fix proposed to branch: stable/havana
Review: https://review.openstack.org/84418

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: stable/havana
Review: https://review.openstack.org/84419

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/84420

Thierry Carrez (ttx)
Changed in neutron:
milestone: icehouse-3 → 2014.1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/havana)

Reviewed: https://review.openstack.org/84418
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=6cc649a8fcc008a392f046eb3d8029a0658ee1e0
Submitter: Jenkins
Branch: stable/havana

commit 6cc649a8fcc008a392f046eb3d8029a0658ee1e0
Author: Carl Baldwin <email address hidden>
Date: Tue Nov 19 17:47:43 2013 +0000

    Call _destroy_metadata_proxy from _destroy_router_namespaces

    Refactor _spawn/destroy_metadata_proxy so that it can be called
    with only the namespace and the router_id.

    Change-Id: Id1c33b22c7c3bd35c54a7c9ad419831bfed8746b
    Closes-Bug: #1252856
    (cherry picked from commit 07d597079781967f5a149f1812ddca3897fa49d9)
    Related-Bug: #1175695

tags: added: in-stable-havana
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/84419
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=91657c1612b4cc037c74b77f4b3548f843c10fcd
Submitter: Jenkins
Branch: stable/havana

commit 91657c1612b4cc037c74b77f4b3548f843c10fcd
Author: Carl Baldwin <email address hidden>
Date: Tue Nov 12 19:31:45 2013 +0000

    Optionally delete namespaces when they are no longer needed

    Adds a configuration option to tell the network agents to delete
    namespaces when they are no longer in use. The option defaults to
    False so that the agent will not attempt to delete namespaces in
    environments where this is not safe.

    This has been working well in deployments where iproute2 has been
    patched with commit 58a3e8270fe72f8ed92687d3a3132c2a708582dd or it is
    new enough to include it without being patched.

    Change-Id: Ice5242c6f0446d16aaaa7ee353d674310297ef72
    Closes-Bug: #1250596
    Related-Bug: #1052535
    (cherry picked from commit 7336f3bd27d138b3d11d601f977a1e3df2a44b3e)
    Related-Bug: #1175695

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.