Restarting l3 agent not spawning keepalived

Bug #1723848 reported by venkata anil
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
venkata anil

Bug Description

When a keepalived is killed manually and then l3 agent is restarted, l3 agent is not spawning keepalived.

When l3 agent is restarted, it sets HA network port status to DOWN because of [1] with the assumption that

1) server will notify port update to l2 agent and
2) then l2 agent will rewire the port and set status to ACTIVE.
3) when port status is set to ACTIVE, server will notify l3 agent
4) when port status is ACTIVE, l3 agent will spawn keepalived

But in newton code base, I see step 1 not happening (i.e server notifying port update to l2 agent) because of that next steps also not happening and keepalived is never respawned.

But in upstream master code base, step 1 is happening because of OVO, and then all next steps, resulting in spawning keepalived.

I posted [2] in mail thread regarding status update notification to l2 agent. Kevin's suggestion was working on u/s master (because of OVO) but not on stable branches. So as a generic fix, server has to notify l2 agent when port_update('status') is called.

[2] http://lists.openstack.org/pipermail/openstack-dev/2017-May/117557.html

Changed in neutron:
assignee: nobody → venkata anil (anil-venkata)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/512179

Changed in neutron:
status: New → In Progress
Revision history for this message
venkata anil (anil-venkata) wrote :

Let me give a background why we see the issue

1) Initially we have seen this issue - "L3 HA: 2 masters after reboot of controller"
   https://bugs.launchpad.net/neutron/+bug/1597461
   To fix that, Ann proposed https://review.openstack.org/#/c/357458/ . With this l3 agent will spawn keepalived only when HA network port status is ACTIVE.

2) But we still had a corner case https://bugs.launchpad.net/neutron/+bug/1597461/comments/26
   and https://review.openstack.org/#/c/470905/ fixed it. This patch sets HA network port to DOWN when the l3 agent starts assuming l2 agent will rewire the port http://lists.openstack.org/pipermail/openstack-dev/2017-May/117557.html

3) But this assumption is correct on u/s master branch only (because of OVO) , for the other branches(including Ocata) server is not notifying L2 agent.
Server is already notifying agent for other fields update (i.e mac_address, port_security, QOS policy, address pairs, extra dchp opts, port binding, security groups, admin_state_up). Now in https://review.openstack.org/#/c/512179/ we extend it to notify for status update also.

Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

The bug makes the ports unwired without an easy way out. I bumped importance to High.

description: updated
Changed in neutron:
importance: Medium → High
tags: added: l3-ha newton-backport-potential ocata-backport-potential pike-backport-potential
tags: removed: newton-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/512179
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4f9a6a8b7661c3792101580d14d401e8426bf72a
Submitter: Zuul
Branch: master

commit 4f9a6a8b7661c3792101580d14d401e8426bf72a
Author: venkata anil <email address hidden>
Date: Mon Oct 16 06:03:35 2017 +0000

    Notify port_update to agent for status change

    Notify agents about port update when "port_update" is called to update
    port's status change. L3 agent will spawn keepalived only when HA
    network port status is ACTIVE. When L3 agent is restarted, it sets HA
    network port status to DOWN (through "port_update"), with the assumption
    that L2 agent will again rewire the port (set status to ACTIVE) allowing
    L3 agent to spawn keepalived. As server is not notifying L2 agent, port
    is remaining in DOWN status and L3 agent was never spawning keepalived
    on L3 agent restart(if keepalived is killed).

    Closes-Bug: 1723848
    Change-Id: I629eeff905bf02ec5f7ee68cccc7c19f1b47d5aa

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/514138

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/514139

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/514140

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/514138
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=11254ef87bb741f74a19a88d2d5c28351939e7e0
Submitter: Zuul
Branch: stable/pike

commit 11254ef87bb741f74a19a88d2d5c28351939e7e0
Author: venkata anil <email address hidden>
Date: Mon Oct 16 06:03:35 2017 +0000

    Notify port_update to agent for status change

    Notify agents about port update when "port_update" is called to update
    port's status change. L3 agent will spawn keepalived only when HA
    network port status is ACTIVE. When L3 agent is restarted, it sets HA
    network port status to DOWN (through "port_update"), with the assumption
    that L2 agent will again rewire the port (set status to ACTIVE) allowing
    L3 agent to spawn keepalived. As server is not notifying L2 agent, port
    is remaining in DOWN status and L3 agent was never spawning keepalived
    on L3 agent restart(if keepalived is killed).

    Closes-Bug: 1723848
    Change-Id: I629eeff905bf02ec5f7ee68cccc7c19f1b47d5aa
    (cherry picked from commit 4f9a6a8b7661c3792101580d14d401e8426bf72a)

tags: added: in-stable-pike
tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ocata)

Reviewed: https://review.openstack.org/514139
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=d6645ad1713b091181dd1765e0e11e7ccb25f568
Submitter: Zuul
Branch: stable/ocata

commit d6645ad1713b091181dd1765e0e11e7ccb25f568
Author: venkata anil <email address hidden>
Date: Mon Oct 16 06:03:35 2017 +0000

    Notify port_update to agent for status change

    Notify agents about port update when "port_update" is called to update
    port's status change. L3 agent will spawn keepalived only when HA
    network port status is ACTIVE. When L3 agent is restarted, it sets HA
    network port status to DOWN (through "port_update"), with the assumption
    that L2 agent will again rewire the port (set status to ACTIVE) allowing
    L3 agent to spawn keepalived. As server is not notifying L2 agent, port
    is remaining in DOWN status and L3 agent was never spawning keepalived
    on L3 agent restart(if keepalived is killed).

    Closes-Bug: 1723848
    Change-Id: I629eeff905bf02ec5f7ee68cccc7c19f1b47d5aa
    (cherry picked from commit 4f9a6a8b7661c3792101580d14d401e8426bf72a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/newton)

Change abandoned by venkata anil (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/514140

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.0.0b1

This issue was fixed in the openstack/neutron 12.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.2

This issue was fixed in the openstack/neutron 11.0.2 release.

tags: added: neutron-proactive-backport-potential
tags: removed: neutron-proactive-backport-potential ocata-backport-potential pike-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 10.0.5

This issue was fixed in the openstack/neutron 10.0.5 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers