L3 HA keepalived config state becomes inconsistent when managing floating ips

Bug #1400217 reported by Sachi King
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Sachi King
Juno
Fix Released
Undecided
Unassigned

Bug Description

The Keepalived config becomes inconsistent, missing some/all floating IP addresses on regeneration on the current master. Upon transition this results in floating IP addresses being left configured on the previous active master.

This occasionally results in network flaps for all addresses attached to that tenants router.

Neutron L3 Agent checks what IPs are configured on the local interface before it adds the Floating IP addresses, as keepalived holds the configured state we need to check against keepalived not the system.

Sachi King (nakato)
Changed in neutron:
assignee: nobody → Sachi King (nakato)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/139932

Changed in neutron:
status: New → In Progress
Sachi King (nakato)
Changed in neutron:
status: In Progress → New
Revision history for this message
Assaf Muller (amuller) wrote :

Sachi could you please write steps to reproduce?

Revision history for this message
Sachi King (nakato) wrote :

Steps to reproduce:

Requirements: two neutron servers

A)
1: create HA router, etc.
2: add some floating IP addresses and assign them to an instance so they are created
3: Locate which keepalived is master
4: restart neutron-l3-agent
5: check keepalived.conf
Expected: Floating ips are not listed

B) (not as well tested)
1: repeat steps 1-3 above
2: remove and re-add floating IP address
3: check configs

Changed in neutron:
status: New → In Progress
Revision history for this message
Assaf Muller (amuller) wrote :

When verifying this bug I encountered another:

The slave compares the currently configured floating IPs on the system (None, as its the slave) and always adds more and more FIPs to the keepalived conf, so you end up with the same floating IP appearing multiple times.

I think the solution is pretty simple - Every time a router update is received, you get all of the floating IPs. In the HA case, just override whatever is currently configured in keepalived.conf with what you got from the update. There's no need to compare to anything, not what's on the system, and not what's in keepalive.

Revision history for this message
Sachi King (nakato) wrote :

The uploaded patch will address that as well.

It's again caused by the fact that it was inspecting the interface for addresses to decide what to add.

When "l3_agent" is restarted on the master it does not get any floating_ip's configured because they're already "configured" according to the interface, meaning failover will leave them behind.
When a vip is added/removed on a standby it gets multiples due to inspecting the interface and none being "configured", so neutron add's them all.

For this reason I don't think the right answer is to override what is there but check what is configured from the definitive source, the keepalived config.

If we want to do verification at "add_vip" we should make it check if the new IP is a duplicate and if it is, then throw an error or debug message, then return without adding the address. A duplicate address should never reach "add_vip".

Changed in neutron:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/142630

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/139932
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ccd5fe83884cf72c7e69091f318628884665f971
Submitter: Jenkins
Branch: master

commit ccd5fe83884cf72c7e69091f318628884665f971
Author: Sachi King <email address hidden>
Date: Mon Dec 8 17:42:48 2014 +1100

    If router is HA, get current_cidrs from keepalived object

    When using L3 HA and keepalived neutron is no longer directly managing
    the floating IP addresses itself. Neutron should not check against
    which addresses are currently configured on the system, but the
    addresses the keepalived object has configured.

    Co-Authored-By: Benoit Page-Guitard <email address hidden>
    Change-Id: I56045ede3a3dc1a7044a22913ee38ed382a81052
    Closes-Bug: #1400217

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/149818

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Kyle Mestery (<email address hidden>) on branch: master
Review: https://review.openstack.org/142630
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/juno)

Reviewed: https://review.openstack.org/149818
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=fbabf07a8dcc306661ac8aa95dbe76c8340c8664
Submitter: Jenkins
Branch: stable/juno

commit fbabf07a8dcc306661ac8aa95dbe76c8340c8664
Author: Sachi King <email address hidden>
Date: Mon Dec 8 17:42:48 2014 +1100

    If router is HA, get current_cidrs from keepalived object

    When using L3 HA and keepalived neutron is no longer directly managing
    the floating IP addresses itself. Neutron should not check against
    which addresses are currently configured on the system, but the
    addresses the keepalived object has configured.

    Conflicts:
     neutron/agent/l3/agent.py

    Co-Authored-By: Benoit Page-Guitard <email address hidden>
    Change-Id: I56045ede3a3dc1a7044a22913ee38ed382a81052
    Closes-Bug: #1400217
    (cherry picked from commit ccd5fe83884cf72c7e69091f318628884665f971)

Thierry Carrez (ttx)
Changed in neutron:
milestone: none → kilo-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-2 → 2015.1.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/142630
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=72e388445eb6f6903ccfc5079aa206ac2cbcfd5e
Submitter: Jenkins
Branch: master

commit 72e388445eb6f6903ccfc5079aa206ac2cbcfd5e
Author: Sachi King <email address hidden>
Date: Mon Dec 8 17:42:48 2014 +1100

    Return exception when attempting to add duplicate VIP

    Neutron should never attempt to add a VIP to keepalived's config
    multiple times, and to do so is an error. As such this adds an
    exception if this is ever attempted.

    Change-Id: If1c41c3164e8a998c73f9b7aa566e2ba6570f54b
    Closes-Bug: #1400217

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (feature/pecan)

Fix proposed to branch: feature/pecan
Review: https://review.openstack.org/224334

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: feature/pecan
Review: https://review.openstack.org/224357

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (feature/pecan)
Download full text (73.6 KiB)

Reviewed: https://review.openstack.org/224357
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=fdc3431ccd219accf6a795079d9b67b8656eed8e
Submitter: Jenkins
Branch: feature/pecan

commit fe236bdaadb949661a0bfb9b62ddbe432b4cf5f1
Author: Miguel Angel Ajo <email address hidden>
Date: Thu Sep 3 15:40:12 2015 +0200

    No network devices on network attached qos policies

    Network devices, like internal router legs, or dhcp ports
    should not be affected by bandwidth limiting rules.

    This patch disables application of network attached policies
    to network/neutron owned ports.

    Closes-bug: #1486039
    DocImpact

    Change-Id: I75d80227f1e6c4b3f5fa7762b8dc3b0c0f1abd46

commit db4a06f7caa20a4c7879b58b20e95b223ed8eeaf
Author: Ken'ichi Ohmichi <email address hidden>
Date: Wed Sep 16 10:04:32 2015 +0000

    Use tempest-lib's token_client

    Now tempest-lib provides token_client modules as library and the
    interface is stable. So neutron repogitory doesn't need to contain
    these modules.
    This patch makes neutron use tempest-lib's token_client and removes
    the own modules for the maintenance.

    Change-Id: Ieff7eb003f6e8257d83368dbc80e332aa66a156c

commit 78aed58edbe6eb8a71339c7add491fe9de9a0546
Author: Jakub Libosvar <email address hidden>
Date: Thu Aug 13 09:08:20 2015 +0000

    Fix establishing UDP connection

    Previously, in establish_connection() for UDP protocol data were sent
    but never read on peer socket. That lead to successful read on peer side
    if this connection was filtered. Having constant testing string masked
    this issue as we can't distinguish to which test of connectivity data
    belong.

    This patch makes unique data string per test_connectivity() and
    also makes establish_connection() to create an ASSURED entry in
    conntrack table. Finally, in last test after firewall filter was
    removed, connection is re-established in order to avoid troubles with
    terminated processes or TCP continuing sending packets which weren't
    successfully delivered.

    Closes-Bug: 1478847
    Change-Id: I2920d587d8df8d96dc1c752c28f48ba495f3cf0f

commit e6292fcdd6262434a7b713ad8802db6bc8a6d3dc
Author: YAMAMOTO Takashi <email address hidden>
Date: Wed Sep 16 13:20:51 2015 +0900

    ovsdb: Fix a few docstring

    Change-Id: I53e1e21655b28fe5da60e58aeeb7cbbd103ae014

commit c22949a4449d96a67caa616290cf76b67b182917
Author: fumihiko kakuma <email address hidden>
Date: Wed Sep 16 11:52:59 2015 +0900

    Remove requirements.txt for the ofagent mechanism driver

    It is no longer used.

    Related-Blueprint: core-vendor-decomposition
    https://blueprints.launchpad.net/neutron/+spec/core-vendor-decomposition

    Change-Id: Ib31fb3febf8968e50d86dd66e1e6e1ea2313f8ac

commit d1d4de19d85f961d388c91e70f31b3bafec418c5
Author: Kevin Benton <email address hidden>
Date: Thu Sep 3 20:25:57 2015 -0700

    Always return iterables in L3 get_candidates

    The caller of this function expects iterables.

    Closes-Bug: #1494996
    Change-Id: I3d103e63f4e127a77268502415c0ddb0d804b54a

commit 1ad6ac448067306...

tags: added: in-feature-pecan
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (feature/pecan)

Change abandoned by Doug Wiegley (<email address hidden>) on branch: feature/pecan
Review: https://review.openstack.org/224334

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.