DVR: Moving FloatingIP between hosts can lead to Floating Agent Gateway Port not being deleted

Bug #1450982 reported by Adolfo Duarte
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Swaminathan Vasudevan

Bug Description

Tests have show that following sequence of events on a multinode setup causes the network:floatingip_agent_gateway port to be left hanging around. It is not possible to delete this port from the cli, even with admin powers. Restackings seems to be the only solution.
The problem is furthered complicated because any public network which has the left over port as one of its meembers may not be deleted.
Here is the list of steps to reproduce. All must be done with admin power under admin tenant

#+ Test Steps: Get ip address of all nodes
#+ Test Steps:
#+ Step: User, Tenant, Token, and Image Verification
#+ Step: Add security Rules
#+ Step: Router Creations(DVR)
#+ Step: Networks, and Subnet Attachement
#+ Step: spin up vms
#+ Step: Obtain necessary vm information
#+ Step: Create external Network and Subnet
#+ Step: Attach external network to router
#+ Step: Associate FIPs to Vm - 2 RESTfull
#+ Step: Verify floating ip pluming
#+ Step: Verify connectivity to vms from outside world
#+ Step: Verify connectivity to vms from outside world

04-25-2015 15:16:38 : ################################################
04-25-2015 15:16:38 : Removing all resources from system
04-25-2015 15:16:38 : ################################################
04-25-2015 15:16:38 : ***Getting User Token***
04-25-2015 15:16:38 : ***User Token Received***
04-25-2015 15:16:38 :
04-25-2015 15:16:38 : -- Deleting VMs
04-25-2015 15:16:39 : -- Done deleting VMs
04-25-2015 15:16:39 :
04-25-2015 15:16:39 : -- Deleting Security Groups
04-25-2015 15:16:39 : 'default' Security Group deleted
04-25-2015 15:16:39 : -- Done deleting Security Groups
04-25-2015 15:16:39 :
04-25-2015 15:16:39 : --- Deleting Floating IPs
04-25-2015 15:16:39 : --Done deleting Floating IPs
04-25-2015 15:16:39 :
04-25-2015 15:16:39 : -- Clearing all routes from routers
04-25-2015 15:16:39 : -- Done Clearing all routes on routers
04-25-2015 15:16:39 :
04-25-2015 15:16:39 : -- Detaching subnets from routers
04-25-2015 15:16:39 : -- Done detaching subnets from routers
04-25-2015 15:16:39 :
04-25-2015 15:16:39 : -- Deleting Ports
HTTP Error 409: Conflict
04-25-2015 15:16:39 : Error deleting 25337691-d558-46d8-b2a2-96cbdef198db
04-25-2015 15:16:39 : -- Done deleting ports
04-25-2015 15:16:39 :
04-25-2015 15:16:39 : -- Deleting subnets
HTTP Error 409: Conflict
04-25-2015 15:16:39 : Error deleting 1f4059ee-2b6b-48d0-9687-b7e0fead3cae
04-25-2015 15:16:39 : --Done deleting subnets
04-25-2015 15:16:39 :
04-25-2015 15:16:39 : -- Deleting routers
04-25-2015 15:16:39 : --Done deleting routers
04-25-2015 15:16:39 :
04-25-2015 15:16:39 : --Deleting networks
HTTP Error 409: Conflict
04-25-2015 15:16:39 : Error deleting wjs_public_network
04-25-2015 15:16:39 : -- Done deleting networks
04-25-2015 15:16:39 :
04-25-2015 15:16:39 : ------DONE-------

adolfo@devstack:~/dvr$ neutron port-list
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+
| id | name | mac_address | fixed_ips |
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+
| 25337691-d558-46d8-b2a2-96cbdef198db | | fa:16:3e:06:1c:0e | {"subnet_id": "1f4059ee-2b6b-48d0-9687-b7e0fead3cae", "ip_address": "101.0.0.103"} |
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+
adolfo@devstack:~/dvr$ neutron port-show 25337691-d558-46d8-b2a2-96cbdef198db
+-----------------------+------------------------------------------------------------------------------------+
| Field | Value |
+-----------------------+------------------------------------------------------------------------------------+
| admin_state_up | True |
| allowed_address_pairs | |
| binding:host_id | devstack |
| binding:profile | {} |
| binding:vif_details | {"port_filter": true, "ovs_hybrid_plug": true} |
| binding:vif_type | ovs |
| binding:vnic_type | normal |
| device_id | d87ddfd4-6acc-48c7-8d0d-9119d6531eb2 |
| device_owner | network:floatingip_agent_gateway |
| extra_dhcp_opts | |
| fixed_ips | {"subnet_id": "1f4059ee-2b6b-48d0-9687-b7e0fead3cae", "ip_address": "101.0.0.103"} |
| id | 25337691-d558-46d8-b2a2-96cbdef198db |
| mac_address | fa:16:3e:06:1c:0e |
| name | |
| network_id | b376c1a7-ecfa-4892-88dc-509145e7fbc7 |
| security_groups | |
| status | DOWN |
| tenant_id | |
+-----------------------+------------------------------------------------------------------------------------+

adolfo@devstack:~/dvr$ neutron net-list
+--------------------------------------+--------------------+---------------------------------------------------+
| id | name | subnets |
+--------------------------------------+--------------------+---------------------------------------------------+
| b376c1a7-ecfa-4892-88dc-509145e7fbc7 | wjs_public_network | 1f4059ee-2b6b-48d0-9687-b7e0fead3cae 101.0.0.0/24 |
+--------------------------------------+--------------------+---------------------------------------------------+
adolfo@devstack:~/dvr$

Changed in neutron:
assignee: nobody → Adolfo Duarte (adolfo-duarte)
tags: added: l3-dvr-backlog
summary: - DVR: Router Gateway Port cannot be delted in certain circumstances
+ DVR: Router Gateway Port cannot be deleted in certain circumstances
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Adolfo Duarte (adolfo-duarte) wrote : Re: DVR: Router Gateway Port cannot be deleted in certain circumstances

After more testing the steps to reproduced this have been refined to:
#+ Test Steps: Get ip address of all nodes
#+ Test Steps:
#+ Step: User, Tenant, Token, and Image Verification
#+ Step: Add security Rules
#+ Step: Router Creations(DVR)
#+ Step: Networks, and Subnet Attachement
#+ Step: spin up dvr_vms
#+ Step: Obtain necessary vm information
#+ Step: Create external Network and Subnet
#+ Step: Attach external network to router
#+ Step: Associate FIP to Vm1
#+ Step: Verify floating ip pluming for Vm1
#+ Step: Verify connectivity to vm1 from outside world
#+ Step: Associate FIP currently associated with vm1 to Vm2
#+ Step: Verify: Fip-show should have Vm2's port Id
#+ Step: Verify floating ip pluming for Vm2
#+ Step: Verify connectivity to vm2 from outside world
#+ Step: Verify ssh to fip from external host and check output of ifconfig have vm2's fixed ip

then

04-25-2015 15:16:38 : ################################################
04-25-2015 15:16:38 : Removing all resources from system
04-25-2015 15:16:38 : ################################################
04-25-2015 15:16:38 : ***Getting User Token***
04-25-2015 15:16:38 : ***User Token Received***
04-25-2015 15:16:38 :
04-25-2015 15:16:38 : -- Deleting VMs
04-25-2015 15:16:39 : -- Done deleting VMs
04-25-2015 15:16:39 :
04-25-2015 15:16:39 : -- Deleting Security Groups
04-25-2015 15:16:39 : 'default' Security Group deleted
04-25-2015 15:16:39 : -- Done deleting Security Groups
04-25-2015 15:16:39 :
04-25-2015 15:16:39 : --- Deleting Floating IPs
04-25-2015 15:16:39 : --Done deleting Floating IPs
04-25-2015 15:16:39 :
04-25-2015 15:16:39 : -- Clearing all routes from routers
04-25-2015 15:16:39 : -- Done Clearing all routes on routers
04-25-2015 15:16:39 :
04-25-2015 15:16:39 : -- Detaching subnets from routers
04-25-2015 15:16:39 : -- Done detaching subnets from routers
04-25-2015 15:16:39 :
04-25-2015 15:16:39 : -- Deleting Ports
HTTP Error 409: Conflict
04-25-2015 15:16:39 : Error deleting 25337691-d558-46d8-b2a2-96cbdef198db
04-25-2015 15:16:39 : -- Done deleting ports
04-25-2015 15:16:39 :
04-25-2015 15:16:39 : -- Deleting subnets
HTTP Error 409: Conflict
04-25-2015 15:16:39 : Error deleting 1f4059ee-2b6b-48d0-9687-b7e0fead3cae
04-25-2015 15:16:39 : --Done deleting subnets
04-25-2015 15:16:39 :
04-25-2015 15:16:39 : -- Deleting routers
04-25-2015 15:16:39 : --Done deleting routers
04-25-2015 15:16:39 :
04-25-2015 15:16:39 : --Deleting networks
HTTP Error 409: Conflict
04-25-2015 15:16:39 : Error deleting wjs_public_network
04-25-2015 15:16:39 : -- Done deleting networks
04-25-2015 15:16:39 :
04-25-2015 15:16:39 : ------DONE-------

Revision history for this message
Adolfo Duarte (adolfo-duarte) wrote :

Here is a possible explanation for this:

It looks like the cause might be the fip re-association from vm1 to vm2.
vm1 is instantiaged in compute-node-1 (hypervisor-1) while vm2 is instantiaged in compute-node-2 (hypervisor-2).
In the above sequence, when the fip is re-associated to vm2 (moved from vm1 to vm2), the port is NOT dissasociated first.

it is simply UPDATED. in other words the command "neutron floatingip-disassociate fip-id> is NOT given. The port is simply reassociagted by doing:
neutron floatingip-associate fip-id vm2-port-id.

this causes the network:floatingip_agent_gateway to be left live in hypervisor-1, where vm1 is.

because the network:floatingip_agent_gateway port is associated with the external network, the external network may not be deleted.

Revision history for this message
Adolfo Duarte (adolfo-duarte) wrote :

a workaround to this problem can be done through the cli by associating a fip to a vm instantiated on the hypervisor with the lingering port, and then disassociating it.
for example on the scenario above where hypervisor-1 has a network:floatingip_agent_gateway lingering around, we can instantiate a vm, associate a fip, then disasassociate the fip. the network:floatingip_agent_gateway then gets deleted, and we can then remove the external network.
This is not really an acceptable workaround on the long term but it prooves that the problem is caused by reassosiating fips from hypervisor to hypervisor without first disassociating them.

Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote : Re: DVR: Floating Agent Gateway Port is not deleted on the first host when FloatingIP is moved from one VM to another VM on a different host without disassociating the FloatingIP.

Today when floatingip update request comes in there is no information about the previous private port attached with the floatingIP.
The probably fix to clean up the left over ports would be clean it up when the gateway port is removed or deleted from the Router.

summary: - DVR: Router Gateway Port cannot be deleted in certain circumstances
+ DVR: Floating Agent Gateway Port is not deleted on the first host when
+ FloatingIP is moved from one VM to another VM on a different host
+ without disassociating the FloatingIP.
Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

I am wrong on my assumption, we do get both the previous and current private port id, so need to check for that condition on host change and based on that would need to delete the port.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/180408

Changed in neutron:
assignee: Adolfo Duarte (adolfo-duarte) → Swaminathan Vasudevan (swaminathan-vasudevan)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/181084

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Swaminathan Vasudevan (<email address hidden>) on branch: master
Review: https://review.openstack.org/181084
Reason: Not required anymore since all the changes are made in the parent patch.

Changed in neutron:
assignee: Swaminathan Vasudevan (swaminathan-vasudevan) → Lynn (lynn-li)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Swaminathan Vasudevan (<email address hidden>) on branch: master
Review: https://review.openstack.org/180408
Reason: Abandon this change, since this was addressed by another patch.

Changed in neutron:
assignee: Lynn (lynn-li) → Swaminathan Vasudevan (swaminathan-vasudevan)
Revision history for this message
Ryan Moats (rmoats) wrote : Re: DVR: Floating Agent Gateway Port is not deleted on the first host when FloatingIP is moved from one VM to another VM on a different host without disassociating the FloatingIP.

Launchpad has missed that https://review.openstack.org/#/c/194455 is now the patch for this defect

Ryan Moats (rmoats)
summary: - DVR: Floating Agent Gateway Port is not deleted on the first host when
- FloatingIP is moved from one VM to another VM on a different host
- without disassociating the FloatingIP.
+ DVR: Moving FloatingIP between hosts can lead to Floating Agent Gateway
+ Port not being deleted
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Swaminathan Vasudevan (<email address hidden>) on branch: master
Review: https://review.openstack.org/180408
Reason: Change ID's shown below fixes this problem.
https://review.openstack.org/#/c/194441/
https://review.openstack.org/#/c/194446/
https://review.openstack.org/#/c/194455/

So we can abandon it.
I will update the launchpad bug with the information.

Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

Hi Ryan,

The patch that you have shown above has some dependencies.

Here are the complete list of patch sets that would address this problem.

https://review.openstack.org/#/c/194441 ( Server side change patch that would delete the port only when the gateway is removed)
https://review.openstack.org/#/c/194446 ( Client side and rpc patch - client if not required will remove the port using the rpc)
https://review.openstack.org/#/c/194455 ( clean up legacy function)

Changed in neutron:
milestone: none → liberty-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/194441
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=d5aa1659f56601d8f4d5e17273d5ade7a0e202dd
Submitter: Jenkins
Branch: master

commit d5aa1659f56601d8f4d5e17273d5ade7a0e202dd
Author: Swaminathan Vasudevan <email address hidden>
Date: Mon Jun 22 16:33:32 2015 -0700

    Delete FIP agent gateway port with external gw port

    FIP agent gateway ports are associated with external
    networks and specific host.

    Today FIP agent gateway ports are deleted for
    every floatingip associate and disassociate. This
    introduces race conditions in the port delete and also
    un-necessary access to the db.

    This patch will delete the FIP agent gateway port when
    the last gateway port of the external network is deleted.

    The child patch linked to this parent patch will clean
    up the FIP agent gateway port delete when associate,
    disassociate and delete of floatingip happens.

    This should also cover the case when an agent for some
    reason was unable to request agent gw port delete.
    (agent died).

    Related-Bug: #1408855
    Related-Bug: #1468007
    Related-Bug: #1450982

    Change-Id: I6637a771e6a6ce74e848cb74b779043e16a54a84

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/194446
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=639f1893dde0d393a97b29ca5309dba716831a7f
Submitter: Jenkins
Branch: master

commit 639f1893dde0d393a97b29ca5309dba716831a7f
Author: Swaminathan Vasudevan <email address hidden>
Date: Mon Jun 22 16:50:43 2015 -0700

    Add RPC command and delete if last FIP on Agent

    Today FloatingIP Agent gateway port is deleted and
    re-created for DVR based routers based on floatingip
    association and disassociation with VMs on compute
    nodes by the plugin.

    This introduces lot more strain on the plugin to
    create and delete these ports when VMs come up and
    get deleted that are associated with FloatingIps.

    This patch will introduce an RPC call for the agent
    to initiate a agent gateway port delete.

    Also the agent will look for the last floatingip that
    it manages, and if condition satisfies, the agent will
    request the server to remove the FloatingIP Agent
    Gateway port.

    Change-Id: I47694b2ee60c363e2fe59ad5f7d168252da08a45
    Related-Bug: #1468007
    Related-Bug: #1408855
    Related-Bug: #1450982

Revision history for this message
Kyle Mestery (mestery) wrote :

Is this gonna make RC1? I'll leave it here for now but it's not clear to me how this will be closed by then.

Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

Hi Kyle, there is only one patch that has to go in.
I need one more +2 on that patch and it will go in.

Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

I did see that I got both the +2's so this last patch should go in.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/194455
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=cd45f16442b7c56c4876bef527c9c83ea0907c40
Submitter: Jenkins
Branch: master

commit cd45f16442b7c56c4876bef527c9c83ea0907c40
Author: Swaminathan Vasudevan <email address hidden>
Date: Mon Jun 22 17:17:15 2015 -0700

    Cleanup the fip agent gateway port delete routines

    Based on the parent patch, right now the Floatingip
    agent gateway ports will only be deleted when the
    last gateway port associated with the external
    network is deleted.

    The Floatingip agent gateway port will not be deleted
    for every floatingip dis-association and deletion.

    The Floatingip agent gateway port was created on all
    nodes as a substitute for the gateway port. So it
    makes sense to delete those ports only when the last
    gateway port on the external network is deleted.

    The agent should be able to delete the floatingip agent
    gateway port on a given external network when it is not
    required.

    This would substantially reduce the burden on the server
    to validate, read and delete the port form the DB.

    Change-Id: Ie561b19a2e58a2a563d79b75421e9e24c70f36f9
    Closes-Bug: #1468007
    Closes-Bug: #1408855
    Closes-Bug: #1450982

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (feature/pecan)

Fix proposed to branch: feature/pecan
Review: https://review.openstack.org/224334

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: feature/pecan
Review: https://review.openstack.org/224357

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (feature/pecan)
Download full text (73.6 KiB)

Reviewed: https://review.openstack.org/224357
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=fdc3431ccd219accf6a795079d9b67b8656eed8e
Submitter: Jenkins
Branch: feature/pecan

commit fe236bdaadb949661a0bfb9b62ddbe432b4cf5f1
Author: Miguel Angel Ajo <email address hidden>
Date: Thu Sep 3 15:40:12 2015 +0200

    No network devices on network attached qos policies

    Network devices, like internal router legs, or dhcp ports
    should not be affected by bandwidth limiting rules.

    This patch disables application of network attached policies
    to network/neutron owned ports.

    Closes-bug: #1486039
    DocImpact

    Change-Id: I75d80227f1e6c4b3f5fa7762b8dc3b0c0f1abd46

commit db4a06f7caa20a4c7879b58b20e95b223ed8eeaf
Author: Ken'ichi Ohmichi <email address hidden>
Date: Wed Sep 16 10:04:32 2015 +0000

    Use tempest-lib's token_client

    Now tempest-lib provides token_client modules as library and the
    interface is stable. So neutron repogitory doesn't need to contain
    these modules.
    This patch makes neutron use tempest-lib's token_client and removes
    the own modules for the maintenance.

    Change-Id: Ieff7eb003f6e8257d83368dbc80e332aa66a156c

commit 78aed58edbe6eb8a71339c7add491fe9de9a0546
Author: Jakub Libosvar <email address hidden>
Date: Thu Aug 13 09:08:20 2015 +0000

    Fix establishing UDP connection

    Previously, in establish_connection() for UDP protocol data were sent
    but never read on peer socket. That lead to successful read on peer side
    if this connection was filtered. Having constant testing string masked
    this issue as we can't distinguish to which test of connectivity data
    belong.

    This patch makes unique data string per test_connectivity() and
    also makes establish_connection() to create an ASSURED entry in
    conntrack table. Finally, in last test after firewall filter was
    removed, connection is re-established in order to avoid troubles with
    terminated processes or TCP continuing sending packets which weren't
    successfully delivered.

    Closes-Bug: 1478847
    Change-Id: I2920d587d8df8d96dc1c752c28f48ba495f3cf0f

commit e6292fcdd6262434a7b713ad8802db6bc8a6d3dc
Author: YAMAMOTO Takashi <email address hidden>
Date: Wed Sep 16 13:20:51 2015 +0900

    ovsdb: Fix a few docstring

    Change-Id: I53e1e21655b28fe5da60e58aeeb7cbbd103ae014

commit c22949a4449d96a67caa616290cf76b67b182917
Author: fumihiko kakuma <email address hidden>
Date: Wed Sep 16 11:52:59 2015 +0900

    Remove requirements.txt for the ofagent mechanism driver

    It is no longer used.

    Related-Blueprint: core-vendor-decomposition
    https://blueprints.launchpad.net/neutron/+spec/core-vendor-decomposition

    Change-Id: Ib31fb3febf8968e50d86dd66e1e6e1ea2313f8ac

commit d1d4de19d85f961d388c91e70f31b3bafec418c5
Author: Kevin Benton <email address hidden>
Date: Thu Sep 3 20:25:57 2015 -0700

    Always return iterables in L3 get_candidates

    The caller of this function expects iterables.

    Closes-Bug: #1494996
    Change-Id: I3d103e63f4e127a77268502415c0ddb0d804b54a

commit 1ad6ac448067306...

tags: added: in-feature-pecan
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (feature/pecan)

Change abandoned by Doug Wiegley (<email address hidden>) on branch: feature/pecan
Review: https://review.openstack.org/224334

Thierry Carrez (ttx)
Changed in neutron:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: liberty-rc1 → 7.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.