unable to ping VM after floating ip re-association when using DVR

Bug #1353287 reported by Armando Migliaccio
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Mike Smith

Bug Description

Seen on master.

To reproduce the issue follow these steps:

- ***After the usual devstack setup (with dvr ON)***
- Boot a VM
- give it a Floating IP
- test ping/ssh success
- disassociate Floating IP
- test ping/ssh failure
- associate Floating IP back to the VM
- test ping/ssh success

Observed behavior:

Ping failure (destination Host Unreachable)

Expected behavior:

Ping success

The above mentioned steps are what is done in the scenario test (the subtle difference is that the test spawn a new server and move the FIP over):

test_network_basic_ops.TestNetworkBasicOps.test_network_basic_ops

description: updated
summary: - unable to ping VM after floating ip re-association
+ unable to ping VM after floating ip re-association when using DVR
description: updated
Changed in neutron:
importance: Undecided → High
Changed in neutron:
assignee: nobody → Mike Smith (michael-smith6)
Revision history for this message
Mike Smith (michael-smith6) wrote :

I tried to reproduce and could not but I realized I did not have any nodes configured as dvr_snat. I will re-test.

Revision history for this message
Mike Smith (michael-smith6) wrote :

I was able to reproduce with agent_mode=dvr_snat - might be a problem related to sharing snat namespace with fip.

Changed in neutron:
status: New → Confirmed
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/112427

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
Mike Smith (michael-smith6) wrote :
description: updated
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

I did some further testing with your patch and this issue does seem to be load related.

What I found out is that if I run the test 'test_network_basic_ops.TestNetworkBasicOps.test_network_basic_ops' on its own, it looks like it's passing (regardless of whether I have delete_router_namespace True or False). So this seems to be an improvement from when I first experienced this issue. However when I run it in combination with other tests, that's when things go astray.

From the failures I noticed that FIP's are not recycled, and I wonder if this may be related to the problem.

That said I hope this may help you in your investigation. I'll keep digging.

Revision history for this message
Mike Smith (michael-smith6) wrote :

With the latest patch sets, the original problem of fip re-association in 'test_network_basic_ops" seems to be resolved. But other failures are seen in the experimental test suite. The following are failing for example:

test_security_groups_basic_ops.TestSecurityGroupsBasicOps
network.test_floating_ips.FloatingIPTestXML
scenario.test_baremetal_basic_ops
compute.servers.test_create_server

Revision history for this message
Brian Haley (brian-haley) wrote :

I am also going through the Tempest failures, and one thing I've noticed with 'test_network_basic_ops' is that it takes up to 40 seconds for the FIP to initially become reachable. I will keep looking into that as well, just didn't know if others had seen the same thing.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/112427
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7205ea5858d5b23662e546885412832536688d51
Submitter: Jenkins
Branch: master

commit 7205ea5858d5b23662e546885412832536688d51
Author: Michael Smith <email address hidden>
Date: Wed Aug 6 15:02:35 2014 -0700

    Fixes an issue with FIP re-association

    When the last FIP is disassociated, the namespace and
    interfaces should be removed. The internal interface
    wasn't removed before without problems, but now the
    namespace cannot be removed with that interface present.
    The fix is to remove the internal FIP interface before
    removing the namespace.

    Change-Id: I021c658ecde584821f67b7a8de0205e8e938bb2d
    Closes-bug: 1353287

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → juno-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: juno-3 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.