VM loses connectivity on floating ip association when using DVR

Bug #1389880 reported by Daniel Gauthier
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Mike Smith
Juno
Fix Released
Undecided
Unassigned

Bug Description

Presence: Juno 2014.2-1 RDO , ubuntu 12.04
openvswitch version on ubuntu is 2.0.2

Description:

Whenever create FIP on a VM, it adds the FIP to ALL other compute nodes, a routing prefix in the FIP namespace, and IP interface alias on the qrouter.
However, the iptables gets updated normally with only the DNAT for the particular IP of the VM on that compute node
This causes the FIP proxy arp to answer ARP requests for ALL VM's on ALL compute nodes which results in compute nodes answering ARPs where they do not have
the VM effectively blackholing traffic to that ip.

Here is a demonstration of the problem:

Before adding a vm+fip on compute4

    [root@compute2 ~]# ip netns exec fip-616a6213-c339-4164-9dff-344ae9e04929 ip route show
    default via 173.209.44.1 dev fg-6ede0596-3a
    169.254.31.28/31 dev fpr-3a90aae6-3 proto kernel scope link src 169.254.31.29
    173.209.44.0/24 dev fg-6ede0596-3a proto kernel scope link src 173.209.44.6
    173.209.44.4 via 169.254.31.28 dev fpr-3a90aae6-3

    [root@compute3 neutron]# ip netns exec fip-616a6213-c339-4164-9dff-344ae9e04929 ip route show
    default via 173.209.44.1 dev fg-26bef858-6b
    169.254.31.238/31 dev fpr-3a90aae6-3 proto kernel scope link src 169.254.31.239
    173.209.44.0/24 dev fg-26bef858-6b proto kernel scope link src 173.209.44.5
    173.209.44.3 via 169.254.31.238 dev fpr-3a90aae6-3

    [root@compute4 ~]# ip netns exec fip-616a6213-c339-4164-9dff-344ae9e04929 ip route show
    default via 173.209.44.1 dev fg-2919b6be-f4
    173.209.44.0/24 dev fg-2919b6be-f4 proto kernel scope link src 173.209.44.8

after creating a new vm on compute4 and attaching a floating IP to it, we get this result.
of course at this point, only the vm on compute4 is able to ping the public network

    [root@compute2 ~]# ip netns exec fip-616a6213-c339-4164-9dff-344ae9e04929 ip route show
    default via 173.209.44.1 dev fg-6ede0596-3a
    169.254.31.28/31 dev fpr-3a90aae6-3 proto kernel scope link src 169.254.31.29
    173.209.44.0/24 dev fg-6ede0596-3a proto kernel scope link src 173.209.44.6
    173.209.44.4 via 169.254.31.28 dev fpr-3a90aae6-3
    173.209.44.7 via 169.254.31.28 dev fpr-3a90aae6-3

    [root@compute3 neutron]# ip netns exec fip-616a6213-c339-4164-9dff-344ae9e04929 ip route show
    default via 173.209.44.1 dev fg-26bef858-6b
    169.254.31.238/31 dev fpr-3a90aae6-3 proto kernel scope link src 169.254.31.239
    173.209.44.0/24 dev fg-26bef858-6b proto kernel scope link src 173.209.44.5
    173.209.44.3 via 169.254.31.238 dev fpr-3a90aae6-3
    173.209.44.7 via 169.254.31.238 dev fpr-3a90aae6-3

    [root@compute4 ~]# ip netns exec fip-616a6213-c339-4164-9dff-344ae9e04929 ip route show
    default via 173.209.44.1 dev fg-2919b6be-f4
    169.254.30.20/31 dev fpr-3a90aae6-3 proto kernel scope link src 169.254.30.21
    173.209.44.0/24 dev fg-2919b6be-f4 proto kernel scope link src 173.209.44.8
    173.209.44.3 via 169.254.30.20 dev fpr-3a90aae6-3
    173.209.44.4 via 169.254.30.20 dev fpr-3a90aae6-3
    173.209.44.7 via 169.254.30.20 dev fpr-3a90aae6-3

 **when we deleted the extra FIP from each Compute Nodes Namespace, everything starts to work just fine**

Following are the router, floating IP information and config files :

    +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Field | Value |
    +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | admin_state_up | True |
    | distributed | True |
    | external_gateway_info | {"network_id": "616a6213-c339-4164-9dff-344ae9e04929", "enable_snat": true, "external_fixed_ips": [{"subnet_id": "0077e2d5-3c3d-4cd2-b55c-ee380fba7867", "ip_address": "173.209.44.2"}]} |
    | ha | False |
    | id | 3a90aae6-3107-49e4-a190-92ed37a43b1a |
    | name | admin-router |
    | routes | |
    | status | ACTIVE |
    | tenant_id | 132a585092284807a115f61cd1e3f688 |
    +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

[root@controller1 ~]# neutron floatingip-show 9919c836-532b-44d8-ba9e-8600c59ec1ec

    +---------------------+--------------------------------------+
    | Field | Value |
        +---------------------+--------------------------------------+
        | fixed_ip_address | 10.0.0.11 |
        | floating_ip_address | 173.209.44.3 |
        | floating_network_id | 616a6213-c339-4164-9dff-344ae9e04929 |
        | id | 9919c836-532b-44d8-ba9e-8600c59ec1ec |
        | port_id | 8b875248-0149-4e4f-805e-361b060ac1e4 |
        | router_id | 3a90aae6-3107-49e4-a190-92ed37a43b1a |
        | status | ACTIVE |
        | tenant_id | 132a585092284807a115f61cd1e3f688 |
        +---------------------+--------------------------------------+

[root@controller1 ~]# neutron floatingip-show ab73e133-ae75-4aea-9b5e-a4152bd922e2

    +---------------------+--------------------------------------+
    | Field | Value |
    +---------------------+--------------------------------------+
    | fixed_ip_address | 10.0.0.9 |
    | floating_ip_address | 173.209.44.4 |
    | floating_network_id | 616a6213-c339-4164-9dff-344ae9e04929 |
    | id | ab73e133-ae75-4aea-9b5e-a4152bd922e2 |
    | port_id | 3273aa63-4928-4880-86f7-634139772e36 |
    | router_id | 3a90aae6-3107-49e4-a190-92ed37a43b1a |
    | status | ACTIVE |
    | tenant_id | 132a585092284807a115f61cd1e3f688 |
    +---------------------+--------------------------------------+

[root@controller1 ~]# neutron floatingip-show bf456993-d20a-48b5-b62d-a1e397acfd1d

    +---------------------+--------------------------------------+
    | Field | Value |
    +---------------------+--------------------------------------+
    | fixed_ip_address | 10.0.0.12 |
    | floating_ip_address | 173.209.44.7 |
    | floating_network_id | 616a6213-c339-4164-9dff-344ae9e04929 |
    | id | bf456993-d20a-48b5-b62d-a1e397acfd1d |
    | port_id | 7b3ec99d-6a21-4446-b305-83a7d9bb6534 |
    | router_id | 3a90aae6-3107-49e4-a190-92ed37a43b1a |
    | status | ACTIVE |
    | tenant_id | 132a585092284807a115f61cd1e3f688 |
    +---------------------+--------------------------------------+

    [root@net1 neutron]# cat /etc/neutron/neutron.conf | grep -v ^$ | grep -v ^#
    [DEFAULT]
    verbose = True
    router_distributed = True
    debug = True
    use_syslog = True
    core_plugin = ml2
    service_plugins = router,lbaas
    auth_strategy = keystone
    allow_overlapping_ips = True
    allow_automatic_l3agent_failover = True
    dhcp_agents_per_network = 2
    notify_nova_on_port_status_changes = True
    notify_nova_on_port_data_changes = True
    nova_url = http://nova:8774/v2
    nova_admin_auth_url = http://keystone:35357/v2.0
    nova_region_name = regionOne
    nova_admin_username = nova
    nova_admin_tenant_id = d7e8412b252247eea6474fdad45442c6
    nova_admin_password = secret
    rabbit_port = 5672
    rabbit_password = guest
    rabbit_hosts = queue1:5672, queue2:5672
    rabbit_userid = guest
    rabbit_virtual_host = /
    rabbit_ha_queues = True
    rpc_backend=rabbit
    [matchmaker_redis]
    [matchmaker_ring]
    [quotas]
    [agent]
    [keystone_authtoken]
    auth_uri = http://keystone:5000/v2.0
    identity_uri = http://keystone:35357
    admin_tenant_name = service
    admin_user = neutron
    admin_password = secret
    [database]
    connection = mysql://neutron:secret@db/neutron
    [service_providers]
    service_provider=LOADBALANCER:Haproxy:neutron.services.loadbalancer.drivers.haproxy.plugin_driver.HaproxyOnHostPluginDriver:default
    service_provider=VPN:openswan:neutron.services.vpn.service_drivers.ipsec.IPsecVPNDriver:default

    [root@net1 neutron]# cat /etc/neutron/l3_agent.ini | grep -v ^$ | grep -v ^#
    [DEFAULT]
    interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
    use_namespaces = True
    external_network_bridge = public
    verbose=True
    agent_mode = dvr_snat

    [root@compute1 neutron]# cat /etc/neutron/neutron.conf | grep -v ^$ | grep -v ^#
    [DEFAULT]
    verbose = True
    router_distributed = True
    debug = True
    use_syslog = True
    core_plugin = ml2
    service_plugins = router
    auth_strategy = keystone
    base_mac = fa:16:3e:01:00:00
    dvr_base_mac = fa:16:3f:01:00:00
    allow_overlapping_ips = True
    rabbit_port = 5672
    rabbit_password = guest
    rabbit_hosts = queue1:5672, queue2:5672
    rabbit_userid = guest
    rabbit_virtual_host = /
    rabbit_ha_queues = True
    rpc_backend=rabbit
    [matchmaker_redis]
    [matchmaker_ring]
    [quotas]
    [agent]
    [keystone_authtoken]
    auth_uri = http://keystone:5000/v2.0
    identity_uri = http://keystone:35357
    admin_tenant_name = service
    admin_user = neutron
    admin_password = secret
    [database]
    [service_providers]
    service_provider=LOADBALANCER:Haproxy:neutron.services.loadbalancer.drivers.haproxy.plugin_driver.HaproxyOnHostPluginDriver:default
    service_provider=VPN:openswan:neutron.services.vpn.service_drivers.ipsec.IPsecVPNDriver:default

    [root@compute1 neutron]# cat /etc/neutron/l3_agent.ini | grep -v ^$ | grep -v ^#
    [DEFAULT]
    interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
    use_namespaces = True
    external_network_bridge = public
    verbose=True
    agent_mode = dvr

    [root@net1 neutron]# cat /etc/neutron/plugins/ml2/ml2_conf.ini | grep -v ^$ | grep -v ^#
    [ml2]
    type_drivers = vxlan,vlan,flat
    tenant_network_types = vxlan
    mechanism_drivers = openvswitch,l2population
    [ml2_type_flat]
    flat_networks = public
    [ml2_type_vlan]
    [ml2_type_gre]
    [ml2_type_vxlan]
    vni_ranges = 10000:100000
    [securitygroup]
    enable_security_group = True
    enable_ipset = True
    firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
    [agent]
    l2_population=True
    polling_interval=2
    arp_responder=True
    tunnel_types=vxlan
    enable_distributed_routing = True
    [ovs]
    enable_tunneling=True
    integration_bridge=br-int
    local_ip=10.60.0.3
    tunnel_bridge=br-tun
    bridge_mappings=public:public

tags: added: l3-ipam-dhcp
removed: floating-ip neutron
Changed in neutron:
importance: Undecided → High
tags: added: l3-dvr-backlog
removed: dvr
Revision history for this message
Mike Smith (michael-smith6) wrote :

I can reproduce and have a fix. I'll post a patch next. I believe this snuck in as a regression from some early refactoring (SHA e5ca28e3).

Changed in neutron:
assignee: nobody → Mike Smith (michael-smith6)
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/133580

Changed in neutron:
status: Confirmed → In Progress
tags: added: juno-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/133580
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=0a21b909baa11e4655852c27ab282c32e0aa7a94
Submitter: Jenkins
Branch: master

commit 0a21b909baa11e4655852c27ab282c32e0aa7a94
Author: Michael Smith <email address hidden>
Date: Mon Nov 10 15:49:14 2014 -0800

    Fix for FIPs duplicated across hosts for DVR

    For DVR, FIPs should be hosted on the single node
    which hosts the VM assigned with the fixed_ip of the FIP.
    The l3_agent should only take action on the correct FIP per
    host by filtering the FIPs based on the 'host' value
    of the FIP.

    A recent refactor on the l3_agent moved the host filtering logic
    from process_router_floating_ip_addresses() to
    _get_external_device_interface_name(). The local floating_ips var
    was not altered as it was before the refactor.

    This resulted in network disruption across multiple hosts
    since more than one namespace contained the FIP. This problem
    would only be seen in a mutli-host environment where the same
    router hosting FIPs was present on more than one node.

    The fix is to return the host filtering logic by adding a
    call to get_floating_ips(). In addition, the unit test
    test_process_router_dist_floating_ip_add() was modified to
    pass two FIPs instead of one. One FIP matches the host
    of the agent, one does not. Only one should be processed,
    not two.

    Change-Id: I67b19f6228af392519fff89b13283b43921552bf
    Closes-bug: #1389880

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → kilo-1
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/145283

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/juno)

Reviewed: https://review.openstack.org/145283
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f979dd692cd3c3fc5c6b0c780ed6eae1fd22664d
Submitter: Jenkins
Branch: stable/juno

commit f979dd692cd3c3fc5c6b0c780ed6eae1fd22664d
Author: Michael Smith <email address hidden>
Date: Mon Nov 10 15:49:14 2014 -0800

    Fix for FIPs duplicated across hosts for DVR

    For DVR, FIPs should be hosted on the single node
    which hosts the VM assigned with the fixed_ip of the FIP.
    The l3_agent should only take action on the correct FIP per
    host by filtering the FIPs based on the 'host' value
    of the FIP.

    A recent refactor on the l3_agent moved the host filtering logic
    from process_router_floating_ip_addresses() to
    _get_external_device_interface_name(). The local floating_ips var
    was not altered as it was before the refactor.

    This resulted in network disruption across multiple hosts
    since more than one namespace contained the FIP. This problem
    would only be seen in a mutli-host environment where the same
    router hosting FIPs was present on more than one node.

    The fix is to return the host filtering logic by adding a
    call to get_floating_ips(). In addition, the unit test
    test_process_router_dist_floating_ip_add() was modified to
    pass two FIPs instead of one. One FIP matches the host
    of the agent, one does not. Only one should be processed,
    not two.

    cherry-picked from 0a21b909baa11e4655852c27ab282c32e0aa7a94
    Change-Id: I67b19f6228af392519fff89b13283b43921552bf
    Closes-bug: #1389880

tags: added: in-stable-juno
Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-1 → 2015.1.0
Revision history for this message
Fernando (fernandom-imedio) wrote :
Download full text (3.9 KiB)

I not sure if my issue is related to this bug, is new one or it's a misconfiguration, but I have the same symptoms.

If I create a new router in HA ( # neutron router-create --ha=True router01), everything works fine.

When I create a new router without HA flag, if I have an instance with one floating IP and then I assign a floating IP to other instance, I lose external connectivity to both instance (doesn't matter the number of instances, I lose external connectivity with all of them) until I connect to anyone by vnc and I ping to external/internet IP, and then everything works fine again.

Sorry, English is not my native language.

Ubuntu 14.04
Open vSwitch 2.3.2
Kilo 2015.1.1

root@network01:/home/administrator# cat /etc/neutron/neutron.conf | grep -v ^$ | grep -v ^#
[DEFAULT]
verbose = False
rpc_backend = rabbit
auth_strategy = keystone
core_plugin = ml2
service_plugins = router
allow_overlapping_ips = True
dhcp_agents_per_network = 2
l3_ha = True
max_l3_agents_per_router = 2
min_l2_agents_per_router = 2
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://10.8.11.120:5000
auth_url = http://10.8.11.120:35357
auth_plugin = password
project_domain_id = default
user_domain_id = default
project_name = service
username = neutron
password = Manolo007
[database]
[nova]
[oslo_concurrency]
lock_path = $state_path/lock
[oslo_policy]
[oslo_messaging_amqp]
[oslo_messaging_qpid]
[oslo_messaging_rabbit]
rabbit_hosts = controller01:5672,controller02:5672
rabbit_userid = openstack
rabbit_password = Manolo007
rabbit_retry_interval = 1
rabbit_retry_backoff = 2
rabbit_max_retries = 0
rabbit_durable_queues = True
rabbit_ha_queues = True

root@network01:/home/administrator# cat /etc/neutron/l3_agent.ini | grep -v ^$ | grep -v ^#
[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
external_network_bridge =
router_delete_namespaces = True

root@network01:/home/administrator# cat /etc/neutron/plugins/ml2/ml2_conf.ini | grep -v ^$ | grep -v ^#
[ml2]
type_drivers = flat,vlan,gre,vxlan
tenant_network_types = gre
mechanism_drivers = openvswitch
[ml2_type_flat]
flat_networks = external
[ml2_type_vlan]
[ml2_type_gre]
tunnel_id_ranges = 1:1000
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True
enable_ipset = True
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
[ovs]
local_ip = 192.168.0.101
bridge_mappings = external:br-ex
[agent]
tunnel_types = gre

root@compute01:/home/ubuntu# cat /etc/neutron/neutron.conf | grep -v ^$ | grep -v ^#
[DEFAULT]
verbose = True
rpc_backend = rabbit
auth_strategy = keystone
core_plugin = ml2
service_plugins = router
allow_overlapping_ips = True
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://10.8.11.120:5000
auth_url = http://10.8.11.120:35357
auth_plugin = password
project_domain_id = default
user_domain_id = default
project_name = service
username = neutron
password = Manolo007
[database]
[nova]
[oslo_concurrency]
lock_pat...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.