VM loses connectivity on floating ip association when using DVR
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
High
|
Mike Smith | ||
Juno |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Presence: Juno 2014.2-1 RDO , ubuntu 12.04
openvswitch version on ubuntu is 2.0.2
Description:
Whenever create FIP on a VM, it adds the FIP to ALL other compute nodes, a routing prefix in the FIP namespace, and IP interface alias on the qrouter.
However, the iptables gets updated normally with only the DNAT for the particular IP of the VM on that compute node
This causes the FIP proxy arp to answer ARP requests for ALL VM's on ALL compute nodes which results in compute nodes answering ARPs where they do not have
the VM effectively blackholing traffic to that ip.
Here is a demonstration of the problem:
Before adding a vm+fip on compute4
[root@compute2 ~]# ip netns exec fip-616a6213-
default via 173.209.44.1 dev fg-6ede0596-3a
169.
173.209.44.0/24 dev fg-6ede0596-3a proto kernel scope link src 173.209.44.6
173.209.44.4 via 169.254.31.28 dev fpr-3a90aae6-3
[root@compute3 neutron]# ip netns exec fip-616a6213-
default via 173.209.44.1 dev fg-26bef858-6b
169.
173.209.44.0/24 dev fg-26bef858-6b proto kernel scope link src 173.209.44.5
173.209.44.3 via 169.254.31.238 dev fpr-3a90aae6-3
[root@compute4 ~]# ip netns exec fip-616a6213-
default via 173.209.44.1 dev fg-2919b6be-f4
173.209.44.0/24 dev fg-2919b6be-f4 proto kernel scope link src 173.209.44.8
after creating a new vm on compute4 and attaching a floating IP to it, we get this result.
of course at this point, only the vm on compute4 is able to ping the public network
[root@compute2 ~]# ip netns exec fip-616a6213-
default via 173.209.44.1 dev fg-6ede0596-3a
169.
173.209.44.0/24 dev fg-6ede0596-3a proto kernel scope link src 173.209.44.6
173.209.44.4 via 169.254.31.28 dev fpr-3a90aae6-3
173.209.44.7 via 169.254.31.28 dev fpr-3a90aae6-3
[root@compute3 neutron]# ip netns exec fip-616a6213-
default via 173.209.44.1 dev fg-26bef858-6b
169.
173.209.44.0/24 dev fg-26bef858-6b proto kernel scope link src 173.209.44.5
173.209.44.3 via 169.254.31.238 dev fpr-3a90aae6-3
173.209.44.7 via 169.254.31.238 dev fpr-3a90aae6-3
[root@compute4 ~]# ip netns exec fip-616a6213-
default via 173.209.44.1 dev fg-2919b6be-f4
169.
173.209.44.0/24 dev fg-2919b6be-f4 proto kernel scope link src 173.209.44.8
173.209.44.3 via 169.254.30.20 dev fpr-3a90aae6-3
173.209.44.4 via 169.254.30.20 dev fpr-3a90aae6-3
173.209.44.7 via 169.254.30.20 dev fpr-3a90aae6-3
**when we deleted the extra FIP from each Compute Nodes Namespace, everything starts to work just fine**
Following are the router, floating IP information and config files :
+--
| Field | Value |
+--
| admin_state_up | True |
| distributed | True |
| external_
| ha | False |
| id | 3a90aae6-
| name | admin-router |
| routes | |
| status | ACTIVE |
| tenant_id | 132a58509228480
+--
[root@controller1 ~]# neutron floatingip-show 9919c836-
+--
| Field | Value |
| fixed_ip_address | 10.0.0.11 |
| floating_ip_address | 173.209.44.3 |
| floating_network_id | 616a6213-
| id | 9919c836-
| port_id | 8b875248-
| router_id | 3a90aae6-
| status | ACTIVE |
| tenant_id | 132a58509228480
[root@controller1 ~]# neutron floatingip-show ab73e133-
+--
| Field | Value |
+--
| fixed_ip_address | 10.0.0.9 |
| floating_ip_address | 173.209.44.4 |
| floating_network_id | 616a6213-
| id | ab73e133-
| port_id | 3273aa63-
| router_id | 3a90aae6-
| status | ACTIVE |
| tenant_id | 132a58509228480
+--
[root@controller1 ~]# neutron floatingip-show bf456993-
+--
| Field | Value |
+--
| fixed_ip_address | 10.0.0.12 |
| floating_ip_address | 173.209.44.7 |
| floating_network_id | 616a6213-
| id | bf456993-
| port_id | 7b3ec99d-
| router_id | 3a90aae6-
| status | ACTIVE |
| tenant_id | 132a58509228480
+--
[root@net1 neutron]# cat /etc/neutron/
[DEFAULT]
verbose = True
router_
debug = True
use_syslog = True
core_plugin = ml2
service_plugins = router,lbaas
auth_strategy = keystone
allow_
allow_
dhcp_
notify_
notify_
nova_url = http://
nova_
nova_
nova_
nova_
nova_
rabbit_port = 5672
rabbit_password = guest
rabbit_hosts = queue1:5672, queue2:5672
rabbit_userid = guest
rabbit_
rabbit_
rpc_
[matchmaker
[matchmaker
[quotas]
[agent]
[keystone_
auth_uri = http://
identity_uri = http://
admin_
admin_user = neutron
admin_password = secret
[database]
connection = mysql:/
[service_
service_
service_
[root@net1 neutron]# cat /etc/neutron/
[DEFAULT]
interface_
use_namespaces = True
external_
verbose=True
agent_mode = dvr_snat
[root@compute1 neutron]# cat /etc/neutron/
[DEFAULT]
verbose = True
router_
debug = True
use_syslog = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
base_mac = fa:16:3e:01:00:00
dvr_base_mac = fa:16:3f:01:00:00
allow_
rabbit_port = 5672
rabbit_password = guest
rabbit_hosts = queue1:5672, queue2:5672
rabbit_userid = guest
rabbit_
rabbit_
rpc_
[matchmaker
[matchmaker
[quotas]
[agent]
[keystone_
auth_uri = http://
identity_uri = http://
admin_
admin_user = neutron
admin_password = secret
[database]
[service_
service_
service_
[root@compute1 neutron]# cat /etc/neutron/
[DEFAULT]
interface_
use_namespaces = True
external_
verbose=True
agent_mode = dvr
[root@net1 neutron]# cat /etc/neutron/
[ml2]
type_drivers = vxlan,vlan,flat
tenant_
mechanism_
[ml2_type_flat]
flat_networks = public
[ml2_type_vlan]
[ml2_type_gre]
[ml2_
vni_ranges = 10000:100000
[securitygroup]
enable_
enable_ipset = True
firewall_driver = neutron.
[agent]
l2_
polling_
arp_
tunnel_
enable_
[ovs]
enable_
integration
local_
tunnel_
bridge_
tags: |
added: l3-ipam-dhcp removed: floating-ip neutron |
Changed in neutron: | |
importance: | Undecided → High |
tags: |
added: l3-dvr-backlog removed: dvr |
tags: | added: juno-backport-potential |
Changed in neutron: | |
milestone: | none → kilo-1 |
status: | Fix Committed → Fix Released |
Changed in neutron: | |
milestone: | kilo-1 → 2015.1.0 |
I can reproduce and have a fix. I'll post a patch next. I believe this snuck in as a regression from some early refactoring (SHA e5ca28e3).