VM loses connectivity on floating ip association when using l3_ha

Bug #1511722 reported by Fernando
20
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Stefan Nica

Bug Description

I not sure if my issue is related to this bug https://bugs.launchpad.net/neutron/+bug/1389880, it's new one or it's a misconfiguration, but I have the same symptoms.

If I create a new router in HA ( # neutron router-create --ha=True router01), everything works fine.

When I create a new router without HA flag, if I have an instance with one floating IP and then I assign a floating IP to other instance, I lose external connectivity to both instance (doesn't matter the number of instances, I lose external connectivity with all of them) until I connect to anyone by vnc and I ping to external/internet IP, and then everything works fine again.

Sorry, English is not my native language.

Ubuntu 14.04
Open vSwitch 2.3.2
Kilo 2015.1.1

root@network01:/home/administrator# cat /etc/neutron/neutron.conf | grep -v ^$ | grep -v ^#
[DEFAULT]
verbose = False
rpc_backend = rabbit
auth_strategy = keystone
core_plugin = ml2
service_plugins = router
allow_overlapping_ips = True
dhcp_agents_per_network = 2
l3_ha = True
max_l3_agents_per_router = 2
min_l2_agents_per_router = 2
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://10.8.11.120:5000
auth_url = http://10.8.11.120:35357
auth_plugin = password
project_domain_id = default
user_domain_id = default
project_name = service
username = neutron
password = secret
[database]
[nova]
[oslo_concurrency]
lock_path = $state_path/lock
[oslo_policy]
[oslo_messaging_amqp]
[oslo_messaging_qpid]
[oslo_messaging_rabbit]
rabbit_hosts = controller01:5672,controller02:5672
rabbit_userid = openstack
rabbit_password = secret
rabbit_retry_interval = 1
rabbit_retry_backoff = 2
rabbit_max_retries = 0
rabbit_durable_queues = True
rabbit_ha_queues = True

root@network01:/home/administrator# cat /etc/neutron/l3_agent.ini | grep -v ^$ | grep -v ^#
[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
external_network_bridge =
router_delete_namespaces = True

root@network01:/home/administrator# cat /etc/neutron/plugins/ml2/ml2_conf.ini | grep -v ^$ | grep -v ^#
[ml2]
type_drivers = flat,vlan,gre,vxlan
tenant_network_types = gre
mechanism_drivers = openvswitch
[ml2_type_flat]
flat_networks = external
[ml2_type_vlan]
[ml2_type_gre]
tunnel_id_ranges = 1:1000
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True
enable_ipset = True
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
[ovs]
local_ip = 192.168.0.101
bridge_mappings = external:br-ex
[agent]
tunnel_types = gre

root@compute01:/home/ubuntu# cat /etc/neutron/neutron.conf | grep -v ^$ | grep -v ^#
[DEFAULT]
verbose = True
rpc_backend = rabbit
auth_strategy = keystone
core_plugin = ml2
service_plugins = router
allow_overlapping_ips = True
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://10.8.11.120:5000
auth_url = http://10.8.11.120:35357
auth_plugin = password
project_domain_id = default
user_domain_id = default
project_name = service
username = neutron
password = secret
[database]
[nova]
[oslo_concurrency]
lock_path = $state_path/lock
[oslo_policy]
[oslo_messaging_amqp]
[oslo_messaging_qpid]
[oslo_messaging_rabbit]
rabbit_hosts = controller01:5672,controller02:5672
rabbit_userid = openstack
rabbit_password = secret
rabbit_retry_interval = 1
rabbit_retry_backoff = 2
rabbit_max_retries = 0
rabbit_durable_queues = True
rabbit_ha_queues = True

root@compute01:/home/ubuntu# cat /etc/neutron/plugins/ml2/ml2_conf.ini | grep -v ^$ | grep -v ^#
[ml2]
type_drivers = flat,vlan,gre,vxlan
tenant_network_types = gre
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
[ml2_type_gre]
tunnel_id_ranges = 1:1000
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True
enable_ipset = True
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
[ovs]
local_ip = 192.168.0.105
[agent]
tunnel_types = gre

tags: added: l3-ha
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

This is different to #1389880, the underlaying mechanisms are different.

It's actually an issue in keepalived, a configuration reload triggers a DNS request trying to resolve the host name.

See:
     https://bugzilla.redhat.com/show_bug.cgi?id=1181592

when your controller does not have access to the host configured DNS inside the qrouter namespace, master keepalived will block waiting for the DNS response on the public interface, another keepalive will become master at that time.

A new parameter was added to the keepalive configuration, and we must now include it in neutron to avoid this issue.

A workaround is adding each of your hostnames to /etc/hosts to avoid the external resolution.

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

Now, if we provide the router_id and default_email_from in the keepalive config, the resolution and MASTER blocking should not happen.

Changed in neutron:
importance: Undecided → Medium
assignee: nobody → Miguel Angel Ajo (mangelajo)
Felipe Reyes (freyes)
tags: added: sts
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/302766

Changed in neutron:
assignee: Miguel Angel Ajo (mangelajo) → Felipe Reyes (freyes)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/302766
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Needs a new owner.

Changed in neutron:
status: In Progress → Incomplete
assignee: Felipe Reyes (freyes) → nobody
tags: added: low-hanging-fruit
removed: sts
Changed in neutron:
assignee: nobody → Stefan Nica (stefan.nica)
Changed in neutron:
status: Incomplete → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/343312

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/343312
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b3af52e7388423a5fd3872453512218b00a5c6d7
Submitter: Jenkins
Branch: master

commit b3af52e7388423a5fd3872453512218b00a5c6d7
Author: Stefan Nica <email address hidden>
Date: Sun Jul 17 16:36:08 2016 +0300

    Keepalived global_defs configuration entries required to avoid DNS lookup

    This changeset addresses a particular L3-HA Neutron deployment scenario
    in which the DNS server configured for the management network is not
    also accessible from the virtual router namespace (i.e. over the
    external network).
    Keepalived uses the hostname against getaddrinfo twice to set default
    values for the router_id and notification_email_from global configuration
    attributes. If the hostname cannot be resolved through /etc/hosts and
    if the nameserver is not reachable, long delays are incurred during
    keepalived startup and configuration reload, causing VRRP state flapping
    and dropped traffic over floating IPs.

    Setting router_id and notification_email_from in the keepalived
    configuration avoids unnecessary DNS lookups. However, this solution
    is only effective with keepalived >= 1.2.17. Older versions still
    exhibit the same problem with or without this patch.

    Closes-Bug: #1511722
    Change-Id: If6e31d164bd6ade52997bc0073ef50cdbc99ec93

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/377730

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/newton)

Reviewed: https://review.openstack.org/377730
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=58180e6d90157c71ccb57cd95cc883f2d16e370c
Submitter: Jenkins
Branch: stable/newton

commit 58180e6d90157c71ccb57cd95cc883f2d16e370c
Author: Stefan Nica <email address hidden>
Date: Sun Jul 17 16:36:08 2016 +0300

    Keepalived global_defs configuration entries required to avoid DNS lookup

    This changeset addresses a particular L3-HA Neutron deployment scenario
    in which the DNS server configured for the management network is not
    also accessible from the virtual router namespace (i.e. over the
    external network).
    Keepalived uses the hostname against getaddrinfo twice to set default
    values for the router_id and notification_email_from global configuration
    attributes. If the hostname cannot be resolved through /etc/hosts and
    if the nameserver is not reachable, long delays are incurred during
    keepalived startup and configuration reload, causing VRRP state flapping
    and dropped traffic over floating IPs.

    Setting router_id and notification_email_from in the keepalived
    configuration avoids unnecessary DNS lookups. However, this solution
    is only effective with keepalived >= 1.2.17. Older versions still
    exhibit the same problem with or without this patch.

    Closes-Bug: #1511722
    Change-Id: If6e31d164bd6ade52997bc0073ef50cdbc99ec93
    (cherry picked from commit b3af52e7388423a5fd3872453512218b00a5c6d7)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 9.1.0

This issue was fixed in the openstack/neutron 9.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 10.0.0.0b1

This issue was fixed in the openstack/neutron 10.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.