FIP Namespace add/delete race conditon seen in DVR router log. This might cause the FIP functionality to fail.
From the trace log it seems when this happens, a bunch of tests related to FIP functionality fails with SSH Timeout waiting for reply.
Here is the output of the trace that kinds of shows the race condition.
Exit code: 0
execute /opt/stack/new/neutron/neutron/agent/linux/utils.py:156
2015-09-29 21:10:33.433 7884 DEBUG neutron.agent.l3.dvr_local_router [-] Removed last floatingip, so requesting the server to delete Floatingip Agent Gateway port:{u'allowed_address_pairs': [], u'extra_dhcp_opts': [], u'device_owner': u'network:floatingip_agent_gateway', u'port_security_enabled': False, u'binding:profile': {}, u'fixed_ips': [{u'subnet_id': u'362e9033-db93-4193-9413-1073215ab326', u'prefixlen': 24, u'ip_address': u'172.24.5.9'}, {u'subnet_id': u'feb3aa76-53b1-4d4e-b136-412c747ffd30', u'prefixlen': 64, u'ip_address': u'2001:db8::a'}], u'id': u'044a8e2f-00eb-4231-b526-13cb46dcc42f', u'security_groups': [], u'binding:vif_details': {u'port_filter': True, u'ovs_hybrid_plug': True}, u'binding:vif_type': u'ovs', u'mac_address': u'fa:16:3e:7a:a6:85', u'status': u'DOWN', u'subnets': [{u'ipv6_ra_mode': None, u'cidr': u'2001:db8::/64', u'gateway_ip': u'2001:db8::2', u'id': u'feb3aa76-53b1-4d4e-b136-412c747ffd30', u'subnetpool_id': None}, {u'ipv6_ra_mode': None, u'cidr': u'172.24.5.0/24', u'gateway_ip': u'172.24.5.1', u'id': u'362e9033-db93-4193-9413-1073215ab326', u'subnetpool_id': None}], u'binding:host_id': u'devstack-trusty-hpcloud-b5-5153724', u'dns_assignment': [{u'hostname': u'host-172-24-5-9', u'ip_address': u'172.24.5.9', u'fqdn': u'host-172-24-5-9.openstacklocal.'}, {u'hostname': u'host-2001-db8--a', u'ip_address': u'2001:db8::a', u'fqdn': u'host-2001-db8--a.openstacklocal.'}], u'device_id': u'646bb18b-da52-4ead-a635-012c72c1ccf1', u'name': u'', u'admin_state_up': True, u'network_id': u'31689320-95d7-44f9-932a-cc82c1bca2b4', u'dns_name': u'', u'binding:vnic_type': u'normal', u'tenant_id': u'', u'extra_subnets': []} floating_ip_removed_dist /opt/stack/new/neutron/neutron/agent/l3/dvr_local_router.py:148
2015-09-29 21:10:34.031 7884 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'delete', 'fip-31689320-95d7-44f9-932a-cc82c1bca2b4'] execute_rootwrap_daemon /opt/stack/new/neutron/neutron/agent/linux/utils.py:101
2015-09-29 21:10:34.043 DEBUG neutron.agent.l3.dvr_local_router [req-33413b07-784c-469e-8a35-0e20312a157e None None] FloatingIP agent gateway port received from the plugin: {u'allowed_address_pairs': [], u'extra_dhcp_opts': [], u'device_owner': u'network:floatingip_agent_gateway', u'port_security_enabled': False, u'binding:profile': {}, u'fixed_ips': [{u'subnet_id': u'362e9033-db93-4193-9413-1073215ab326', u'prefixlen': 24, u'ip_address': u'172.24.5.9'}, {u'subnet_id': u'feb3aa76-53b1-4d4e-b136-412c747ffd30', u'prefixlen': 64, u'ip_address': u'2001:db8::a'}], u'id': u'044a8e2f-00eb-4231-b526-13cb46dcc42f', u'security_groups': [], u'binding:vif_details': {u'port_filter': True, u'ovs_hybrid_plug': True}, u'binding:vif_type': u'ovs', u'mac_address': u'fa:16:3e:7a:a6:85', u'status': u'ACTIVE', u'subnets': [{u'ipv6_ra_mode': None, u'cidr': u'172.24.5.0/24', u'gateway_ip': u'172.24.5.1', u'id': u'362e9033-db93-4193-9413-1073215ab326', u'subnetpool_id': None}, {u'ipv6_ra_mode': None, u'cidr': u'2001:db8::/64', u'gateway_ip': u'2001:db8::2', u'id': u'feb3aa76-53b1-4d4e-b136-412c747ffd30', u'subnetpool_id': None}], u'binding:host_id': u'devstack-trusty-hpcloud-b5-5153724', u'dns_assignment': [{u'hostname': u'host-172-24-5-9', u'ip_address': u'172.24.5.9', u'fqdn': u'host-172-24-5-9.openstacklocal.'}, {u'hostname': u'host-2001-db8--a', u'ip_address': u'2001:db8::a', u'fqdn': u'host-2001-db8--a.openstacklocal.'}], u'device_id': u'646bb18b-da52-4ead-a635-012c72c1ccf1', u'name': u'', u'admin_state_up': True, u'network_id': u'31689320-95d7-44f9-932a-cc82c1bca2b4', u'dns_name': u'', u'binding:vnic_type': u'normal', u'tenant_id': u'', u'extra_subnets': []} create_dvr_fip_interfaces /opt/stack/new/neutron/neutron/agent/l3/dvr_local_router.py:427
2015-09-29 21:10:34.043 DEBUG neutron.agent.l3.dvr_fip_ns [req-33413b07-784c-469e-8a35-0e20312a157e None None] add fip-namespace(fip-31689320-95d7-44f9-932a-cc82c1bca2b4) create /opt/stack/new/neutron/neutron/agent/l3/dvr_fip_ns.py:133
Exit code: 0
execute /opt/stack/new/neutron/neutron/agent/linux/utils.py:156
2015-09-29 21:10:34.053 DEBUG neutron.agent.linux.utils [req-33413b07-784c-469e-8a35-0e20312a157e None None] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'fip-31689320-95d7-44f9-932a-cc82c1bca2b4', 'sysctl', '-w', 'net.ipv4.ip_forward=1'] execute_rootwrap_daemon /opt/stack/new/neutron/neutron/agent/linux/utils.py:101
2015-09-29 21:10:34.084 ERROR neutron.agent.linux.utils [req-33413b07-784c-469e-8a35-0e20312a157e None None]
Command: ['ip', 'netns', 'exec', 'fip-31689320-95d7-44f9-932a-cc82c1bca2b4', 'sysctl', '-w', 'net.ipv4.ip_forward=1']
Exit code: 1
Stdin:
Stdout:
Stderr: seting the network namespace "fip-31689320-95d7-44f9-932a-cc82c1bca2b4" failed: Invalid argument
This leads to a series of failures.
This failure is seen only in the gate.
This can be reproduced by constantly adding and deleting floatingip to a private IP, with multiple API worker threads.
For more information you can also look at the "logstash" output below.
http://logs.openstack.org/82/228582/8/check/gate-tempest-dsvm-neutron-dvr/9053337/logs/screen-q-l3.txt.gz?level=TRACE#_2015-09-29_21_10_34_084
Reviewed: https:/ /review. openstack. org/229561 /git.openstack. org/cgit/ openstack/ neutron/ commit/ ?id=c874f6dadaa 983ed4f33880890 7c8829a7f86031
Committed: https:/
Submitter: Jenkins
Branch: master
commit c874f6dadaa983e d4f338808907c88 29a7f86031
Author: Swaminathan Vasudevan <email address hidden>
Date: Wed Sep 30 11:15:52 2015 -0700
Split the FIP Namespace delete in L3 agent for DVR
Right now we are seeing a race condition in the l3 agent
for DVR routers when a floatingip is deleted and created.
The agent tries to delete the floatingip namespace and
while it tries to delete there is another call to add a
namespace. There is a timing window in between these two
calls where sometimes the call to create a namespace succeeds
but, when tried to execute any commands in the namespace
it fails, since the namespace was deleted concurrently.
Since the fip namespace is associated with an external net
and each node has only one fip namespace for an external net,
we would like to only delete the fip namespace when the
external net is deleted.
The first step is to split the delete functionality into two.
The call to fip_ns.cleanup will only remove the dependency that
the fipnamespace has with the router namespace such as fpr and
rfp veth pairs.
The call to fip_ns.delete will actually delete the
the fip namespace and the fg device.
Partial-Bug: #1501873 54af70c274b2b2c 20ab64e2487
Change-Id: Ic94625d5a968f5