Ipset race condition

Bug #1491131 reported by Kris Lindgren on 2015-09-01
42
This bug affects 7 people
Affects Status Importance Assigned to Milestone
neutron
High
Unassigned
Juno
High
Miguel Angel Ajo
Kilo
High
Miguel Angel Ajo

Bug Description

Hello,

We have been using ipsets in neutron since juno. We have upgraded our install to kilo a month or so and we have experienced 3 issues with ipsets.

The issues are as follows:
1.) Iptables attempts to apply rules for an ipset that was not added
2.) iptables attempt to apply rules for an ipset that was removed, but still refrenced in the iptables config
3.) ipset churns trying to remove an ipset that has already been removed.

For issue one and two I am unable to get the logs for these issues because neutron was dumping the full iptables-restore entries to log once every second for a few hours and eventually filled up the disk and we removed the file to get things working again.

For issue 3.) I have the start of the logs here:
2015-08-31 12:17:00.100 6840 INFO neutron.agent.common.ovs_lib [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ] Port 29355e52-bae1-44b2-ace6-5bc7ce497d32 not present in bridge br-int
2015-08-31 12:17:00.101 6840 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ] port_unbound(): net_uuid None not in local_vlan_map
2015-08-31 12:17:00.101 6840 INFO neutron.agent.securitygroups_rpc [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ] Remove device filter for [u'2aa0f79d-4983-4c7a-b489-e0612c482e36']
2015-08-31 12:17:00.861 4581 INFO neutron.agent.common.ovs_lib [req-4144d47e-0044-47b3-a302-d019e0a67aa0 ] Port 2aa0f79d-4983-4c7a-b489-e0612c482e36 not present in bridge br-int
2015-08-31 12:17:00.862 4581 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-4144d47e-0044-47b3-a302-d019e0a67aa0 ] port_unbound(): net_uuid None not in local_vlan_map
2015-08-31 12:17:00.862 4581 INFO neutron.agent.securitygroups_rpc [req-4144d47e-0044-47b3-a302-d019e0a67aa0 ] Remove device filter for [u'c616328a-e44c-4cf8-bc8e-83058c5635dd']
2015-08-31 12:17:01.499 4581 INFO neutron.agent.securitygroups_rpc [req-b5b95389-52b6-4051-ab35-ae383df56a0b ] Security group member updated [u'b05f4fa6-f1ec-41c0-8ba6-80b859dc23b0']
2015-08-31 12:17:01.500 6840 INFO neutron.agent.securitygroups_rpc [req-b5b95389-52b6-4051-ab35-ae383df56a0b ] Security group member updated [u'b05f4fa6-f1ec-41c0-8ba6-80b859dc23b0']
2015-08-31 12:17:01.608 6840 INFO neutron.agent.common.ovs_lib [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ] Port 2aa0f79d-4983-4c7a-b489-e0612c482e36 not present in bridge br-int
2015-08-31 12:17:01.609 6840 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ] port_unbound(): net_uuid None not in local_vlan_map
2015-08-31 12:17:01.609 6840 INFO neutron.agent.securitygroups_rpc [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ] Remove device filter for [u'c616328a-e44c-4cf8-bc8e-83058c5635dd']
2015-08-31 12:17:02.358 4581 INFO neutron.agent.common.ovs_lib [req-4144d47e-0044-47b3-a302-d019e0a67aa0 ] Port c616328a-e44c-4cf8-bc8e-83058c5635dd not present in bridge br-int
2015-08-31 12:17:02.359 4581 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-4144d47e-0044-47b3-a302-d019e0a67aa0 ] port_unbound(): net_uuid None not in local_vlan_map
2015-08-31 12:17:02.359 4581 INFO neutron.agent.securitygroups_rpc [req-4144d47e-0044-47b3-a302-d019e0a67aa0 ] Remove device filter for [u'fddff586-9903-47ad-92e1-b334e02e9d1c']
2015-08-31 12:17:03.108 6840 INFO neutron.agent.common.ovs_lib [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ] Port c616328a-e44c-4cf8-bc8e-83058c5635dd not present in bridge br-int
2015-08-31 12:17:03.109 6840 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ] port_unbound(): net_uuid None not in local_vlan_map
2015-08-31 12:17:03.109 6840 INFO neutron.agent.securitygroups_rpc [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ] Remove device filter for [u'fddff586-9903-47ad-92e1-b334e02e9d1c']
2015-08-31 12:17:03.855 4581 INFO neutron.agent.common.ovs_lib [req-4144d47e-0044-47b3-a302-d019e0a67aa0 ] Port fddff586-9903-47ad-92e1-b334e02e9d1c not present in bridge br-int
2015-08-31 12:17:03.855 4581 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-4144d47e-0044-47b3-a302-d019e0a67aa0 ] port_unbound(): net_uuid None not in local_vlan_map
2015-08-31 12:17:03.856 4581 INFO neutron.agent.securitygroups_rpc [req-4144d47e-0044-47b3-a302-d019e0a67aa0 ] Remove device filter for [u'3f706749-f8bb-41ab-aa4c-a0925dc67bd4']
2015-08-31 12:17:03.919 4581 INFO neutron.agent.securitygroups_rpc [req-1872b212-b537-41cc-96af-0c6ad380824c ] Security group member updated [u'b05f4fa6-f1ec-41c0-8ba6-80b859dc23b0']
2015-08-31 12:17:04.166 6840 INFO neutron.agent.securitygroups_rpc [req-1872b212-b537-41cc-96af-0c6ad380824c ] Security group member updated [u'b05f4fa6-f1ec-41c0-8ba6-80b859dc23b0']
2015-08-31 12:17:04.620 6840 INFO neutron.agent.common.ovs_lib [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ] Port fddff586-9903-47ad-92e1-b334e02e9d1c not present in bridge br-int
2015-08-31 12:17:04.621 6840 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ] port_unbound(): net_uuid None not in local_vlan_map
2015-08-31 12:17:04.621 6840 INFO neutron.agent.securitygroups_rpc [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ] Remove device filter for [u'3f706749-f8bb-41ab-aa4c-a0925dc67bd4']
2015-08-31 12:17:05.422 4581 INFO neutron.agent.common.ovs_lib [req-4144d47e-0044-47b3-a302-d019e0a67aa0 ] Port 3f706749-f8bb-41ab-aa4c-a0925dc67bd4 not present in bridge br-int
2015-08-31 12:17:05.422 4581 INFO neutron.agent.securitygroups_rpc [req-4144d47e-0044-47b3-a302-d019e0a67aa0 ] Remove device filter for [u'f075fda0-2dfc-4c1b-b7af-66a556b32434']
2015-08-31 12:17:05.674 4581 INFO neutron.agent.securitygroups_rpc [req-2aa43ed7-6749-4313-a6bd-d46ec8fc4d29 ] Security group member updated [u'7ad253a3-71ae-4539-a973-b9afb0f23e3b']
2015-08-31 12:17:05.679 6840 INFO neutron.agent.securitygroups_rpc [req-2aa43ed7-6749-4313-a6bd-d46ec8fc4d29 ] Security group member updated [u'7ad253a3-71ae-4539-a973-b9afb0f23e3b']
2015-08-31 12:17:06.037 6840 ERROR neutron.agent.linux.utils [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ]
Command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ipset', 'destroy', 'NETIPv420e1594a-28bb-4591-b']
Exit code: 1
Stdin:
Stdout:
Stderr: ipset v6.11: The set with the given name does not exist

2015-08-31 12:17:06.040 6840 ERROR neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ] Error while processing VIF ports
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last):
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py", line 1589, in rpc_loop
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self.process_deleted_ports(port_info)
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py", line 355, in process_deleted_ports
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self.sg_agent.remove_devices_filter([port_id])
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/securitygroups_rpc.py", line 303, in remove_devices_filter
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self.firewall.remove_port_filter(device)
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/rh/python27/root/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self.gen.next()
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/firewall.py", line 106, in defer_apply
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self.filter_defer_apply_off()
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/linux/iptables_firewall.py", line 660, in filter_defer_apply_off
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self._remove_unused_security_group_info()
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/linux/iptables_firewall.py", line 583, in _remove_unused_security_group_info
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self._remove_ipsets_for_remote_sgs(ip_version, remote_sg_ids)
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/linux/iptables_firewall.py", line 642, in _remove_ipsets_for_remote_sgs
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self.ipset.destroy(remote_sg_id, ip_version)
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 445, in inner
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent return f(*args, **kwargs)
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/linux/ipset_manager.py", line 101, in destroy
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self._destroy(set_name, forced)
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/linux/ipset_manager.py", line 174, in _destroy
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self._apply(cmd)
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/linux/ipset_manager.py", line 138, in _apply
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self.execute(cmd_ns, run_as_root=True, process_input=input)
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 137, in execute
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent raise RuntimeError(m)
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent RuntimeError:
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent Command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ipset', 'destroy', 'NETIPv420e1594a-28bb-4591-b']
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent Exit code: 1
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent Stdin:
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent Stdout:
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent Stderr: ipset v6.11: The set with the given name does not exist
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent
2015-08-31 12:17:06.040 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent
2015-08-31 12:17:06.042 6840 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ] Agent out of sync with plugin!
2015-08-31 12:17:06.045 4581 INFO neutron.agent.securitygroups_rpc [req-75496835-97a3-4d3c-a2ad-7fe3ce7d88a2 ] Security group member updated [u'ba07732e-88fb-4bd1-a018-99221d85c283', u'e716ee7f-df77-4722-b117-dbb6907b2319']
2015-08-31 12:17:06.053 6840 INFO neutron.agent.securitygroups_rpc [req-75496835-97a3-4d3c-a2ad-7fe3ce7d88a2 ] Security group member updated [u'ba07732e-88fb-4bd1-a018-99221d85c283', u'e716ee7f-df77-4722-b117-dbb6907b2319']
2015-08-31 12:17:06.488 6840 INFO neutron.agent.securitygroups_rpc [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ] Remove device filter for [u'0457636d-0ae6-412a-8025-6579e1f8a78a']
2015-08-31 12:17:06.748 4581 INFO neutron.agent.common.ovs_lib [req-4144d47e-0044-47b3-a302-d019e0a67aa0 ] Port f075fda0-2dfc-4c1b-b7af-66a556b32434 not present in bridge br-int
2015-08-31 12:17:06.749 4581 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-4144d47e-0044-47b3-a302-d019e0a67aa0 ] port_unbound(): net_uuid None not in local_vlan_map
2015-08-31 12:17:06.749 4581 INFO neutron.agent.securitygroups_rpc [req-4144d47e-0044-47b3-a302-d019e0a67aa0 ] Remove device filter for [u'cbd00d69-4069-4d65-bc5d-25cfc57ae9b5']
2015-08-31 12:17:07.474 6840 ERROR neutron.agent.linux.utils [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ]
Command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ipset', 'destroy', 'NETIPv420e1594a-28bb-4591-b']
Exit code: 1
Stdin:
Stdout:
Stderr: ipset v6.11: The set with the given name does not exist

2015-08-31 12:17:07.475 6840 ERROR neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ] Error while processing VIF ports
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last):
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py", line 1589, in rpc_loop
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self.process_deleted_ports(port_info)
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py", line 355, in process_deleted_ports
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self.sg_agent.remove_devices_filter([port_id])
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/securitygroups_rpc.py", line 303, in remove_devices_filter
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self.firewall.remove_port_filter(device)
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/rh/python27/root/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self.gen.next()
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/firewall.py", line 106, in defer_apply
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self.filter_defer_apply_off()
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/linux/iptables_firewall.py", line 660, in filter_defer_apply_off
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self._remove_unused_security_group_info()
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/linux/iptables_firewall.py", line 583, in _remove_unused_security_group_info
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self._remove_ipsets_for_remote_sgs(ip_version, remote_sg_ids)
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/linux/iptables_firewall.py", line 642, in _remove_ipsets_for_remote_sgs
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self.ipset.destroy(remote_sg_id, ip_version)
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 445, in inner
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent return f(*args, **kwargs)
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/linux/ipset_manager.py", line 101, in destroy
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self._destroy(set_name, forced)
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/linux/ipset_manager.py", line 174, in _destroy
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self._apply(cmd)
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/linux/ipset_manager.py", line 138, in _apply
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self.execute(cmd_ns, run_as_root=True, process_input=input)
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 137, in execute
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent raise RuntimeError(m)
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent RuntimeError:
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent Command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ipset', 'destroy', 'NETIPv420e1594a-28bb-4591-b']
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent Exit code: 1
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent Stdin:
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent Stdout:
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent Stderr: ipset v6.11: The set with the given name does not exist
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent
2015-08-31 12:17:07.475 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent
2015-08-31 12:17:08.042 6840 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ] Agent out of sync with plugin!
2015-08-31 12:17:08.177 4581 INFO neutron.agent.common.ovs_lib [req-4144d47e-0044-47b3-a302-d019e0a67aa0 ] Port cbd00d69-4069-4d65-bc5d-25cfc57ae9b5 not present in bridge br-int
2015-08-31 12:17:08.177 4581 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-4144d47e-0044-47b3-a302-d019e0a67aa0 ] port_unbound(): net_uuid None not in local_vlan_map
2015-08-31 12:17:08.178 4581 INFO neutron.agent.securitygroups_rpc [req-4144d47e-0044-47b3-a302-d019e0a67aa0 ] Remove device filter for [u'4ddf7f2a-deec-448e-9378-3fbc95172774']

This bascially churns like this until neutron-openvsiwtch-agent is restarted and then everything is fine.

Miguel Angel Ajo (mangelajo) wrote :

This could be due to a race condition, I'd patch this by adding a WARNING message, and not forcing a resync, as it's going to fail again, deleting should be happy if the set does not exist.

I didn't find this exact issue in logstash, but I found a similar one (trying to delete when still reference, which I believe kevin started tackling once, and not sure if I blocked it :(

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiU3RkZXJyOiBpcHNldFwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxNDQxMTc5NjM0Njc4fQ==

Kris G. Could you provide details on what were you doing exactly, and how do your security groups look like?

shihanzhang (shihanzhang) wrote :

Kris G, how can we reproduce this problem?

Kris Lindgren (klindgren) wrote :
Download full text (9.0 KiB)

I am not sure how to reproduce this problem. It happens randomly (last time was August 19th). It actually happened on the same server and it also happened on the same request UUID (how is that possible?)

2015-08-19 15:13:04.547 6840 INFO neutron.agent.securitygroups_rpc [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ] Refresh firewall rules
2015-08-19 15:13:04.778 6840 ERROR neutron.agent.linux.utils [req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ]
Command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ipset', 'del', 'NETIPv48a17c661-2071-4527-a', '10.22.248.16']
Exit code: 1
Stdin:
Stdout:
Stderr: ipset v6.11: Element cannot be deleted from the set: it's not added

[req-2505ffdc-85a7-46f9-bb0f-3fb7fe3d3eed ] Error while processing VIF ports
2015-08-19 15:13:04.779 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last):
2015-08-19 15:13:04.779 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py", line 1605, in rpc_loop
2015-08-19 15:13:04.779 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent ovs_restarted)
2015-08-19 15:13:04.779 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py", line 1367, in process_network_ports
2015-08-19 15:13:04.779 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent port_info.get('updated', set()))
2015-08-19 15:13:04.779 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/securitygroups_rpc.py", line 375, in setup_port_filters
2015-08-19 15:13:04.779 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self.refresh_firewall(updated_devices)
2015-08-19 15:13:04.779 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/securitygroups_rpc.py", line 219, in decorated_function
2015-08-19 15:13:04.779 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent *args, **kwargs)
2015-08-19 15:13:04.779 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/securitygroups_rpc.py", line 331, in refresh_firewall
2015-08-19 15:13:04.779 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent security_groups, security_group_member_ips)
2015-08-19 15:13:04.779 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/rh/python27/root/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
2015-08-19 15:13:04.779 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self.gen.next()
2015-08-19 15:13:04.779 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File "/opt/openstack/neutron/venv/lib/python2.7/site-packages/neutron/agent/firewall.py", line 106, in defer_apply
2015-08-19 15:13:04.779 6840 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_ag...

Read more...

Kris Lindgren (klindgren) wrote :

When I search for "port_unbound(): net_uuid None not in local_vlan_map". In our elasticsearch I get 1,497,722 count, across every single compute host that I have. If I drill down into a single host I see a repeating request uuid going back since 8/12.

Kris Lindgren (klindgren) wrote :

I should say across every host running neutron-openvsiwtch-agent. Also, each server seems to have one or more req-id's that are repeating, however no servers appear to share the same req-id.

Sam Morrison (sorrison) wrote :

We (NeCTAR) are seeing this in our environment too

Tentatively setting priority to HIGH.
This is difficult to reproduce into our environemnts, but I agree with Miguel's findings.

For the repeated request uuid, it's an unrelated problem. It's always been like that for operations over the RPC channel (and slightly annoying - we should have an open bug for this).

My gut feeling is that we should search the root cause for this issue in the RPC operation we use to retrieve security group rules for devices. There is probably a possibility that the state of the agent changes between the instance in which these info are retrieved and the instant in which they're processed.

Changed in neutron:
importance: Undecided → High
milestone: none → liberty-rc1

the reason for which I too the Liberty (pun intented) of setting milestone and importance is that this is an opeator-reported defect and might cause headaches to operators already deploying neutron or in the process of deploying it.

Kyle Mestery (mestery) wrote :

Changed description and added Kilo series.

summary: - Ipset race condition - Kilo-2015.1.0
+ Ipset race condition
shihanzhang (shihanzhang) wrote :

@Kris G, can you provide some more details:
1. what did you do causing this problem, for example adding a VM to a existing SG, or deleting a VM from a existing SG
2. the port that this IP 10.22.248.16 related is hosting on the problem server?

you can contact me on IRC #openstack-neutron: shihanzhang

shihanzhang (shihanzhang) wrote :

@ajo, there is already a patch for resync failed deiveces #https://review.openstack.org/#/c/211586/

Sam Morrison (sorrison) wrote :

I should add we are using ml2 linux bridge

Changed in neutron:
assignee: nobody → Cedric Brandily (cbrandily)
Sam Morrison (sorrison) wrote :

Some more information: We are triggering this by creating a security group that has a rule which is a source group back to itself. Then launching and deleting multiple instances with that source group.

It doesn't happen every time, in fact works more times than it breaks. With the source secgroup it causes a lot of iptables rule changes on each boot and delete.

Environment is stable/kilo on ubuntu 14.04 with cloud archive.

The main issue we see is that ipset is trying to delete something which is not there, it then just tries over and over until a restart.

shihanzhang (shihanzhang) wrote :

@Sam, thx very much for your information, I will test that as your steps

shihanzhang (shihanzhang) wrote :

@Sam, there is a improvement for security group, if we just add a new member to security group, there is no need to reload iptable rules, it just need to update the ipset set, now the master branch also has this problem, I have submit a patch for it https://review.openstack.org/#/c/122368/, but it has not be merged

Miguel Angel Ajo (mangelajo) wrote :

@shihanzhang, some of the points were already fixed in liberty, is there something deserving a backport?.

I would add a sanity check on ipset delete to make sure we don't permanently blow up the agent, but dumping some sort of ugly error we could trace back via log stash.

Said that, the ipset delete for a non existing ipset didn't show up in log stash last time I looked for it.

Kris Lindgren (klindgren) wrote :

I would agree with the adding a sanity check on ipset delete. If we remove an ipset that doesn't exist, then we should specifically catch that error and treat it as a success in that case - because the desired end state has already been reached. At a minimum we should only attempt to apply this change a few time. I see a change for the OVS agent, but this also seems to impact the linux bridge agent as well.

shihanzhang (shihanzhang) wrote :

@ajo, these four patchs were not backport:
1.https://review.openstack.org/#/c/190991/
2.https://review.openstack.org/#/c/187483/
3.https://review.openstack.org/#/c/177159/
4.https://review.openstack.org/#/c/187433/

and adding a sanity check is ok, but I think we need to find the root causes.

Kyle Mestery (mestery) wrote :

Can someone confirm if this needs to be fixed in master or only have the back ports Shihanzhang suggest done for kilo? It's not clear to me yet.

Kyle Mestery (mestery) on 2015-09-15
Changed in neutron:
status: New → Invalid
milestone: liberty-rc1 → none
Cedric Brandily (cbrandily) wrote :

The trouble is no more raised in neutron master branch according to logstash since a long time but still raised in neutron stable/kilo branch (mustly in gate-tempest-dsvm-neutron-linuxbridge) with a slightly different error:

     Stderr: ipset v6.20.1: Set cannot be destroyed: it is in use by a kernel component

Such error is raised when we try to delete an ipset still used (in iptables).

So i think that for the moment we can consider that this bug concerns stable/kilo not master branchs

Kyle Mestery (mestery) wrote :

Per discussion with Zzelle in channel, this only affects Kilo and Juno. Marking as such.

shihanzhang (shihanzhang) wrote :

hi kyle, in our production, we also use juno, but we have not this problems, so I do not think it affects juno

Miguel Angel Ajo (mangelajo) wrote :

Correct, Cedric, that's what I saw in logstash too.

@shihanzhang, I think (per our IRC conversation) you mean that in your production you use kilo, and you haven't seen it there, so it could be only juno.

Miguel Angel Ajo (mangelajo) wrote :

As per last meeting conversation I'm checking the list of proposed patches to see what applies for backports. master->kilo->juno

Sam Morrison (sorrison) wrote :

Kilo is where the issue is for us. Weren't seeing this in Juno

Miguel Angel Ajo (mangelajo) wrote :

Notes:

Could be related, but wasn't available in Kilo, where shihanzhang hasn't seen it (check log stash for occurrences in kilo):

https://review.openstack.org/#/c/187483/

Not likely to fix this bug but worth backporting because of their own bugs:

https://review.openstack.org/#/c/187433/4 : https://launchpad.net/bugs/1458786

https://review.openstack.org/#/c/177159/16 : https://launchpad.net/bugs/1448022

Not likely to fix this, but may be lower priority (this only leaves unused ipsets on the system, but does not break the agent):

https://review.openstack.org/#/c/190991/

Miguel Angel Ajo (mangelajo) wrote :

To my surprise, when checking today, this error is back to town in master, as of 11/sept/2015.

http://bit.ly/ipset-failures-logstash

It's also affecting kilo and juno.

Miguel Angel Ajo (mangelajo) wrote :

Hmm, most of the kilo failures seem to be related to "gate-tempest-dsvm-neutron-linuxbridge" , where it happens for any run.

Looks like ipset/security groups are broken in combination with linuxbridge.

Miguel Angel Ajo (mangelajo) wrote :

Sorry, no occurrences in master, it's "tempest/master" what I saw, in relation to this job: gate-tempest-dsvm-neutron-full-juno

Miguel Angel Ajo (mangelajo) wrote :

To see where this happens out of stable/kilo linux-brige use this logstash query:

http://bit.ly/ipset-failure-loghstash-without-lb

It seems that the runs are related to stable/juno and stable/kilo grenade (which runs both).

So I'd say

master (LB+OVS) is bugfree
kilo(LB) is broken because of this, or has something that triggers this.
kilo(OVS) is ok
juno(LB?+OVS) has low occurrences of the bug, bug the occurrences break the agent until reboot.

It must have been a patch from kilo what fixed this. Talking to kevin benton, we're considering to include a log.ERROR and continue, or something equivalent to this which removes any related iptable entry: https://review.openstack.org/#/c/173608/

We need someone experienced in linuxbridge to look at stable/kilo.

Miguel Angel Ajo (mangelajo) wrote :

Ok, shihanzhang confirms he doesn't see this in juno,
and the operator report was in kilo.

So, juno seems to be affected on the gate, kilo on at least a live system?...

I'd go on with the "log error, and don't block the agent in a loop" and backport to all stable branches. Since the race condition seems to be fixed in master, and it seems to be a safer fix to backport.

shihanzhang (shihanzhang) wrote :

@ajo, you can try to backport this patch https://review.openstack.org/#/c/187483/ and test it again

Reviewed: https://review.openstack.org/224517
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=114949b789eca7ef88ec4c43f1bb0b7233abd69c
Submitter: Jenkins
Branch: stable/kilo

commit 114949b789eca7ef88ec4c43f1bb0b7233abd69c
Author: shihanzhang <email address hidden>
Date: Mon Jun 1 16:17:37 2015 +0800

    Fix ipset can't be destroyed when last rule is deleted

    when it deletes a security group all rules, it should
    include this sg information in RPC method
    'security_group_info_for_devices', otherwise the ports
    in this sg can't corrcectly update their iptables and
    ipset sets.

    Change-Id: Ibb071ce84590bd46cda2c1e010a566e75e22b4d2
    Related-bug: #1491131
    Closes-bug: #1460562
    (cherry picked from commit 764f018f50ac7cd42c29efeabaccbb5aec21f6f4)

tags: added: in-stable-kilo
Changed in neutron:
assignee: Cedric Brandily (cbrandily) → shihanzhang (shihanzhang)
status: Invalid → In Progress

Reviewed: https://review.openstack.org/224528
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=d567d57f72ae9699442ca999ce347074ad8df886
Submitter: Jenkins
Branch: stable/juno

commit d567d57f72ae9699442ca999ce347074ad8df886
Author: shihanzhang <email address hidden>
Date: Mon Jun 1 16:17:37 2015 +0800

    Fix ipset can't be destroyed when last rule is deleted

    when it deletes a security group all rules, it should
    include this sg information in RPC method
    'security_group_info_for_devices', otherwise the ports
    in this sg can't corrcectly update their iptables and
    ipset sets.

    Related-bug: #1491131
    Closes-bug: #1460562
    (cherry picked from commit 764f018f50ac7cd42c29efeabaccbb5aec21f6f4)
    (cherry picked from commit 114949b789eca7ef88ec4c43f1bb0b7233abd69c)

    Conflicts neutron/tests/unit/test_security_groups_rpc.py, tests
    adapted for py26 compatibility.
    Change-Id: Ibb071ce84590bd46cda2c1e010a566e75e22b4d2

tags: added: in-stable-juno

Miguel: fixes were backported for a related bug. Can we assume that the fix was enough to close this bug too?

Kris Lindgren (klindgren) wrote :

I don't see how the backported fix actually solves the problem I am running into? The issue I am running into on kilo is that OVS-agent churns forever trying to delete an ipset that doesn't exist. The error given is not "trying to delete an ipset thats in use", which seem to be what this backport is for?

Changed in neutron:
milestone: none → mitaka-rc1
Kevin Benton (kevinbenton) wrote :

I don't think that the infinite loop can exist anymore. We don't check for failures on the delete attempt after this patch had merged and it's in Kilo and Liberty: https://review.openstack.org/#/q/If67330523d114d6da13d0280851e7138a51d08f7,n,z

Invalid on Mitaka+?

Changed in neutron:
status: In Progress → Won't Fix
milestone: mitaka-rc1 → none
assignee: shihanzhang (shihanzhang) → nobody
status: Won't Fix → Incomplete
Kevin Benton (kevinbenton) wrote :

I'm not completely sure. There are basically 3 bugs in one and I'm not sure which have been addressed over the last few cycles. The infinite loop should be gone. I vaguely recall the one about non-existent sets being referenced by iptables being addressed, but I can't be sure.

I suppose mark as incomplete makes sense for now and we can always re-open if someone can get some reproductions.

Download full text (3.5 KiB)

Hi,

I use OpenStack Kilo 2015.1.3 in Ubuntu 14.0.4 with LB + OVS (3 nodes).

nova list

| d2230162-be17-4a17-89b8-da8f17ff4406 | vm-cirros-sec-global-http4 | ACTIVE | - | Running | demo-net=192.168.1.25, 30.0.0.103 |
| 29a86d69-5058-49dc-bfba-79415082af39 | vm-ubuntu-sec-global-http | ACTIVE | - | Running | demo-net=192.168.1.22, 30.0.0.102 |

ping 30.0.0.102
PING 30.0.0.102 (30.0.0.102) 56(84) bytes of data.
64 bytes from 30.0.0.102: icmp_seq=1 ttl=62 time=3.05 ms
64 bytes from 30.0.0.102: icmp_seq=2 ttl=62 time=2.34 ms

 wget 30.0.0.102
--2016-05-16 15:14:02-- http://30.0.0.102/
Connecting to 30.0.0.102:80... connected.
HTTP request sent, awaiting response...

VM receives the wget request, but does not respond. Apache 2 is installed on the VM.

ssh igoravila@30.0.0.102

VM does not respond SSH.

nova secgroup-list-rules global_http
+-------------+-----------+---------+-----------+--------------+
| IP Protocol | From Port | To Port | IP Range | Source Group |
+-------------+-----------+---------+-----------+--------------+
| tcp | 443 | 443 | 0.0.0.0/0 | |
| udp | 53 | 53 | 0.0.0.0/0 | |
| tcp | 22 | 22 | 0.0.0.0/0 | |
| tcp | 80 | 80 | 0.0.0.0/0 | |
| icmp | -1 | -1 | 0.0.0.0/0 | |
+-------------+-----------+---------+-----------+--------------+

openvswitch-agent.log (netowrk Node):

2016-05-16 14:39:15.872 1103 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-1d9fe87f-8c8a-4d29-afcf-51badf621837 ] Configuration for device aa0ffc40-690a-4861-a20b-a057d60b363d completed.
2016-05-16 14:39:16.097 1103 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-1d9fe87f-8c8a-4d29-afcf-51badf621837 ] Port fed2747d-f68c-4f17-b57e-785f17155254 updated. Details: {u'profile': {}, u'allowed_address_pairs': [], u'admin_state_up': True, u'network_id': u'162fdcc2-74f1-4f9b-b7f1-b782acb38dbf', u'segmentation_id': None, u'device_owner': u'neutron:LOADBALANCER', u'physical_network': u'external', u'mac_address': u'fa:16:3e:25:25:11', u'device': u'fed2747d-f68c-4f17-b57e-785f17155254', u'port_security_enabled': True, u'port_id': u'fed2747d-f68c-4f17-b57e-785f17155254', u'fixed_ips': [{u'subnet_id': u'7641d89b-2356-4bd2-a01d-33679d03c601', u'ip_address': u'30.0.0.107'}], u'network_type': u'flat'}
2016-05-16 14:39:16.099 1103 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-1d9fe87f-8c8a-4d29-afcf-51badf621837 ] Assigning 2 as local vlan for net-id=162fdcc2-74f1-4f9b-b7f1-b782acb38dbf
2016-05-16 14:39:16.582 1103 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-1d9fe87f-8c8a-4d29-afcf-51badf621837 ] Configuration for device fed2747d-f68c-4f17-b57e-785f17155254 completed.
2016-05-16 14:52:08.632 1103 INFO neutron.agent.securitygroups_rpc [req-ad9f1b57-c3d1-41b9-a6c1-f040353ff9ee ] Security group member updated [u'e5d46ebd-67c7-4cad-aa68-b276367b77af']
2016-05-16 14:52:09.279 1103 INFO neutron.agent.common.ovs_lib [req-1d9fe87f-8c8a-4d29-afcf-51badf621837 ] Port 9ba102a6-a9d6-4385-87b0-656e58282a20 not present in bridge br-int
201...

Read more...

Miguel Angel Ajo (mangelajo) wrote :

Igor, I believe you are configuring from & to port in the security groups, configure only the "To" port.

Btw, the security groups settings are handled by the openvswitch-agent in the hypervisor, not on the network node.

This bug is > 180 days without activity. We are unsetting assignee and milestone and setting status to Incomplete in order to allow its expiry in 60 days.

If the bug is still valid, then update the bug status.

Changed in neutron:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers