A tap device can suddenly disappear e.g. due to instance destroy. If the agent is in the midst of processing a a device update for this tap device (e.g. due to a security group update), the agent logs the following errors:
2016-01-07 17:43:52.225 DEBUG neutron.agent.linux.utils [req-07a4fb1d-88fe-40d7-b0fa-f93d1bac8a34 None None] Running command: ['ip', '-o', 'link', 'show', 'tapa0084edd-d4'] create_process /opt/stack/new/neutron/neutron/agent/linux/utils.py:84
2016-01-07 17:43:52.230 DEBUG neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [req-07a4fb1d-88fe-40d7-b0fa-f93d1bac8a34 None None] Tap device: tapa0084edd-d4 does not exist on this host, skipped add_tap_interface /opt/stack/new/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py:409
2016-01-07 17:43:52.230 DEBUG neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [req-07a4fb1d-88fe-40d7-b0fa-f93d1bac8a34 None None] Setting admin_state_up to True for port a0084edd-d437-4ff0-b2e7-7cfd93ea34c4 ensure_port_admin_state /opt/stack/new/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py:686
2016-01-07 17:43:52.231 DEBUG neutron.agent.linux.utils [req-07a4fb1d-88fe-40d7-b0fa-f93d1bac8a34 None None] Running command (rootwrap daemon): ['ip', 'link', 'set', 'tapa0084edd-d4', 'up'] execute_rootwrap_daemon /opt/stack/new/neutron/neutron/agent/linux/utils.py:100
2016-01-07 17:43:52.263 ERROR neutron.agent.linux.utils [req-07a4fb1d-88fe-40d7-b0fa-f93d1bac8a34 None None] Exit code: 1; Stdin: ; Stdout: ; Stderr: Cannot find device "tapa0084edd-d4"
2016-01-07 17:43:52.263 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [req-07a4fb1d-88fe-40d7-b0fa-f93d1bac8a34 None None] Error in agent loop. Devices info: {'current': set(['tap3bbccdeb-0d', 'tap2cbadddb-48', 'tap2ff01acc-16', 'tap92ccd364-e1', 'tap1b585b2d-f7', 'tap6838b208-7e', 'tapf03a19db-48', 'tap294b5031-17', 'tapa0084edd-d4', 'tap6457a7f6-65', 'tap91c29239-c1']), 'removed': set([]), 'added': set([]), 'updated': set([u'tapa0084edd-d4'])}
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent Traceback (most recent call last):
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent File "/opt/stack/new/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 1191, in daemon_loop
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent sync = self.process_network_devices(device_info)
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent File "/opt/stack/new/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 994, in process_network_devices
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent resync_a = self.treat_devices_added_updated(devices_added_updated)
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent File "/opt/stack/new/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 1070, in treat_devices_added_updated
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent device_details['admin_state_up'])
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent File "/opt/stack/new/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 689, in ensure_port_admin_state
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent ip_lib.IPDevice(tap_name).link.set_up()
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 461, in set_up
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent return self._as_root([], ('set', self.name, 'up'))
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 321, in _as_root
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent use_root_namespace=use_root_namespace)
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 94, in _as_root
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent log_fail_as_error=self.log_fail_as_error)
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 103, in _execute
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent log_fail_as_error=log_fail_as_error)
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent File "/opt/stack/new/neutron/neutron/agent/linux/utils.py", line 140, in execute
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent raise RuntimeError(msg)
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent RuntimeError: Exit code: 1; Stdin: ; Stdout: ; Stderr: Cannot find device "tapa0084edd-d4"
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent
The logs [1][2] show two scenarios where this happens:
- hard reboot of an instance (instance is destroyed and defined again)
- destroy of an instance
After that , the agent triggers a resync and gets everything fine again
The solution could be to check for the device existence in ensure_port_admin_state method. Maybe we should check if there's a more elegant way to do it to avoid the resync...
[1] http://logs.openstack.org/18/246318/14/check/gate-tempest-dsvm-neutron-linuxbridge/cdded9a/logs/screen-n-cpu.txt.gz#_2016-01-07_17_43_50_991
[2] http://logs.openstack.org/18/246318/14/check/gate-tempest-dsvm-neutron-linuxbridge/cdded9a/logs/screen-q-agt.txt.gz#_2016-01-07_17_43_45_961
Fix proposed to branch: master /review. openstack. org/268095
Review: https:/