lb: hard reboot or destroy of vm can lead to error log and agent resync

Bug #1532171 reported by Andreas Scheuring
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Andreas Scheuring

Bug Description

A tap device can suddenly disappear e.g. due to instance destroy. If the agent is in the midst of processing a a device update for this tap device (e.g. due to a security group update), the agent logs the following errors:

2016-01-07 17:43:52.225 DEBUG neutron.agent.linux.utils [req-07a4fb1d-88fe-40d7-b0fa-f93d1bac8a34 None None] Running command: ['ip', '-o', 'link', 'show', 'tapa0084edd-d4'] create_process /opt/stack/new/neutron/neutron/agent/linux/utils.py:84
2016-01-07 17:43:52.230 DEBUG neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [req-07a4fb1d-88fe-40d7-b0fa-f93d1bac8a34 None None] Tap device: tapa0084edd-d4 does not exist on this host, skipped add_tap_interface /opt/stack/new/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py:409
2016-01-07 17:43:52.230 DEBUG neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [req-07a4fb1d-88fe-40d7-b0fa-f93d1bac8a34 None None] Setting admin_state_up to True for port a0084edd-d437-4ff0-b2e7-7cfd93ea34c4 ensure_port_admin_state /opt/stack/new/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py:686
2016-01-07 17:43:52.231 DEBUG neutron.agent.linux.utils [req-07a4fb1d-88fe-40d7-b0fa-f93d1bac8a34 None None] Running command (rootwrap daemon): ['ip', 'link', 'set', 'tapa0084edd-d4', 'up'] execute_rootwrap_daemon /opt/stack/new/neutron/neutron/agent/linux/utils.py:100
2016-01-07 17:43:52.263 ERROR neutron.agent.linux.utils [req-07a4fb1d-88fe-40d7-b0fa-f93d1bac8a34 None None] Exit code: 1; Stdin: ; Stdout: ; Stderr: Cannot find device "tapa0084edd-d4"

2016-01-07 17:43:52.263 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [req-07a4fb1d-88fe-40d7-b0fa-f93d1bac8a34 None None] Error in agent loop. Devices info: {'current': set(['tap3bbccdeb-0d', 'tap2cbadddb-48', 'tap2ff01acc-16', 'tap92ccd364-e1', 'tap1b585b2d-f7', 'tap6838b208-7e', 'tapf03a19db-48', 'tap294b5031-17', 'tapa0084edd-d4', 'tap6457a7f6-65', 'tap91c29239-c1']), 'removed': set([]), 'added': set([]), 'updated': set([u'tapa0084edd-d4'])}
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent Traceback (most recent call last):
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent File "/opt/stack/new/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 1191, in daemon_loop
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent sync = self.process_network_devices(device_info)
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent File "/opt/stack/new/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 994, in process_network_devices
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent resync_a = self.treat_devices_added_updated(devices_added_updated)
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent File "/opt/stack/new/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 1070, in treat_devices_added_updated
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent device_details['admin_state_up'])
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent File "/opt/stack/new/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 689, in ensure_port_admin_state
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent ip_lib.IPDevice(tap_name).link.set_up()
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 461, in set_up
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent return self._as_root([], ('set', self.name, 'up'))
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 321, in _as_root
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent use_root_namespace=use_root_namespace)
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 94, in _as_root
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent log_fail_as_error=self.log_fail_as_error)
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 103, in _execute
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent log_fail_as_error=log_fail_as_error)
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent File "/opt/stack/new/neutron/neutron/agent/linux/utils.py", line 140, in execute
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent raise RuntimeError(msg)
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent RuntimeError: Exit code: 1; Stdin: ; Stdout: ; Stderr: Cannot find device "tapa0084edd-d4"
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent
2016-01-07 17:43:52.263 23166 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent

The logs [1][2] show two scenarios where this happens:
- hard reboot of an instance (instance is destroyed and defined again)
- destroy of an instance

After that , the agent triggers a resync and gets everything fine again

The solution could be to check for the device existence in ensure_port_admin_state method. Maybe we should check if there's a more elegant way to do it to avoid the resync...

[1] http://logs.openstack.org/18/246318/14/check/gate-tempest-dsvm-neutron-linuxbridge/cdded9a/logs/screen-n-cpu.txt.gz#_2016-01-07_17_43_50_991
[2] http://logs.openstack.org/18/246318/14/check/gate-tempest-dsvm-neutron-linuxbridge/cdded9a/logs/screen-q-agt.txt.gz#_2016-01-07_17_43_45_961

Tags: linuxbridge
tags: added: linuxbridge
description: updated
description: updated
Assaf Muller (amuller)
Changed in neutron:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/268095

Changed in neutron:
assignee: nobody → Andreas Scheuring (andreas-scheuring)
status: Confirmed → In Progress
Revision history for this message
Andreas Scheuring (andreas-scheuring) wrote :
Revision history for this message
Andreas Scheuring (andreas-scheuring) wrote :

https://review.openstack.org/#/c/276519/ has merged know - so I closed the bug

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Andreas Scheuring (<email address hidden>) on branch: master
Review: https://review.openstack.org/268095
Reason: see https://review.openstack.org/276519

Changed in neutron:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.