neutron-netns-cleanup explodes when trying to delete an OVS internal port

Bug #1776469 reported by Miguel Angel Ajo
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Miguel Angel Ajo

Bug Description

Apparently, the exception is not bubbling up out of privsep, and the cleanup exits instead of retrying with ovsdb del port:

https://github.com/openstack/neutron/blob/100491cec72ecf694cc8cbd6cd17b66a191a5bd7/neutron/cmd/netns_cleanup.py#L124

def unplug_device(conf, device):
    orig_log_fail_as_error = device.get_log_fail_as_error()
    device.set_log_fail_as_error(False)
    try:
        device.link.delete()
    except RuntimeError:
        device.set_log_fail_as_error(orig_log_fail_as_error)
        # Maybe the device is OVS port, so try to delete
        ovs = ovs_lib.BaseOVS()
        bridge_name = ovs.get_bridge_for_iface(device.name)
        if bridge_name:
            bridge = ovs_lib.OVSBridge(bridge_name)
            bridge.delete_port(device.name)
        else:
            LOG.debug('Unable to find bridge for device: %s', device.name)
    finally:
        device.set_log_fail_as_error(orig_log_fail_as_error)

neutron-netns-cleanup --config-file /usr/share/neutron/neutron-dist.conf --config-dir /usr/share/neutron/l3_agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/l3_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-l3-agent --agent-type l3 -d --force

2018-06-12 11:39:26.868 254573 INFO neutron.common.config [-] Logging enabled!
2018-06-12 11:39:26.868 254573 INFO neutron.common.config [-] /usr/bin/neutron-netns-cleanup version 13.0.0.0b2.dev174
2018-06-12 11:39:26.868 254573 DEBUG neutron.common.config [-] command line: /usr/bin/neutron-netns-cleanup --config-file /usr/share/neutron/neutron-dist.conf --config-dir /usr/share/neutron/l3_agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/l3_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-l3-agent --agent-type l3 -d --force setup_logging /usr/lib/python2.7/site-packages/neutron/common/config.py:104
2018-06-12 11:39:26.869 254573 INFO oslo.privsep.daemon [-] Running privsep helper: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'privsep-helper', '--config-file', '/usr/share/neutron/neutron-dist.conf', '--config-file', '/etc/neutron/neutron.conf', '--config-file', '/etc/neutron/l3_agent.ini', '--config-dir', '/etc/neutron/conf.d/neutron-l3-agent', '--privsep_context', 'neutron.privileged.default', '--privsep_sock_path', '/tmp/tmpNU7Loh/privsep.sock']
2018-06-12 11:39:27.455 254573 INFO oslo.privsep.daemon [-] Spawned new privsep daemon via rootwrap
2018-06-12 11:39:27.456 254573 DEBUG oslo.privsep.daemon [-] Accepted privsep connection to /tmp/tmpNU7Loh/privsep.sock __init__ /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:331
2018-06-12 11:39:27.386 254707 INFO oslo.privsep.daemon [-] privsep daemon starting
2018-06-12 11:39:27.390 254707 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0
2018-06-12 11:39:27.395 254707 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_NET_ADMIN|CAP_SYS_ADMIN/CAP_NET_ADMIN|CAP_SYS_ADMIN/none
2018-06-12 11:39:27.395 254707 INFO oslo.privsep.daemon [-] privsep daemon running as pid 254707
2018-06-12 11:39:27.458 254707 DEBUG oslo.privsep.daemon [-] privsep: request[140529299646096]: (1,) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:443
2018-06-12 11:39:27.458 254707 DEBUG oslo.privsep.daemon [-] privsep: reply[140529299646096]: (2,) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:456
2018-06-12 11:39:27.458 254707 DEBUG oslo.privsep.daemon [-] privsep: request[140529299646096]: (3, 'neutron.privileged.agent.linux.ip_lib.list_netns', (), {}) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:443
2018-06-12 11:39:27.501 254707 DEBUG oslo.privsep.daemon [-] privsep: reply[140529299646096]: (4, ['qdhcp-64aa11b0-d9ff-47c3-9a44-2906bc22d724', 'qrouter-c24debdc-7bcd-40d7-90b9-32e0ec9bb11a', 'qdhcp-4b523888-7121-4133-b0c1-ff6a81a40dcd']) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:456
2018-06-12 11:39:30.179 254573 DEBUG neutron.agent.linux.utils [-] Unable to access /var/lib/neutron/dhcp/qrouter-c24debdc-7bcd-40d7-90b9-32e0ec9bb11a/pid get_value_from_file /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:254
2018-06-12 11:39:30.179 254707 DEBUG oslo.privsep.daemon [-] privsep: request[140529299646096]: (3, 'neutron.privileged.agent.linux.ip_lib.list_netns', (), {}) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:443
2018-06-12 11:39:30.179 254707 DEBUG oslo.privsep.daemon [-] privsep: reply[140529299646096]: (4, ['qdhcp-64aa11b0-d9ff-47c3-9a44-2906bc22d724', 'qrouter-c24debdc-7bcd-40d7-90b9-32e0ec9bb11a', 'qdhcp-4b523888-7121-4133-b0c1-ff6a81a40dcd']) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:456
2018-06-12 11:39:30.180 254573 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qrouter-c24debdc-7bcd-40d7-90b9-32e0ec9bb11a', 'netstat', '-nlp'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:108
2018-06-12 11:39:30.189 254573 DEBUG oslo_rootwrap.client [-] Popen for ['sudo', 'neutron-rootwrap-daemon', '/etc/neutron/rootwrap.conf'] command has been instantiated _initialize /usr/lib/python2.7/site-packages/oslo_rootwrap/client.py:73
2018-06-12 11:39:30.327 254573 INFO oslo_rootwrap.client [-] Spawned new rootwrap daemon process with pid=255039
2018-06-12 11:39:30.411 254573 DEBUG neutron.agent.linux.utils [-] Running command: ['ps', '-o', 'ppid=', '64523'] create_process /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:92
2018-06-12 11:39:30.450 254573 DEBUG neutron.agent.linux.utils [-] Running command: ['ps', '-o', 'ppid=', '64507'] create_process /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:92
2018-06-12 11:39:30.486 254573 WARNING neutron.cmd.netns_cleanup [-] Killing (15) [64507] /usr/sbin/haproxy -Ds -f /var/lib/neutron/ns-metadata-proxy/c24debdc-7bcd-40d7-9
2018-06-12 11:39:30.486 254573 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['kill', '-15', '64507'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:108
2018-06-12 11:39:30.492 254573 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qrouter-c24debdc-7bcd-40d7-90b9-32e0ec9bb11a', 'netstat', '-nlp'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:108
2018-06-12 11:39:31.543 254573 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qrouter-c24debdc-7bcd-40d7-90b9-32e0ec9bb11a', 'netstat', '-nlp'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:108
2018-06-12 11:39:31.615 254573 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qrouter-c24debdc-7bcd-40d7-90b9-32e0ec9bb11a', 'find', '/sys/class/net', '-maxdepth', '1', '-type', 'l', '-printf', '%f '] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:108
2018-06-12 11:39:31.658 254707 DEBUG oslo.privsep.daemon [-] privsep: request[140529299646096]: (3, 'neutron.privileged.agent.linux.ip_lib.delete_interface', (u'qr-5c4dbfdf-2e', 'qrouter-c24debdc-7bcd-40d7-90b9-32e0ec9bb11a'), {}) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:443
2018-06-12 11:39:31.696 254707 DEBUG oslo.privsep.daemon [-] privsep: Exception during request[140529299646096]: (95, 'Operation not supported') loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:449
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 445, in loop
    reply = self._process_cmd(*msg)
  File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 428, in _process_cmd
    ret = func(*f_args, **f_kwargs)
  File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 209, in _wrap
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/neutron/privileged/agent/linux/ip_lib.py", line 242, in delete_interface
    _run_iproute_link("del", ifname, namespace, **kwargs)
  File "/usr/lib/python2.7/site-packages/neutron/privileged/agent/linux/ip_lib.py", line 131, in _run_iproute_link
    return ip.link(command, index=idx, **kwargs)
  File "/usr/lib/python2.7/site-packages/pyroute2/iproute.py", line 1310, in link
    msg_flags=msg_flags)
  File "/usr/lib/python2.7/site-packages/pyroute2/netlink/nlsocket.py", line 804, in nlm_request
    return do_try()
  File "/usr/lib/python2.7/site-packages/pyroute2/netlink/nlsocket.py", line 783, in do_try
    callback=callback)
  File "/usr/lib/python2.7/site-packages/pyroute2/netlink/nlsocket.py", line 648, in get
    raise msg['header']['error']
NetlinkError: (95, 'Operation not supported')
2018-06-12 11:39:31.697 254707 DEBUG oslo.privsep.daemon [-] privsep: reply[140529299646096]: (5, 'pyroute2.netlink.exceptions.NetlinkError', (95, 'Operation not supported')) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:456
2018-06-12 11:39:31.709 254573 ERROR neutron.cmd.netns_cleanup [-] Error unable to destroy namespace: qrouter-c24debdc-7bcd-40d7-90b9-32e0ec9bb11a: NetlinkError: (95, 'Operation not supported')
2018-06-12 11:39:31.709 254573 ERROR neutron.cmd.netns_cleanup Traceback (most recent call last):
2018-06-12 11:39:31.709 254573 ERROR neutron.cmd.netns_cleanup File "/usr/lib/python2.7/site-packages/neutron/cmd/netns_cleanup.py", line 251, in destroy_namespace
2018-06-12 11:39:31.709 254573 ERROR neutron.cmd.netns_cleanup unplug_device(conf, device)
2018-06-12 11:39:31.709 254573 ERROR neutron.cmd.netns_cleanup File "/usr/lib/python2.7/site-packages/neutron/cmd/netns_cleanup.py", line 126, in unplug_device
2018-06-12 11:39:31.709 254573 ERROR neutron.cmd.netns_cleanup device.link.delete()
2018-06-12 11:39:31.709 254573 ERROR neutron.cmd.netns_cleanup File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 549, in delete
2018-06-12 11:39:31.709 254573 ERROR neutron.cmd.netns_cleanup privileged.delete_interface(self.name, self._parent.namespace)
2018-06-12 11:39:31.709 254573 ERROR neutron.cmd.netns_cleanup File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 207, in _wrap
2018-06-12 11:39:31.709 254573 ERROR neutron.cmd.netns_cleanup return self.channel.remote_call(name, args, kwargs)
2018-06-12 11:39:31.709 254573 ERROR neutron.cmd.netns_cleanup File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 202, in remote_call
2018-06-12 11:39:31.709 254573 ERROR neutron.cmd.netns_cleanup raise exc_type(*result[2])
2018-06-12 11:39:31.709 254573 ERROR neutron.cmd.netns_cleanup NetlinkError: (95, 'Operation not supported')
2018-06-12 11:39:31.709 254573 ERROR neutron.cmd.netns_cleanup
2018-06-12 11:39:31.712 254573 INFO oslo_rootwrap.client [-] Stopping rootwrap daemon process with pid=255039

neutron l3-agent is a bit more smart while deleting ports and it will check if br-int is in the system first, then delete the port:

11:11:40.861 85586 DEBUG neutron.agent.l3.router_info [-] removing internal network: port(567309c3-a831-4086-bc77-d3523b149e74) interface(qr-567309c3-a8) internal_network_removed /usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py:478

11:11:40.862 95332 DEBUG oslo.privsep.daemon [-] privsep: request[140596640998416]: (3, 'neutron.privileged.agent.linux.ip_lib.interface_exists', (u'qr-567309c3-a8', u'qrouter-c24debdc-7bcd-40d7-90b9-32e0ec9bb11a'), {}) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:443
13:11 <ajo> 2018-06-12 11:11:40.877 85586 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): DelPortCommand(if_exists=True, bridge=br-int, port=qr-567309c3-a8) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84

Changed in neutron:
status: New → Incomplete
status: Incomplete → Confirmed
importance: Undecided → Medium
assignee: nobody → Miguel Angel Ajo (mangelajo)
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

Capturing the NetlinkError exception seems to work, but leaves a very dirty log full of traces. Looking for a different solution.

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

Oh, wait, the exception only shows up in debug mode. I guess that's ok.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/574712

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/574712
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7458575cfbc00a9bedf4d514a95e9b891639d5e8
Submitter: Zuul
Branch: master

commit 7458575cfbc00a9bedf4d514a95e9b891639d5e8
Author: Miguel Angel Ajo <email address hidden>
Date: Tue Jun 12 14:35:39 2018 +0200

    Convert missing exception on device.link.delete()

    Once we started using oslo.privsep the call to device.link.delete()
    should return RuntimeError when the device can't be handled by ip link
    for example, when it's an ovs internal device.

    Closes-Bug: #1776469

    Change-Id: Ibf4b0bbb54aef38fc569036880668c745cb5c096

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.0.0b3

This issue was fixed in the openstack/neutron 13.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.