netns deletion on newer kernels fails with errno 16

Bug #1795280 reported by Tobias Urdin on 2018-09-30
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
High
Unassigned

Bug Description

This is probably not neutron related, but need help with some input.

On a 3.10 kernel on CentOS 7.5 by simply creating a network and deleting it properly terminates all processes, removes interfaces and deletes the network namespace.

[root@controller ~]# uname -r
3.10.0-862.11.6.el7.x86_64

If running a later kernel like 4.18 there is some change that causes the namespace deletion to cause a OSError errno 16 device or resource busy.

Before something like kernel 3.19 the netns filesystem was provided in proc but has since been moved
to it's own nsfs, maybe this has something to do with it, but I haven't seen this issue on Ubuntu before.

[root@controller ~]# mount | grep qdhcp
proc on /run/netns/qdhcp-51e47959-9a2b-4372-a204-aff75de9bd01 type proc (rw,nosuid,nodev,noexec,relatime)
proc on /run/netns/qdhcp-51e47959-9a2b-4372-a204-aff75de9bd01 type proc (rw,nosuid,nodev,noexec,relatime)

[root@controller ~]# uname -r
4.18.8-1.el7.elrepo.x86_64

nsfs on /run/netns/qdhcp-1fb24615-fd9e-4804-aade-5668bb2cdecb type nsfs (rw,seclabel)
nsfs on /run/netns/qdhcp-1fb24615-fd9e-4804-aade-5668bb2cdecb type nsfs (rw,seclabel)

Perhaps some CentOS or RedHat person can shime in about this.

Can reproduce this every single time:
* Create network, it spawns dnsmasq, haproxy and the interfaces in a netns
* Delete network, it will terminate all processes, delete interface but netns cannot be deleted and throws below error

Seen on both queens and rocky fwiw

2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent [req-28a9e37f-a2ca-4375-a3f0-8384711414dd - - - - -] Unable to disable dhcp for 1fb24615-fd9e-4804-aade-5668bb2cdecb.: OSError: [Errno 16] Device or resource busy
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 144, in call_driver
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent getattr(driver, action)(**action_kwargs)
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 241, in disable
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent self._destroy_namespace_and_port()
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 255, in _destroy_namespace_and_port
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent ip_lib.delete_network_namespace(self.network.namespace)
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 1105, in delete_network_namespace
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent privileged.remove_netns(namespace, **kwargs)
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 207, in _wrap
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent return self.channel.remote_call(name, args, kwargs)
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 202, in remote_call
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent raise exc_type(*result[2])
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent OSError: [Errno 16] Device or resource busy

description: updated
Tobias Urdin (tobias-urdin) wrote :

[root@controller ~]# rpm -qa | grep pyroute2
python2-pyroute2-0.4.21-1.el7.noarch

Tobias Urdin (tobias-urdin) wrote :

should note that running `ip netns delete <namespace>` using the root user or with pyroute2 script running as root the namespace can be deleted.

only happens when neutron should do it

Brian Haley (brian-haley) wrote :

It does look related to the pyroute2 bug linked in #1, I'll follow that as well to see if there is a fix released soon. I haven't seen the failure locally, but don't think I'm running with this kernel version either.

Changed in neutron:
importance: Undecided → High
Nate Johnston (nate-johnston) wrote :

Hi! I don't have a system with that kernel version handy at the moment, so I cannot confirm that I see the issue as well. But I am looking for a subject matter expert at Red Hat.

When you run the `ip netns delete` command are you running within a container as neutron does, or are you not in a container at that point? If outside, try running it inside, and compare the pyroute2 versions inside and outside the container.

Tobias Urdin (tobias-urdin) wrote :

Thanks for your replies. I'm not running Neutron in a container.
I tried to see if I could execute the netns deletion using neutron-rootwrap (we are using the rootwrap-daemon) but I don't remember if it worked.

Best regards

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.