ip netns list starts without root_helper

Bug #1311804 reported by Sergey Vasilenko
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Undecided
Unassigned
neutron
Fix Released
Medium
Kevin Benton

Bug Description

CENTOS 6.5
Reproduced on typical Openstack installation in any segmentation type with one L3-agent.

In ip_lib in IpNetnsComand losted root_helper.
Without it L3 agent can't create interfaces inside network namespace, because in Centos 'ip netns list' returns empty list if start without root privileges.

[root@node-2 ~]# uname -a
Linux node-2.domain.tld 2.6.32-431.11.2.el6.x86_64 #1 SMP Tue Mar 25 19:59:55 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@node-2 ~]# ip netns list
haproxy
qrouter-b582586e-70e3-4a38-8b19-039f30ce87a9
[root@node-2 ~]# su -m neutron -c 'ip netns list'
[root@node-2 ~]#

in the log below exception happens because namespace already exists (see full log in attach), but can't detected by ip netns list without root_wrapper.

2014-04-23 16:15:44.760 28240 DEBUG neutron.agent.linux.utils [req-d0f812f6-d987-45f5-9cff-11f1fa52fed6 None] Running command: ['ip', '-o', 'netns', 'list'] create_process /
usr/lib/python2.6/site-packages/neutron/agent/linux/utils.py:48
2014-04-23 16:15:44.781 28240 DEBUG neutron.agent.linux.utils [req-d0f812f6-d987-45f5-9cff-11f1fa52fed6 None]
Command: ['ip', '-o', 'netns', 'list']
Exit code: 0
Stdout: ''
Stderr: '' execute /usr/lib/python2.6/site-packages/neutron/agent/linux/utils.py:74
2014-04-23 16:15:44.782 28240 DEBUG neutron.agent.linux.utils [req-d0f812f6-d987-45f5-9cff-11f1fa52fed6 None] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/roo
twrap.conf', 'ip', 'netns', 'add', 'qrouter-b582586e-70e3-4a38-8b19-039f30ce87a9'] create_process /usr/lib/python2.6/site-packages/neutron/agent/linux/utils.py:48
2014-04-23 16:15:44.864 28240 DEBUG neutron.agent.linux.utils [req-d0f812f6-d987-45f5-9cff-11f1fa52fed6 None]
Command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'add', 'qrouter-b582586e-70e3-4a38-8b19-039f30ce87a9']
Exit code: 255
Stdout: ''
Stderr: 'Could not create /var/run/netns/qrouter-b582586e-70e3-4a38-8b19-039f30ce87a9: File exists\n' execute /usr/lib/python2.6/site-packages/neutron/agent/linux/utils.py:7
4
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/eventlet/greenpool.py", line 80, in _spawn_n_impl
    func(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/neutron/agent/l3_agent.py", line 438, in process_router
    p['ip_cidr'], p['mac_address'])
  File "/usr/lib/python2.6/site-packages/neutron/agent/l3_agent.py", line 707, in internal_network_added
    prefix=INTERNAL_DEV_PREFIX)
  File "/usr/lib/python2.6/site-packages/neutron/agent/linux/interface.py", line 195, in plug
    namespace_obj = ip.ensure_namespace(namespace)
  File "/usr/lib/python2.6/site-packages/neutron/agent/linux/ip_lib.py", line 136, in ensure_namespace
    ip = self.netns.add(name)
  File "/usr/lib/python2.6/site-packages/neutron/agent/linux/ip_lib.py", line 446, in add
    self._as_root('add', name, use_root_namespace=True)
  File "/usr/lib/python2.6/site-packages/neutron/agent/linux/ip_lib.py", line 217, in _as_root
    kwargs.get('use_root_namespace', False))
  File "/usr/lib/python2.6/site-packages/neutron/agent/linux/ip_lib.py", line 70, in _as_root
    namespace)
  File "/usr/lib/python2.6/site-packages/neutron/agent/linux/ip_lib.py", line 81, in _execute
    root_helper=root_helper)
  File "/usr/lib/python2.6/site-packages/neutron/agent/linux/utils.py", line 76, in execute
    raise RuntimeError(m)
RuntimeError:
Command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'add', 'qrouter-b582586e-70e3-4a38-8b19-039f30ce87a9']
Exit code: 255

Revision history for this message
Sergey Vasilenko (xenolog) wrote :
description: updated
Changed in neutron:
status: New → Incomplete
description: updated
description: updated
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Can't reproduce that on Centos 6.5 x86_64
ip netns list seems to show correct list of network namespaces being run under regular user

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

I can completely confirm this issue.

[root@node-7 ~]# cat /etc/redhat-release
CentOS release 6.5 (Final)
[root@node-7 ~]# ip netns list
qrouter-0f3f6a72-d209-4ff7-a8fa-7a143f6bcaaf
qdhcp-58e83332-5db8-444c-87f1-fef7c0dcafd9
haproxy
[root@node-7 ~]# su neutron -s '/bin/bash'
bash-4.1$ ip netns list
bash-4.1$

Changed in neutron:
status: Incomplete → Confirmed
Mike Scherbakov (mihgen)
tags: added: icehouse-backport-potential
Revision history for this message
Brian Haley (brian-haley) wrote :

So since people are seeing different behaviors, is there some underlying security setting controlling this? It would be good to know why that difference exists.

I say this because we spent some time in Icehouse, and will be doing more in Juno, to reduce calls using rootwrap to increase performance in Neutron. There is even a design summit session to discuss this:

https://etherpad.openstack.org/p/neutron-agent-exec-performance
http://junodesignsummit.sched.org/event/21b51dfcc1d4098383904f923dfb9a74 (Friday 2:10 timeslot)

And perhaps if there is no other solution, maybe it's possible to detect when we're on such a distro (or have a config flag), and use rootwrap only then, as there migt be other places we also need to do this. I just don't want to take a step backwards and impact all the other distros because of this one if possible.

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Brian, it appears to be a particular deployment setting which resulted in /var/run/netns having 751 mode instead of regular 755. That has caused the behavior described in this bug.

Revision history for this message
Brian Haley (brian-haley) wrote :

Sorry, I didn't get your update since I hadn't subscribed to this bug.

Is this deployment setting something standard with RHEL, or is it something added ? Do you feel this is still a bug or was it an oversight in those settings?

Revision history for this message
Ryan Moe (rmoe) wrote :

This was only a problem on CentOS. The issue was that Pacemaker sets the umask to 026 (this is hard-coded) which gets inherited by all processes that get launched. A umask of 026 is how we ended up with 751 permissions on a bunch of different things.

Changed in neutron:
status: Confirmed → Invalid
no longer affects: fuel/5.0.x
Changed in fuel:
status: New → Invalid
Revision history for this message
Kevin Benton (kevinbenton) wrote :

This bug should not have been closed for neutron. The performance argument is weak because this code is only called during interface creation. The trade-off for a tiny bit of performance in infrequent code for neutron completely breaking in some environments is unacceptable IMO. This already wasted an afternoon on my part any I'm relatively familiar with the neutron code. This would be nearly impossible to debug for a deployer.

Changed in neutron:
status: Invalid → Confirmed
Changed in neutron:
assignee: nobody → Kevin Benton (kevinbenton)
status: Confirmed → In Progress
Revision history for this message
Brian Haley (brian-haley) wrote :

But shouldn't we fix the thing that is changing the permissions? With the right (wrong?) /sbin/ip and sudo and 10K interfaces, there is overhead. That's only my opinion of course.

Revision history for this message
Kevin Benton (kevinbenton) wrote :

Yes, but we don't know what is changing permissions in all cases. We are much better off here programming defensively rather than going for extreme optimization for something that isn't the hot path compared to security group processing, RPC calls, etc.

10k interfaces on one node is pretty unlikely, but even in that case processing 10,000 sudo calls added about 42 seconds to the whole process, which is definitely not going to be the bottleneck in creating that many interfaces simultaneously.

$ python
Python 2.7.3 (default, Feb 27 2014, 19:58:35)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.timeit("import os; os.system('ip netns list')", number=10000)
15.637353897094727
>>> timeit.timeit("import os; os.system('sudo ip netns list')", number=10000)
57.93700289726257
>>>

Revision history for this message
Andrew Woodward (xarses) wrote :

Brian, Kevin,

Forgot about this one. The issue we had was as described related to permissions, In our case the permissions where being mucked up by pacemaker in CentOS having a wrong mask hard-coded into it. As Brian notes the correct solution is fixing permissions unless for some reason the permissions are bad due to other security concerns. I do however think that Neutron should be testing for this during start up and yell at you over it

Revision history for this message
Kevin Benton (kevinbenton) wrote :

Andrew,

I ran into the same issue and I really think we should just have neutron elevate here. Otherwise we have to add a bunch of code to inspect permissions and umask variables all just to have neutron fail with a better error message. Still not very user friendly.

Revision history for this message
Brian Haley (brian-haley) wrote :

Well I can "see" 5K interfaces on one network node here, and that doesn't include the other end of the veth pair that it's plumbing. The problem we were seeing with the sudo (and /sbin/ip) overhead was there was a cumulative effect when the number of network interfaces was high, and when you make 10-15 sudo calls creating a neutron router it starts adding up. That's why I'm pushing back here as we saw this change helped.

Revision history for this message
Kevin Benton (kevinbenton) wrote :

So in this case that would add ~42 seconds to create every single router interface on that node based on my numbers above with my dev VM. Do you have any other numbers to get an idea of how large the percentage of time this would take?

Revision history for this message
Kevin Benton (kevinbenton) wrote :

To clarify ~42 seconds to create all of the router interfaces simultaneously. Not 42 seconds per interface.

Revision history for this message
Kevin Benton (kevinbenton) wrote :

Hi Brian and Andrew,

Can you check the latest patch? (https://review.openstack.org/#/c/109736/)

I've updated it with logic to only use the root_helper for every call if it's determined to be necessary.
Let me know if this doesn't address your concerns.

Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/4.1.x
Changed in fuel:
milestone: none → 4.1.1
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

I agree with Kevin Benton here, being a bit defensive, and at least providing a warning
to the administrator, so it can take action and fix it.

It's better to make it work (even at degraded performance), and WARN the deployer
about the problem, that, not working, and making it extremely difficult to the deployer
to work, because it will blame neutron at the end.

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

AFAIK sudo is not the same as rootwrap in terms of performance, so it's not just 42 seconds. I would expect it to be ~10x times more.

Changed in neutron:
importance: Undecided → Low
Changed in neutron:
importance: Low → Medium
Revision history for this message
Sergey Vasilenko (xenolog) wrote :

Guys, I suggest another solution for this issues.

at the start of *-agent we should create test network namespace (with random name), check, that we can see it without root-helper.
If we can't see newly created net.namespace, but can delete it without error -- we has misconfigured permissions on host system.
In this situation agent shouldn't start and should log error message to the log file.

This solution allows didn't waste system time for run ip-netns-list under root-wrapper. At this time it's allows uniquely determine what error happens and why *-agent don't work properly.

Revision history for this message
Kevin Benton (kevinbenton) wrote :

Sergey,

The patch I proposed will determine if the root_helper is required by checking the output without the root_helper. If it's determined that the helper is required, it uses it. If not, it doesn't.

Do you see a problem with that approach?

Revision history for this message
Sergey Vasilenko (xenolog) wrote :

I think it's a good approach.
may be will be good make configurable option -- use root_wrapper or fail a start. But may be it's a overlogic.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Salvatore Orlando (<email address hidden>) on branch: master
Review: https://review.openstack.org/89907
Reason: This patch has been inactive long enough that I think it's safe to abandon.
The author can resurrect it if needed.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/109736
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9833364fbd4705fc4a563192cf2707ffe8cf763d
Submitter: Jenkins
Branch: master

commit 9833364fbd4705fc4a563192cf2707ffe8cf763d
Author: Kevin Benton <email address hidden>
Date: Fri Jul 25 14:27:00 2014 -0700

    Option for root_helper when checking namespace

    Adds a configuration option to use the root helper in the ip netns list
    command executed by the IP library when checking for the existence of a
    namespace. This prevents an unprivileged l3 agent from erroneously trying
    to create another namespace when one already exists. This is necessary in
    environments with constrained permissions on /var/run/netns via umask or
    other access controls.

    However, due to the overhead incurred by calling sudo every time on systems
    where this restriction isn't in place, this configuration won't be desired
    all of the time. So this patch also adds a sanity check that reports back
    whether or not the root_helper is required for a deployment.

    DocImpact

    Closes-Bug: #1348812
    Closes-Bug: #1311804
    Change-Id: If7560161de3be6066af0d9866e6b5cd7c7247c33

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → kilo-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-2 → 2015.1.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/227589
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=81823e86328e62850a89aef9f0b609bfc0a6dacd
Submitter: Jenkins
Branch: master

commit 81823e86328e62850a89aef9f0b609bfc0a6dacd
Author: John Kasperski <email address hidden>
Date: Thu Sep 24 18:16:18 2015 -0500

    Improve performance of ensure_namespace

    The ensure_namespace method calls IpNetnsCommand.exists to
    determine if the specified namespace exists or not. This is
    accomplished by listing all namespaces with "ip netns list"
    and then looping through the output to determine if the specified
    namespace was included in the output.

    Research of various Linux operating systems has indicated that
    namespaces are represented as files in /var/run/netns and root
    authority is "typically" not required in order to look at the
    files in this subdirectory.

    The existing configuration option "use_helper_for_ns_read"
    will be used to determine if the root-helper should be used to
    to retrieve the list of namespaces. If this configuraton option
    is set to False, the native python os.listdir(/var/run/netns)
    will be used.

    Related-Bug: #1311804
    Closes-Bug: #1497396
    Change-Id: I9da627d07d6cbb6e5ef1a921a5f22963317a04e2

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.