oslo-rootwrap-daemon performing badly in docker containers(centos/fedora)

Bug #1760471 reported by yatin
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
oslo.rootwrap
Confirmed
Undecided
Unassigned

Bug Description

Found while investigating multiple issues with SSH timeout to VM while running tempest tests:-
https://bugs.launchpad.net/tripleo/+bug/1757556, specifically comment: https://bugs.launchpad.net/tripleo/+bug/1757556/comments/9

- Without containers found that performance was good for l3 agent.
- Then found that something has wrong with neutron-rootwrap-daemon, which leads to oslo-rootwrap-daemon
- Tried to run cProfile on 1 operation that l3 agent does on both host(https://review.rdoproject.org/paste/show/95/) and container(https://review.rdoproject.org/paste/show/96/), and result for it shows the performance difference.

If output from paste is lost so copying some here:- Major time on container was taken here, on host below time is negligible:-
42 0.409 0.010 0.409 0.010 {method 'recv_into' of '_socket.socket' objects}
1 0.366 0.366 0.366 0.366 {posix.read}

Then found that oslo.rootwrap ships benchmark with it so tried that and could reproduce it easily in centos and fedora containers:-
I tried on Centos 7 and Fedora 26 host with both centos:7 and fedora:27 containers and all have similar result.

Steps to reproduce on containers:-
1) centos:7 container
sudo docker run -it --privileged=true centos:7 /bin/bash
yum -y install sudo git gcc python-devel iproute
git clone https://github.com/openstack/oslo.rootwrap
curl -s https://bootstrap.pypa.io/get-pip.py | sudo python
pip install tox
cd oslo.rootwrap
tox -ebenchmark

2) fedora:27 container
sudo docker run -it --privileged=true fedora:27 /bin/bash
yum -y install sudo git gcc python-devel iproute
git clone https://github.com/openstack/oslo.rootwrap
curl -s https://bootstrap.pypa.io/get-pip.py | sudo python
pip install tox
cd oslo.rootwrap
tox -ebenchmark

3) fedora:27 container with python3
sudo docker run -it --privileged=true fedora:27 /bin/bash
yum -y install sudo git gcc python3-devel iproute
git clone https://github.com/openstack/oslo.rootwrap
curl -s https://bootstrap.pypa.io/get-pip.py | sudo python
pip install tox
cd oslo.rootwrap

Change tox.ini to set basepython=python3 in [testenv:benchmark] and python --> python3 and a fix on line 58 of benchmark/benchmark.py assert err == "" , "Stderr not empty:\n" + err --> assert err == "" or err == b'', "Stderr not empty:\n" + str(err)

tox -ebenchmark

benchmark results on host, centos7/fedora27 container with py2 and py3: http://paste.openstack.org/show/718122/

Copying just some results here from above to show the big difference:-
On host:- daemon.run('ip netns exec bench_ns ip a') : 11.913ms 20.691ms 57.185ms 6.777ms
On container(py2): daemon.run('ip netns exec bench_ns ip a') : 277.928ms 313.805ms 363.981ms 16.139ms
On container(py3): daemon.run('ip netns exec bench_ns ip a') : 13.048ms 21.600ms 37.930ms 5.663ms

Till now only found that it's happening with containers while running with python2, On host it's fine with python2.
Exact root cause is still not clear, dmsmirad has started with easy reproducers:- https://github.com/dmsimard/oslo.rootwrap-benchmark, which is good to start finding the root cause.

Changed in oslo.rootwrap:
status: New → Confirmed
Revision history for this message
Victor Stinner (vstinner) wrote :

The performance difference likely comes from close_fds=True of subprocess.Popen. On Python 2, Popen calls close(fd) on all file descriptors from 3 to SC_OPEN_MAX. On my Fedora 27 "host", SC_OPEN_MAX is 1,024. But in docker, SC_OPEN_MAX is... 1,048,576: 1,000x larger.

On Python 3, Popen is smarter. On Linux, it lists the content of /proc/self/fd/ to only close open file descriptors. It doesn't depend on SC_OPEN_MAX value.

The quick workaround is to use close_fds=False, but this change can have an impact on security. close_fds=True is a cheap protection to prevent leaking file descriptors to child processes.

See also my PEP 446 which made file descriptors not inheritable by default in Python 3.4.

Revision history for this message
Victor Stinner (vstinner) wrote :

By default, docker containers inherits ulimits from limits of the docker deamon. On Fedora 27, the daemon gets NOFILE=1048576 limit from LimitNOFILE=1048576 in /usr/lib/systemd/system/docker.service.

This big limit impacts many services. A quick search found two issues unrelated to Python:

* slapd: "Memory usage in container is 2 orders of magnitude higher for slapd."
  https://github.com/moby/moby/issues/8231#issue-43921400
* rpm: "Slow performance when installing RPMs to create new Docker images"
  https://github.com/moby/moby/issues/23137#issuecomment-359140849

One solution is to run the impacted containers with lower NOFILE ulimit:

sudo docker run --ulimit nofile=1024:1024 ...

Revision history for this message
Michele Baldessari (michele) wrote :

Another option would be to lower the global docker NOFILE ulimit in the docker systemd unit, but only when ran on CI.

Unless we can easily define A) which containers are affected and B) what is a reasonable FD limit.

Maybe a comparison to the limits we had in baremetal would give us an answer there

Revision history for this message
Victor Stinner (vstinner) wrote :

Latest Docker changes on NOFILE ulimit:

* 2014: https://github.com/moby/moby/commit/7abe70c0b1018729006fd5d614ada2dc67edb9b2
* 2016: https://github.com/moby/moby/commit/428d7337e808ec5f4dba1b0aceda002f295cc320

It seems like the 1048576 value (2^20) comes from the maximum allowed value in RHEL 5:
http://stackoverflow.com/a/1213069/1811501

"RHEL 5 has a maximum value of 1048576 (220) for this limit (NR_OPEN in /usr/include/linux/fs.h), and will not accept any larger value including infinity, even for root. So on RHEL 5 you can use this value in /etc/security/limits.conf and that is as close as you are going to get to infinity."

Revision history for this message
yatin (yatinkarel) wrote :

<< Another option would be to lower the global docker NOFILE ulimit in the docker systemd unit, but only when ran on CI.
I think we can start by setting limit nofile for neutron agents containers(and other services which are relying on oslo-rootwrap-daemon)

Revision history for this message
yatin (yatinkarel) wrote :

For Example, something similar to below:-

# docker run --privileged=true --ulimit nofile=4096 -it dmsimard/oslo.rootwrap-benchmark:centos-7
Thu Apr 5 12:54:13 UTC 2018
/oslo.rootwrap /
Tox version: 2.9.1 imported from /usr/lib/python2.7/site-packages/tox/__init__.pyc
Running 'ip a':
        method : min avg max dev
                   ip a : 3.766ms 3.913ms 5.515ms 292.543us
              sudo ip a : 12.332ms 13.083ms 15.894ms 645.875us
sudo rootwrap conf ip a : 189.039ms 196.875ms 226.927ms 7.126ms
     daemon.run('ip a') : 8.731ms 11.053ms 196.931ms 18.695ms
Running 'ip netns exec bench_ns ip a':
                    method : min avg max dev
              sudo ip netns exec bench_ns ip a : 12.732ms 13.297ms 14.469ms 337.529us
sudo rootwrap conf ip netns exec bench_ns ip a : 190.024ms 195.455ms 220.363ms 4.874ms
     daemon.run('ip netns exec bench_ns ip a') : 9.833ms 10.229ms 11.801ms 303.166us

real 0m44.696s
user 0m25.421s
sys 0m17.029s
/

Thu Apr 5 12:54:58 UTC 2018
# docker run --privileged=true --ulimit nofile=24096 -it dmsimard/oslo.rootwrap-benchmark:centos-7
Thu Apr 5 12:56:06 UTC 2018
/oslo.rootwrap /
Tox version: 2.9.1 imported from /usr/lib/python2.7/site-packages/tox/__init__.pyc
Running 'ip a':
        method : min avg max dev
                   ip a : 3.738ms 3.833ms 5.314ms 163.538us
              sudo ip a : 11.690ms 11.940ms 13.434ms 206.461us
sudo rootwrap conf ip a : 186.247ms 193.332ms 212.408ms 5.487ms
     daemon.run('ip a') : 30.680ms 33.268ms 241.550ms 20.935ms
Running 'ip netns exec bench_ns ip a':
                    method : min avg max dev
              sudo ip netns exec bench_ns ip a : 12.639ms 13.422ms 17.026ms 0.962ms
sudo rootwrap conf ip netns exec bench_ns ip a : 187.968ms 196.106ms 264.668ms 10.389ms
     daemon.run('ip netns exec bench_ns ip a') : 32.311ms 33.184ms 39.456ms 865.649us

real 0m48.815s
user 0m25.134s
sys 0m16.926s

Revision history for this message
Alan Pevec (apevec) wrote :

> but only when ran on CI.

@Michele why is that, it would hit in production too, wouldn't it?

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

This should be fixed for production too.

At least all neutron agents need the fdlimit lowered back to 1024.

Also I'd look at other openstack services that spawn system calls (nova-compute I can easily think of, but I guess there will be more).

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

nofile=24096 is still too high for any neutron agent needs, In the past (baremetal) we were running with 1024, and that is fine. Leaves the rootwrap-daemon execution back to 3ms , This is a critical factor to many customers. We spent a lot of time coming up with the rootwrap-daemon solution which helped with scale a lot. Otherwise we'd be on regresion.

Revision history for this message
Jiří Stránský (jistr) wrote :

I'd be careful with lowering the file limit globally, i think e.g. MariaDB needs at least 16K for common operation.

What we probably could/should do is support --ulimit on a per-container basis in paunch. The patch would look similar to this [1]. A slight difference though: i think --ulimit is an option that can appear on the command line multiple times, so we should approach it more like the --volume option. An array in paunch config which is then iterated over [2].

[1] https://github.com/openstack/paunch/commit/4a4f43ac36c3edc2645c8fff5cf783415ea3f1cf
[2] https://github.com/openstack/paunch/blob/4a4f43ac36c3edc2645c8fff5cf783415ea3f1cf/paunch/builder/compose1.py#L177-L179

Revision history for this message
yatin (yatinkarel) wrote :

<<< nofile=24096 is still too high for any neutron agent needs, In the past (baremetal) we were running with 1024, and that is fine. Leaves the rootwrap-daemon execution back to 3ms , This is a critical factor to many customers. We spent a lot of time coming up with the rootwrap-daemon solution which helped with scale a lot. Otherwise we'd be on regresion.
Yes 24096 is too high, i just used this number to show the difference in performance using the reproducer.
1024 should be enough for agents atleast as they used to run with this and and can be done as jistr mentioned.

I tried locally what jistr mentioned as volumes because ulimit can be passed multiple times to docker run. So we can start with paunch and heat template changes.

Following should also be considered if default limit is being changed:-
For glance and cinder package changes to increase the ulimit file and process to 131072 with reasoning why they need it:- https://review.rdoproject.org/r/#/c/1360/, https://review.rdoproject.org/r/#/c/1364/

Revision history for this message
Alan Pevec (apevec) wrote :

> For glance and cinder package changes to increase the ulimit file and process to 131072
> with reasoning why they need it

FWIW those were lost when switching to containers since systemd service files are not used
to start containerized services in TripleO.

Revision history for this message
Alan Pevec (apevec) wrote :

Related THT review (Related-Bug magic comment didn't link it here for some reason): https://review.openstack.org/559268 Set ulimit for neutron agent containers

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.