oslo-rootwrap-daemon performing badly in docker containers(centos/fedora)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
oslo.rootwrap |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Found while investigating multiple issues with SSH timeout to VM while running tempest tests:-
https:/
- Without containers found that performance was good for l3 agent.
- Then found that something has wrong with neutron-
- Tried to run cProfile on 1 operation that l3 agent does on both host(https:/
If output from paste is lost so copying some here:- Major time on container was taken here, on host below time is negligible:-
42 0.409 0.010 0.409 0.010 {method 'recv_into' of '_socket.socket' objects}
1 0.366 0.366 0.366 0.366 {posix.read}
Then found that oslo.rootwrap ships benchmark with it so tried that and could reproduce it easily in centos and fedora containers:-
I tried on Centos 7 and Fedora 26 host with both centos:7 and fedora:27 containers and all have similar result.
Steps to reproduce on containers:-
1) centos:7 container
sudo docker run -it --privileged=true centos:7 /bin/bash
yum -y install sudo git gcc python-devel iproute
git clone https:/
curl -s https:/
pip install tox
cd oslo.rootwrap
tox -ebenchmark
2) fedora:27 container
sudo docker run -it --privileged=true fedora:27 /bin/bash
yum -y install sudo git gcc python-devel iproute
git clone https:/
curl -s https:/
pip install tox
cd oslo.rootwrap
tox -ebenchmark
3) fedora:27 container with python3
sudo docker run -it --privileged=true fedora:27 /bin/bash
yum -y install sudo git gcc python3-devel iproute
git clone https:/
curl -s https:/
pip install tox
cd oslo.rootwrap
Change tox.ini to set basepython=python3 in [testenv:benchmark] and python --> python3 and a fix on line 58 of benchmark/
tox -ebenchmark
benchmark results on host, centos7/fedora27 container with py2 and py3: http://
Copying just some results here from above to show the big difference:-
On host:- daemon.run('ip netns exec bench_ns ip a') : 11.913ms 20.691ms 57.185ms 6.777ms
On container(py2): daemon.run('ip netns exec bench_ns ip a') : 277.928ms 313.805ms 363.981ms 16.139ms
On container(py3): daemon.run('ip netns exec bench_ns ip a') : 13.048ms 21.600ms 37.930ms 5.663ms
Till now only found that it's happening with containers while running with python2, On host it's fine with python2.
Exact root cause is still not clear, dmsmirad has started with easy reproducers:- https:/
Changed in oslo.rootwrap: | |
status: | New → Confirmed |
The performance difference likely comes from close_fds=True of subprocess.Popen. On Python 2, Popen calls close(fd) on all file descriptors from 3 to SC_OPEN_MAX. On my Fedora 27 "host", SC_OPEN_MAX is 1,024. But in docker, SC_OPEN_MAX is... 1,048,576: 1,000x larger.
On Python 3, Popen is smarter. On Linux, it lists the content of /proc/self/fd/ to only close open file descriptors. It doesn't depend on SC_OPEN_MAX value.
The quick workaround is to use close_fds=False, but this change can have an impact on security. close_fds=True is a cheap protection to prevent leaking file descriptors to child processes.
See also my PEP 446 which made file descriptors not inheritable by default in Python 3.4.