Too many PIPEs are created when subprocess.Open fails

Bug #1565752 reported by Nguyen Truong Son
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Invalid
High
Unassigned

Bug Description

1. How to reproduce:

Set max process (soft, hard) for particular user

Example: modify file /etc/security/limits.conf
hunters hard nproc 70
hunters soft nproc 70

And then, start neutron-openvswitch-agent with this user.
Try to start many another applications to get all the free processes, then the error log will be thrown.

In root user, check number of current open files of neutron-openvswitch-agent service.
# ps -ef | grep neutron-openvswitch
501 29401 1 2 Mar30 ? 03:13:53 /usr/bin/python /usr/bin/neutron-openvswitch-agent ...
# lsof -p 29401
neutron-o 29401 openstack 10r FIFO 0,8 0t0 3849643462 pipe
neutron-o 29401 openstack 11w FIFO 0,8 0t0 3849643462 pipe
neutron-o 29401 openstack 12r FIFO 0,8 0t0 3849643463 pipe
neutron-o 29401 openstack 13w FIFO 0,8 0t0 3849643463 pipe
neutron-o 29401 openstack 14r FIFO 0,8 0t0 3849643464 pipe
neutron-o 29401 openstack 15w FIFO 0,8 0t0 3849643464 pipe
...

Too many PIPE are created.

2. Summary:

At weekend, when server runs at high load for rotating logs or something else, neutron-openvswitch-agent gets error:

2016-04-04 18:05:33.942 7817 ERROR neutron.agent.common.ovs_lib [req-42b082b1-2fbf-48a2-b2f3-3b7d774141f0 - - - - -] Unable to execute ['ovs-ofctl', 'dump-flows', 'br-int', 'table=23']. Exception: [Errno 11] Resource temporarily unavailable
2016-04-04 18:05:33.944 7817 ERROR neutron.agent.common.ovs_lib [req-42b082b1-2fbf-48a2-b2f3-3b7d774141f0 - - - - -] Traceback (most recent call last):
File "/home/hunters/neutron-7.0.0/neutron/agent/common/ovs_lib.py", line 226, in run_ofctl
process_input=process_input)
File "/home/hunters/neutron-7.0.0/neutron/agent/linux/utils.py", line 120, in execute
addl_env=addl_env)
File "/home/hunters/neutron-7.0.0/neutron/agent/linux/utils.py", line 89, in create_process
stderr=subprocess.PIPE)
File "/home/hunters/neutron-7.0.0/neutron/common/utils.py", line 199, in subprocess_popen
close_fds=close_fds, env=env)
File "/home/hunters/neutron-7.0.0/.venv/local/lib/python2.7/site-packages/eventlet/green/subprocess.py", line 53, in init
subprocess_orig.Popen.init(self, args, 0, argss, *kwds)
File "/usr/lib/python2.7/subprocess.py", line 710, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1223, in _execute_child
self.pid = os.fork()
OSError: [Errno 11] Resource temporarily unavailable

And then, the PIPEs are not closed. About 700 PIPE are created. After 2 week, it throws error "Too many open files" and then neutron-openvswitch-agent stops.

Tags: ovs
description: updated
description: updated
tags: added: ovs
Changed in neutron:
importance: Undecided → High
Revision history for this message
j_king (james-agentultra) wrote :

This error can be triggered in a devstack env with:

    ```
    prlimit --pid <neutron-ovsagent pid> --nproc=70:70
    ```

Assuming you have the linux utilities package for your OS installed.

It seems that a value of at least 200 will allow the agent to continue without hitting the error.

Is there any more information about the system, configuration, etc that could help here? I'm not able to convince the agent to leak descriptors (on the master branch).

Changed in neutron:
assignee: nobody → j_king (james-agentultra)
status: New → Confirmed
Revision history for this message
Nguyen Truong Son (hunters1094) wrote :

My current servers are running Centos 6.5, Python 2.6.6, with 1024 soft limit and 2066276 hard limit for process.

And I reproduce on my laptop with Ubuntu 14.04, Python 2.7.6.

The problem is that when the PIPEs are created, they are not closed when error. It may a python bug, and I see some efforts to resolve it.

Revision history for this message
j_king (james-agentultra) wrote :

What version on neutron?

Revision history for this message
Nguyen Truong Son (hunters1094) wrote :

My servers run Havana version.

My test environment I check with ubuntu 14.04 and neutron 7.0.0

Revision history for this message
j_king (james-agentultra) wrote :

This definitely does not affect neutron 8.0.1.dev239 pre-release 63e8c6d on Ubuntu 15.10; the agent will retry a small number of times before crashing the agent process

I haven't been able to turn up a patch with backport potential that fixes this particular issue (those patches that did fix leaked FD's from subprocess were fixed in grizzly it seems). https://review.openstack.org/#/c/22486/ and https://bugs.launchpad.net/neutron/+bug/1130735

Revision history for this message
Nguyen Truong Son (hunters1094) wrote :
Download full text (3.3 KiB)

Hi

I can fix it with this patch, it seems to be python bug.

File: /usr/lib64/python2.6/subprocess.py
I had tried on 1 server and it was ok. I will try another servers this weekend and will inform you. Thanks.

#change code
        if mswindows:
            if p2cwrite is not None:
                p2cwrite = msvcrt.open_osfhandle(p2cwrite.Detach(), 0)
            if c2pread is not None:
                c2pread = msvcrt.open_osfhandle(c2pread.Detach(), 0)
            if errread is not None:
                errread = msvcrt.open_osfhandle(errread.Detach(), 0)

        if p2cwrite is not None:
            self.stdin = os.fdopen(p2cwrite, 'wb', bufsize)
        if c2pread is not None:
            if universal_newlines:
                self.stdout = os.fdopen(c2pread, 'rU', bufsize)
            else:
                self.stdout = os.fdopen(c2pread, 'rb', bufsize)
        if errread is not None:
            if universal_newlines:
                self.stderr = os.fdopen(errread, 'rU', bufsize)
            else:
                self.stderr = os.fdopen(errread, 'rb', bufsize)

        try:
            self._execute_child(args, executable, preexec_fn, close_fds,
                                cwd, env, universal_newlines,
                                startupinfo, creationflags, shell,
                                p2cread, p2cwrite,
                                c2pread, c2pwrite,
                                errread, errwrite)

        except Exception:
            # Preserve original exception in case os.close raises.
            exc_type, exc_value, exc_trace = sys.exc_info()

            for f in filter(None, (self.stdin, self.stdout, self.stderr)):
                try:
                    f.close()
                except OSError:
                    pass # Ignore EBADF or other errors.

            for fd in filter(None, (p2cread, c2pwrite, errwrite)):
                try:
                    if mswindows:
                        fd.Close()
                    else:
                        os.close(fd)
                except EnvironmentError:
                    pass
            raise exc_type, exc_value, exc_trace
#end add

# original code
# self._execute_child(args, executable, preexec_fn, close_fds,
# cwd, env, universal_newlines,
# startupinfo, creationflags, shell,
# p2cread, p2cwrite,
# c2pread, c2pwrite,
# errread, errwrite)

# if mswindows:
# if p2cwrite is not None:
# p2cwrite = msvcrt.open_osfhandle(p2cwrite.Detach(), 0)
# if c2pread is not None:
# c2pread = msvcrt.open_osfhandle(c2pread.Detach(), 0)
# if errread is not None:
# errread = msvcrt.open_osfhandle(errread.Detach(), 0)

# if p2cwrite is not None:
# self.stdin = os.fdopen(p2cwrite, 'wb', bufsize)
# if c2pread is not None:
# if universal_newlines:
# self.stdout = os.fdopen(c2pread, 'rU', bufsize)
# else:
# self.stdout = os.fdopen(c2pread, 'rb', bufsize)
# if er...

Read more...

Revision history for this message
Nguyen Truong Son (hunters1094) wrote :

Hi.

The PIPEs are not created anymore.

Thanks.

Changed in neutron:
status: Confirmed → Invalid
assignee: j_king (james-agentultra) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.