Too many PIPEs are created when subprocess.Open fails
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Invalid
|
High
|
Unassigned |
Bug Description
1. How to reproduce:
Set max process (soft, hard) for particular user
Example: modify file /etc/security/
hunters hard nproc 70
hunters soft nproc 70
And then, start neutron-
Try to start many another applications to get all the free processes, then the error log will be thrown.
In root user, check number of current open files of neutron-
# ps -ef | grep neutron-openvswitch
501 29401 1 2 Mar30 ? 03:13:53 /usr/bin/python /usr/bin/
# lsof -p 29401
neutron-o 29401 openstack 10r FIFO 0,8 0t0 3849643462 pipe
neutron-o 29401 openstack 11w FIFO 0,8 0t0 3849643462 pipe
neutron-o 29401 openstack 12r FIFO 0,8 0t0 3849643463 pipe
neutron-o 29401 openstack 13w FIFO 0,8 0t0 3849643463 pipe
neutron-o 29401 openstack 14r FIFO 0,8 0t0 3849643464 pipe
neutron-o 29401 openstack 15w FIFO 0,8 0t0 3849643464 pipe
...
Too many PIPE are created.
2. Summary:
At weekend, when server runs at high load for rotating logs or something else, neutron-
2016-04-04 18:05:33.942 7817 ERROR neutron.
2016-04-04 18:05:33.944 7817 ERROR neutron.
File "/home/
process_
File "/home/
addl_env=addl_env)
File "/home/
stderr=
File "/home/
close_fds=
File "/home/
subprocess_
File "/usr/lib/
errread, errwrite)
File "/usr/lib/
self.pid = os.fork()
OSError: [Errno 11] Resource temporarily unavailable
And then, the PIPEs are not closed. About 700 PIPE are created. After 2 week, it throws error "Too many open files" and then neutron-
description: | updated |
description: | updated |
tags: | added: ovs |
Changed in neutron: | |
importance: | Undecided → High |
Changed in neutron: | |
status: | Confirmed → Invalid |
assignee: | j_king (james-agentultra) → nobody |
This error can be triggered in a devstack env with:
```
prlimit --pid <neutron-ovsagent pid> --nproc=70:70
```
Assuming you have the linux utilities package for your OS installed.
It seems that a value of at least 200 will allow the agent to continue without hitting the error.
Is there any more information about the system, configuration, etc that could help here? I'm not able to convince the agent to leak descriptors (on the master branch).