Filedescriptor leak and zombie processes

Bug #718390 reported by Soren Hansen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Soren Hansen

Bug Description

Something seems to be wrong with the way we run subprocesses. After running a lot of tests, everything started failing, because I had run out of file descriptors. Looking at the process table there were a bajillion zombie shells, so apparantly, we're not reaping the child processes properly. :-/

Related branches

Soren Hansen (soren)
Changed in nova:
status: New → In Progress
assignee: nobody → Soren Hansen (soren)
Revision history for this message
Soren Hansen (soren) wrote :

Turns out to be an eventlet problem.

Eventlet monkey patches os, so that wait is called with the NOHANG flag in the standard python subprocess module, but it wasn't built for that, so if the timing is right, we'll call wait before the child has terminated, and then it never gets reaped. The solution is to use eventlet's subprocess module instead, but that's broken for us: https://bitbucket.org/which_linden/eventlet/issue/77/subprocess-module-fails-if-os-module-is

Thierry Carrez (ttx)
Changed in nova:
importance: Undecided → High
Revision history for this message
Vish Ishaya (vishvananda) wrote : Re: [Bug 718390] Re: Filedescriptor leak and zombie processes

So, is the best option to put a patched version of eventlet into the ppa and switch to using subprocess? I think this is blocking moving our staging into production.

Vish

On Feb 14, 2011, at 2:47 AM, Soren Hansen wrote:

> Turns out to be an eventlet problem.
>
> Eventlet monkey patches os, so that wait is called with the NOHANG flag
> in the standard python subprocess module, but it wasn't built for that,
> so if the timing is right, we'll call wait before the child has
> terminated, and then it never gets reaped. The solution is to use
> eventlet's subprocess module instead, but that's broken for us:
> https://bitbucket.org/which_linden/eventlet/issue/77/subprocess-module-
> fails-if-os-module-is
>
> --
> You received this bug notification because you are a member of Nova Bug
> Team, which is subscribed to OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/718390
>
> Title:
> Filedescriptor leak and zombie processes
>
> Status in OpenStack Compute (Nova):
> In Progress
>
> Bug description:
> Something seems to be wrong with the way we run subprocesses. After
> running a lot of tests, everything started failing, because I had run
> out of file descriptors. Looking at the process table there were a
> bajillion zombie shells, so apparantly, we're not reaping the child
> processes properly. :-/
>
>

Revision history for this message
Soren Hansen (soren) wrote :

Yes, that's the solution.

I don't understand how it's blocking you, though. I have no reason to believe it's a new problem.

Revision history for this message
Vish Ishaya (vishvananda) wrote :

We are just moving to bexar. That means our current deployed code is still pre-eventlet. :) Thanks for the fix. You rock!
On Feb 14, 2011, at 1:38 PM, Soren Hansen wrote:

> Yes, that's the solution.
>
> I don't understand how it's blocking you, though. I have no reason to
> believe it's a new problem.
>
> --
> You received this bug notification because you are a member of Nova Bug
> Team, which is subscribed to OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/718390
>
> Title:
> Filedescriptor leak and zombie processes
>
> Status in OpenStack Compute (Nova):
> In Progress
>
> Bug description:
> Something seems to be wrong with the way we run subprocesses. After
> running a lot of tests, everything started failing, because I had run
> out of file descriptors. Looking at the process table there were a
> bajillion zombie shells, so apparantly, we're not reaping the child
> processes properly. :-/
>
>

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
tags: added: bexar-post-release
Thierry Carrez (ttx)
Changed in nova:
milestone: none → 2011.2
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.