On 2/15/2011 3:41 PM, Michael Hudson-Doyle wrote:
> I guess it would have been good to have run lsof on the process when it
> was running out of file handles :/
>
> I also suppose it might be good to call setproctitle in the forking
> service's children to record some information that way.
I'm pretty sure it does, since that is done already in the lp-serve
context. I'm not sure why "ps" doesn't see it, but it appears that top
does...
>
> Apart from load, which jam seems to have covered, I can't think of any
> meaningful difference between staging and production here. Maybe we
> should hold some connections open for longer in the load testing?
>
Yeah, I'm working on that in my script now. "--request-count" will set
how many times a given connection says "hello", waits for a response,
and says it again.
I've seen the script start crashing for various reasons (which looks
like bad messages, rejections, etc.), but the script doesn't handle
failure very well (tends to spew 100s of lines to the terminal). So once
I've sorted that out, I'll see what's up.
SPM noticed that there were about 50 'zombie' launchpad-forking-service
processes still running on Crowberry. So I'm guessing they forcibly
killed the master process before the 300s timeout, which then prevented
it from forcibly killing all of its children.
I would really like to have known what state those 50 processes were in.
If they got a failure that wasn't being handled, and somehow left them
in "I'm still running" vs the Conch server, so that it held those
handles open indefinitely.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 2/15/2011 3:41 PM, Michael Hudson-Doyle wrote:
> I guess it would have been good to have run lsof on the process when it
> was running out of file handles :/
>
> I also suppose it might be good to call setproctitle in the forking
> service's children to record some information that way.
I'm pretty sure it does, since that is done already in the lp-serve
context. I'm not sure why "ps" doesn't see it, but it appears that top
does...
>
> Apart from load, which jam seems to have covered, I can't think of any
> meaningful difference between staging and production here. Maybe we
> should hold some connections open for longer in the load testing?
>
Yeah, I'm working on that in my script now. "--request-count" will set
how many times a given connection says "hello", waits for a response,
and says it again.
I've seen the script start crashing for various reasons (which looks
like bad messages, rejections, etc.), but the script doesn't handle
failure very well (tends to spew 100s of lines to the terminal). So once
I've sorted that out, I'll see what's up.
SPM noticed that there were about 50 'zombie' launchpad- forking- service
processes still running on Crowberry. So I'm guessing they forcibly
killed the master process before the 300s timeout, which then prevented
it from forcibly killing all of its children.
I would really like to have known what state those 50 processes were in.
If they got a failure that wasn't being handled, and somehow left them
in "I'm still running" vs the Conch server, so that it held those
handles open indefinitely.
John enigmail. mozdev. org/
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAk1 a9uUACgkQJdeBCY SNAAOhywCeOFiy1 weAEmuVTexZ7YdM PAmm 3WLXs8+ t34T/dq7ZZ
X7gAni0IP33pxgT
=QcJi
-----END PGP SIGNATURE-----