Comment 14 for bug 717345

Revision history for this message
John A Meinel (jameinel) wrote :

I'm attaching the script I was using to load test qastaging. It should be usable to load test a local instance.

The way I hammered qastaging to cause failure was with:
$ python ~/hammer_ssh.py --load=200 --delay-after-spawn=1 --run-time=3600 --server=bazaar.qastaging.launchpad.net --request-count=1000

Which is:

1) Try to keep 200 connections active concurrently.
2) Wait 1s between spawning each request, hopefully so that you don't spawn 200, they all finish, you spawn another 200, etc. This could certainly be tweaked to 0.1s, or whatever makes sense.
3) Run the overall script for 1 hour
4) For each process, have it say "hello" 1000 times, waiting for a response after each.

We *should* be able to satisfy these 200 connections without running out of file handles. We can always increase that to 300, etc. Of course the *local* process may have issues with too many file handles as well. I haven't seen anything that I would specifically point at that. It should only be 2 new handles per child (stdin, stdout, child uses stderr from the master.)

It sounds like the Conch server isn't killing off children that get only partially initialized. So say we get a new connection request, and have 2 file handles available.
We will connect to stdin, and stdout, but fail to connect to stderr. At which point, the child process gets hung waiting for us to connect, and the master process decides to go on its merry way. It still *knows* about the child, so it doesn't try to kill it, and it holds stdin and stdout open, in the fruitless hopes that it will be able to talk to the child some day later.