Comment 4 for bug 1420057

Revision history for this message
Cheryl Jennings (cherylj) wrote :

I've been working on a recreate for this problem and I *might* have one. I have an EC2 environment up and running with 7 state servers and two non state servers. Two state servers are down, and the others are repeatedly killing jujud every minute or so. We started with about 40 sockets in CLOSE_WAIT for jujud on the non state server I'm monitoring, and after about 2 hours it's up to 100.

The total number of sockets in CLOSE_WAIT seems to vary wildly, and previous recreate attempts saw it go up to about 100 and drop down to 40 at various stages, so I'm going to let this run overnight to confirm if I am also seeing the leaked file handles.

In between recreate attempts, I have been walking various code paths to see if I could spot a case of an api connection not being closed and wasn't able to find any as of yet.