Comment 5 for bug 1420057

Revision history for this message
Cheryl Jennings (cherylj) wrote :

I think I was able to recreate a file handle leak by setting up an EC2
environment with one mysql machine and 7 state servers. I manually
shut down two of the state servers, and had a script on the others that
would kill jujud every 1 - 2 minutes.

After running overnight, I saw that there were 163 sockets belonging to
jujud in the CLOSE_WAIT state as reported by lsof.

The current suspicion is that there is a problem in the go.net library
when we try to close the websocket:

// Close implements the io.Closer interface.
func (ws *Conn) Close() error {
        err := ws.frameHandler.WriteClose(ws.defaultCloseStatus)
        if err != nil {
                return err
        }
        return ws.rwc.Close()
}

I have confirmed that we are getting an EOF error from WriteClose, and
that closing rwc even if we get an error there seems to eliminate the
problem of extra sockets laying around in CLOSE_WAIT (only initial
testing). However, it seems to make the local juju/juju tests explode
and we need to work on figuring out why.