jujud leaking file handles

Bug #1454697 reported by John A Meinel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
Cheryl Jennings
1.22
Fix Released
Critical
Cheryl Jennings
1.23
Fix Released
High
Cheryl Jennings
1.24
Fix Released
High
Cheryl Jennings

Bug Description

Still needs further investigation as to the root cause. However we are seeing 250,000 open file handles (according to lsof) for Jujud on a production server. It is currently failing to connect to the API server because of "too many open file handles".

From what we can tell the start of the machine-1.log has "failure to connect" because of too many open file handles. So likely the operational issue is that if the API servers are down, we slowly leak file descriptors. And after long enough time we can no longer allocate new ones so it will always fail to connect.

The lsof output shows ~99% of the open file handles as stuck in CLOSE_WAIT which should be TCP saying "the remote side has closed your connection, but you haven't closed it yet."

Revision history for this message
Cheryl Jennings (cherylj) wrote :

Appears to be a dup of 1420057. Taking a look...

Revision history for this message
Cheryl Jennings (cherylj) wrote :

There were two causes of the leaked file descriptors. One in go.net, and one in the jujud machine agent code. Bug #1420057 tracked the changes to go.net. I'll use this bug for the changes to jujud's machine agent.

The good news is that this problem is far less likely to occur on 1.22 and later as 1.20 (where the problem was originally seen) had a significantly larger timing window for this leak to occur, and the changes made to the agent code in 1.22 made this problem much less likely to be seen.

Revision history for this message
John A Meinel (jameinel) wrote :

Assigned across the versions to Cheryl, feel free to hand it off to someone, but someone should be responsible to make sure your change lands in the various releases.

Changed in juju-core:
milestone: none → 1.25.0
assignee: nobody → Cheryl Jennings (cherylj)
Changed in juju-core:
status: Triaged → In Progress
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.