Comment 7 for bug 2025160

Revision history for this message
Ponnuvel Palaniyappan (pponnuvel) wrote (last edit ):

Thanks, jameinel, for taking a look. You seem to agree that there's a definitely some issue just not what it could be. As noted in the description, this wasn't the only instance it happened.

The user reported further increase since (there's currently one controller with 24K+ open fds). The one you looked at had only about 15K. I don't have many bright ideas on how to extract relevant info from the user environment (lsof, proc/<jujud>/fd, juju_metrics, etc tell "there's a problem" but not much more). I am currently looking into Delve [0]. But I don't know if the user would be willing to run on a production deployment.

If you have any suggestions on what info would help narrow down the problem, that'd be useful too.
I looked into all the "agent introspection" tools and nothing seem to be particularly good for this issue.
Otherwise I'll check if Delve is an option.

[0] https://github.com/go-delve/delve