DNS lookups in web UI leak UDP sockets until resource exhaustion
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Network Administration Visualized |
Fix Released
|
High
|
Morten Brekkevold |
Bug Description
After moving all our client installations from mod_wsgi under Apache to uWsgi, we became aware of recurrent, complete hangs in the web interface. I.e. on some installs, the uWsgi workers stop responding to requests and need to be killed/restarted manually.
This is NAV 4.4.3.
Some debugging reveals that these uWsgi workers are leaking UDP file descriptors until the process reaches the system ulimit, and then stops responding. Since the worker process is no longer serving requests, it never reaches its configured max number of requests, and is never respawned by the uwsgi master process.
There are two ways the NAV web code could be leaking UDP file descriptors: Either because of SNMP communication, or because of DNS lookups. It appears the latter is the case.
In fact, a large NAV installation will be hit by this fairly quickly if using the WatchDog tool or widgets, since this will issue a DNS request for each of the IP Devices registered in NAV. It could also be triggered by repeated Machine Tracker searches for large subnets (with the DNS checkbox ticked).
NAV uses its own nav.asyncdns module for these requests, which employs the twisted.names framework in perverted ways (since the web code is not asynchronous, like Twisted is). It manually iterates the twisted reactor until all outstanding DNS requests have either been answered or timed out. but it seems that at least one extra iteration is required for the resolver code to clean up its sockets afterwards.
Changed in nav: | |
status: | Fix Committed → Fix Released |
fix here: https:/ /nav.uninett. no/hg/stable/ rev/c5e9d99dee1 c /nav.uninett. no/hg/stable/ rev/b04aa484f3b 3
and here: https:/