Constant sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received

Bug #1827923 reported by Andres Rodriguez
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
bind9 (Ubuntu)
Triaged
Low
Unassigned

Bug Description

I see this in bind logs constantly. The machine is a machine with 88 procs, and about 132 GB ram. https://kb.isc.org/docs/aa-00508 sheds some light on the issues, which require re-compilation of bind9.

Apr 07 12:01:54 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:01:59 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:02:01 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:02:02 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:02:03 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:02:03 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:02:04 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:02:05 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:02:05 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:02:06 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:02:06 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:02:07 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:02:07 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:02:08 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:02:09 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:02:09 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:02:23 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:03:10 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:03:12 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:03:20 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:03:27 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:03:33 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:03:57 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:03:58 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:03:59 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
Apr 07 12:04:00 maas2 named[373]: sockmgr 0x7fa0b8b0a010: maximum number of FD events (64) received
lines 1-27

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

https://kb.isc.org/docs/aa-00716 is also helpful:
"""
BIND 9.9.0 introduced multiple UDP listeners to improve performance, which means that in a multi-threaded BIND installation, there are effectively more sockets being monitored for client-side queries. If you are also listening on multiple interfaces (real or virtual), this might mean that # interfaces x # UDP listeners exceeds 64. The outcome of this is that on a busy server (particularly a busy recursive server that is also opening connections to make iterative queries of authoritative servers), the default number of FD events is already too small.
"""

There are two options:
a) rebuild with a higher limit
b) reduce the number of UDP listeners

(b) is achieved with the -U command line option:

       -U #listeners
           Use #listeners worker threads to listen for incoming UDP packets on each address. If not specified, named will calculate a default value based on the number of detected CPUs: 1 for 1 CPU, and
           the number of detected CPUs minus one for machines with more than 1 CPU. This cannot be increased to a value higher than the number of CPUs. If -n has been set to a higher value than the number
           of detected CPUs, then -U may be increased as high as that value, but no higher. On Windows, the number of UDP listeners is hardwired to 1 and this option has no effect.

88 processors is probably not very common, and according to the above kb link, having more than 32 listeners might not increase performance at all. Could MAAS perhaps limit the number of listeners on such machines via this parameter? Notice it's this value per address.

tags: added: server-triage-discuss
Revision history for this message
Robie Basak (racb) wrote :

I wonder what the downside is of rebuilding with a higher limit. If there is none, why isn't upstream doing this already?

Robie Basak (racb)
tags: removed: server-triage-discuss
Changed in bind9 (Ubuntu):
assignee: nobody → Robie Basak (racb)
Revision history for this message
Robie Basak (racb) wrote :

This is already reported upstream at https://gitlab.isc.org/isc-projects/bind9/issues/220

Revision history for this message
Robie Basak (racb) wrote :

Not much else to do pending upstream feedback. This doesn't seem like it's important enough to warrant patching without upstream's lead. Please let me know if this turns out to be more essential to land - patching it shouldn't be difficult.

Changed in bind9 (Ubuntu):
assignee: Robie Basak (racb) → nobody
status: New → Triaged
importance: Undecided → Low
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks for the link Robie, I updated the upstream bug with some info that was requested. Lets see how the discussion there ends.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I'm updating bind9 to 9.14, which was mentioned in the upstream bug report in the phrase "With 9.14 the problem should be much less prominent", for which I asked for clarification.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.