When using --bindto and/or --allowedhosts nagios-statd stops accepting connections after a while

Bug #463795 reported by Rhomboid on 2009-10-29
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
nagios-statd (Ubuntu)
Undecided
Unassigned

Bug Description

Ubuntu 9.04 server:

nagios-statd-server:
  Installed: 3.12-1
  Candidate: 3.12-1
  Version table:
 *** 3.12-1 0
        500 http://us.archive.ubuntu.com jaunty/universe Packages
        100 /var/lib/dpkg/status

This is happening on 3 machines now. There are 7 others with the same version of Ubuntu installed on the same hardware but do not require either of these options because they're only on a private network. Those 7 work properly with the same Nagios server. The 3 using --bindto stop allowing connections after some period of time. The 3 machines with this problem have a public and private interface and I'm trying to bind to the private interface.

There does not seem to be any logging to view but for lack of anything else here's an strace during a connection attempt using telnet. The other end just gets an immediate dropped connection.

select(4, [3], [], [], {0, 500000}) = 0 (Timeout)
select(4, [3], [], [], {0, 500000}) = 0 (Timeout)
select(4, [3], [], [], {0, 500000}) = 1 (in [3], left {0, 248772})
accept(3, {sa_family=AF_INET, sin_port=htons(41078), sin_addr=inet_addr("10.2.2.246")}, [8547832101339136016]) = 4
stat("/usr/lib/python2.6/SocketServer.py", {st_mode=S_IFREG|0644, st_size=21922, ...}) = 0
stat("/usr/lib/python2.6/SocketServer.py", {st_mode=S_IFREG|0644, st_size=21922, ...}) = 0
stat("/usr/lib/python2.6/SocketServer.py", {st_mode=S_IFREG|0644, st_size=21922, ...}) = 0
close(4) = 0
select(4, [3], [], [], {0, 500000}) = 0 (Timeout)
select(4, [3], [], [], {0, 500000}) = 0 (Timeout)

Yet the port appears to be listening:

Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 10.2.2.1:1040 0.0.0.0:* LISTEN

AvaCam (cameron-pierce) wrote :

I have been experiencing this issue as well. I've managed to work around it by disabling the service and configuring nagios_statd to operate through xinetd.

Rhomboid (rhomboid) wrote :

Looks like this might actually be a python problem or it needs to be updated for newer versions of python. It happens with the vanilla source version of statd from twoevils.org going all the way back to the "good" version in Debian stable (3.09).

Rhomboid (rhomboid) wrote :

I'm probably going to use xinetd as well but it bugs me because it's yet another thing I have to watch and keep secure on a couple of machines that are really minimized and locked down..

AvaCam (cameron-pierce) wrote :

I agree. Doing this on one system isn't too bad, but for an entire server farm it's a major annoyance. Sure it can be scripted but these tools should work out of the box.

Here's my xinet entry if anyone wants to simplify their work down the road when dealing with the work-around:
service nagios-statd
{
        port = 1040
        socket_type = stream
        wait = no
        instances = 1
        only_from = 172.20.0.0
        user = root
        server = /usr/sbin/nagios-statd
        server_args = -i
        log_on_failure += USERID
        disable = no
}

Rhomboid (rhomboid) wrote :

Found a better workaround: Install python2.4 and edit the /usr/sbin/nagios-statd shebang to use it (#!/usr/bin/python2.4).

This bug is even worse in 9.10. Now the process locks up one core at 100% CPU when it stops responding.

Suggested fix: Ship the nagios-statd-server package with a dependency on python2.4 and update the shebang by default.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nagios-statd (Ubuntu):
status: New → Confirmed
Marc Pignat (swid) wrote :

Ubuntu 10.04.3 LTS is also affected

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers