stop doesn't stop only the host's process on LXC szerver

Bug #600941 reported by Tamas Papp on 2010-07-02
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
nagios-nrpe (Ubuntu)
High
Stéphane Graber
Hardy
Undecided
Unassigned
Lucid
Undecided
Stéphane Graber
Maverick
Undecided
Stéphane Graber
Natty
Undecided
Stéphane Graber
Oneiric
Undecided
Stéphane Graber
Precise
High
Stéphane Graber

Bug Description

/etc/init.d/nagios-nrpe-server stop|reload|restart affects the containers processes.

Description: Ubuntu 10.04 LTS
Release: 10.04

This really small patch fixes it:

--- nagios-nrpe-server.orig 2010-07-02 10:11:26.000000000 +0200
+++ nagios-nrpe-server 2010-07-02 10:13:45.000000000 +0200
@@ -61,12 +61,12 @@
        ;;
   stop)
        log_daemon_msg "Stopping $DESC" "$NAME"
- start-stop-daemon --stop --quiet --oknodo --exec $DAEMON
+ start-stop-daemon --stop --quiet --oknodo --pidfile $PIDDIR/nrpe.pid --exec $DAEMON
        log_end_msg $?
        ;;
   reload|force-reload)
        log_daemon_msg "Reloading $DESC configuration files" "$NAME"
- start-stop-daemon --stop --signal HUP --quiet --exec $DAEMON
+ start-stop-daemon --stop --signal HUP --quiet --pidfile $PIDDIR/nrpe.pid --exec $DAEMON
        log_end_msg $?
        ;;
   restart)

Changed in nagios-nrpe (Ubuntu):
status: New → Triaged
Changed in nagios-nrpe (Ubuntu Lucid):
status: New → Triaged
Changed in nagios-nrpe (Ubuntu Maverick):
status: New → Triaged
Changed in nagios-nrpe (Ubuntu Natty):
status: New → Triaged
Changed in nagios-nrpe (Ubuntu Oneiric):
status: New → Triaged
Changed in nagios-nrpe (Ubuntu Lucid):
assignee: nobody → Stéphane Graber (stgraber)
Changed in nagios-nrpe (Ubuntu Maverick):
assignee: nobody → Stéphane Graber (stgraber)
Changed in nagios-nrpe (Ubuntu Natty):
assignee: nobody → Stéphane Graber (stgraber)
Changed in nagios-nrpe (Ubuntu Oneiric):
assignee: nobody → Stéphane Graber (stgraber)
Stéphane Graber (stgraber) wrote :

Based my diff on Debian's change from http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=639523 and uploaded them all to proposed.

Changed in nagios-nrpe (Ubuntu Lucid):
status: Triaged → Fix Committed
Changed in nagios-nrpe (Ubuntu Maverick):
status: Triaged → Fix Committed
Changed in nagios-nrpe (Ubuntu Natty):
status: Triaged → Fix Committed
Changed in nagios-nrpe (Ubuntu Oneiric):
status: Triaged → Fix Committed
Chris Halse Rogers (raof) wrote :

What's the status of this in Precise?

Clint Byrum (clint-fewbar) wrote :

Chris, we can pocket copy this to precise after it hits oneiric-updates.

tags: added: verification-needed

Hello Tamas, or anyone else affected,

Accepted nagios-nrpe into oneiric-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Clint Byrum (clint-fewbar) wrote :

Hello Tamas, or anyone else affected,

Accepted nagios-nrpe into natty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Clint Byrum (clint-fewbar) wrote :

Hello Tamas, or anyone else affected,

Accepted nagios-nrpe into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Clint Byrum (clint-fewbar) wrote :

Hello Tamas, or anyone else affected,

Accepted nagios-nrpe into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Stéphane Graber (stgraber) wrote :

Hi Clint and Chris, I didn't bother to prepare an upload for Precise has the fix is copy/pasted from Debian.

I'm assuming we'll either merge or sync from Debian which will give us the fix for Precise.
Pocket copying to Precise wouldn't hurt but I don't think it'd be particularly useful either.

Stéphane Graber (stgraber) wrote :
Download full text (7.2 KiB)

I can confirm that the proposed fix works as shown below:

root@athos:~# ps aux | grep "nrpe -c" | grep -v grep
nagios 15412 0.0 0.0 56800 2072 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 16299 0.0 0.0 56800 2068 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 17111 0.0 0.0 56800 2068 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 19966 0.0 0.0 56800 2068 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 22111 0.0 0.0 56800 2068 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 22295 0.0 0.0 56800 2076 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 23139 0.0 0.0 56800 2072 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 26368 0.0 0.0 56800 2076 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 26794 0.0 0.0 56800 2080 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 27077 0.0 0.0 56800 2076 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 29085 0.0 0.0 56800 2076 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 31481 0.0 0.0 56800 2076 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 31595 0.0 0.0 56800 2076 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 31717 0.0 0.0 56800 2076 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
root@athos:~# /etc/init.d/nagios-nrpe-server start
 * Starting nagios-nrpe nagios-nrpe [ OK ]
root@athos:~# ps aux | grep "nrpe -c" | grep -v grep
nagios 7208 0.0 0.0 19212 748 ? Ss 10:56 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 15412 0.0 0.0 56800 2072 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 16299 0.0 0.0 56800 2068 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 17111 0.0 0.0 56800 2068 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 19966 0.0 0.0 56800 2068 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 22111 0.0 0.0 56800 2068 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 22295 0.0 0.0 56800 2076 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 23139 0.0 0.0 56800 2072 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 26368 0.0 0.0 56800 2076 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 26794 0.0 0.0 56800 2080 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 27077 0.0 0.0 56800 2076 ? Ss 10:55 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios 29085 0.0 0.0 56800 2076 ? Ss 10:55 0:00 /usr...

Read more...

Martin Pitt (pitti) on 2011-10-18
tags: added: verification-done
removed: verification-needed
Martin Pitt (pitti) wrote :

Stephane, for which release did you test?

Stéphane Graber (stgraber) wrote :

I tried Lucid and Oneiric.

Stéphane Graber (stgraber) wrote :

I didn't bother testing maverick and natty as lucid, maverick and natty all have the exact same package.

Michael Jeanson (mjeanson) wrote :

Debdiff for the same issue on hardy, however I kept the default pidfile location for hardy which was '/var/run/nrpe.pid'.

Stéphane Graber (stgraber) wrote :

Looks good, changes:
 - Changed the version number to be -0ubuntu0.1
 - Updated the maintainer

I uploaded it so it will hopefully get into hardy-proposed soonish.

Changed in nagios-nrpe (Ubuntu Hardy):
status: New → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nagios-nrpe - 2.12-4ubuntu1.10.04.1

---------------
nagios-nrpe (2.12-4ubuntu1.10.04.1) lucid-proposed; urgency=low

  * Use pidfile for start-stop-daemon and fix pidfile deletion (LP: #600941)
 -- Stephane Graber <email address hidden> Fri, 14 Oct 2011 10:29:15 +0100

Changed in nagios-nrpe (Ubuntu Lucid):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nagios-nrpe - 2.12-4ubuntu1.10.10.1

---------------
nagios-nrpe (2.12-4ubuntu1.10.10.1) maverick-proposed; urgency=low

  * Use pidfile for start-stop-daemon and fix pidfile deletion (LP: #600941)
 -- Stephane Graber <email address hidden> Fri, 14 Oct 2011 10:32:51 +0100

Changed in nagios-nrpe (Ubuntu Maverick):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nagios-nrpe - 2.12-4ubuntu1.11.04.1

---------------
nagios-nrpe (2.12-4ubuntu1.11.04.1) natty-proposed; urgency=low

  * Use pidfile for start-stop-daemon and fix pidfile deletion (LP: #600941)
 -- Stephane Graber <email address hidden> Fri, 14 Oct 2011 10:39:21 +0100

Changed in nagios-nrpe (Ubuntu Natty):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nagios-nrpe - 2.12-4ubuntu3.1

---------------
nagios-nrpe (2.12-4ubuntu3.1) oneiric-proposed; urgency=low

  * Use pidfile for start-stop-daemon and fix pidfile deletion (LP: #600941)
 -- Stephane Graber <email address hidden> Fri, 14 Oct 2011 10:40:16 +0100

Changed in nagios-nrpe (Ubuntu Oneiric):
status: Fix Committed → Fix Released
Martin Pitt (pitti) wrote :

Stephane, please fix in precise as well. I can't copy the SRU as precise has a newer version.

Changed in nagios-nrpe (Ubuntu Precise):
assignee: nobody → Stéphane Graber (stgraber)
importance: Undecided → High
milestone: none → precise-alpha-1
Martin Pitt (pitti) wrote :

Hello Tamas, or anyone else affected,

Accepted nagios-nrpe into hardy-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: removed: verification-done
tags: added: verification-needed
Stéphane Graber (stgraber) wrote :

The reason I didn't upload to Precise was because Debian already had the fix.
The version that just got merged now brings that fix in Precise, so I'm going to mark the task as Fix released (Debian 2.12-5 is where I cherry-picked the patch from and Precise now has 2.12-5ubuntu1).

Changed in nagios-nrpe (Ubuntu Precise):
status: Triaged → Fix Released
Michael Jeanson (mjeanson) wrote :

I installed the package from hardy-proposed and it works as expected in my environment.

Martin Pitt (pitti) on 2011-10-26
tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nagios-nrpe - 2.8.1-1ubuntu0.1

---------------
nagios-nrpe (2.8.1-1ubuntu0.1) hardy-proposed; urgency=low

  * Use pidfile for start-stop-daemon and fix pidfile deletion (LP: #600941)
 -- Michael Jeanson <email address hidden> Mon, 24 Oct 2011 16:35:38 -0400

Changed in nagios-nrpe (Ubuntu Hardy):
status: Fix Committed → Fix Released

Regrettably start-stop-daemon is a C program that resulted from translating a Perl script and it shows. There are far too many global varibles in start-stop-daemon.c and although there are some nice blocks of comments there are far to few comments that explain the intent behind variable names such as "schedule" (the string "schedule" appears in many contexts.) The manual page for start-stop-daemon strongly suggests that unless you use the "--retry" option the start-stop-daemon program may return before the process it sent a signal to actually calls exit() and terminates, however it doesn't actually state this in plain English.

The obtuseness of start-stop-daemon is most likely the reason that the "fixed" version of the script includes a dubious comment indicating that sometimes the pid file has not been removed. I can easily see why someone would not understand that he left the --retry option missing. This bug also causes failures when the script is given the "restart" option; the nrpe daemon will shutdown but not start up again.

Did I mention that since applying the update our nrpe servers started crashing over and over? Repairing this is why I am doing this work.

I have been working on remediating the fix. You can see my current work in this PPA.
https://launchpad.net/~nutznboltz/+archive/nrpe-unbreak-lp-600941

$ grep -i sometimes /etc/init.d/nagios-nrpe-server
    #sometimes deleting the pidfile fails. cleanup afterwards.

That got past QA? It looks like a red flag to me.

Stéphane Graber (stgraber) wrote :

The right way of providing a fix is by attaching the patch or debdiff to the bug, this way Launchpad will detect it and flag the bug.

Also, this bug is marked as fix released. While I agree that this regression needs fixing, I invite you to file a new bug for it so we can track it properly.

Also, opening a bug against Debian is probably a good idea as that's where the init script comes from and where it should be fixed first (then have that fix synced into Ubuntu).

This specific bug (all nrpe processes getting killed on LXC/OpenVZ/Vserver hosts when nrpe restarts on the host) has indeed been fixed by the change in Debian that I pushed as an SRU so I'd really recommend having this discussion in a separate bug report (that's not marked as fix released).

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.