nagios3 + livestatus: SIGSEGV everyday at midnight

Bug #1372284 reported by Stanislav German-Evtushenko on 2014-09-22
70
This bug affects 12 people
Affects Status Importance Assigned to Milestone
check-mk (Debian)
New
Unknown
check-mk (Ubuntu)
Medium
Unassigned
Trusty
Undecided
Unassigned
Xenial
Undecided
Unassigned
Yakkety
Undecided
Unassigned
Zesty
Medium
Unassigned

Bug Description

[Impact]

- Ubuntu 14.04, 16.04, 16.10, and 17.04

Nagios goes down everyday at midnight with livestatus enabled and downtime configured.

Here are bug reports in other trackers:
https://bugzilla.redhat.com/show_bug.cgi?id=1083003
http://tracker.nagios.org/view.php?id=516
http://tracker.nagios.org/view.php?id=455

People say this patch helps:
http://git.mathias-kettner.de/git/?p=omd.git;a=blob;f=packages/nagios/patches/0007-fix_downtime_struct.dif;h=af0e245b585e78c372a69d10c5e3b47ab64ad510;hb=HEAD

[Test Case]

* Enable check-mk-livestatus plugin (add broker_module=/usr/lib/check_mk/livestatus.o /var/lib/nagios3/livestatus/socket to nagios3/nagios.cfg)

* Wait for log rotation (or update log_rotation_method in nagios3/nagios.cfg to rotate hourly so it happens more often)

* Without the fix, nagios segfaults, entry in /var/log/nagios3/nagios.log as follows:

| [1486857600] Caught SIGSEGV, shutting down...

* With the fix, nagios logrotation succeeds and rotated logs present in /var/log/nagios3/archives.

[Regression Potential]

Nagios will continue to segfault during logrotation.

[Other Info]

check-mk ships out a different version of nagios/downtime.h which differs from nagios3's downtime.h (defined struct scheduled_downtime_struct differs between the two).

Robie Basak (racb) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better.

Looks like the check-mk source package embeds a copy of nagios' downtime.h (along with other header files), and that Nagios has updated a struct since then.

Patching Nagios doesn't make sense here, from a distribution perspective. The real fix is to fix check-mk's packaging to use a build dependency on some kind of nagios-dev package which provides the required headers. So this is a bug in check-mk, not in nagios.

This is probably relevant to Debian's check-mk packaging, too. Checking in Debian and reporting there if this bug applies to Debian would be appropriate.

affects: nagios3 (Ubuntu) → check-mk (Ubuntu)
tags: added: needs-upstream-report

Right, just noticed that this is a livestatus bug. Should I create another report for check-mk-livestatus?

On Mon, Sep 22, 2014 at 08:28:44AM -0000, Stanislav German-Evtushenko wrote:
> Right, just noticed that this is a livestatus bug. Should I create
> another report for check-mk-livestatus?

No need; I reassigned it.

summary: - nagios3: SIGSEGV everyday at midnight
+ nagios3 + livestatus: SIGSEGV everyday at midnight

I did manage to rebuild check-mk with the following files from nagios 3.5.1:
*******************************************
include/broker.h
include/cgiauth.h
include/cgiutils.h
include/comments.h
include/common.h
include/compat.h
include/config.h
include/downtime.h
include/locations.h
include/logging.h
include/macros.h
include/nagios.h
include/nebcallbacks.h
include/neberrors.h
include/nebmods.h
include/nebmodules.h
include/nebstructs.h
include/objects.h
include/perfdata.h
include/shared.h
include/skiplist.h
include/sretention.h
include/statusdata.h
*******************************************

Where the following small changes are applied:
*******************************************
diff -ru include/nagios.h nagios/nagios.h
--- include/nagios.h 2013-08-30 21:46:14.000000000 +0400
+++ nagios/nagios.h 2014-09-22 16:17:53.006908559 +0400
@@ -28,6 +28,7 @@
 # define NSCORE
 #endif

+#include "config.h"
 #include "compat.h"
 #include "logging.h"
 #include "common.h"
diff -ru include/objects.h nagios/objects.h
--- include/objects.h 2013-08-30 21:46:14.000000000 +0400
+++ nagios/objects.h 2014-09-22 16:30:19.467335449 +0400
@@ -174,7 +174,7 @@

 /* COMMANDSMEMBER structure */
 typedef struct commandsmember_struct {
- char *command;
+ char *command_dummy;
 #ifdef NSCORE
  command *command_ptr;
 #endif
*******************************************

livestatus archive with updated headers is attached.

Jan Wagner (waja) wrote :

> This is probably relevant to Debian's check-mk packaging, too.

As this is probably true.

> Checking in Debian and reporting there if this bug applies to Debian would be
> appropriate.

This wouldn´t help much. check-mk is removed from testing and not
touched since ~1y, with 20 bugs open, 6 of them security. It´s more
likely to get removed from Debian.
Even nagios3 is in legacy mainainance and is not getting updated to
newer versions as they are all more or less broken. I would guess, it
will be also removed anythime, if there will be a drop in replacement
available.

If anybody would ask me, as part of the Debian Nagios Mainainers Group,
I would tying to avoid to use one of these both packages.

Cheers, Jan.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in check-mk (Ubuntu):
status: New → Confirmed
Simon Déziel (sdeziel) wrote :

This affects us as well on Trusty and we'd really appreciate if [1] could be applied to nagios3.

Thanks in advance.

1: http://git.mathias-kettner.de/git/?p=omd.git;a=blob;f=packages/nagios/patches/0007-fix_downtime_struct.dif;h=af0e245b585e78c372a69d10c5e3b47ab64ad510;hb=HEAD

Robie Basak (racb) wrote :

Simon,

As far as I can tell the correct fix is in check-mk, not in nagios3, and I've explained my reasoning above. Without an explanation of why this is wrong and the nagios3 package should be changed instead, I don't think we can make any progress in fixing this in nagios3 in Trusty.

Simon Déziel (sdeziel) wrote :

Robie, you are right. I'll see if something can be done with check-mk to make it use the proper headers if not the case already with newer versions.

Thanks and sorry for the noise.

Pablo Estigarribia (pablodav) wrote :

Hello,

I have been working hard looking for a workaround and for the root cause of it.
I had 4 servers installed and only one failing with this bug, so I tried to find the root cause of the issue.

I had reinstalled a server and migrated all history, config, etc to see if it had something related.

After many tests on both the new and the old one I found a workaround:

First as root:

    service nagios3 stop
    cd /var/lib/nagios3
    rm spool/checkresults/*

I have also deleted retention file but I think it is not a cause of the problem because I tried only with this file before and nothing was fixed.

    rm retention.dat

Then started the service

    service nagios3 start

I have used hourly rotation option on nagios.cfg file, so I tested it many times and is working smooth now.

Pablo Estigarribia (pablodav) wrote :

I have found more information on:
http://nagios.fm4dd.com/howto/manual/pnp4n_send_service_mail.htm

Seems that it's recommended to have enable_environment_macros=0 option to avoid inestability with check-mk-livestatus

Pablo Estigarribia (pablodav) wrote :

I have confirmed in my case if I enable environment_macros, log rotation crashes.
Then I have to diable and to exactly same steps as comment #10 to fix it

Pablo Estigarribia (pablodav) wrote :

I have found that scheduling a downtime always crashes check-mk-livestatus, same thing as described in:
https://bugzilla.redhat.com/show_bug.cgi?id=1083003#c0

I will try to test Ubuntu 16.04 as it was fixed on centos/redhat but not on Ubuntu 14.04

Pablo Estigarribia (pablodav) wrote :

I have solved it upgrading from nagios4 sources to nagios4.

Haw Loeung (hloeung) wrote :

I have a working patch to fix this for Xenial. It's packaged in my PPA[1] and tested. The fix updates check-mk, so the shipped downtime.h struct matches what's shipped in nagios3, see diff[2].

| [1487560622] Caught SIGHUP, restarting...
| [1487560624] livestatus: Socket thread has terminated
| [1487560624] Event broker module '/usr/lib/check_mk/livestatus.o' deinitialized successfully.
| [1487560624] Nagios 3.5.1 starting... (PID=28200)
| [1487560624] Local time is Mon Feb 20 03:17:04 UTC 2017
| [1487560624] LOG VERSION: 2.0
| [1487560624] livestatus: Livestatus 1.2.6p12 by Mathias Kettner. Socket: '/var/lib/nagios3/livestatus/socket'
| [1487560624] livestatus: Please visit us at http://mathias-kettner.de/
| [1487560624] livestatus: Hint: please try out OMD - the Open Monitoring Distribution
| [1487560624] livestatus: Please visit OMD at http://omdistro.org
| [1487560624] livestatus: Finished initialization. Further log messages go to /var/log/nagios3/livestatus.log
| [1487560624] Event broker module '/usr/lib/check_mk/livestatus.o' initialized successfully.

Anyone else able to help test it?

[1]https://launchpad.net/~hloeung/+archive/ubuntu/nagios3
[2]https://launchpadlibrarian.net/306798563/check-mk_1.2.6p12-1_1.2.6p12-1haw1.diff.gz

Haw Loeung (hloeung) on 2017-02-20
Changed in check-mk (Ubuntu):
status: Confirmed → In Progress
assignee: nobody → Haw Loeung (hloeung)
Haw Loeung (hloeung) wrote :

The attachment "check-mk_1.2.6p12-1_1.2.6p12-1haw1.diff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
tags: added: trusty
affects: check-mk (Fedora) → ubuntu
Changed in ubuntu:
importance: Unknown → Undecided
status: Unknown → New
no longer affects: ubuntu
Changed in check-mk (Ubuntu):
importance: Undecided → Medium
Haw Loeung (hloeung) on 2017-02-22
tags: added: xenial
Haw Loeung (hloeung) wrote :

Backported Trusty version with patch seems to look good as well and hasn't crashed.

Alvaro Uría (aluria) on 2017-03-29
tags: added: canonical-bootstack
Nish Aravamudan (nacc) wrote :

I'm unsubscribing sponsors for now, as the patch attachment does not have a valid version for uploading to the archive and seems to also be needed for Trusty.

Nish Aravamudan (nacc) wrote :

Also does this happen in 17.04?

Haw Loeung (hloeung) wrote :

Nish, what do you mean by "does not have a valid version for uploading"? I've attached a patch that applies to both Xenial and Trusty[1][2]. I've used that to build packages for both distros in a PPA[3] and tested on the various nagios instances we have internally (combination of Trusty and Xenial).

[1]https://launchpadlibrarian.net/306798563/check-mk_1.2.6p12-1_1.2.6p12-1haw1.diff.gz
[2]https://launchpadlibrarian.net/307129797/check-mk_1.2.2p3-1_1.2.2p3-1haw1.diff.gz
[3]https://launchpad.net/~hloeung/+archive/ubuntu/nagios3/+packages

Haw,
1.2.6p12-1haw1

1.2.2p3-1haw1

are not appropriate versions for the archive. Please follow the
versioning recommendations at:
https://wiki.ubuntu.com/SecurityTeam/UpdatePreparation#Update_the_packaging

-Nish

On Fri, Mar 31, 2017 at 8:46 PM, Haw Loeung <email address hidden> wrote:
> Nish, what do you mean by "does not have a valid version for uploading"?
> I've attached a patch that applies to both Xenial and Trusty[1][2]. I've
> used that to build packages for both distros in a PPA[3] and tested on
> the various nagios instances we have internally (combination of Trusty
> and Xenial).
>
> [1]https://launchpadlibrarian.net/306798563/check-mk_1.2.6p12-1_1.2.6p12-1haw1.diff.gz
> [2]https://launchpadlibrarian.net/307129797/check-mk_1.2.2p3-1_1.2.2p3-1haw1.diff.gz
> [3]https://launchpad.net/~hloeung/+archive/ubuntu/nagios3/+packages
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1372284
>
> Title:
> nagios3 + livestatus: SIGSEGV everyday at midnight
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/check-mk/+bug/1372284/+subscriptions

Haw Loeung (hloeung) wrote :

On Mon, Apr 03, 2017 at 04:58:15PM -0000, Nish Aravamudan wrote:
> Haw,
> 1.2.6p12-1haw1
>
> 1.2.2p3-1haw1
>
> are not appropriate versions for the archive. Please follow the
> versioning recommendations at:
> https://wiki.ubuntu.com/SecurityTeam/UpdatePreparation#Update_the_packaging

Ah right, I've created a new PPA and re-uploaded with the correct
versions[1].

Thanks for your time!

[1]https://launchpad.net/~hloeung/+archive/ubuntu/check-mk

Nish Aravamudan (nacc) wrote :

Thanks for that fix. However, can you please provide debdiffs in this bug as well for sponsoring purposes? Couple of other nits:

I took a look at the PPA's generated debdiffs and saw that (at least) the xenial one incorrectly indicates in debian/changelog that 1.2.6p12-1 is from xenial, when it is from Debian unstable (i'm not sure how that happened).

Also, please run `update-maintainer` since there is now an Ubuntu delta.

It appears this is something that also needs fixing in 17.04 and 16.10, can debdiffs be provided for those? SRUs can't be sponsored without the development release being fixed first.

And finally, is this a bug upstream? Or a backport of an upstream fix? Can there be DEP3 headers as appropriate? http://dep.debian.net/deps/dep3/

Haw Loeung (hloeung) wrote :
tags: removed: needs-upstream-report
Nish Aravamudan (nacc) on 2017-04-05
Changed in check-mk (Ubuntu Trusty):
status: New → Triaged
Changed in check-mk (Ubuntu Xenial):
status: New → Triaged
Changed in check-mk (Ubuntu Yakkety):
status: New → Triaged
Haw Loeung (hloeung) on 2017-04-05
description: updated
Changed in check-mk (Debian):
status: Unknown → New
Haw Loeung (hloeung) on 2017-04-05
description: updated
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package check-mk - 1.2.8p16-1ubuntu0.1

---------------
check-mk (1.2.8p16-1ubuntu0.1) zesty; urgency=medium

  * Added patch to fix downtime.h's scheduled_downtime_struct (LP: #1372284)

 -- Haw Loeung <email address hidden> Tue, 04 Apr 2017 17:17:00 -0700

Changed in check-mk (Ubuntu Zesty):
status: In Progress → Fix Released
Nish Aravamudan (nacc) on 2017-04-05
Changed in check-mk (Ubuntu Zesty):
assignee: Haw Loeung (hloeung) → nobody
Nish Aravamudan (nacc) wrote :

Haw,

I have sponsored the uploads for SRU, with the following change:

Xenial => 1.2.6p12-1ubuntu0.1 -> 1.2.16p12-1ubuntu0.16.04.1

and uploaded the same for Yakkety:

Yakkety => 1.2.6p12-1ubuntu0.1 -> 1.2.16p12-1ubuntu0.16.10.1

Hello Stanislav, or anyone else affected,

Accepted check-mk into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/check-mk/1.2.6p12-1ubuntu0.16.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in check-mk (Ubuntu Yakkety):
status: Triaged → Fix Committed
tags: added: verification-needed
Changed in check-mk (Ubuntu Xenial):
status: Triaged → Fix Committed
Brian Murray (brian-murray) wrote :

Hello Stanislav, or anyone else affected,

Accepted check-mk into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/check-mk/1.2.6p12-1ubuntu0.16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Brian Murray (brian-murray) wrote :

Hello Stanislav, or anyone else affected,

Accepted check-mk into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/check-mk/1.2.2p3-1ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in check-mk (Ubuntu Trusty):
status: Triaged → Fix Committed
Simon Déziel (sdeziel) on 2017-04-06
tags: added: verification-done-trusty verification-needed-xenial
removed: verification-needed

Hello Simon,

Please don't just change the flag values, but provide some indication
of what version was tested and what the test was.

2) of https://lists.ubuntu.com/archives/ubuntu-devel/2017-March/039745.html

Simon Déziel (sdeziel) wrote :

Right, sorry. I tested on Trusty by installing the -proposed package (see below) yesterday then checked this AM if nagios had crashed during the log rotation (midnight). It didn't while it used to crash every single nights before. So the problem is fixed by the -proposed package, thanks!

$ apt-cache policy check-mk-livestatus
check-mk-livestatus:
  Installed: 1.2.2p3-1ubuntu0.1
  Candidate: 1.2.2p3-1ubuntu0.1
  Version table:
 *** 1.2.2p3-1ubuntu0.1 0
        100 /var/lib/dpkg/status
     1.2.2p3-1 0
        500 http://archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages

Haw Loeung (hloeung) wrote :

Verified on both Xenial and Trusty from -proposed.

Xenial package used: 1.2.6p12-1ubuntu0.16.04.1
Trusty package used: 1.2.2p3-1ubuntu0.1

tags: added: verification-done-xenial
removed: verification-needed-xenial
tags: added: verification-done
Brian Murray (brian-murray) wrote :

I see trusty and xenial being verified, but not yakkety.

tags: added: verification-needed
removed: verification-done
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package check-mk - 1.2.2p3-1ubuntu0.1

---------------
check-mk (1.2.2p3-1ubuntu0.1) trusty; urgency=medium

  * Added patch to fix downtime.h's scheduled_downtime_struct (LP: #1372284)

 -- Haw Loeung <email address hidden> Wed, 05 Apr 2017 09:42:09 -0700

Changed in check-mk (Ubuntu Trusty):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package check-mk - 1.2.6p12-1ubuntu0.16.04.1

---------------
check-mk (1.2.6p12-1ubuntu0.16.04.1) xenial; urgency=medium

  * Added patch to fix downtime.h's scheduled_downtime_struct (LP: #1372284)

 -- Haw Loeung <email address hidden> Wed, 05 Apr 2017 09:34:13 -0700

Changed in check-mk (Ubuntu Xenial):
status: Fix Committed → Fix Released

As part of a recent change in the Stable Release Update verification policy we would like to inform that for a bug to be considered verified for a given release a verification-done-$RELEASE tag needs to be added to the bug where $RELEASE is the name of the series the package that was tested (e.g. verification-done-xenial). Please note that the global 'verification-done' tag can no longer be used for this purpose.

Thank you!

Haw Loeung (hloeung) wrote :

Verified on yakkety with 1.2.6p12-1ubuntu0.16.10.1 from -proposed.

So spun up an instance in canonistack, installed check-mk-livestatus, enabled broker_module, changed logrotate to hourly.

Without fix it would segfault:

[1499047200] Caught SIGSEGV, shutting down...

With updated package, logrotation succeeds.

tags: added: verification-done-yakkety
removed: verification-needed

The verification of the Stable Release Update for check-mk has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package check-mk - 1.2.6p12-1ubuntu0.16.10.1

---------------
check-mk (1.2.6p12-1ubuntu0.16.10.1) yakkety; urgency=medium

  * Added patch to fix downtime.h's scheduled_downtime_struct (LP: #1372284)

 -- Haw Loeung <email address hidden> Wed, 05 Apr 2017 09:34:13 -0700

Changed in check-mk (Ubuntu Yakkety):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.