nagios3 + livestatus: SIGSEGV everyday at midnight
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| check-mk (Debian) |
New
|
Unknown
|
||
| check-mk (Ubuntu) |
Medium
|
Unassigned | ||
| Trusty |
Undecided
|
Unassigned | ||
| Xenial |
Undecided
|
Unassigned | ||
| Yakkety |
Undecided
|
Unassigned | ||
| Zesty |
Medium
|
Unassigned |
Bug Description
[Impact]
- Ubuntu 14.04, 16.04, 16.10, and 17.04
Nagios goes down everyday at midnight with livestatus enabled and downtime configured.
Here are bug reports in other trackers:
https:/
http://
http://
People say this patch helps:
http://
[Test Case]
* Enable check-mk-livestatus plugin (add broker_
* Wait for log rotation (or update log_rotation_method in nagios3/nagios.cfg to rotate hourly so it happens more often)
* Without the fix, nagios segfaults, entry in /var/log/
| [1486857600] Caught SIGSEGV, shutting down...
* With the fix, nagios logrotation succeeds and rotated logs present in /var/log/
[Regression Potential]
Nagios will continue to segfault during logrotation.
[Other Info]
check-mk ships out a different version of nagios/downtime.h which differs from nagios3's downtime.h (defined struct scheduled_
Robie Basak (racb) wrote : | #1 |
affects: | nagios3 (Ubuntu) → check-mk (Ubuntu) |
tags: | added: needs-upstream-report |
Right, just noticed that this is a livestatus bug. Should I create another report for check-mk-
On Mon, Sep 22, 2014 at 08:28:44AM -0000, Stanislav German-Evtushenko wrote:
> Right, just noticed that this is a livestatus bug. Should I create
> another report for check-mk-
No need; I reassigned it.
summary: |
- nagios3: SIGSEGV everyday at midnight + nagios3 + livestatus: SIGSEGV everyday at midnight |
I did manage to rebuild check-mk with the following files from nagios 3.5.1:
*******
include/broker.h
include/cgiauth.h
include/cgiutils.h
include/comments.h
include/common.h
include/compat.h
include/config.h
include/downtime.h
include/locations.h
include/logging.h
include/macros.h
include/nagios.h
include/
include/neberrors.h
include/nebmods.h
include/
include/
include/objects.h
include/perfdata.h
include/shared.h
include/skiplist.h
include/
include/
*******
Where the following small changes are applied:
*******
diff -ru include/nagios.h nagios/nagios.h
--- include/nagios.h 2013-08-30 21:46:14.000000000 +0400
+++ nagios/nagios.h 2014-09-22 16:17:53.006908559 +0400
@@ -28,6 +28,7 @@
# define NSCORE
#endif
+#include "config.h"
#include "compat.h"
#include "logging.h"
#include "common.h"
diff -ru include/objects.h nagios/objects.h
--- include/objects.h 2013-08-30 21:46:14.000000000 +0400
+++ nagios/objects.h 2014-09-22 16:30:19.467335449 +0400
@@ -174,7 +174,7 @@
/* COMMANDSMEMBER structure */
typedef struct commandsmember_
- char *command;
+ char *command_dummy;
#ifdef NSCORE
command *command_ptr;
#endif
*******
livestatus archive with updated headers is attached.
Jan Wagner (waja) wrote : | #5 |
> This is probably relevant to Debian's check-mk packaging, too.
As this is probably true.
> Checking in Debian and reporting there if this bug applies to Debian would be
> appropriate.
This wouldn´t help much. check-mk is removed from testing and not
touched since ~1y, with 20 bugs open, 6 of them security. It´s more
likely to get removed from Debian.
Even nagios3 is in legacy mainainance and is not getting updated to
newer versions as they are all more or less broken. I would guess, it
will be also removed anythime, if there will be a drop in replacement
available.
If anybody would ask me, as part of the Debian Nagios Mainainers Group,
I would tying to avoid to use one of these both packages.
Cheers, Jan.
Launchpad Janitor (janitor) wrote : | #6 |
Status changed to 'Confirmed' because the bug affects multiple users.
Changed in check-mk (Ubuntu): | |
status: | New → Confirmed |
Simon Déziel (sdeziel) wrote : | #7 |
This affects us as well on Trusty and we'd really appreciate if [1] could be applied to nagios3.
Thanks in advance.
Robie Basak (racb) wrote : | #8 |
Simon,
As far as I can tell the correct fix is in check-mk, not in nagios3, and I've explained my reasoning above. Without an explanation of why this is wrong and the nagios3 package should be changed instead, I don't think we can make any progress in fixing this in nagios3 in Trusty.
Simon Déziel (sdeziel) wrote : | #9 |
Robie, you are right. I'll see if something can be done with check-mk to make it use the proper headers if not the case already with newer versions.
Thanks and sorry for the noise.
Pablo Estigarribia (pablodav) wrote : | #10 |
Hello,
I have been working hard looking for a workaround and for the root cause of it.
I had 4 servers installed and only one failing with this bug, so I tried to find the root cause of the issue.
I had reinstalled a server and migrated all history, config, etc to see if it had something related.
After many tests on both the new and the old one I found a workaround:
First as root:
service nagios3 stop
cd /var/lib/nagios3
rm spool/checkresu
I have also deleted retention file but I think it is not a cause of the problem because I tried only with this file before and nothing was fixed.
rm retention.dat
Then started the service
service nagios3 start
I have used hourly rotation option on nagios.cfg file, so I tested it many times and is working smooth now.
Pablo Estigarribia (pablodav) wrote : | #11 |
I have found more information on:
http://
Seems that it's recommended to have enable_
Pablo Estigarribia (pablodav) wrote : | #12 |
I have confirmed in my case if I enable environment_macros, log rotation crashes.
Then I have to diable and to exactly same steps as comment #10 to fix it
Pablo Estigarribia (pablodav) wrote : | #13 |
I have found that scheduling a downtime always crashes check-mk-
https:/
I will try to test Ubuntu 16.04 as it was fixed on centos/redhat but not on Ubuntu 14.04
Pablo Estigarribia (pablodav) wrote : | #14 |
I have solved it upgrading from nagios4 sources to nagios4.
Haw Loeung (hloeung) wrote : | #15 |
I have a working patch to fix this for Xenial. It's packaged in my PPA[1] and tested. The fix updates check-mk, so the shipped downtime.h struct matches what's shipped in nagios3, see diff[2].
| [1487560622] Caught SIGHUP, restarting...
| [1487560624] livestatus: Socket thread has terminated
| [1487560624] Event broker module '/usr/lib/
| [1487560624] Nagios 3.5.1 starting... (PID=28200)
| [1487560624] Local time is Mon Feb 20 03:17:04 UTC 2017
| [1487560624] LOG VERSION: 2.0
| [1487560624] livestatus: Livestatus 1.2.6p12 by Mathias Kettner. Socket: '/var/lib/
| [1487560624] livestatus: Please visit us at http://
| [1487560624] livestatus: Hint: please try out OMD - the Open Monitoring Distribution
| [1487560624] livestatus: Please visit OMD at http://
| [1487560624] livestatus: Finished initialization. Further log messages go to /var/log/
| [1487560624] Event broker module '/usr/lib/
Anyone else able to help test it?
[1]https:/
[2]https:/
Changed in check-mk (Ubuntu): | |
status: | Confirmed → In Progress |
assignee: | nobody → Haw Loeung (hloeung) |
Haw Loeung (hloeung) wrote : | #16 |
The attachment "check-
[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]
tags: | added: patch |
tags: | added: trusty |
affects: | check-mk (Fedora) → ubuntu |
Changed in ubuntu: | |
importance: | Unknown → Undecided |
status: | Unknown → New |
no longer affects: | ubuntu |
Changed in check-mk (Ubuntu): | |
importance: | Undecided → Medium |
tags: | added: xenial |
Haw Loeung (hloeung) wrote : | #18 |
Backported Trusty version with patch seems to look good as well and hasn't crashed.
tags: | added: canonical-bootstack |
Nish Aravamudan (nacc) wrote : | #19 |
I'm unsubscribing sponsors for now, as the patch attachment does not have a valid version for uploading to the archive and seems to also be needed for Trusty.
Nish Aravamudan (nacc) wrote : | #20 |
Also does this happen in 17.04?
Haw Loeung (hloeung) wrote : | #21 |
Nish, what do you mean by "does not have a valid version for uploading"? I've attached a patch that applies to both Xenial and Trusty[1][2]. I've used that to build packages for both distros in a PPA[3] and tested on the various nagios instances we have internally (combination of Trusty and Xenial).
[1]https:/
[2]https:/
[3]https:/
Nish Aravamudan (nacc) wrote : Re: [Bug 1372284] Re: nagios3 + livestatus: SIGSEGV everyday at midnight | #22 |
Haw,
1.2.6p12-1haw1
1.2.2p3-1haw1
are not appropriate versions for the archive. Please follow the
versioning recommendations at:
https:/
-Nish
On Fri, Mar 31, 2017 at 8:46 PM, Haw Loeung <email address hidden> wrote:
> Nish, what do you mean by "does not have a valid version for uploading"?
> I've attached a patch that applies to both Xenial and Trusty[1][2]. I've
> used that to build packages for both distros in a PPA[3] and tested on
> the various nagios instances we have internally (combination of Trusty
> and Xenial).
>
> [1]https:/
> [2]https:/
> [3]https:/
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> nagios3 + livestatus: SIGSEGV everyday at midnight
>
> To manage notifications about this bug go to:
> https:/
Haw Loeung (hloeung) wrote : | #23 |
On Mon, Apr 03, 2017 at 04:58:15PM -0000, Nish Aravamudan wrote:
> Haw,
> 1.2.6p12-1haw1
>
> 1.2.2p3-1haw1
>
> are not appropriate versions for the archive. Please follow the
> versioning recommendations at:
> https:/
Ah right, I've created a new PPA and re-uploaded with the correct
versions[1].
Thanks for your time!
Nish Aravamudan (nacc) wrote : | #24 |
Thanks for that fix. However, can you please provide debdiffs in this bug as well for sponsoring purposes? Couple of other nits:
I took a look at the PPA's generated debdiffs and saw that (at least) the xenial one incorrectly indicates in debian/changelog that 1.2.6p12-1 is from xenial, when it is from Debian unstable (i'm not sure how that happened).
Also, please run `update-maintainer` since there is now an Ubuntu delta.
It appears this is something that also needs fixing in 17.04 and 16.10, can debdiffs be provided for those? SRUs can't be sponsored without the development release being fixed first.
And finally, is this a bug upstream? Or a backport of an upstream fix? Can there be DEP3 headers as appropriate? http://
Haw Loeung (hloeung) wrote : | #25 |
tags: | removed: needs-upstream-report |
Haw Loeung (hloeung) wrote : | #26 |
Haw Loeung (hloeung) wrote : | #27 |
Haw Loeung (hloeung) wrote : | #28 |
Changed in check-mk (Ubuntu Trusty): | |
status: | New → Triaged |
Changed in check-mk (Ubuntu Xenial): | |
status: | New → Triaged |
Changed in check-mk (Ubuntu Yakkety): | |
status: | New → Triaged |
description: | updated |
Changed in check-mk (Debian): | |
status: | Unknown → New |
description: | updated |
Launchpad Janitor (janitor) wrote : | #29 |
This bug was fixed in the package check-mk - 1.2.8p16-1ubuntu0.1
---------------
check-mk (1.2.8p16-
* Added patch to fix downtime.h's scheduled_
-- Haw Loeung <email address hidden> Tue, 04 Apr 2017 17:17:00 -0700
Changed in check-mk (Ubuntu Zesty): | |
status: | In Progress → Fix Released |
Changed in check-mk (Ubuntu Zesty): | |
assignee: | Haw Loeung (hloeung) → nobody |
Nish Aravamudan (nacc) wrote : | #30 |
Haw,
I have sponsored the uploads for SRU, with the following change:
Xenial => 1.2.6p12-1ubuntu0.1 -> 1.2.16p12-
and uploaded the same for Yakkety:
Yakkety => 1.2.6p12-1ubuntu0.1 -> 1.2.16p12-
Hello Stanislav, or anyone else affected,
Accepted check-mk into yakkety-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-
Further information regarding the verification process can be found at https:/
Changed in check-mk (Ubuntu Yakkety): | |
status: | Triaged → Fix Committed |
tags: | added: verification-needed |
Changed in check-mk (Ubuntu Xenial): | |
status: | Triaged → Fix Committed |
Brian Murray (brian-murray) wrote : | #32 |
Hello Stanislav, or anyone else affected,
Accepted check-mk into xenial-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-
Further information regarding the verification process can be found at https:/
Brian Murray (brian-murray) wrote : | #33 |
Hello Stanislav, or anyone else affected,
Accepted check-mk into trusty-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-
Further information regarding the verification process can be found at https:/
Changed in check-mk (Ubuntu Trusty): | |
status: | Triaged → Fix Committed |
tags: |
added: verification-done-trusty verification-needed-xenial removed: verification-needed |
Nish Aravamudan (nacc) wrote : Re: [Bug 1372284] Re: nagios3 + livestatus: SIGSEGV everyday at midnight | #34 |
Hello Simon,
Please don't just change the flag values, but provide some indication
of what version was tested and what the test was.
2) of https:/
Simon Déziel (sdeziel) wrote : | #35 |
Right, sorry. I tested on Trusty by installing the -proposed package (see below) yesterday then checked this AM if nagios had crashed during the log rotation (midnight). It didn't while it used to crash every single nights before. So the problem is fixed by the -proposed package, thanks!
$ apt-cache policy check-mk-livestatus
check-mk-
Installed: 1.2.2p3-1ubuntu0.1
Candidate: 1.2.2p3-1ubuntu0.1
Version table:
*** 1.2.2p3-1ubuntu0.1 0
100 /var/lib/
1.2.2p3-1 0
500 http://
Haw Loeung (hloeung) wrote : | #36 |
Verified on both Xenial and Trusty from -proposed.
Xenial package used: 1.2.6p12-
Trusty package used: 1.2.2p3-1ubuntu0.1
tags: |
added: verification-done-xenial removed: verification-needed-xenial |
tags: | added: verification-done |
Brian Murray (brian-murray) wrote : | #37 |
I see trusty and xenial being verified, but not yakkety.
tags: |
added: verification-needed removed: verification-done |
Launchpad Janitor (janitor) wrote : | #38 |
This bug was fixed in the package check-mk - 1.2.2p3-1ubuntu0.1
---------------
check-mk (1.2.2p3-
* Added patch to fix downtime.h's scheduled_
-- Haw Loeung <email address hidden> Wed, 05 Apr 2017 09:42:09 -0700
Changed in check-mk (Ubuntu Trusty): | |
status: | Fix Committed → Fix Released |
Launchpad Janitor (janitor) wrote : | #39 |
This bug was fixed in the package check-mk - 1.2.6p12-
---------------
check-mk (1.2.6p12-
* Added patch to fix downtime.h's scheduled_
-- Haw Loeung <email address hidden> Wed, 05 Apr 2017 09:34:13 -0700
Changed in check-mk (Ubuntu Xenial): | |
status: | Fix Committed → Fix Released |
As part of a recent change in the Stable Release Update verification policy we would like to inform that for a bug to be considered verified for a given release a verification-
Thank you!
Haw Loeung (hloeung) wrote : | #41 |
Verified on yakkety with 1.2.6p12-
So spun up an instance in canonistack, installed check-mk-
Without fix it would segfault:
[1499047200] Caught SIGSEGV, shutting down...
With updated package, logrotation succeeds.
tags: |
added: verification-done-yakkety removed: verification-needed |
The verification of the Stable Release Update for check-mk has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.
Launchpad Janitor (janitor) wrote : | #43 |
This bug was fixed in the package check-mk - 1.2.6p12-
---------------
check-mk (1.2.6p12-
* Added patch to fix downtime.h's scheduled_
-- Haw Loeung <email address hidden> Wed, 05 Apr 2017 09:34:13 -0700
Changed in check-mk (Ubuntu Yakkety): | |
status: | Fix Committed → Fix Released |
Thank you for taking the time to report this bug and helping to make Ubuntu better.
Looks like the check-mk source package embeds a copy of nagios' downtime.h (along with other header files), and that Nagios has updated a struct since then.
Patching Nagios doesn't make sense here, from a distribution perspective. The real fix is to fix check-mk's packaging to use a build dependency on some kind of nagios-dev package which provides the required headers. So this is a bug in check-mk, not in nagios.
This is probably relevant to Debian's check-mk packaging, too. Checking in Debian and reporting there if this bug applies to Debian would be appropriate.