Charm attempts to write to /etc/nagios/nrpe.d/ before nagios is installed

Bug #1882557 reported by Michael Skalka
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack HA Cluster Charm
Fix Released
High
Unassigned

Bug Description

During a CDK deployment on baremetal using the bionic series charm. Charm revision 68. Test run can artifacts can be found here: https://solutions.qa.canonical.com/#/qa/testRun/1b24c2f4-5d56-4160-9252-513bea60cabf

/var/log/juju/unit-hacluster-vault-2.log:

...
2020-06-06 02:01:08 DEBUG config-changed Traceback (most recent call last):
2020-06-06 02:01:08 DEBUG config-changed File "/var/lib/juju/agents/unit-hacluster-vault-2/charm/hooks/config-changed", line 660, in <module>
2020-06-06 02:01:08 DEBUG config-changed hooks.execute(sys.argv)
2020-06-06 02:01:08 DEBUG config-changed File "/var/lib/juju/agents/unit-hacluster-vault-2/charm/charmhelpers/core/hookenv.py", line 943, in execute
2020-06-06 02:01:08 DEBUG config-changed self._hooks[hook_name]()
2020-06-06 02:01:08 DEBUG config-changed File "/var/lib/juju/agents/unit-hacluster-vault-2/charm/hooks/config-changed", line 202, in config_changed
2020-06-06 02:01:08 DEBUG config-changed update_nrpe_config()
2020-06-06 02:01:08 DEBUG config-changed File "/var/lib/juju/agents/unit-hacluster-vault-2/charm/hooks/config-changed", line 612, in update_nrpe_config
2020-06-06 02:01:08 DEBUG config-changed nrpe_setup.write()
2020-06-06 02:01:08 DEBUG config-changed File "/var/lib/juju/agents/unit-hacluster-vault-2/charm/charmhelpers/contrib/charmsupport/nrpe.py", line 315, in write
2020-06-06 02:01:08 DEBUG config-changed self.nagios_servicegroups)
2020-06-06 02:01:08 DEBUG config-changed File "/var/lib/juju/agents/unit-hacluster-vault-2/charm/charmhelpers/contrib/charmsupport/nrpe.py", line 196, in write
2020-06-06 02:01:08 DEBUG config-changed with open(nrpe_check_file, 'w') as nrpe_check_config:
2020-06-06 02:01:08 DEBUG config-changed FileNotFoundError: [Errno 2] No such file or directory: '/etc/nagios/nrpe.d/check_corosync_rings.cfg'
...

Later in the Juju unit logs we see the nagios unit running its install hook:

...
var/log/juju/unit-nagios-0.log:2020-06-06 01:31:15 DEBUG install Get:69 http://archive.ubuntu.com/ubuntu bionic/universe amd64 nagios-images all 0.9.1ubuntu1 [2215 kB]
var/log/juju/unit-nagios-0.log:2020-06-06 01:31:15 DEBUG install Get:70 http://archive.ubuntu.com/ubuntu bionic/universe amd64 nagios-nrpe-plugin amd64 3.2.1-1ubuntu1 [22.9 kB]
var/log/juju/unit-nagios-0.log:2020-06-06 01:31:15 DEBUG install Get:73 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 nagios-plugins-basic all 2.2-3ubuntu3 [6444 B]
var/log/juju/unit-nagios-0.log:2020-06-06 01:31:15 DEBUG install Get:74 http://archive.ubuntu.com/ubuntu bionic/universe amd64 nagios3-common all 3.5.1.dfsg-2.1ubuntu8 [53.5 kB]
var/log/juju/unit-nagios-0.log:2020-06-06 01:31:15 DEBUG install Get:75 http://archive.ubuntu.com/ubuntu bionic/universe amd64 nagios3-cgi amd64 3.5.1.dfsg-2.1ubuntu8 [781 kB]
var/log/juju/unit-nagios-0.log:2020-06-06 01:31:15 DEBUG install Get:76 http://archive.ubuntu.com/ubuntu bionic/universe amd64 nagios3-core amd64 3.5.1.dfsg-2.1ubuntu8 [236 kB]
var/log/juju/unit-nagios-0.log:2020-06-06 01:31:15 DEBUG install Get:77 http://archive.ubuntu.com/ubuntu bionic/universe amd64 nagios3 amd64 3.5.1.dfsg-2.1ubuntu8 [1540 B]
var/log/juju/unit-nagios-0.log:2020-06-06 01:31:15 DEBUG install Get:96 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 nagios-plugins all 2.2-3ubuntu3 [3980 B]
...

In an ideal world the hacluster charm should check if nagios is installed before writing that file.

Revision history for this message
Joshua Genet (genet022) wrote :
Changed in charm-hacluster:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Aurelien Lourot (aurelien-lourot)
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Thanks for reporting! I'm still trying to reproduce it locally with a simpler bundle. When the issue happen we see that:

- writing to /etc/nagios/nrpe.d/ is done on hacluster-vault/2
- installing nagios is done on nagios/0

and these two units happen to have ended up on the same machine (10). The original bundle doesn't enforce that, so if you're lucky hacluster-vault and nagios don't have common machines and everything is fine.

Revision history for this message
Michael Skalka (mskalka) wrote :

Saw another instance of this: https://solutions.qa.canonical.com/#/qa/testRun/774e899b-522e-43d6-9b19-120862b98b32

hacluster-vault attempts to write to /etc/nagios/nrpe.d/check_corosync_rings.cfg before the nrpe juju agent has even initialized.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

@aurelien-lourot this doesn't appear to have anything to do with the nagios application; it's just trying to write to a directory that doesn't exist during the config-changed hook. I assume the nrpe application is responsible for setting up that directory and it hasn't run yet.

Changed in charm-hacluster:
assignee: Aurelien Lourot (aurelien-lourot) → Alex Kavanagh (ajkavanagh)
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

So, after a little digging, I've discovered that the '/etc/nagios/nrpe.d' directory gets created by the nrpe subordinate charm. However, looking at the model/progress of deploy it looks like that hasn't managed to get installed yet. That the hacluster subordinate charm errors out due to the directory not yet existing.

I had the idea of putting some code into charm-helpers to check for the directory / write permissions and abort the writes (without error). This would allow the model to continue deploying and with later hook executions the nrpe configs should get written when the directory exists.

However, I'm a bit worried that there may not *be* a later hook execution if the nrpe subordinate is 'late'; i.e. I'm not sure if there is a relation that will kick it via the nrpe-external-master interface. It depends on whether the relation is set between the nrpe subordinate and the hacluster charm? However, I don't want to force-create it in charm-helpers (if it can be avoided) as that will be (at least) TWO places where code would have to be maintained to create directories, and if they needed to be changed in the future (e.g. permissions), odd bugs may resurface in the future, especially as they might be timing related (e.g. order of deployment of subordinate charms).

I'll sort out a test charm, with my first approach.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

bumped from field high to field crit as SQA's hit this 22 times in the last week, completely blocking our testing of baremetal kubernetes.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

I have a fix for this; I'm just testing that it actually works at the moment.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-hacluster (master)

Fix proposed to branch: master
Review: https://review.opendev.org/740979

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

It appears we're still hitting this with the proposed fix in place:

https://solutions.qa.canonical.com/qa/testRun/55db29e0-8c09-4ffe-856d-310de96e2851

Revision history for this message
David Ames (thedac) wrote :

FWIW, based on line numbers the run from #10 is not using Alex's change [0], so may not be a valid test.

2020-07-14 21:53:03 DEBUG config-changed Traceback (most recent call last):
2020-07-14 21:53:03 DEBUG config-changed File "/var/lib/juju/agents/unit-hacluster-vault-2/charm/hooks/config-changed", line 660, in <module>
2020-07-14 21:53:03 DEBUG config-changed hooks.execute(sys.argv)
2020-07-14 21:53:03 DEBUG config-changed File "/var/lib/juju/agents/unit-hacluster-vault-2/charm/charmhelpers/core/hookenv.py", line 943, in execute
2020-07-14 21:53:03 DEBUG config-changed self._hooks[hook_name]()
2020-07-14 21:53:03 DEBUG config-changed File "/var/lib/juju/agents/unit-hacluster-vault-2/charm/hooks/config-changed", line 202, in config_changed
2020-07-14 21:53:03 DEBUG config-changed update_nrpe_config()
2020-07-14 21:53:03 DEBUG config-changed File "/var/lib/juju/agents/unit-hacluster-vault-2/charm/hooks/config-changed", line 612, in update_nrpe_config
2020-07-14 21:53:03 DEBUG config-changed nrpe_setup.write()
2020-07-14 21:53:03 DEBUG config-changed File "/var/lib/juju/agents/unit-hacluster-vault-2/charm/charmhelpers/contrib/charmsupport/nrpe.py", line 315, in write
2020-07-14 21:53:03 DEBUG config-changed self.nagios_servicegroups)
2020-07-14 21:53:03 DEBUG config-changed File "/var/lib/juju/agents/unit-hacluster-vault-2/charm/charmhelpers/contrib/charmsupport/nrpe.py", line 196, in write
2020-07-14 21:53:03 DEBUG config-changed with open(nrpe_check_file, 'w') as nrpe_check_config:
2020-07-14 21:53:03 DEBUG config-changed FileNotFoundError: [Errno 2] No such file or directory: '/etc/nagios/nrpe.d/check_corosync_rings.cfg'

[0] https://github.com/ajkavanagh/charm-helpers/blob/bug/1882557/charmhelpers/contrib/charmsupport/nrpe.py

Revision history for this message
Jason Hobbs (jason-hobbs) wrote : Re: [Bug 1882557] Re: Charm attempts to write to /etc/nagios/nrpe.d/ before nagios is installed
Download full text (6.1 KiB)

Ahh, dang, we have an overlay that is overriding the versions of the charms
there, you're right, invalid test, sorry! we'll retest.

On Tue, Jul 14, 2020 at 6:00 PM David Ames <email address hidden>
wrote:

> FWIW, based on line numbers the run from #10 is not using Alex's change
> [0], so may not be a valid test.
>
> 2020-07-14 21:53:03 DEBUG config-changed Traceback (most recent call last):
> 2020-07-14 21:53:03 DEBUG config-changed File
> "/var/lib/juju/agents/unit-hacluster-vault-2/charm/hooks/config-changed",
> line 660, in <module>
> 2020-07-14 21:53:03 DEBUG config-changed hooks.execute(sys.argv)
> 2020-07-14 21:53:03 DEBUG config-changed File
> "/var/lib/juju/agents/unit-hacluster-vault-2/charm/charmhelpers/core/hookenv.py",
> line 943, in execute
> 2020-07-14 21:53:03 DEBUG config-changed self._hooks[hook_name]()
> 2020-07-14 21:53:03 DEBUG config-changed File
> "/var/lib/juju/agents/unit-hacluster-vault-2/charm/hooks/config-changed",
> line 202, in config_changed
> 2020-07-14 21:53:03 DEBUG config-changed update_nrpe_config()
> 2020-07-14 21:53:03 DEBUG config-changed File
> "/var/lib/juju/agents/unit-hacluster-vault-2/charm/hooks/config-changed",
> line 612, in update_nrpe_config
> 2020-07-14 21:53:03 DEBUG config-changed nrpe_setup.write()
> 2020-07-14 21:53:03 DEBUG config-changed File
> "/var/lib/juju/agents/unit-hacluster-vault-2/charm/charmhelpers/contrib/charmsupport/nrpe.py",
> line 315, in write
> 2020-07-14 21:53:03 DEBUG config-changed self.nagios_servicegroups)
> 2020-07-14 21:53:03 DEBUG config-changed File
> "/var/lib/juju/agents/unit-hacluster-vault-2/charm/charmhelpers/contrib/charmsupport/nrpe.py",
> line 196, in write
> 2020-07-14 21:53:03 DEBUG config-changed with open(nrpe_check_file,
> 'w') as nrpe_check_config:
> 2020-07-14 21:53:03 DEBUG config-changed FileNotFoundError: [Errno 2] No
> such file or directory: '/etc/nagios/nrpe.d/check_corosync_rings.cfg'
>
> [0] https://github.com/ajkavanagh/charm-
> helpers/blob/bug/1882557/charmhelpers/contrib/charmsupport/nrpe.py
>
> --
> You received this bug notification because you are a member of Canonical
> Field Critical, which is subscribed to the bug report.
> https://bugs.launchpad.net/bugs/1882557
>
> Title:
> Charm attempts to write to /etc/nagios/nrpe.d/ before nagios is
> installed
>
> Status in OpenStack hacluster charm:
> In Progress
>
> Bug description:
> During a CDK deployment on baremetal using the bionic series charm.
> Charm revision 68. Test run can artifacts can be found here:
>
> https://solutions.qa.canonical.com/#/qa/testRun/1b24c2f4-5d56-4160-9252-513bea60cabf
>
> /var/log/juju/unit-hacluster-vault-2.log:
>
> ...
> 2020-06-06 02:01:08 DEBUG config-changed Traceback (most recent call
> last):
> 2020-06-06 02:01:08 DEBUG config-changed File
> "/var/lib/juju/agents/unit-hacluster-vault-2/charm/hooks/config-changed",
> line 660, in <module>
> 2020-06-06 02:01:08 DEBUG config-changed hooks.execute(sys.argv)
> 2020-06-06 02:01:08 DEBUG config-changed File
> "/var/lib/juju/agents/unit-hacluster-vault-2/charm/charmhelpers/core/hookenv.py",
> line 943, in execute
> 2020-06-06 02:01:0...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-hacluster (master)

Reviewed: https://review.opendev.org/740979
Committed: https://git.openstack.org/cgit/openstack/charm-hacluster/commit/?id=24fa642247b0692de387df417b7862ee450e15b9
Submitter: Zuul
Branch: master

commit 24fa642247b0692de387df417b7862ee450e15b9
Author: Alex Kavanagh <email address hidden>
Date: Tue Jul 14 09:58:50 2020 +0100

    Fix directory /etc/nagios/nrpe.d/ issue

    Under certain deployment conditions, the charm can attempt to write to
    the /etc/nagios/nrpe.d/ directory before it exists. This directory is
    created by the nrpe charm, but if the hacluster (this charm) gets
    installed first, then it can be triggered to attempt to set up the nrpe
    entries before the directory can be created by nrpe. This change (and
    the associated charm-helpers change) ensures that the charm will delay
    the nrpe config until the directory is available (and thus, the nrpe
    charm is fully installed)

    Related charm-helpers: https://github.com/juju/charm-helpers/pull/492

    Change-Id: Ibcbb5f56205b72c475807e3c34c64a00844908f4
    Closes-Bug: #1882557

Changed in charm-hacluster:
status: In Progress → Fix Committed
Changed in charm-hacluster:
assignee: Alex Kavanagh (ajkavanagh) → nobody
James Page (james-page)
Changed in charm-hacluster:
milestone: none → 20.08
Changed in charm-hacluster:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.