config-changed hook fails: nrpe_helpers.py:nagios_hostname in _metadata_unit with FileNotFoundError error for metadata.yaml of related unit on another host

Bug #1712977 reported by Trent Lloyd
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Charm Helpers
Fix Released
Undecided
Unassigned
NRPE Charm
Fix Released
Undecided
Xav Paice
Nagios Charm
Fix Released
Undecided
Xav Paice

Bug Description

Due to recent changes, nrpe_helpers.py:nagios_hostname calls charmhelpers.core.hookenv.principal_unit to get the parent unit's hostname.

On Juju 2.1 or earlier, it does this by walking all relations of the unit and inspecting it's metadata file using _metadata_unit.

This fails because it attempts to inspect units on another host, i.e. the parent nagios unit and _metadata_unit fires a FileNotFoundError exception for the file that doesn't exist.

You can simulate this on Juju 2.2 by commenting out the Juju 2.2 checks at the start and deploying the nagios and nrpe charms on separate machines.

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nrpe-0/charm/hooks/config-changed", line 3, in <module>
    services.manage()
  File "/var/lib/juju/agents/unit-nrpe-0/charm/hooks/services.py", line 27, in manage
    nrpe_helpers.NagiosInfo(),
  File "/var/lib/juju/agents/unit-nrpe-0/charm/hooks/nrpe_helpers.py", line 202, in __init__
    self['nagios_hostname'] = self.principle_relation.nagios_hostname()
  File "/var/lib/juju/agents/unit-nrpe-0/charm/hooks/nrpe_helpers.py", line 163, in nagios_hostname
    principle_unitname = hookenv.principal_unit()
  File "/var/lib/juju/agents/unit-nrpe-0/charm/hooks/charmhelpers/core/hookenv.py", line 219, in principal_unit
    md = _metadata_unit(unit)
  File "/var/lib/juju/agents/unit-nrpe-0/charm/hooks/charmhelpers/core/hookenv.py", line 513, in _metadata_unit
    with open(os.path.join(basedir, unitdir, 'charm', 'metadata.yaml')) as md:
FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/juju/agents/unit-nagios-0/charm/metadata.yaml'

Related branches

Trent Lloyd (lathiat)
Changed in charm-helpers:
status: New → Confirmed
Changed in nrpe-charm:
status: New → Confirmed
Xav Paice (xavpaice)
tags: added: canonical-bootstack
Revision history for this message
Xav Paice (xavpaice) wrote :

This also affects the Nagios charm, e.g.:

(added pdb.set_trace() to L210 of core/hookenv.py)

root@juju-machine-0-lxc-6:/var/lib/juju/agents/unit-nagios-0/charm# ./hooks/upgrade-charm
> /var/lib/juju/agents/unit-nagios-0/charm/hooks/charmhelpers/core/hookenv.py(218)principal_unit()
-> for reltype in relation_types():
(Pdb) principal_unit
(Pdb) os.environ['JUJU_UNIT_NAME']
'nagios/0'
(Pdb) relation_types()
['website', 'monitors', 'nagios']

(Pdb) relation_ids('monitors')
[u'monitors:143', u'monitors:144', u'monitors:148']
(Pdb) related_units('monitors:143')
[u'nrpe/0', u'nrpe/1', u'nrpe/10', u'nrpe/11', u'nrpe/12', u'nrpe/13', u'nrpe/14', u'nrpe/15', u'nrpe/16', u'nrpe/17', u'nrpe/19', u'nrpe/2', u'nrpe/20', u'nrpe/3', u'nrpe/4', u'nrpe/5', u'nrpe/6', u'nrpe/7', u'nrpe/8', u'nrpe/9']

(Pdb) _metadata_unit('nrpe/1')
*** IOError: [Errno 2] No such file or directory: '/var/lib/juju/agents/unit-nrpe-1/charm/metadata.yaml'

(Pdb) relation_ids('nagios')
[]

This change seems to come about from rev 773 (merge/rebase) which adds the principal_unit().

Revision history for this message
Xav Paice (xavpaice) wrote :

FWIW, this problem makes it impossible to upgrade charms to the current head/master.

Revision history for this message
Fairbanks. (fairbanks) wrote :

I have this problem also.
If i repeatedly do a `juju resolved nrpe/X` it eventually gets to a ready state.
But this very annoying during deployments.

Revision history for this message
Fairbanks. (fairbanks) wrote :

A Small update.
If you use revision 25 of the charm it will work without errors

`juju deploy cs:nrpe-25 nrpe`
of if you want to switch to this version
`juju upgrade-charm nrpe --revision 25`

You maybe need to do a `juju resolved --no-retry nrpe/X` so that it skips the failed hooks of the "bugged" version.

Stuart Bishop (stub)
Changed in charm-helpers:
status: Confirmed → Fix Committed
Revision history for this message
Xav Paice (xavpaice) wrote :

rev 786 of charmhelpers fixes the bug, need to update both nrpe and nagios charms to collect that update.

https://code.launchpad.net/~xavpaice/nrpe-charm/+git/nrpe-charm/+merge/329767
https://code.launchpad.net/~xavpaice/nagios-charm/+git/nagios-charm/+merge/329768

Revision history for this message
Xav Paice (xavpaice) wrote :

Fixes available in cs:~nagios-charmers/nagios-5 and cs:~nrpe-charmers/nrpe-20, I don't have perms to release these to the main charm store urls.

Changed in nrpe-charm:
status: Confirmed → Fix Committed
Changed in nagios-charm:
status: New → Fix Committed
Revision history for this message
Fairbanks. (fairbanks) wrote :

I think the charm has been updated since it now is on revision 30 :)

Revision history for this message
Haw Loeung (hloeung) wrote :

revision 21 (https://jujucharms.com/u/nrpe-charmers/nrpe/) which is promulgated to cs:nrpe-30.

Changed in nrpe-charm:
status: Fix Committed → Fix Released
Haw Loeung (hloeung)
Changed in charm-helpers:
status: Fix Committed → Fix Released
Changed in nrpe-charm:
assignee: nobody → Xav Paice (xavpaice)
Changed in nagios-charm:
assignee: nobody → Xav Paice (xavpaice)
status: Fix Committed → Fix Released
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

hello @xavpaice

I tried to deploy fresh simple ubuntu, nrpe, nagios charm ( relation is ubuntu, nrpe and nrpe:monitors, nagios:monitors)

nagios has config-hooks error

ERROR msg : global name 'self' is not defined

which seems that caused from

https://git.launchpad.net/nagios-charm/commit/?id=38f049516d4865a1e3c1fec5289f6f189fff0631

+ host_context = hookenv.config('nagios_host_context')
+ principal_unitname = hookenv.principal_unit()
+ # Fallback to using "primary" if it exists.
+ if not principal_unitname:
+ for relunit in self[self.name]:
+ if relunit.get('primary', 'False').lower() == 'true':
+ principal_unitname = relunit['__unit__']
+ break

no error in your test env?

Revision history for this message
Haw Loeung (hloeung) wrote : Re: [Bug 1712977] Re: config-changed hook fails: nrpe_helpers.py:nagios_hostname in _metadata_unit with FileNotFoundError error for metadata.yaml of related unit on another host

On Thu, Aug 31, 2017 at 02:44:06PM -0000, Seyeong Kim wrote:
> I tried to deploy fresh simple ubuntu, nrpe, nagios charm ( relation is
> ubuntu, nrpe and nrpe:monitors, nagios:monitors)
>
> nagios has config-hooks error
>
> ERROR msg : global name 'self' is not defined
>
> which seems that caused from
>
> https://git.launchpad.net/nagios-
> charm/commit/?id=38f049516d4865a1e3c1fec5289f6f189fff0631
>
> + host_context = hookenv.config('nagios_host_context')
> + principal_unitname = hookenv.principal_unit()
> + # Fallback to using "primary" if it exists.
> + if not principal_unitname:
> + for relunit in self[self.name]:
> + if relunit.get('primary', 'False').lower() == 'true':
> + principal_unitname = relunit['__unit__']
> + break
>
> no error in your test env?
>

Can you provide the full traceback?

The code segment above is from the nrpe charm but you're reporting
config-changed in nagios itself?

Revision history for this message
Seyeong Kim (seyeongkim) wrote :
Download full text (5.0 KiB)

no that code is from nagios charm not nrpe charm

full traceback is below

2017-09-01 01:47:50 INFO config-changed Traceback (most recent call last):
2017-09-01 01:47:50 INFO config-changed File "/var/lib/juju/agents/unit-nagios-0/charm/hooks/config-changed", line 339, in <module>
2017-09-01 01:47:50 INFO config-changed update_config()
2017-09-01 01:47:50 INFO config-changed File "/var/lib/juju/agents/unit-nagios-0/charm/hooks/config-changed", line 231, in update_config
2017-09-01 01:47:50 INFO config-changed for relunit in self[self.name]:
2017-09-01 01:47:50 INFO config-changed NameError: global name 'self' is not defined
2017-09-01 01:47:50 ERROR juju.worker.uniter.operation runhook.go:107 hook "config-changed" failed: exit status 1
2017-09-01 01:47:56 WARNING juju-log Please use the generic juju-info or the monitors interface
2017-09-01 01:47:56 DEBUG juju-log Writing file /etc/nagios3/conf.d/extra.cfg root:root 444
2017-09-01 01:47:56 INFO config-changed Traceback (most recent call last):
2017-09-01 01:47:56 INFO config-changed File "/var/lib/juju/agents/unit-nagios-0/charm/hooks/config-changed", line 339, in <module>
2017-09-01 01:47:56 INFO config-changed update_config()
2017-09-01 01:47:56 INFO config-changed File "/var/lib/juju/agents/unit-nagios-0/charm/hooks/config-changed", line 231, in update_config
2017-09-01 01:47:56 INFO config-changed for relunit in self[self.name]:
2017-09-01 01:47:56 INFO config-changed NameError: global name 'self' is not defined
2017-09-01 01:47:56 ERROR juju.worker.uniter.operation runhook.go:107 hook "config-changed" failed: exit status 1
2017-09-01 01:48:07 WARNING juju-log Please use the generic juju-info or the monitors interface
2017-09-01 01:48:07 DEBUG juju-log Writing file /etc/nagios3/conf.d/extra.cfg root:root 444
2017-09-01 01:48:07 INFO config-changed Traceback (most recent call last):
2017-09-01 01:48:07 INFO config-changed File "/var/lib/juju/agents/unit-nagios-0/charm/hooks/config-changed", line 339, in <module>
2017-09-01 01:48:07 INFO config-changed update_config()
2017-09-01 01:48:07 INFO config-changed File "/var/lib/juju/agents/unit-nagios-0/charm/hooks/config-changed", line 231, in update_config
2017-09-01 01:48:07 INFO config-changed for relunit in self[self.name]:
2017-09-01 01:48:07 INFO config-changed NameError: global name 'self' is not defined
2017-09-01 01:48:07 ERROR juju.worker.uniter.operation runhook.go:107 hook "config-changed" failed: exit status 1
2017-09-01 01:48:28 WARNING juju-log Please use the generic juju-info or the monitors interface
2017-09-01 01:48:28 DEBUG juju-log Writing file /etc/nagios3/conf.d/extra.cfg root:root 444
2017-09-01 01:48:28 INFO config-changed Traceback (most recent call last):
2017-09-01 01:48:28 INFO config-changed File "/var/lib/juju/agents/unit-nagios-0/charm/hooks/config-changed", line 339, in <module>
2017-09-01 01:48:28 INFO config-changed update_config()
2017-09-01 01:48:28 INFO config-changed File "/var/lib/juju/agents/unit-nagios-0/charm/hooks/config-changed", line 231, in update_config
2017-09-01 01:48:28 INFO config-changed for relunit in self[self.name]:
2017-09-01 01:48:28 INFO con...

Read more...

Revision history for this message
Haw Loeung (hloeung) wrote :

On Fri, Sep 01, 2017 at 01:54:38AM -0000, Seyeong Kim wrote:
> no that code is from nagios charm not nrpe charm
>

Ah, okay. I think you should file a new bug against nagios-charm and
reference LP# :1677580.

https://bugs.launchpad.net/nagios-charm

Regards,

Haw

Revision history for this message
Haw Loeung (hloeung) wrote :

Err.. LP: #1677580

Revision history for this message
James Hebden (ec0) wrote :

I've just spent some time retesting this in a Juju 2.2.2 environment, and was not able to reproduce the issue reported by Seyeong.

I also could not see a follow-up bug against the Nagios charm. Seyeong, if you are still experiencing this issue, could you please find another bug against the Nagios charm with a little more detail about your test environment?

Revision history for this message
James Hebden (ec0) wrote :

Correction! I have been able to reproduce this on an older Juju version. I'm pushing up a fix, which I will create a new bug for.

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

Thanks jhebden

I confirmed my testing was done in juju 2.0.3

and it's fine on 2.2.2

Revision history for this message
James Hebden (ec0) wrote :

I've filed https://bugs.launchpad.net/nagios-charm/+bug/1715086 with a patch that should fix this on 2.0.3 (and older versions) by instead using hookenv.local_unit() call instead of walking relations. Feel free to comment over there and if you get a chance, test out the MP I've created which fixed the bug in my testing.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.