config-changed hook fails: nrpe_helpers.py:nagios_hostname in _metadata_unit with FileNotFoundError error for metadata.yaml of related unit on another host

Bug #1712977 reported by Trent Lloyd on 2017-08-25
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Charm Helpers
Undecided
Unassigned
NRPE Charm
Undecided
Xav Paice
Nagios Charm
Undecided
Xav Paice

Bug Description

Due to recent changes, nrpe_helpers.py:nagios_hostname calls charmhelpers.core.hookenv.principal_unit to get the parent unit's hostname.

On Juju 2.1 or earlier, it does this by walking all relations of the unit and inspecting it's metadata file using _metadata_unit.

This fails because it attempts to inspect units on another host, i.e. the parent nagios unit and _metadata_unit fires a FileNotFoundError exception for the file that doesn't exist.

You can simulate this on Juju 2.2 by commenting out the Juju 2.2 checks at the start and deploying the nagios and nrpe charms on separate machines.

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nrpe-0/charm/hooks/config-changed", line 3, in <module>
    services.manage()
  File "/var/lib/juju/agents/unit-nrpe-0/charm/hooks/services.py", line 27, in manage
    nrpe_helpers.NagiosInfo(),
  File "/var/lib/juju/agents/unit-nrpe-0/charm/hooks/nrpe_helpers.py", line 202, in __init__
    self['nagios_hostname'] = self.principle_relation.nagios_hostname()
  File "/var/lib/juju/agents/unit-nrpe-0/charm/hooks/nrpe_helpers.py", line 163, in nagios_hostname
    principle_unitname = hookenv.principal_unit()
  File "/var/lib/juju/agents/unit-nrpe-0/charm/hooks/charmhelpers/core/hookenv.py", line 219, in principal_unit
    md = _metadata_unit(unit)
  File "/var/lib/juju/agents/unit-nrpe-0/charm/hooks/charmhelpers/core/hookenv.py", line 513, in _metadata_unit
    with open(os.path.join(basedir, unitdir, 'charm', 'metadata.yaml')) as md:
FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/juju/agents/unit-nagios-0/charm/metadata.yaml'

Related branches

Trent Lloyd (lathiat) on 2017-08-25
Changed in charm-helpers:
status: New → Confirmed
Changed in nrpe-charm:
status: New → Confirmed
Xav Paice (xavpaice) on 2017-08-27
tags: added: canonical-bootstack
Xav Paice (xavpaice) wrote :

This also affects the Nagios charm, e.g.:

(added pdb.set_trace() to L210 of core/hookenv.py)

root@juju-machine-0-lxc-6:/var/lib/juju/agents/unit-nagios-0/charm# ./hooks/upgrade-charm
> /var/lib/juju/agents/unit-nagios-0/charm/hooks/charmhelpers/core/hookenv.py(218)principal_unit()
-> for reltype in relation_types():
(Pdb) principal_unit
(Pdb) os.environ['JUJU_UNIT_NAME']
'nagios/0'
(Pdb) relation_types()
['website', 'monitors', 'nagios']

(Pdb) relation_ids('monitors')
[u'monitors:143', u'monitors:144', u'monitors:148']
(Pdb) related_units('monitors:143')
[u'nrpe/0', u'nrpe/1', u'nrpe/10', u'nrpe/11', u'nrpe/12', u'nrpe/13', u'nrpe/14', u'nrpe/15', u'nrpe/16', u'nrpe/17', u'nrpe/19', u'nrpe/2', u'nrpe/20', u'nrpe/3', u'nrpe/4', u'nrpe/5', u'nrpe/6', u'nrpe/7', u'nrpe/8', u'nrpe/9']

(Pdb) _metadata_unit('nrpe/1')
*** IOError: [Errno 2] No such file or directory: '/var/lib/juju/agents/unit-nrpe-1/charm/metadata.yaml'

(Pdb) relation_ids('nagios')
[]

This change seems to come about from rev 773 (merge/rebase) which adds the principal_unit().

Xav Paice (xavpaice) wrote :

FWIW, this problem makes it impossible to upgrade charms to the current head/master.

Fairbanks. (fairbanks) wrote :

I have this problem also.
If i repeatedly do a `juju resolved nrpe/X` it eventually gets to a ready state.
But this very annoying during deployments.

Fairbanks. (fairbanks) wrote :

A Small update.
If you use revision 25 of the charm it will work without errors

`juju deploy cs:nrpe-25 nrpe`
of if you want to switch to this version
`juju upgrade-charm nrpe --revision 25`

You maybe need to do a `juju resolved --no-retry nrpe/X` so that it skips the failed hooks of the "bugged" version.

Stuart Bishop (stub) on 2017-08-28
Changed in charm-helpers:
status: Confirmed → Fix Committed
Xav Paice (xavpaice) wrote :

rev 786 of charmhelpers fixes the bug, need to update both nrpe and nagios charms to collect that update.

https://code.launchpad.net/~xavpaice/nrpe-charm/+git/nrpe-charm/+merge/329767
https://code.launchpad.net/~xavpaice/nagios-charm/+git/nagios-charm/+merge/329768

Xav Paice (xavpaice) wrote :

Fixes available in cs:~nagios-charmers/nagios-5 and cs:~nrpe-charmers/nrpe-20, I don't have perms to release these to the main charm store urls.

Changed in nrpe-charm:
status: Confirmed → Fix Committed
Changed in nagios-charm:
status: New → Fix Committed
Fairbanks. (fairbanks) wrote :

I think the charm has been updated since it now is on revision 30 :)

Haw Loeung (hloeung) wrote :

revision 21 (https://jujucharms.com/u/nrpe-charmers/nrpe/) which is promulgated to cs:nrpe-30.

Changed in nrpe-charm:
status: Fix Committed → Fix Released
Haw Loeung (hloeung) on 2017-08-29
Changed in charm-helpers:
status: Fix Committed → Fix Released
Changed in nrpe-charm:
assignee: nobody → Xav Paice (xavpaice)
Changed in nagios-charm:
assignee: nobody → Xav Paice (xavpaice)
status: Fix Committed → Fix Released
Seyeong Kim (xtrusia) wrote :

hello @xavpaice

I tried to deploy fresh simple ubuntu, nrpe, nagios charm ( relation is ubuntu, nrpe and nrpe:monitors, nagios:monitors)

nagios has config-hooks error

ERROR msg : global name 'self' is not defined

which seems that caused from

https://git.launchpad.net/nagios-charm/commit/?id=38f049516d4865a1e3c1fec5289f6f189fff0631

+ host_context = hookenv.config('nagios_host_context')
+ principal_unitname = hookenv.principal_unit()
+ # Fallback to using "primary" if it exists.
+ if not principal_unitname:
+ for relunit in self[self.name]:
+ if relunit.get('primary', 'False').lower() == 'true':
+ principal_unitname = relunit['__unit__']
+ break

no error in your test env?

On Thu, Aug 31, 2017 at 02:44:06PM -0000, Seyeong Kim wrote:
> I tried to deploy fresh simple ubuntu, nrpe, nagios charm ( relation is
> ubuntu, nrpe and nrpe:monitors, nagios:monitors)
>
> nagios has config-hooks error
>
> ERROR msg : global name 'self' is not defined
>
> which seems that caused from
>
> https://git.launchpad.net/nagios-
> charm/commit/?id=38f049516d4865a1e3c1fec5289f6f189fff0631
>
> + host_context = hookenv.config('nagios_host_context')
> + principal_unitname = hookenv.principal_unit()
> + # Fallback to using "primary" if it exists.
> + if not principal_unitname:
> + for relunit in self[self.name]:
> + if relunit.get('primary', 'False').lower() == 'true':
> + principal_unitname = relunit['__unit__']
> + break
>
> no error in your test env?
>

Can you provide the full traceback?

The code segment above is from the nrpe charm but you're reporting
config-changed in nagios itself?

Seyeong Kim (xtrusia) wrote :
Download full text (5.0 KiB)

no that code is from nagios charm not nrpe charm

full traceback is below

2017-09-01 01:47:50 INFO config-changed Traceback (most recent call last):
2017-09-01 01:47:50 INFO config-changed File "/var/lib/juju/agents/unit-nagios-0/charm/hooks/config-changed", line 339, in <module>
2017-09-01 01:47:50 INFO config-changed update_config()
2017-09-01 01:47:50 INFO config-changed File "/var/lib/juju/agents/unit-nagios-0/charm/hooks/config-changed", line 231, in update_config
2017-09-01 01:47:50 INFO config-changed for relunit in self[self.name]:
2017-09-01 01:47:50 INFO config-changed NameError: global name 'self' is not defined
2017-09-01 01:47:50 ERROR juju.worker.uniter.operation runhook.go:107 hook "config-changed" failed: exit status 1
2017-09-01 01:47:56 WARNING juju-log Please use the generic juju-info or the monitors interface
2017-09-01 01:47:56 DEBUG juju-log Writing file /etc/nagios3/conf.d/extra.cfg root:root 444
2017-09-01 01:47:56 INFO config-changed Traceback (most recent call last):
2017-09-01 01:47:56 INFO config-changed File "/var/lib/juju/agents/unit-nagios-0/charm/hooks/config-changed", line 339, in <module>
2017-09-01 01:47:56 INFO config-changed update_config()
2017-09-01 01:47:56 INFO config-changed File "/var/lib/juju/agents/unit-nagios-0/charm/hooks/config-changed", line 231, in update_config
2017-09-01 01:47:56 INFO config-changed for relunit in self[self.name]:
2017-09-01 01:47:56 INFO config-changed NameError: global name 'self' is not defined
2017-09-01 01:47:56 ERROR juju.worker.uniter.operation runhook.go:107 hook "config-changed" failed: exit status 1
2017-09-01 01:48:07 WARNING juju-log Please use the generic juju-info or the monitors interface
2017-09-01 01:48:07 DEBUG juju-log Writing file /etc/nagios3/conf.d/extra.cfg root:root 444
2017-09-01 01:48:07 INFO config-changed Traceback (most recent call last):
2017-09-01 01:48:07 INFO config-changed File "/var/lib/juju/agents/unit-nagios-0/charm/hooks/config-changed", line 339, in <module>
2017-09-01 01:48:07 INFO config-changed update_config()
2017-09-01 01:48:07 INFO config-changed File "/var/lib/juju/agents/unit-nagios-0/charm/hooks/config-changed", line 231, in update_config
2017-09-01 01:48:07 INFO config-changed for relunit in self[self.name]:
2017-09-01 01:48:07 INFO config-changed NameError: global name 'self' is not defined
2017-09-01 01:48:07 ERROR juju.worker.uniter.operation runhook.go:107 hook "config-changed" failed: exit status 1
2017-09-01 01:48:28 WARNING juju-log Please use the generic juju-info or the monitors interface
2017-09-01 01:48:28 DEBUG juju-log Writing file /etc/nagios3/conf.d/extra.cfg root:root 444
2017-09-01 01:48:28 INFO config-changed Traceback (most recent call last):
2017-09-01 01:48:28 INFO config-changed File "/var/lib/juju/agents/unit-nagios-0/charm/hooks/config-changed", line 339, in <module>
2017-09-01 01:48:28 INFO config-changed update_config()
2017-09-01 01:48:28 INFO config-changed File "/var/lib/juju/agents/unit-nagios-0/charm/hooks/config-changed", line 231, in update_config
2017-09-01 01:48:28 INFO config-changed for relunit in self[self.name]:
2017-09-01 01:48:28 INFO con...

Read more...

Haw Loeung (hloeung) wrote :

On Fri, Sep 01, 2017 at 01:54:38AM -0000, Seyeong Kim wrote:
> no that code is from nagios charm not nrpe charm
>

Ah, okay. I think you should file a new bug against nagios-charm and
reference LP# :1677580.

https://bugs.launchpad.net/nagios-charm

Regards,

Haw

Haw Loeung (hloeung) wrote :

Err.. LP: #1677580

James Hebden (ec0) wrote :

I've just spent some time retesting this in a Juju 2.2.2 environment, and was not able to reproduce the issue reported by Seyeong.

I also could not see a follow-up bug against the Nagios charm. Seyeong, if you are still experiencing this issue, could you please find another bug against the Nagios charm with a little more detail about your test environment?

James Hebden (ec0) wrote :

Correction! I have been able to reproduce this on an older Juju version. I'm pushing up a fix, which I will create a new bug for.

Seyeong Kim (xtrusia) wrote :

Thanks jhebden

I confirmed my testing was done in juju 2.0.3

and it's fine on 2.2.2

James Hebden (ec0) wrote :

I've filed https://bugs.launchpad.net/nagios-charm/+bug/1715086 with a patch that should fix this on 2.0.3 (and older versions) by instead using hookenv.local_unit() call instead of walking relations. Feel free to comment over there and if you get a chance, test out the MP I've created which fixed the bug in my testing.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers