conflicts with existing 5666/tcp when multiple units on the same host

Bug #1750490 reported by Xav Paice
32
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Charm Helpers
Fix Released
Undecided
Unassigned
NRPE Charm
Fix Released
Undecided
Unassigned

Bug Description

In a situation where we have, e.g., nova-compute and ceph-osd on the same physical host, we want to be able to relate nrpe to both applications to collect any specific monitors they export. If we set nagios_hostname_type to host we then get one Nagios host definition with the checks collated and duplicates resolved. We have been doing this for a while without issue. However, on a model deployed using cs:nrpe-48 I get the following traceback from the nrpe-external-master-relation-changed hook:

2018-02-20 04:56:09 DEBUG monitors-relation-joined ERROR cannot open 5666/tcp (unit "nrpe-host/21"): conflicts with existing 5666/tcp (unit "nrpe-host/6")
2018-02-20 04:56:09 DEBUG monitors-relation-joined Traceback (most recent call last):
2018-02-20 04:56:09 DEBUG monitors-relation-joined File "/var/lib/juju/agents/unit-nrpe-host-21/charm/hooks/monitors-relation-joined", line 3, in <module>
2018-02-20 04:56:09 DEBUG monitors-relation-joined services.manage()
2018-02-20 04:56:09 DEBUG monitors-relation-joined File "/var/lib/juju/agents/unit-nrpe-host-21/charm/hooks/services.py", line 67, in manage
2018-02-20 04:56:09 DEBUG monitors-relation-joined manager.manage()
2018-02-20 04:56:09 DEBUG monitors-relation-joined File "/var/lib/juju/agents/unit-nrpe-host-21/charm/hooks/charmhelpers/core/services/base.py", line 135, in manage
2018-02-20 04:56:09 DEBUG monitors-relation-joined self.reconfigure_services()
2018-02-20 04:56:09 DEBUG monitors-relation-joined File "/var/lib/juju/agents/unit-nrpe-host-21/charm/hooks/charmhelpers/core/services/base.py", line 192, in reconfigure_services
2018-02-20 04:56:09 DEBUG monitors-relation-joined manage_ports])
2018-02-20 04:56:09 DEBUG monitors-relation-joined File "/var/lib/juju/agents/unit-nrpe-host-21/charm/hooks/charmhelpers/core/services/base.py", line 234, in fire_event
2018-02-20 04:56:09 DEBUG monitors-relation-joined callback(self, service_name, event_name)
2018-02-20 04:56:09 DEBUG monitors-relation-joined File "/var/lib/juju/agents/unit-nrpe-host-21/charm/hooks/charmhelpers/core/services/base.py", line 326, in __call__
2018-02-20 04:56:09 DEBUG monitors-relation-joined hookenv.open_port(port, protocol)
2018-02-20 04:56:09 DEBUG monitors-relation-joined File "/var/lib/juju/agents/unit-nrpe-host-21/charm/hooks/charmhelpers/core/hookenv.py", line 669, in open_port
2018-02-20 04:56:09 DEBUG monitors-relation-joined _port_op('open-port', port, protocol)
2018-02-20 04:56:09 DEBUG monitors-relation-joined File "/var/lib/juju/agents/unit-nrpe-host-21/charm/hooks/charmhelpers/core/hookenv.py", line 659, in _port_op
2018-02-20 04:56:09 DEBUG monitors-relation-joined subprocess.check_call(_args)
2018-02-20 04:56:09 DEBUG monitors-relation-joined File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
2018-02-20 04:56:09 DEBUG monitors-relation-joined raise CalledProcessError(retcode, cmd)
2018-02-20 04:56:09 DEBUG monitors-relation-joined subprocess.CalledProcessError: Command '['open-port', '5666/TCP']' returned non-zero exit status 1
2018-02-20 04:56:09 ERROR juju.worker.uniter.operation runhook.go:113 hook "monitors-relation-joined" failed: exit status 1

I guess we need to be checking if the nrpe package is installed and running already and not raising errors if it's already working.

Related branches

Revision history for this message
Fairbanks. (fairbanks) wrote :

This bug effects me too.
This prevents the checks of a specific relation to be activated and monitored.
Hope this can be fixed soon.

Revision history for this message
Haw Loeung (hloeung) wrote :

This is actually in charmhelpers. Either PortManagerCallback() in charmhelpers/core/services/base.py which calls hookenv.open_port() or in hookenv.open_port() directly.

I think we can use hookenv.opened_ports() to obtain a list of already opened ports and skip based on that. It might be best to do that in hookenv.open_port() though and have that fixed for other charms.

Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

I've tried a fix based on the above (skipping based on hookenv.opened_ports()) but that didn't quite work out. Problem here is that hookenv.opened_ports() will only report ports opened _by the same unit_. We'd want opened ports from other units too though.

Revision history for this message
Alvaro Uria (aluria) wrote :

Per https://jujucharms.com/docs/2.3/charms-exposing,
"""
To allow public access to applications, the appropriate changes must be made to the cloud provider firewall settings. As the procedure for doing this varies depending on your cloud, Juju helpfully abstracts this into a single command, juju expose <applicationname>.
"""

I think that's the only case "open-port PORT/proto" is needed, so it should be optional or errors if port is already open should be ignored. However, ignoring an error on B unit, when A already opened it could be an issue if A unit is removed (ie. B could not have access)

This is also related to bug 1668968: Smooshed applications would have multiple nrpe-app1 subordinates to enable logic on the principal units.

Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

FTR., relevant wrt. opened_ports(). Bug #1427770 "opened-ports doesn't include ports opened by other charms"

Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

Ugly workaround if you don't really care abt open-port that much in ~canonical-bootstack/nrpe-charm:make-port-open-errors-nonfatal

Haw Loeung (hloeung)
Changed in charm-helpers:
status: New → In Progress
Changed in nrpe-charm:
status: New → In Progress
Changed in charm-helpers:
assignee: nobody → Haw Loeung (hloeung)
Changed in nrpe-charm:
assignee: nobody → Haw Loeung (hloeung)
Revision history for this message
Xav Paice (xavpaice) wrote :

See https://bugs.launchpad.net/juju/+bug/1750079 for the reason this is happening

Revision history for this message
Haw Loeung (hloeung) wrote :

More on this continued in the charm-helpers PR on GitHub - https://github.com/juju/charm-helpers/pull/152

Revision history for this message
Haw Loeung (hloeung) wrote :

I think what we should do here is update charm-helpers so "open_port return a more informative error message" as stub suggested. Then on port-already-open, NRPE just does nothing.

The issue might be when applications smashed on the same unit are removed, nrpe subordinates related could close off that port, but I guess if you're smooshing applications on the same machine, you're expected to manually fix things.

Changed in charm-helpers:
assignee: Haw Loeung (hloeung) → nobody
Changed in nrpe-charm:
assignee: Haw Loeung (hloeung) → nobody
Changed in charm-helpers:
status: In Progress → Triaged
Changed in nrpe-charm:
status: In Progress → Triaged
Changed in charm-helpers:
status: Triaged → Fix Committed
status: Fix Committed → Fix Released
Changed in nrpe-charm:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.