Nrpe cannot bind to server_address(bind address) due to using a floating IP

Bug #1937888 reported by Nobuto Murata
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
NRPE Charm
Fix Released
High
Xav Paice

Bug Description

$ juju version
2.9.9-ubuntu-amd64

An OpenStack instance/VM where nrpe was deployed has two IP address, 10.5.5.16 as the main and private one, and 192.168.151.74 as a floating IP. The latter one is assigned at a Neutron router for DNAT so it's not bound to the VM itself.

$ openstack server show juju-572b1d-k8s-on-openstack-0 -c addresses
+-----------+------------------------------------+
| Field | Value |
+-----------+------------------------------------+
| addresses | internal=10.5.5.16, 192.168.151.74 |
+-----------+------------------------------------+

Nrpe failed to start by trying to bind the service to 192.168.151.74 which doesn't exist locally. The IP was written to nrpe.cfg by the charm. According to the unit log, it looks like the initial address was 10.5.5.16 which was expected, then changed to 192.168.151.74.

"network-get monitors" doesn't include the floating IP(192.168.151.74) so I'm not sure how the charm got it. Also, I ran "network-get monitors" over night but the output was stable and no floating IP was recorded.

$ juju run -u nrpe/0 -- network-get monitors
bind-addresses:
- mac-address: fa:16:3e:35:a9:b7
  interface-name: ens3
  addresses:
  - hostname: ""
    address: 10.5.5.16
    cidr: 10.5.5.0/24
  macaddress: fa:16:3e:35:a9:b7
  interfacename: ens3
- mac-address: c2:05:47:10:f6:e5
  interface-name: fan-252
  addresses:
  - hostname: ""
    address: 252.16.0.1
    cidr: 252.0.0.0/8
  macaddress: c2:05:47:10:f6:e5
  interfacename: fan-252
egress-subnets:
- 10.5.5.16/32
ingress-addresses:
- 10.5.5.16
- 252.16.0.1

2021-07-23 16:32:36 INFO unit.nrpe/0.juju-log server.go:314 Getting ingress IP address for binding monitors
2021-07-23 16:32:37 INFO unit.nrpe/0.juju-log server.go:314 Using ingress-addresses
2021-07-23 16:32:37 INFO unit.nrpe/0.juju-log server.go:314 10.5.5.16
2021-07-23 16:32:37 INFO unit.nrpe/0.juju-log server.go:314 Getting ingress IP address for binding monitors
2021-07-23 16:32:37 INFO unit.nrpe/0.juju-log server.go:314 Using ingress-addresses
2021-07-23 16:32:37 INFO unit.nrpe/0.juju-log server.go:314 10.5.5.16
--
2021-07-23 16:32:45 INFO unit.nrpe/0.juju-log server.go:314 Getting ingress IP address for binding monitors
2021-07-23 16:32:45 INFO unit.nrpe/0.juju-log server.go:314 Using ingress-addresses
2021-07-23 16:32:45 INFO unit.nrpe/0.juju-log server.go:314 10.5.5.16
--
2021-07-23 16:37:31 INFO unit.nrpe/0.juju-log server.go:314 monitors:20: Getting ingress IP address for binding monitors
2021-07-23 16:37:31 INFO unit.nrpe/0.juju-log server.go:314 monitors:20: Using ingress-addresses
2021-07-23 16:37:31 INFO unit.nrpe/0.juju-log server.go:314 monitors:20: 192.168.151.74
2021-07-23 16:37:31 INFO unit.nrpe/0.juju-log server.go:314 monitors:20: Getting ingress IP address for binding monitors
2021-07-23 16:37:31 INFO unit.nrpe/0.juju-log server.go:314 monitors:20: Using ingress-addresses
2021-07-23 16:37:31 INFO unit.nrpe/0.juju-log server.go:314 monitors:20: 192.168.151.74
--
2021-07-23 16:37:35 INFO unit.nrpe/0.juju-log server.go:314 monitors:20: Getting ingress IP address for binding monitors
2021-07-23 16:37:35 INFO unit.nrpe/0.juju-log server.go:314 monitors:20: Using ingress-addresses
2021-07-23 16:37:35 INFO unit.nrpe/0.juju-log server.go:314 monitors:20: 192.168.151.74

$ cat /etc/nagios/nrpe.cfg
#--------------------------------------------------------
# This file is managed by Juju
#--------------------------------------------------------

# See https://github.com/stockholmuniversity/Nagios-NRPE/blob/2.0.10/share/nrpe.cfg
server_address=192.168.151.74
server_port=5666
allowed_hosts=127.0.0.1,192.168.151.100/32
nrpe_user=nagios
nrpe_group=nagios
dont_blame_nrpe=0
debug=0
command_timeout=60
pid_file=/var/run/nagios/nrpe.pid

# All configuration snippets go into nrpe.d/
include_dir=/etc/nagios/nrpe.d/

$ sudo systemctl status nagios-nrpe-server
● nagios-nrpe-server.service - Nagios Remote Plugin Executor
     Loaded: loaded (/lib/systemd/system/nagios-nrpe-server.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Fri 2021-07-23 16:37:42 UTC; 11h ago
       Docs: http://www.nagios.org/documentation
    Process: 846294 ExecStart=/usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -f $NRPE_OPTS (code=exited, status=1/FAILURE)
    Process: 846303 ExecStopPost=/bin/rm -f /run/nagios/nrpe.pid (code=exited, status=0/SUCCESS)
   Main PID: 846294 (code=exited, status=1/FAILURE)

Jul 23 16:37:42 juju-572b1d-k8s-on-openstack-0 systemd[1]: Started Nagios Remote Plugin Executor.
Jul 23 16:37:42 juju-572b1d-k8s-on-openstack-0 nrpe[846294]: Starting up daemon
Jul 23 16:37:42 juju-572b1d-k8s-on-openstack-0 nrpe[846294]: Bind to port 5666 on 192.168.151.74 failed: Cannot assign requested address.
Jul 23 16:37:42 juju-572b1d-k8s-on-openstack-0 nrpe[846294]: Cannot bind to any address.
Jul 23 16:37:42 juju-572b1d-k8s-on-openstack-0 systemd[1]: nagios-nrpe-server.service: Main process exited, code=exited, status=1/FAILURE
Jul 23 16:37:42 juju-572b1d-k8s-on-openstack-0 systemd[1]: nagios-nrpe-server.service: Failed with result 'exit-code'.

Related branches

Revision history for this message
Nobuto Murata (nobuto) wrote :

Ah, now I get where the floating IP comes from.

$ juju run -u nrpe/0 -- network-get monitors
bind-addresses:
- mac-address: fa:16:3e:35:a9:b7
  interface-name: ens3
  addresses:
  - hostname: ""
    address: 10.5.5.16
    cidr: 10.5.5.0/24
  macaddress: fa:16:3e:35:a9:b7
  interfacename: ens3
- mac-address: c2:05:47:10:f6:e5
  interface-name: fan-252
  addresses:
  - hostname: ""
    address: 252.16.0.1
    cidr: 252.0.0.0/8
  macaddress: c2:05:47:10:f6:e5
  interfacename: fan-252
egress-subnets:
- 10.5.5.16/32
ingress-addresses:
- 10.5.5.16
- 252.16.0.1

$ juju run -u nrpe/0 -- network-get monitors -r monitors:20
bind-addresses:
- mac-address: fa:16:3e:35:a9:b7
  interface-name: ens3
  addresses:
  - hostname: ""
    address: 10.5.5.16
    cidr: 10.5.5.0/24
  macaddress: fa:16:3e:35:a9:b7
  interfacename: ens3
- mac-address: c2:05:47:10:f6:e5
  interface-name: fan-252
  addresses:
  - hostname: ""
    address: 252.16.0.1
    cidr: 252.0.0.0/8
  macaddress: c2:05:47:10:f6:e5
  interfacename: fan-252
egress-subnets:
- 192.168.151.74/32
ingress-addresses:
- 192.168.151.74

description: updated
description: updated
Revision history for this message
Nobuto Murata (nobuto) wrote :

By looking into the code, it states "Get ingress IP address for a binding". If I'm not mistaken, there is no assurance that ingress IP can be used as a bind address.

https://git.launchpad.net/charm-nrpe/tree/hooks/nrpe_helpers.py?h=stable/21.04#n76

The lines where possibly a cause of this behavior:

            if ip_address not in network_info["ingress-addresses"]:
                ip_address = network_info["ingress-addresses"][0]

summary: - server_address(bind address) flapping between local IP and Floating IP
+ Nrpe cannot bind to server_address(bind address) due to using a floating
+ IP
Revision history for this message
Drew Freiberger (afreiberger) wrote :

Thanks for the bug, Nobuto.

That code is our workaround for juju pre-2.9.10 bug.

I just ran into this on an openstack/floating ip deployment, too.

It only happened on two units, and I'm running juju 2.9.12.

It appears that the FIP is part of the bind-addresses in 'network-get monitors' for a short while during deployment, but then it clears out from some cleanup routines in the machine agent.

I found that running 'hooks/config-changed' on the units with the improper IPs after the model settled, it reverted to the actual listening addresses.

NRPE charm is going to need an update-status hook routine that checks the 'nrpe_ipaddress' vs current 'network-get monitors' and perform update to catch the juju network info state change.

Secondarily, to address the other issue of nrpe not listening at all (because it should be able to advertise a floating IP to an external-to-openstack nagios service), we should check that the address detected with the line:

self["nrpe_ipaddress"] = get_local_ingress_address("monitors")

Should be changed to ensure that we set nrpe_ipaddress to None in the template rendering context if the address is not a local IP listed for any interface.

Changed in charm-nrpe:
status: New → Confirmed
importance: Undecided → High
status: Confirmed → Triaged
Revision history for this message
Drew Freiberger (afreiberger) wrote :

Even in Nobuto's comment, the 'network-get monitors' "resolved" the FIP being part of the ingress-addresses between the two command runs, just as I found in my environment. The charm is finishing configuration hooks before juju is running the interface audit cleanup routine added to resolve lp#1897261: https://github.com/juju/juju/pull/13184

Revision history for this message
Drew Freiberger (afreiberger) wrote :

Sorry, I was confused for a moment. Nobuto's second output only shows the FIP and not the local address. That's the opposite of what I'm experiencing at the moment.

Revision history for this message
Xav Paice (xavpaice) wrote :

Reported https://bugs.launchpad.net/charm-nrpe/+bug/1943210 (duplicate). The public address was affecting not only nrpe.cfg but the relation data also.

I'm told that in a cross model relation, there's no way for Juju to know if the private IPs are accessible so will always provide the public IP. The charm uses get_local_ingress_address("monitors") to determine the IP, we'll need to patch it so that if the address isn't on the host, it's not used for the service config, and if nagios_address_type isn't set to 'public' it shouldn't be used in the relation either.

Revision history for this message
Xav Paice (xavpaice) wrote :

subscribed field-high, blocking deployment on a production site.

Xav Paice (xavpaice)
Changed in charm-nrpe:
status: Triaged → In Progress
assignee: nobody → Xav Paice (xavpaice)
milestone: none → 21.10
Xav Paice (xavpaice)
Changed in charm-nrpe:
status: In Progress → Fix Committed
Celia Wang (ziyiwang)
Changed in charm-nrpe:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.