syslog flooded with "refused notify from non-primary"

Bug #2000711 reported by Silviu Panica
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Designate-Bind Charm
New
Undecided
Unassigned

Bug Description

bind process of designate-bind generates a large amount of 'zone ex1.dns.zone/IN: refused notify from non-primary: designate-bind_ip_X#port' messages (>30/s for around 12 DNS zones) that makes syslog extremely large and in a few days (>200GB of text data). designate-bind_ip_X are the IPs of the designate-bind instances.

My setup uses the following configuration:
 * juju version 2.9.37
 * operating system: Ubuntu 22.04 x86_64
 * designate-bind, channel latest/edge, ver. 9.18.1, rev 84;
 * designate-bind-hacluster, channel latest/edge, rev. 116;
 * designate, channel latest/edge, ver. 15.0.0, rev. 117;
 * designate-hacluster, channel latest/edge, rev. 104;

I seems that the bind instances deployed by designate-bind issue notify requests to the other instances in the HA cluster. From what I have analysed so far it seems that the leader node sends notifies to the other nodes in the cluster but it keeps generate the messages because requests are rejected.

These requests are rejected because `named.conf.options` is configured to accept notifies only from the designate instances:

# /etc/bind/named.conf.options
allow-notify { designate_ip_1;designate_ip_2;designate_ip_3; };

At some point syslog rotation mechanism fails to work (because of the syslog files being to large) and the local disk storage becomes full.

The only workaround I have right now is a cron job that checks the syslog generated files and deletes them from time to time.

Another workaround I am thinking is to create a custom designate-bind charm to include also the the designate-bind IPs in the `allow-notify` in bind but I don't what further implications this may introduce.

Any advice on how to solve this issue would be helpful.

Thanks,
Silviu.

Tags: bind syslog
Revision history for this message
Silviu Panica (silviu001) wrote :

I did some extra research and according to bind9 documentation:

https://bind9.readthedocs.io/en/v9_18_1/advanced.html#notify

```
As a secondary zone can also be a primary to other secondaries, named, by default, sends NOTIFY messages for every zone it loads. Specifying notify primary-only; causes named to only send NOTIFY for primary zones that it loads.
```

it is the default behaviour for the slaves to send a NOTIFY to all the other name servers in the zones if `notify primary-only;` is not set and at least for edge/latest is not.

So a mitigation to this should be either to add the designate-bind endpoints to `allow-notify` or to set `notify primary-only;` in named.conf.options.

Revision history for this message
Silviu Panica (silviu001) wrote :

I have managed to patch the running instances using the following:

juju ssh designate-bind/ID -- "exec -- sudo sed '/allow-notify { {{ .*/s/$/\n\tnotify primary-only;/' /var/lib/juju/agents/unit-designate-bind-ID/charm/templates/named.conf.options"

juju ssh designate-bind/ID -- systemctl restart jujud-machine-MACHINE_ID-lxd-CONTAINER_ID.service

where:
 - ID - desigante-bind instance id (I am running in HA mode with 3 instances)
 - MACHINE_ID - the id of the juju machine where designate-bind instance LXD container is hosted
 - CONTAINER_ID - the id of the designate-bind instance LXD container

This fix works for my juju deployment and stops syslog flooding with 'refused notify from non-primary'.

Both designate and designate-bind are running without issues (I've conducted a couple of tests).

Revision history for this message
DUFOUR Olivier (odufourc) wrote :

I have not been able to reproduce on my side.

But I'm noticing that hacluster charm is deployed on designate-bind units where it is not necessary, it may be the cause of the issue.

Revision history for this message
Silviu Panica (silviu001) wrote (last edit ):

In order to use virtual IPs feature (`service_ips` configuration variable, available in zed or latest) a subordinate hacluster relation is needed.

I make use of this feature on my deployment.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.