generates duplicate zone records if overlapping subnets are used which leads to bind9 failures: '36.232.10.in-addr.arpa': already exists previous definition

Bug #1683047 reported by Dmitrii Shcherbakov
54
This bug affects 7 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Christian Grabowski

Bug Description

The setup is attached as a picture.

Situation:

2 overlapping subnets (which are observed on different L2s though):
    - one is 10.232.36.0/24 (IS-managed, used for BMC access via iLo interfaces)
    - 10.232.32.0/21 (a subnet used by me in one of the VLANs - overlaps with 10.232.36.0/24 but who cares as long as I reserve that /24 range in a /21 range and they are on different L2s)
    - two different fabrics (IS-managed VLAN and subnet are not on the same thunk as my VLAN and subnet)
    - no ip ranges configured in both of the subnets (just for simplicity's sake)
    - static addresses used for OOB (by iLo interfaces) are observed in 10.232.36.0/24 by MAAS

On the MAAS VM (interfaces correspond to physical host's interfaces below):

ubuntu@maas:/etc/maas$ ip -o -4 a s
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
2: eth0 inet 10.232.36.101/24 brd 10.232.36.255 scope global eth0\ valid_lft forever preferred_lft forever
3: eth1 inet 10.232.0.2/21 brd 10.232.7.255 scope global eth1\ valid_lft forever preferred_lft forever
4: eth2 inet 10.232.8.2/21 brd 10.232.15.255 scope global eth2\ valid_lft forever preferred_lft forever
5: virbr0 inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0\ valid_lft forever preferred_lft forever

ubuntu@skrzak:~$ brctl show
bridge name bridge id STP enabled interfaces
br0 8000.e4115bbffc88 no eth0
       vnet0
       vnet3
br1 8000.e4115bbffc8a no eth1
       vnet1
       vnet4
br2 8000.e4115bbffc8c no eth2
       vnet2
       vnet5

ubuntu@maas-host:~$ ip -o -4 a s
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
6: br0 inet 10.232.36.100/24 brd 10.232.36.255 scope global br0\ valid_lft forever preferred_lft forever
7: br1 inet 10.232.0.1/21 brd 10.232.7.255 scope global br1\ valid_lft forever preferred_lft forever
8: br2 inet 10.232.8.1/21 brd 10.232.15.255 scope global br2\ valid_lft forever preferred_lft forever

Switch-wise it looks like this (VLAN 15 being IS-managed):

interface Ethernet122/1/12
description maas-host:eth0
switchport access vlan 15

interface Ethernet122/1/13
description maas-host:eth1
switchport mode trunk
switchport trunk native vlan 2727
switchport trunk allowed vlan 2727-2731

interface Ethernet122/1/14
description maas-host:eth2
switchport mode trunk
switchport trunk native vlan 2727
switchport trunk allowed vlan 2727-2731
----------------

As soon as I configure the subnet 10.232.32.0/21, MAAS regiond generates a new named.conf.maas which contains duplicate records for "36.232.10.in-addr.arpa" zone which leads to bind9 service failure.

See get_details_for_ip_range, get_GENERATE_directives funcs in the MAAS sources.

Apr 15 12:38:22 maas named[6121]: loading configuration from '/etc/bind/named.conf'
Apr 15 12:38:22 maas named[6121]: /etc/bind/maas/named.conf.maas:164: zone '36.232.10.in-addr.arpa': already exists previous definition: /etc/bind/maas/named.conf.maas:16
Apr 15 12:38:22 maas named[6121]: reloading configuration failed: failure

/etc/bind/maas/named.conf.maas

zone "36.232.10.in-addr.arpa" {
    type master;
    file "/etc/bind/maas/zone.36.232.10.in-addr.arpa";
};
...
zone "36.232.10.in-addr.arpa" {
    type master;
    file "/etc/bind/maas/zone.36.232.10.in-addr.arpa";
};

As soon as I change the subnet in MAAS from 10.232.32.0/21 to, say, 10.232.48.0/21 regiond reloads the bind9 config again and there is no duplicate zone entry.

This is somewhat crazy to debug due to the fact that adding an overlapping subnet with a differently looking prefix leads to a duplicate zone record with "36" in it but I can reproduce this 100% of times with this setup.

From my point of view, having overlapping subnet setups should be acceptable - who knows how people are going to use their IP ranges.

I can provide more info but this setup should be easy to replicate with a VM and a couple of virtual interfaces.

----------------
MAAS package versions:
https://paste.ubuntu.com/24388138/

Related branches

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Changed in maas:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.2.0rc3
summary: - MAAS 2.2 generates duplicate zone records if overlapping subnets are
- used which leads to bind9 failures: '36.232.10.in-addr.arpa': already
- exists previous definition
+ [2.2] generates duplicate zone records if overlapping subnets are used
+ which leads to bind9 failures: '36.232.10.in-addr.arpa': already exists
+ previous definition
Changed in maas:
milestone: 2.2.0rc3 → 2.2.0rc4
Revision history for this message
Andres Rodriguez (andreserl) wrote : Re: [2.2] generates duplicate zone records if overlapping subnets are used which leads to bind9 failures: '36.232.10.in-addr.arpa': already exists previous definition

MAAS doesn't support overlapping subnets. In order for us to fix this we would need to explore what is possible but at the moment is not supported.

Changed in maas:
milestone: 2.2.0rc4 → 2.2.1
importance: High → Wishlist
milestone: 2.2.1 → 2.3.0
Changed in maas:
milestone: 2.3.0 → 2.3.x
Revision history for this message
Thiago Martins (martinx) wrote :

Hey guys,

 I'm facing this problem with MaaS 2.5 stable.

 How can I fix it without reinstalling it from scratch?

 So far, it wasn't my intention to overlap any network.

 Logs:

maas.log:
---
2019-01-15T11:15:59.338281-05:00 wc maas.dns: message repeated 19 times: [ [error] Reloading BIND failed (is it running?): Command `rndc -c /etc/bind/maas/rndc.conf.maas reload` returned non-zero exit status 1:#012rndc: connect failed: 127.0.0.1#954: connection refused]
2019-01-15T11:15:59.667179-05:00 wc maas.service_monitor: [info] Service 'bind9' is not on, it will be started.
2019-01-15T11:15:59.726226-05:00 wc maas.service_monitor: [error] Service 'bind9' failed to start. Its current state is 'dead' and 'Result: exit-code'.
2019-01-15T11:16:00.852280-05:00 wc maas.dns: [error] Reloading BIND failed (is it running?): Command `rndc -c /etc/bind/maas/rndc.conf.maas reload` returned non-zero exit status 1:#012rndc: connect failed: 127.0.0.1#954: connection refused
---

systemctl status bind9:
---
Jan 15 11:15:59 maas-1 named[3491]: loading configuration from '/etc/bind/named.conf'
Jan 15 11:15:59 maas-1 named[3491]: /etc/bind/maas/named.conf.maas:112: zone '56.84.10.in-addr.arpa': already exists previous definition: /etc/bind/maas/named.conf.maas:12
Jan 15 11:15:59 maas-1 named[3491]: loading configuration: failure
Jan 15 11:15:59 maas-1 named[3491]: exiting (due to fatal error)
Jan 15 11:15:59 maas-1 systemd[1]: bind9.service: Main process exited, code=exited, status=1/FAILURE
Jan 15 11:15:59 maas-1 systemd[1]: bind9.service: Failed with result 'exit-code'.
---

 What to do?

Best,
Thiago

Revision history for this message
Thiago Martins (martinx) wrote :

I had to reinstall. I manage to find the duplicated subnet but, this thing triggered many other problems, like "IP already in use" when it wasn't... Deployments started to fail... Geez...

PLEASE! Add the following error message:

"Subnet already in use"

Instead of allowing the uses to BREAK MaaS!

summary: - [2.2] generates duplicate zone records if overlapping subnets are used
- which leads to bind9 failures: '36.232.10.in-addr.arpa': already exists
+ generates duplicate zone records if overlapping subnets are used which
+ leads to bind9 failures: '36.232.10.in-addr.arpa': already exists
previous definition
Revision history for this message
Adham Sabry (atdhrhs) wrote :

is there any updates to this issue? I am also experiencing the same problem and I can't reinstall MaaS for this

Revision history for this message
Adham Sabry (atdhrhs) wrote :
Revision history for this message
Adham Sabry (atdhrhs) wrote :

can anyone pls help?

Revision history for this message
Adham Sabry (atdhrhs) wrote :

@dmitriis can you please post any update here?

Changed in maas:
milestone: 2.3.x → next
Revision history for this message
Xav Paice (xavpaice) wrote :

Just a me-too on 2.3.6.

 systemctl status bind9
● bind9.service - BIND Domain Name Server
   Loaded: loaded (/lib/systemd/system/bind9.service; enabled; vendor preset: enabled)
  Drop-In: /run/systemd/generator/bind9.service.d
           └─50-insserv.conf-$named.conf
   Active: failed (Result: exit-code) since Mon 2019-09-09 03:21:53 UTC; 6s ago
     Docs: man:named(8)
  Process: 44098 ExecStop=/usr/sbin/rndc stop (code=exited, status=1/FAILURE)
  Process: 44036 ExecStart=/usr/sbin/named -f $OPTIONS (code=exited, status=1/FAILURE)
 Main PID: 44036 (code=exited, status=1/FAILURE)

Sep 09 03:21:53 pnjostkinfr02 named[44036]: using up to 4096 sockets
Sep 09 03:21:53 pnjostkinfr02 named[44036]: loading configuration from '/etc/bind/named.conf'
Sep 09 03:21:53 pnjostkinfr02 named[44036]: /etc/bind/maas/named.conf.maas:84: zone 'abc.243.10.in-addr.arpa': already exists previous definition: /etc/bind/maas/named.conf.maas:20
Sep 09 03:21:53 pnjostkinfr02 named[44036]: /etc/bind/maas/named.conf.maas:88: zone 'abd.243.10.in-addr.arpa': already exists previous definition: /etc/bind/maas/named.conf.maas:24
Sep 09 03:21:53 pnjostkinfr02 named[44036]: loading configuration: failure
Sep 09 03:21:53 pnjostkinfr02 systemd[1]: bind9.service: Main process exited, code=exited, status=1/FAILURE
Sep 09 03:21:53 pnjostkinfr02 rndc[44098]: rndc: connect failed: 127.0.0.1#953: connection refused
Sep 09 03:21:53 pnjostkinfr02 systemd[1]: bind9.service: Control process exited, code=exited status=1
Sep 09 03:21:53 pnjostkinfr02 systemd[1]: bind9.service: Unit entered failed state.
Sep 09 03:21:53 pnjostkinfr02 systemd[1]: bind9.service: Failed with result 'exit-code'.

tags: added: canonical-bootstack
Revision history for this message
Xav Paice (xavpaice) wrote :

A note here:

named[24247]: /etc/bind/maas/named.conf.maas:84: zone '32.243.10.in-addr.arpa': already exists previous definition: /etc/bind/maas/named.conf.maas:20
named[24247]: /etc/bind/maas/named.conf.maas:88: zone '33.243.10.in-addr.arpa': already exists previous definition: /etc/bind/maas/named.conf.maas:24

We have subnets defined for 10.243.32.0/23 - there's no overlapping subnet or even one directly adjacent.

Changed in maas:
status: Triaged → New
Revision history for this message
Xav Paice (xavpaice) wrote :

I've subscribed field-medium, and reset the status to 'new' as this issue is one that we've seen on multiple production sites. I'd therefore like to get the priority re-evaluated since this is causing production downtime to DNS.

Revision history for this message
Alberto Donato (ack) wrote :

Could you please do a "subnets read" via the MAAS CLI when the issue happens to check what's the network model in maas?

Changed in maas:
status: New → Triaged
importance: Wishlist → Undecided
milestone: next → none
importance: Undecided → Medium
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
Changed in maas:
status: Expired → New
Changed in maas:
status: New → Triaged
importance: Medium → High
Revision history for this message
Pedro Vieira (pedrivo) wrote :

Hello,

We are also facing the exact same issue. Are there any updates on this?

Thanks

Revision history for this message
Pedro Vieira (pedrivo) wrote :

Hello,

We are still facing this issue. can someone please review?
Thanks

Alberto Donato (ack)
Changed in maas:
assignee: nobody → Christian Grabowski (cgrabowski)
milestone: none → 3.0.0
status: Triaged → In Progress
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
milestone: 3.0.0 → 3.0.0-beta4
Changed in maas:
status: Fix Committed → Fix Released
Revision history for this message
Boris Lukashev (rageltman) wrote :

This appears to still be happening on 3.3:
```

==> /var/snap/maas/common/log/maas.log <==
2023-04-30T22:46:36.927886+00:00 maashostname maas.dns: [error] Reloading BIND failed (is it running?): Command `rndc -c /var/snap/maas/27109/bind/rndc.conf.maas reload` returned non-zero exit status 1:#012rndc: connect failed: 127.0.0.1#954: connection refused
2023-04-30T22:46:41.605702+00:00 maashostname maas.rpc.rackcontrollers: [info] Existing rack controller 'maashostname' running version 3.3.2-13177-g.a73a6e2bd has connected to region 'maashostname'.

```

Somehow maas detected an 0.0/20 and an 0.0/22 overlapping and assigned three nodes to the /20 despite them being in the 0.x subnet at the base of both of those CIDRs.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.