[3.1] Adding overlapping subnets in fabric breaks deployments

Bug #1964644 reported by Alan Baghumian
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Offline/RAD documentation
Fix Released
Bill Wear

Bug Description

Here is how to reproduce the issue:

1) MAAS 3.1 Snap with an existing subnet configured + DHCP Provided by MAAS. Currently installed version on all region and rack controllers is 3.1.0-10901-g.f1f8f1505 (channel latest/stable Jan 19, 2022).

2) Performed a test commissioning and deployment prior to the experiment. Everything worked.

3) Added a new subnet via WebUI to fabric-0 which already includes an existing overlapping subnet MAAS did not stop me from adding the overlapping network.

4) Tested deploying an already commissioned machine:

     - Edited the network interface and put it under the as well as subnets.
     - Tried DHCP and static IP addresses.
     - Tried Focal (20.04) and Groovy (20.10)
     - In all scenarios the machine performed PXE boot then went into a boot loop causing the deployments to fail.

5) Removed the overlapping subnet and re-tested deployments, they still failed.

6) Rebooted all region (2) and rack (2) controllers.

7) Tested deployments again and they started working again.

Suggested Solution: Do not allow user to add overlapping subnets. This should be possible by implementing some sort of validation upon creating subnets.

Revision history for this message
Bill Wear (billwear) wrote :

Triaging because I have already seen this weirdness. Subnets aren't intended to overlap. The IP range of one subnet should be unique compared to every other subnet on the same segment. This is mainly because routers can't reliably determine which subnet should get a packet destined for one of the overlapping addresses. That might be what's gumming up the rack controller in this instance, dunno.

That said, I'm not sure if MAAS should prevent you from doing it, that is, I'm not sure if it's a doc/troubleshooting bug or a code bug. Either way, we should talk about it.

Changed in maas:
status: New → Triaged
Revision history for this message
Bill Wear (billwear) wrote :

After discussion with field, this bug needs clarification as to how the /22 and /24 subnets were overlapped. Need more info to understand how to classify this problem..

Changed in maas:
status: Triaged → Incomplete
Revision history for this message
Alan Baghumian (alanbach) wrote :

As discussed on the channel today, I deployed a brand new machine with MAAS 3.1 Snap (3.1.0-10901-g.f1f8f1505), PostgreSQL 12 outside of snap and initialized it as region+rack controller. The configured subnet was (IP:

The test involved two scenarios:

Scenario 1)

- Used a blank installed 20.04 LTS + PostgreSQL 12 + MAAS 3.1 Snap (3.1.0-10901-g.f1f8f1505)
- Went through MAAS initial setup screen.
- Changed network discovery interval to 10 minutes.
- Edited machines' netplan configuration, switching subnet from /24 to /22
- Rebooted the machine, a new subnet was added under subnets/fabric-0 next to the existing (See logs package for screenshots)

Scenario 2)

- Used a blank installed 20.04 LTS + PostgreSQL 12 + MAAS 3.1 Snap (3.1.0-10901-g.f1f8f1505)
- Went through MAAS initial setup screen.
- Changed network discovery interval to 10 minutes.
- From MAAS WebUI, Subnets tab, Changed subnet to
- No new subnets were added besides

The logs package includes:

- Logs for scenario 1 and 2 captured from /var/snap/maas/common/log/
- Screenshots from scenario 1

This process was repeated twice with the exact same results.

Hope this helps to shed a bit of light on the issue.

Revision history for this message
Bill Wear (billwear) wrote :

Triaging this now. Not clear to me, personally, whether MAAS should do extensive sanity checks on network inputs, as this might restrict user freedom to handle edge cases, but (1) clearly there is a path here to introduce a non-working configuration, whether that's by design or by mistake, (2) the MAAS PM Anton Smith has stated a (weight-bearing) preference that the MAAS networking model not tolerate cross-wiring like this without at least a warning, and (3) MAAS developer Alexsander Sousa has indicated, wisely, that MAAS should recover from these kind of connection errors without resorting to controller restarts.

Setting importance to Medium, and adding a doc track at importance level High, so this can be added to troubleshooting information, at least.

Changed in maas:
status: Incomplete → Triaged
importance: Undecided → Medium
Changed in maas-offline-docs:
importance: Undecided → High
status: New → Triaged
assignee: nobody → Bill Wear (billwear)
Bill Wear (billwear)
Changed in maas-offline-docs:
status: Triaged → Fix Committed
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.