[2.0] MaaS 2.0 BMC information not removed when nodes are removed
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
Critical
|
Jeffrey C Jones |
Bug Description
We ran into a couple of different failures that led to something interesting bugs:
1. We had provision the original PXE boot network as a /26 (10.189.69.0/26), with Juju and LXC containers for OpenStack we needed to grow this network to a /25
2. Our BMC network was in the second /26 (10.189.69.64/26) next to the first that we wanted to grow
3. We changed the DHCP for the second /26 and moved it up one /26 (to 10.189.69.128/26) (DHCP outside of MaaS)
4. We removed all machines from MaaS by deleting them, and updated the subnet in MaaS to 10.189.69.0/25
5. We one by one powered each machine up, and commissioned it in MaaS
Two weeks later:
6. We mark a machine that won't deploy as broken, it had IP 10.189.69.58
7. We attempt to re-deploy all machines (18), and notice that as soon as we have deployed 9 machines, the 10th one won't deploy due to failure to allocate an IP address. This was mistakingly attributed to assigning multiple auto assigned networks in https:/
8. We assumed it had to do with the trying to set more subnet's on the various bond interfaces, but further inquiry found:
In the database in table maasserver_
Attempting to delete it, there was a foreign key constraint on maasserver_bmc. It is at this point that I noticed that maasserver_bmc had a whole range of entries that were not referenced at all in maasserver_node.
After running the following:
DELETE FROM maasserver_
DELETE FROM maasserver_bmc WHERE id NOT IN (SELECT bmc_id FROM maasserver_node WHERE bmc_id IS NOT NULL);
DELETE FROM maasserver_
(Yes, sub-queries, I'm sure there is a better way to do it... :-P)
We were able to deploy the rest of the nodes. We then released all nodes not marked broken and noticed that a single IP address (10.189.69.58) was still in use. Once we marked the node as fixed, deployed and then immediately released it, that IP was released, now MaaS started numbering machines from the beginning of the range (10.189.69.7 was next available) rather than at 10.189.69.59 (next available after the broken one).
It seems that BMC information is not properly removed when a node is deleted from MaaS which when you change network ranges and have possible overlap becomes an issue.
Related branches
- Blake Rouse (community): Approve
- Jeffrey C Jones (community): Approve
- Mike Pontillo (community): Approve
-
Diff: 118 lines (+72/-6)4 files modifiedsrc/maasserver/migrations/builtin/maasserver/0063_remove_orphaned_bmcs_and_ips.py (+21/-0)
src/maasserver/models/bmc.py (+8/-5)
src/maasserver/models/node.py (+9/-1)
src/maasserver/models/tests/test_node.py (+34/-0)
tags: | added: sts |
Changed in maas: | |
importance: | Undecided → Critical |
milestone: | none → 2.0.0 |
summary: |
- MaaS 2.0 BMC information not removed when nodes are removed + [2.0b5] MaaS 2.0 BMC information not removed when nodes are removed |
summary: |
- [2.0b5] MaaS 2.0 BMC information not removed when nodes are removed + [2.0] MaaS 2.0 BMC information not removed when nodes are removed |
Changed in maas: | |
assignee: | nobody → Jeffrey C Jones (trapnine) |
status: | New → In Progress |
Changed in maas: | |
status: | In Progress → Fix Committed |
Changed in maas: | |
status: | Fix Committed → Fix Released |