Cannot re-use static IP address

Bug #1630034 reported by Chris Martin
42
This bug affects 7 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Mike Pontillo

Bug Description

I have an IP address, 10.192.154.118, which I need to assign to a node. I think (not 100% sure) that it was previously manually assigned to a node which was deleted earlier today. I know it is part of a reserved range, i.e. MAAS would not DHCP-assign or statically auto-assign a conflicting address.

It appears that MAAS did not forget/delete the earlier assignment. When I try to statically assign it to a node's interface, I see "IP address is already in use", and the form field reverts to the default static auto-assigned address.

When I browse to http://10.192.154.19/MAAS/#/subnet/1, I see the address listed in the "Used" section: "Static" Type, Node column is blank, "Unknown" interface, "Unknown" usage, "MAAS" owner, last seen at 11:30:36 today server time.

How do I get MAAS to release/forget about this IP address so I can re-use it? I do not see any option in the web UI or the CLI.

Thank you,

Chris

(Attached: contents of /var/log/maas/* as maas-logs.tgz)

root@nika:/var/log/maas# dpkg -l '*maas*' | cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===============================-==============================-============-=================================================
ii maas 2.0.0+bzr5189-0ubuntu1~16.04.1 all "Metal as a Service" is a physical cloud and IPAM
ii maas-cli 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS client and command-line interface
un maas-cluster-controller <none> <none> (no description available)
ii maas-common 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS server common files
ii maas-dhcp 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS DHCP server
ii maas-dns 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS DNS server
ii maas-proxy 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS Caching Proxy
ii maas-rack-controller 2.0.0+bzr5189-0ubuntu1~16.04.1 all Rack Controller for MAAS
ii maas-region-api 2.0.0+bzr5189-0ubuntu1~16.04.1 all Region controller API service for MAAS
ii maas-region-controller 2.0.0+bzr5189-0ubuntu1~16.04.1 all Region Controller for MAAS
un maas-region-controller-min <none> <none> (no description available)
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
un python-maas-provisioningserver <none> <none> (no description available)
ii python3-django-maas 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS server provisioning libraries (Python 3)

Tags: ui maas-cli
Revision history for this message
Chris Martin (6-chris-z) wrote :
Revision history for this message
Chris Martin (6-chris-z) wrote :
Revision history for this message
Chris Martin (6-chris-z) wrote :
Revision history for this message
Chris Martin (6-chris-z) wrote :

Did some digging in the database.

maasdb=# select * from maasserver_staticipaddress where ip = '10.192.154.118';
   id | created | updated | ip | alloc_type | subnet_id | user_id | lease_time
--------+-------------------------------+-------------------------------+----------------+------------+-----------+---------+------------
 346035 | 2016-10-03 11:30:36.706452-07 | 2016-10-03 11:30:36.706452-07 | 10.192.154.118 | 1 | 1 | | 0

So, it's not associated with any user and its lease time is 0 (forever?).

maasdb=# select * from maasserver_interface_ip_addresses where staticipaddress_id = 346035;
 id | interface_id | staticipaddress_id
----+--------------+--------------------
(0 rows)

No interfaces are associated with the IP address, at least in this table.

I'm inclined to just delete that row in maasserver_staticipaddress and try again. Is this likely to cause side effects? I'm certain that no node or device is currently using that IP address.

Revision history for this message
Chris Martin (6-chris-z) wrote :

I backed up the database and deleted the row in the comment above. This appears to have solved the problem. I was able to re-use the IP address and deploy the node. It's up and I can SSH to it. Everything in the web UI looks as you'd expect.

So, you can consider this bug "worked around" (resolved) for me. If it comes up again, I suggest exploring ways that a static IP address can end up in this state (e.g. assigning a static IP address to a node, then deleting the node before deploying it).

Thanks anyhow!

Revision history for this message
Mike Pontillo (mpontillo) wrote :

Chris, thanks for the bug report. Had I seen this bug earlier I would have probably given you the go-ahead to delete that row. (since you seem to know what you're doing!) ;-)

What you have there is an IP address with an alloc_type of 1, which indicates to MAAS that this is a STICKY IP address. In the documentation (at least, the documentation inline in the code) we state that such an address is a "User-specified static IP address for a node or device. Permanent until removed by the user, or the node or device is deleted."

I'm not certain that comment is 100% correct, as it might have been expected that we also release that IP address when the node is released from a deployment. (I'd have to double check.)

There is a supported way to create such an IP address (_almost_; the address has slightly different properties), and that is to use the "ipaddresses reserve" API (accessible with the 'maas' CLI command). There is a corresponding "ipaddresses release" API, which is intended to remove such an address. However, in your case, this would not have worked, since we only allow addresses of type USER_RESERVED -- type 4 (not STICKY -- type 1, as you saw) to be deleted using this API.

The bottom line is: yes, your theory is correct: somehow that IP address was most likely not removed when it was supposed to be.

I'll take a look to see if I can figure out how that might have happened, and/or see if we can improve the user experience here. Thanks for your feedback!

Revision history for this message
Mike Pontillo (mpontillo) wrote :

I tried a number of paths (like deleting the interface and node related to a statically assigned IP address) and have not been able to reproduce the conditions that caused this, unfortunately.

I'll dig a little more before I give up, but I'm going to mark this Incomplete for now since we don't know how the address was "orphaned".

I think we could also consider adding a way to force removal of IP addresses in cases like this. I'm not positive what that would look like yet.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

I forgot to ask, Chris: can you remember any more details about how the deletion of this machine happened? For example, was it deleted using the Web UI or the API? Was it ever re-added to MAAS after deletion? Were any of the machine's components (especially the NIC) re-used on another MAAS managed machine?

Revision history for this message
Mike Pontillo (mpontillo) wrote :

Another question. Looking at your logs, I noticed you have repeated tracebacks during our lease notification/update process; it seems (every time?) when we see a new lease, we attempt to insert it into the database, but it is rejected as a duplicate.

Can you tell me about, for example, 10.192.155.145? Is that assigned to something in MAAS? From what I can tell, it should be in a dynamic range, so MAAS should be free to assign it to whatever it hears back from the DHCP server that it got assigned to.

Are you running HA rack controllers? (maybe the lease is coming in from more than one rack?)

Revision history for this message
Mike Pontillo (mpontillo) wrote :

One last question: is it possible that the IP address used to be assigned to a BMC? (that is, used as the IP address of a power controller, such as an IPMI address?

Revision history for this message
Chris Martin (6-chris-z) wrote :

Mike, thanks for your detailed responses!

Yes, unfortunately I'm not really sure how we got in the state with the stuck sticky IP address. I was troubleshooting something else when it happened. My best memory is that this node was deleted from MAAS in the web UI and then re-added, probably re-commissioned a few times, etc. I do know that all actions leading up to the broken state were performed 100% in the web UI. I only started poking at the API after I was unable to assign the IP via the UI.

No NICs or other hardware were ever shared between machines.

10.192.155.145 is a Dell iDRAC device (BMC). All of the nodes' BMCs are DHCP-assigned by MAAS.

Our 10.192.154.0/23 subnet is current broken up like this:
10.192.154.1 to 10.192.154.127: reserved for statically assigned devices (e.g. the nodes that we're provisioning)
10.192.154.128 to 10.192.155.126: not used for anything(?)
10.192.155.127 to 10.192.155.254: MAAS "Dynamic range", IP addresses assigned by MAAS DHCP

We haven't observed any other DHCP-related issues lately. Maybe you're seeing duplicate errors for lease renewal because dhcpd tends to re-assign the same IP address to the same device, and there's already a database entry for the same lease, or something.

We are not running HA rack controllers. It's unlikely (though I suppose possible) that 10.192.154.118 was previously assigned to a BMC at some point in the past.

No worries if you don't make more progress on this, as I can't tell you how to duplicate it! If it comes up again I'll be sure to keep better notes.

Chris

Revision history for this message
Mike Pontillo (mpontillo) wrote :

It's possible that an unlikely race condition triggered the "orphaned" IP address, but it's difficult to say. I think the true fix involves taking a close look at what happens when IP addresses that may have once been assigned to a BMC are re-used as machine IP addresses.

Meanwhile, I'm taking the opportunity to add an "official" workaround to this problem as part of a fix for a related issue, bug #1629061. That is, if this happens again, in MAAS 2.1 you should be able to release the IP address using the command-line or API, without modifying the database directly.

Please leave another comment on this bug (or open a new one) if you determine a way to reproduce the issue. Thanks again!

Changed in maas:
milestone: 2.1.0 → 2.1.1
Changed in maas:
milestone: 2.1.1 → 2.1.2
Revision history for this message
Mike Pontillo (mpontillo) wrote :

FYI, in case anyone runs into this in the future and wants to check if the IP address belongs to a BMC, you can run the following query (after running `sudo maas-region dbshell`):

SELECT s.ip, b.power_type, n.hostname
    FROM maasserver_staticipaddress s
    JOIN maasserver_bmc b ON s.id = b.ip_address_id
    LEFT OUTER JOIN maasserver_node n on n.bmc_id = b.id
    ORDER BY s.ip, n.hostname;

Changed in maas:
milestone: 2.1.2 → 2.1.3
Revision history for this message
Jason Saslow (jsaslow) wrote :

I ran into an issue with maas 2.1.3 and found this bug report which appears to be related to (if not exactly) my issue. In my case, I am building using the following procedure:
1. PXE boot, commission and deploy two physical servers in maas
2. Configure one physical server as a rack controller attached to a specific vlan with an assigned dhcp subnet
3. Configure the other physical server as a kvm host with a bridged interface
4. Define kvm guests
5. Setup virsh power parameters for kvm guests (via add-chassis on the kvm host)
6. Commission and deploy the kvm guests in maas exactly as I did the two physical servers mentioned in step #1

For the build on which I am working, I run through this process and then reset everything for a new run (after additional feature development). My reset process is as follows:
1. Disable dhcp on the subnet assigned to the rack controller
2. Delete the rack controller as a rack controller
3. Go to nodes and select all systems in the test (rack controller, kvm host and all kvm guests) and delete them all at the same time
4. Bob’s your uncle

This will cause the IP address used by the kvm host (which also was the IP of the bridged interface for the kvm guests) to become a STICKY IP in maas. It is unlikely that the rack controller portion of my build test is related, because the resulting sticky IP is not within the same subnet that is assigned to the controller; but I have not confirmed by explicitly testing without the rack controller.

As an aside, I attempted to use the work around mentioned in bug #1629061, which is to 'release force=true'; and while the dialogue indicates the command was successful, the condition is not reset/released. In the meantime, I figured out a slightly different temporary ("gentler" direct table edit) fix than suggested by the OP because I’m not a fan of deleting items from tables in relationship databases when I don’t know system inside/out (or the ramifications of such actions):

sudo maas-region dbshell
select ip_address_id from maasserver_bmc where power_type = 'virsh' and power_parameters like '%192.168.1.67%';

ip_address_id
---------------
27185

update maasserver_bmc set power_type = '' where ip_address_id = 27185;
update maasserver_staticipaddress set alloc_type = 6 where ip = '192.168.1.67';
\q

It is unclear if the OP was dealing with virsh systems as well, so this workaround may not work for everyone who has this issue, but it should for those who have a sticky IP resulting from deleted virsh powered guests.

I did a lot more testing and troubleshooting than what is represented here. So please don’t hesitate to contact me should you have any questions.

Revision history for this message
Alan McAlexander (five0va) wrote :

Running into this same bug - trying to delete via maasdb:

delete from maasserver_staticipaddress where ip = '10.62.16.131';
ERROR: update or delete on table "maasserver_staticipaddress" violates foreign key constraint "D366aaa1e050380f8f9eeec06a05e6e6" on table "maasserver_interface_ip_addresses"
DETAIL: Key (id)=(17843) is still referenced from table "maasserver_interface_ip_addresses".

I've deleted the Key from maasserver_interface_ip_addresses, but the error remains.

This is MaaS 2.2.0 rc3, but I was also having this issue in rc2. I had deleted the 10.62.16.131 server via the web ui when we were first standing up the environment, but I'm unable to add this static ip back. I'm also unable to remove the IP from the maas cli.

Revision history for this message
Blake Rouse (blake-rouse) wrote :

We believe this is fixed in 2.2.0rc4 which isn't released yet. We will get it out soon for your to tests. For now targeting the fix to 2.2.1, if this still remains an issue.

no longer affects: maas/2.0
no longer affects: maas/trunk
Changed in maas:
milestone: 2.2.0rc4 → 2.2.1
Revision history for this message
Данило Шеган (danilo) wrote :

Can you confirm if 2.2.0 fixes this issue for you?

Changed in maas:
status: Incomplete → Confirmed
status: Confirmed → Incomplete
Changed in maas:
milestone: 2.2.1 → 2.2.x
Revision history for this message
Chris Martin (6-chris-z) wrote :

I won't have the opportunity to test in the next few days, so I'll let others on the thread answer. Thanks, danilo, Blake, et al.!

Changed in maas:
status: Incomplete → Triaged
status: Triaged → Incomplete
Revision history for this message
Andres Rodriguez (andreserl) wrote :

We believe this issue has now been fixed in the latest MAAS releases (2.2+). If this is still an issue for you after upgrading to the latest release, please re-open the bug report.

Changed in maas:
status: Incomplete → Fix Released
milestone: 2.2.x → none
Revision history for this message
Thiago Martins (martinx) wrote :

This is not fixed!

I'm facing this problem with MaaS 2.5 stable.

:-/

Revision history for this message
Andreas Oberritter (mtdcr) wrote :

This still happens, for example, when editing network interfaces of a machine containing virsh-pods.

I had an IP address assigned to bond0 and wanted to put bond0 into a newly created bridge br0, at the same time moving the assigned address from bond0 to br0. It wasn't possible, because the same address was used by virsh.

Revision history for this message
Adham Sabry (atdhrhs) wrote :

i'm facing similar issue

Revision history for this message
Georgios Konitopoulos (georgios1982) wrote :

We are facing the same issue.We have released an IP Address from a host that is not currently used and we are trying to assign the same IP to another host but there is an error which does not allow to use the same IP.

 IP address is already in use despite the fact

The previous host has been released using Release button on MAAS
This IP does not exist nowhere to the MAAS server and also not on the network as well.

Revision history for this message
David Andruczyk (dandruczyk) wrote :

I have hit this same issue on maas 2.9.2 from snap (current stable). I was able to get around it by manually deleting the DB entries as described in this ticket.

Revision history for this message
Ginder Innokentiy (iaginde1) wrote :

MaaS 3.1 snap
Add VMWare VM -> Deploy -> Release -> Delete -> ip-address not reuse

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.