No warning when the dynamic IP range is close to exhaustion

Bug #1393944 reported by Christian Reis
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Wishlist
Mike Pontillo

Bug Description

This is not deja vu; you just looked at bug 1393936 first.

When a cluster's dynamic IP range is running out of addresses (say, 20 or 10% free) we should warn the user/admin that the cluster is about to run out.

This could be checked at lease parsing time, and a cluster-specific warning be raised.

In terms of providing feedback:

  - A UI component warning might be advisable, since the failure mode is pretty bad (enlistment, commissioning, containers and VMs fail) and it is largely driven by external action
  - A high-level warning on the cluster listing page
  - A warning on the cluster interface page
  - We should find a way to log without spamming the logs every time the log parser runs

summary: - Warn when the dynamic IP range is close to exhaustion
+ No warning when the dynamic IP range is close to exhaustion
Changed in maas:
status: New → Triaged
importance: Undecided → High
milestone: none → next
Revision history for this message
John George (jog) wrote :

What is the plan and time line to address this bug?

Some type of warning in the MAAS logs or UI would be nice but the impact to uses when the dynamic pool is exhausted is quite high. As mentioned in the description, containers fail. One example of this is deployment of Openstack, using LDS Autopilot. The user starts a cloud deployment from their browser and ends up with an Autopilot status that waits forever for LXC machines to start. If MAAS could raise a message in a way that Autopilot could detect there is an exhaustion issue or query MAAS up front for a dangerously low number of available IPs, then the user experience could be improved.

tags: added: cdo-qa-blocker
Revision history for this message
Andres Rodriguez (andreserl) wrote :

@Jon,

Notifications is part of MAAS 2.0. That said, MAAS in 1.9 has no real control of the dynamic range as it is always DHCP. IN 2.0, MAAS cleans up dynamic leases every 7 mins if no machine owns the lease anymore, and notifies MAAS about it not being used.

If you are using juju2.0 and MAAS 2.0, it would be interesting to see why you are getting depletion of DHCP addresses. It may be that Juju is not registering trhese containers as devices and/or not requesting IP addresses from MAAS for each container.

Changed in maas:
milestone: next → 2.1.0
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Err, MAAS notifications is part of 16.10 cycle (MAAS 2.1)

Changed in maas:
importance: High → Wishlist
Revision history for this message
Dean Henrichsmeyer (dean) wrote :

Let's bump this up. This is important and Wishlist will fall off the radar.

Revision history for this message
Dean Henrichsmeyer (dean) wrote :

On second thought, this shouldn't really be a problem in 2.0 so Wishlist is fine.

Changed in maas:
milestone: 2.0.1 → 2.1.0
tags: added: notifications
Changed in maas:
milestone: 2.1.0 → 2.1.1
Changed in maas:
milestone: 2.1.1 → 2.1.2
Changed in maas:
milestone: 2.1.2 → 2.1.3
Changed in maas:
milestone: 2.1.3 → 2.2.0
tags: removed: cdo-qa-blocker
Changed in maas:
assignee: nobody → Mike Pontillo (mpontillo)
milestone: 2.2.0 → 2.2.0rc2
Revision history for this message
Mike Pontillo (mpontillo) wrote :

This is difficult because we don't always have accurate data. For example, during commissioning we take any observed leases (IPs of type DISCOVERED) and *delete* them from the database, replacing them with AUTO IPs for the same subnet.

So if you have a dynamic range with 10 IPs, and you commission 10 machines (with "power on and SSH enabled", for the sake of argument), you would have zero IPs free, but MAAS would see 0% usage.

The fix, I think, should be a design change: separate state from configuration. A DHCP lease is not a StaticIPAddress and doesn't belong in that table. Leases should go in their own table.

"But wait," you might say, "that's how it was in MAAS 1.8, but we got rid of that in MAAS 1.9!" [1]

Well, back then, the DHCP lease table was directly linked to a NodeGroup (aka maas-clusterd). That didn't work out for HA, so that had to be removed.

But the only improvement the new way brought was the consistent modeling of an IP address, such as being linked to a known subnet, etc. That way we can determine subnet/VLAN linkage and replace the DISCOVERED addresses with AUTO IPs for the same subnet. But there is no reason that has to be done within the confines of the StaticIPAddress model. And the new way also brought disadvantages: now, whenever anyone is working with a StaticIPAddress, they most often need to remember to filter out DISCOVERED addresses, which can be a source of bugs.

Conclusion: I think this should be fixed via a design change. Resurrect the maasserver_dhcplease table and populate it with whatever leases each cluster reports. Use that data in the same way we use DISCOVERED IP addresses today. Then we get the following benefits:

 * No mixture of state and configuration. (What we do today in the database is like placing your /var/lib/maas/dhcp.leases file in /etc/maas.)
 * More accurate lease data, since we don't have to delete it post-commissioning to correctly configure each node.
 * Less risk of unique key violations on the StaticIPAddress table (this has been a problem in the past).
 * The ability to correlate leased IPs with commissioned nodes *even after* commissioning completes.
   - This also means that we can include DHCP leases in the discovery dashboard, with better information, even in the case where DHCP relay is enabled or passive network discovery is disabled, and thus no ARPs are being observed.

[1]:
https://github.com/maas/maas/blob/1.8/src/maasserver/models/dhcplease.py

Changed in maas:
milestone: 2.2.0rc2 → 2.3.0
Changed in maas:
milestone: 2.3.0 → 2.3.x
Revision history for this message
Adam Collard (adam-collard) wrote :

This bug has not seen any activity in the last 6 months, so it is being automatically closed.

If you are still experiencing this issue, please feel free to re-open.

MAAS Team

Changed in maas:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.