MAAS retains child devices' IP addresses when a parent node is released

Bug #1527068 reported by Curtis Hovey on 2015-12-17
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Critical
Mike Pontillo
1.8
Critical
Unassigned
juju-core
High
Unassigned
1.25
Critical
Unassigned

Bug Description

LXC container IPs are not released when Juju QA tests Juju 1.25.2 and the MAAS version is 1.8.3; even when the --force option is passed to destroy-environment. This leads to IP exaustion in the Juju-QA MAAS 1.8 test setup and all future test jobs fail because machines and LXC containers cannot be provisioned. The issue is not seen with MAAS 1.7 or MAAS 1.9, which are tested with the same scripts.

These symptoms started occuring around the time Juju 1.25.1 was changed to use MAAS devices for container IPs. The Juju-QA test environment was also updated from MAAS 1.8.2 to 1.8.3.

With the 1.25.0 released version of Juju, containers are allocated an IP from the dynamic range, that's been defined in the MAAS cluster config. This is true for both MAAS version 1.8.2 and 1.8.3. The dynamic range IPs are listed in /var/lib/maas/dhcp/dhcpd.leases with a lease expire time defined, similar to the following:

lease 10.0.20.17 {
  starts 3 2016/01/06 20:26:06;
  ends 4 2016/01/07 08:26:06;
  cltt 3 2016/01/06 20:26:06;
  binding state active;
  next binding state free;
  rewind binding state free;
  hardware ethernet 00:16:3e:c2:18:f5;
  client-hostname "juju-machine-2-lxc-1";
}

Development buids of Juju 1.25.2 tested against MAAS 1.8 are allocated an IP from the static range, that's been defined in the MAAS cluster config. Entries in /var/lib/maas/dhcp/dhcpd.leases for these IPs do not include a lease expire time, similar to the following:

host 10.0.80.117 {
  dynamic;
  hardware ethernet 00:16:3e:3a:1e:e1;
  fixed-address 10.0.80.117;
}

During testing, an environment is deployed using a bundle, that installs two services into LXC containers. The container IPs can be seen in the Juju status output. Once the environment is destroyed the /var/lib/maas/dhcp/dhcpd.leases file on the MAAS server will include entries for the LXC container IPs, that do not have a correcponding 'deleted' entry. Non-LXC IPs do have corresponding 'deleted' entries, similar to the following:

host 10.0.20.111 {
  dynamic;
  deleted;
}

Dimiter Naydenov (dimitern) wrote :

Some logs will be needed to see what's going on..

Curtis Hovey (sinzui) on 2015-12-18
Changed in juju-core:
status: Triaged → Incomplete
Curtis Hovey (sinzui) on 2015-12-18
Changed in juju-core:
milestone: 1.26-alpha3 → 2.0-alpha1
John George (jog) wrote :

I observe that as deployer calls Juju to add services, some of the machines that are requested of MAAS by Juju move into the 'allocated' state, rather than 'deploying' and later 'deployed' states. These machines never move out of 'allocated' and deployer eventually times out. The final status we get back from Juju has the error message 'Unable to allocate static IP due to address exhaustion.' Upon inspection of the /var/lib/maas/dhcp/dhcpd.leases file from the MAAS server, it does not appear that all IPs are actually used. The MAAS 1.8 Juju-QA test server has a static IP range configured from 10.0.80.102 to 10.0.80.250. One example of a failure, with logs, can be seen here:
http://reports.vapour.ws/releases/3464/job/maas-1_8-OS-deployer/attempt/718

Note, the active leases have been printed near the top of console log.

Curtis Hovey (sinzui) on 2016-01-04
description: updated
Curtis Hovey (sinzui) on 2016-01-04
Changed in juju-core:
importance: Critical → High
John George (jog) wrote :

Deployer on MAAS 1.8 with build-revision 3430 (http://reports.vapour.ws/releases/3430)
Addresses allocated to real machines are released but those allocated to LXCs are not. This can be seen by comparing the juju status output with the dhcpd.leases. 10.0.80.117 and 10.0.80.118 were assigned to 2/lxc/0 and 2/lxc/1 but there is no host entry in dhcpd.leases with 'deleted;'.

John George (jog) wrote :

Deployer on MAAS 1.8 with released 1.25.0.
Addresses allocated to real machines are released but those allocated to LXCs are not.
    2/lxc/0 was assigned 10.0.80.99
    2/lxc/1 was assigned 10.0.80.46

The following diff shows that these lxc addresses were not released:

diff dhcpd.leases_stable_1.25.0_before_destroy dhcpd.leases_stable_1.25.0_after_destroy
1650a1651,1672
> host 10.0.80.124 {
> dynamic;
> hardware ethernet 52:54:00:ac:5c:ad;
> fixed-address 10.0.80.124;
> }
> host 10.0.80.201 {
> dynamic;
> hardware ethernet 52:54:00:4e:33:62;
> fixed-address 10.0.80.201;
> }
> host 10.0.80.106 {
> dynamic;
> deleted;
> }
> host 10.0.80.119 {
> dynamic;
> deleted;
> }
> host 10.0.80.104 {
> dynamic;
> deleted;
> }

Dimiter Naydenov (dimitern) wrote :

Juju does the same thing with MAAS 1.8 and 1.9 for containers - creates a device with a known (juju-generated) MAC address and claims a sticky IP for it, linking the device to the container's host node. Cleaning up the leases for the containers is then up to MAAS - when the device representing the container is removed, along with its parent.

So if that does happen on 1.9, but not on 1.8 then it's obviously a MAAS 1.8 issue, fixed in 1.9.

Cheryl Jennings (cherylj) wrote :

I spoke with mpontillo a bit today about this bug and confirmed that with MAAS 1.8, IPs for devices are not cleaned up when the device is removed.

However, this doesn't help us to release 1.25.2, as we cannot release something that will break people running MAAS 1.8. This problem was most likely introduced in Juju in the fix for bug #1483879 (included in 1.25.1). I was able to recreate the problem of leaked static IPs with 1.25.1. I'm not sure why IP address exhaustion only recently became an issue with QA.

We can't really consider this behavior as the same behavior that bug #1483879 was trying to fix, as the IP addresses left behind before the fix would eventually have their leases expire and would be reclaimed. This isn't the case now, as the static IPs will never expire and never be reclaimed.

Cheryl Jennings (cherylj) wrote :

The last section of this comment implies that this may only be an issue with MAAS 1.8.3:
https://bugs.launchpad.net/maas/+bug/1519527/comments/44

Which would explain why the juju QA team only recently started seeing this, as this started happening when they upgraded their MAAS from 1.8.2 to 1.8.3.

summary: - maas 1.8 static ips not released
+ maas 1.8 static ips not released upon device removal

Just to confirm, I (with Cheryl's confirmation) see the same issue on 1.8.2.

Mike Pontillo (mpontillo) wrote :

This issue was fixed (unintentionally) in MAAS 1.9, but I confirmed that it still exists on MAAS 1.8.3.

Changed in maas:
status: New → Triaged
summary: - maas 1.8 static ips not released upon device removal
+ When a parent node is released, MAAS retains its child devices' IP
+ addresses
summary: - When a parent node is released, MAAS retains its child devices' IP
- addresses
+ MAAS retains child devices' IP addresses when a parent node is released
Changed in maas:
importance: Undecided → Critical
milestone: none → 1.9.0
assignee: nobody → Mike Pontillo (mpontillo)
status: Triaged → Fix Released
no longer affects: maas/1.9
Mike Pontillo (mpontillo) wrote :

For MAAS 1.8 users, we have a workaround: delete any static IP addresses not associated with a MAC address. A script to do so is available here:

http://pastebin.ubuntu.com/14423324/

Similarly, for MAAS 1.9 users who have migrated from a MAAS 1.8 database containing leaked IP addresses, running "sudo maas-region-admin dbshell" and then executing the SQL found here[1] fixes it:

delete from maasserver_staticipaddress ip
    where ip.id in (
        select ip.id from maasserver_staticipaddress ip
            left outer join maasserver_interface_ip_addresses iip
                on ip.id = iip.staticipaddress_id
                where iip.id is null and ip.alloc_type=1
    );

[1]: https://bugs.launchpad.net/maas/+bug/1519527/comments/46

John George (jog) on 2016-01-06
description: updated
John George (jog) wrote :

Evidence that Juju 1.25.2 uses dynamic IP allocation with MAAS 1.8.2.

John George (jog) on 2016-01-06
description: updated
John George (jog) wrote :

Please ignore comment #11, Juju 1.25.2 with MAAS 1.8.2 and 1.8.3 both get IPs from the static range, which are never released.

description: updated
Mike Pontillo (mpontillo) wrote :

Just for the record, in MAAS 1.9 we have code in interface.py (the node network interface model object) which handles a pre-delete signal for network interfaces, and uses it to delete_related_ip_addresses(). There is no such code in MAAS 1.8 for MAC addresses, which is why this is a problem on MAAS 1.8 but not 1.9.

Cheryl Jennings (cherylj) wrote :

Changed to invalid for juju-core master as this is caused by a MAAS bug that's exposed with the code added to fix bug #1483879.

For 1.25, we are backing out the changes for that bug, so using this bug to track the backing out: https://github.com/juju/juju/pull/4055

Changed in juju-core:
status: Incomplete → Invalid
Curtis Hovey (sinzui) on 2016-01-20
Changed in juju-core:
milestone: 2.0-alpha1 → none
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers