If a Node is released while the DHCP server is down, its IP address remains reserved.

Bug #1484698 reported by Mike Pontillo
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Unassigned
1.9
Won't Fix
Undecided
Unassigned

Bug Description

Seen in regiond.log:

2015-08-13 09:40:15 [-] Unhandled error in Deferred:
2015-08-13 09:40:15 [-] Unhandled Error
    Traceback (most recent call last):
    Failure: provisioningserver.rpc.exceptions.CannotRemoveHostMap: DHCPv4 server is disabled.

This could happen if the DHCPv4 service is down, or otherwise cannot be reached (via omshell) to release the IP address.

In this case, the Node is successfully released, but its "AUTO-type" StaticIPAddress remains in the table.

This was seen on an LXC where its host OS did not have an updated AppArmor profile (via the maas-dhcp package).

This defect could possibly be reproduced by doing a "chmod -x /usr/bin/omshell".

This was seen on 1.9, but we need to check if it affects 1.8 as well.

Related branches

Changed in maas:
importance: High → Critical
Revision history for this message
Blake Rouse (blake-rouse) wrote :

Are we sure this is the case? If the DHCP server is down and it should be on it will be turned on before using the omshell. What your describing is just a misconfigured machine with the omshell tool has invalid permissions. Now if the DHCP server is disabled and then the node is released it will cause an issue. Is that what this bug is trying to describe?

Changed in maas:
assignee: nobody → Ricardo Bánffy (rbanffy)
Revision history for this message
Ricardo Bánffy (rbanffy) wrote :

Using the chmod hack, your traceback will look like this:

==> /var/log/maas/clusterd.log <==
2015-10-15 17:28:43-0400 [-] Unhandled failure dispatching AMP command. This is probably a bug. Please ensure that this error is handled within application code or declared in the signature of the RemoveHostMaps command. [maas-trusty-testbench:pid=639:cmd=RemoveHostMaps:ask=7]
 Traceback (most recent call last):
   File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
     self.__bootstrap_inner()
   File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
     self.run()
   File "/usr/lib/python2.7/threading.py", line 763, in run
     self.__target(*self.__args, **self.__kwargs)
 --- <exception caught here> ---
   File "/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 191, in _worker
     result = context.call(ctx, function, *args, **kwargs)
   File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext
     return self.currentContext().callWithContext(ctx, func, *args, **kw)
   File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext
     return func(*args,**kw)
   File "/usr/lib/python2.7/dist-packages/provisioningserver/utils/twisted.py", line 200, in wrapper
     return func(*args, **kwargs)
   File "/usr/lib/python2.7/dist-packages/provisioningserver/rpc/dhcp.py", line 224, in remove_host_maps
     omshell.remove(mac_address)
   File "/usr/lib/python2.7/dist-packages/provisioningserver/dhcp/omshell.py", line 225, in remove
     returncode, output = self._run(stdin)
   File "/usr/lib/python2.7/dist-packages/provisioningserver/dhcp/omshell.py", line 133, in _run
     proc = Popen(self.command, stdin=PIPE, stdout=PIPE)
   File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
     errread, errwrite)
   File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
     raise child_exception
 exceptions.OSError: [Errno 13] Permission denied

Most likely, while experimenting a real failure, you'll see a different exception, but on the same line.

Revision history for this message
Ricardo Bánffy (rbanffy) wrote :

Other attempts to replicate the issue, by forcing a MAAS DHCP failure to start (moving the /var/lib/maas/dhcpd.conf file) we'll have a log output similar to this:

==> /var/log/maas/maas.log <==
Oct 16 20:38:50 maas-1484698 maas.node: [INFO] meaty-ball: Releasing node
Oct 16 20:38:50 maas-1484698 maas.node: [INFO] meaty-ball: Status transition from DEPLOYED to RELEASING
Oct 16 20:38:51 maas-1484698 maas.power: [INFO] Changing power state (off) of node: meaty-ball (node-958b88a4-73f7-11e5-bbc4-525400111c6c)
Oct 16 20:38:56 maas-1484698 maas.node: [INFO] meaty-ball: Status transition from RELEASING to READY
Oct 16 20:38:56 maas-1484698 maas.power: [INFO] Changed power state (off) of node: meaty-ball (node-958b88a4-73f7-11e5-bbc4-525400111c6c)
Oct 16 20:38:57 maas-1484698 maas.service_monitor: [INFO] Service 'maas-dhcpd' is not on, it will be started.
Oct 16 20:38:57 maas-1484698 maas.service_monitor: [ERROR] Service 'maas-dhcpd' failed to start. Its current state is 'off' and 'dead'.

Revision history for this message
Ricardo Bánffy (rbanffy) wrote :

... and this in regiond.log:

==> /var/log/maas/regiond.log <==
2015-10-16 20:38:57 [-] Unhandled error in Deferred:
2015-10-16 20:38:57 [-] Unhandled Error
 Traceback (most recent call last):
 Failure: provisioningserver.rpc.exceptions.CannotRemoveHostMap: DHCPv4 server failed to start: Service 'maas-dhcpd' failed to start. Its current state is 'off' and 'dead'.

Gavin Panella (allenap)
Changed in maas:
assignee: Ricardo Bánffy (rbanffy) → nobody
Christian Reis (kiko)
Changed in maas:
milestone: 1.9.0 → 1.9.1
status: Triaged → New
importance: Critical → High
Gavin Panella (allenap)
Changed in maas:
status: New → Triaged
Changed in maas:
milestone: 1.9.1 → 1.9.2
Changed in maas:
milestone: 1.9.2 → 1.9.3
Changed in maas:
milestone: 1.9.3 → 1.9.4
Changed in maas:
milestone: 1.9.4 → 1.9.5
tags: added: canonical-bootstack
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Marking this bug as won't fix provided that this was fixed in MAAS 2.0 by completely re-designing how MAAS manages DHCP.

This cannot be fixed in 1.9 exactly due to the way how MAAS managed DHCP in 1.x, and the reason for re-design in 2.0 came from this and other issues.

Changed in maas:
status: Triaged → Won't Fix
milestone: 1.9.5 → none
status: Won't Fix → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.