Comment 12 for bug 989355

Revision history for this message
Sebastian Malcolm (smalcolm) wrote :

To find some way to get MAAS to wait longer, I I initially looked in "/usr/share/pyshared/cobbler/modules/sync_post_restart_services.py" and found code that looks like it uses subprocess_sp() from "/usr/share/pyshared/cobbler/utils.py" to run "service dnsmasq restart". Perhaps the code for restarting services could print out a progress bar or message like "will wait PSERV_TIMEOUT seconds for service to restart".

I found PSERV_TIMEOUT in /usr/share/maas/maas/settings.py then added the 2 lines below in /etc/maas/maas_local_settings.py that should override the 7.0 seconds default value:

# Time-out for socket operations against the Provisioning API.
PSERV_TIMEOUT = 60.0 # seconds.

== 5 step mini-HOWTO for fixing an incomplete "Add Node" operation ==
(1) Remove node added to Cobbler as a System, but not visible in MAAS Web UI
# cobbler system remove --name=node-e59f0a74-a0a5-11e1-86c0-00187184baf2

(2) Removed any existing DHCP leases for this node that didn't finish getting added to MAAS
[The node I was getting "timed out" error on was booted into an existing Ubuntu install and getting a DHCP lease, so i unplugged it's network cable before attempting to re-add it to MAAS]
# service dnsmasq stop
# vi /var/lib/misc/dnsmasq.leases
# service dnsmasq start

(3) Check node removed from cobbler
[The two node-.... entries below are two offline machines I had successfully added before PXE booting them]
# cobbler system list
   default
   node-9c4d6890-9fe5-11e1-8148-00187184baf2
   node-dd577d68-a098-11e1-86c0-00187184baf2

(4) Manually restart dnsmasq & cobbler
# service dnsmasq restart
# service cobbler restart

(5) Test MAAS / Retry adding the MAC for the node
I had two nodes added (in Offline/Commissioning state), so to check things were working I powered one on (manually) and it successfully PXE booted from the MAAS server then powered itself off and entered the Queued/Ready state. I retried again to add the problematic node... and this time it worked!

One or more of these things I did may have "fixed" the Add Node timed out problem:
(a) increase PSERV_TIMEOUT in /etc/maas/maas_local_settings.py,; or
(b) unplug network cable of this machine that was already powered on & running Ubuntu desktop & getting DHCP lease from dnsmasq; or
(c) also removing the existing lease line for that machine from "/var/lib/misc/dnsmasq.leases" file (after stopping dnsmasq).

TODO: Test if this Add Node timed out problem only occurs when attempting to add a machine that has obtained a DHCP lease from dnsmasq (because it powered online with an existing OS installed). Could also try enlisting with the boot menu option from Ubuntu Server 12.04 from CDROM/USB instead of manually adding the MAC address.