Commissioning status persists with cloud-init 0.6.3-0ubuntu1

Bug #992075 reported by Han-sebastien
40
This bug affects 8 people
Affects Status Importance Assigned to Milestone
maas (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Hi guys,

I've been through this issue https://bugs.launchpad.net/ubuntu/+source/maas/+bug/981845
I followed the instruction of the comment #2 but the new version of cloud-init didn't solve my issue.

I'm running 12.04 final release and up to date. (apt-get update && apt-get upgrade)

The disk image:

root@ubuntu:/home# mount /var/lib/maas/ephemeral/precise/ephemeral/amd64/20120418/disk.img /mnt/
root@ubuntu:/home# chroot /mnt/
root@ubuntu:/# dpkg -l | grep cloud-init
ii cloud-init 0.6.3-0ubuntu1 Init scripts for cloud instances

MAAS version:

root@ubuntu:/home# dpkg -l | grep maas
ii maas 0.1+bzr482+dfsg-0ubuntu1 Ubuntu MAAS Server
ii maas-provision 2.2.2-0ubuntu4 Install server
ii maas-provision-common 2.2.2-0ubuntu4 Cobbler Install server - common files
ii python-django-maas 0.1+bzr482+dfsg-0ubuntu1 Ubuntu MAAS Server - (django files)
ii python-maas-provision 2.2.2-0ubuntu4 Install server - python libraries.

I re-update the system, rebooted the machine, added new node but nothing changed...

I also deleted the old ephemeral images:

rm -rf /var/lib/maas/ephemeral/precise/ephemeral/amd64/20120418/
rm -rf /var/lib/maas/ephemeral/precise/ephemeral/i386/20120418/

So now,i'm working with the latest ephemeral available: 20120424

By the way, Francis J. Lacoste (flacoste) suggested to set 'STREAM=daily' in /etc/maas/import_ephemerals but apparently there is no daily ephemeral build, this path is empty https://maas.ubuntu.com/images/ephemeral/daily/precise/.

Thank you in advance :)

description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in maas (Ubuntu):
status: New → Confirmed
Revision history for this message
Andrew Crawford (acrawford) wrote :
Download full text (3.7 KiB)

I am also using cloud-init 0.6.3-0ubuntu1.

At first I did see the bug at https://bugs.launchpad.net/ubuntu/+source/maas/+bug/981845 and as per my comment there, did have the network cables inadvertantly switched, no PXE boot. Switching the cables and re-booting the node had no effect on the persistent "Commissioning" status. Though the node did begin the boot process.

Because the UI was unable to positively identify and enlist the node, and because my original setup (prior to dist-upgrade) was Orchestra using 11.10, I wanted to make sure it wasn't a problem with some of the manual reconfiguration necessary after the dist upgrade (switch DHCP and DNS providers etc.)

However, after rectifying the cabling issue and installing a fresh MAAS server from the official 12.04 release (and with apt-get update && apt-get upgrade) , I am still seeing this behavior, but only one node has this problem.

I also had no luck with the daily ephemeral build.

After watching the PXE boot process on the problematic node, at about 11 seconds into boot I receive a

init: cloud-init-nonet main process (###) killed by TERM signal

which strongly suggests that there is a good reason that the "Commissioning" process is never completed for this node

Interestingly, the node continues to boot to a normal login prompt after about 3-4 minutes.

if left alone, the node will drop a few errors on the console:

* Starting App Armor profiles
Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
landscape-client is not configured, please run landscape-config

which makes some sense as the init process isn't completing. I am tempted to open a new bug, for the:

init: cloud-init-nonet main process (###) killed by TERM signal

boot error, but unless people are looking at the actual boot process of the node, the only direct indication that anything is wrong is the endless "Commissioning" state in the MAAS web UI.

OK I have ruled out this as the problem:

http://irclogs.ubuntu.com/2012/04/18/%23ubuntu-server.txt

A hardware clock mismatch causing Oauth to fail.

[01:31] <DiabolicalGamer> I'm attempting to setup a MaaS server on Ubuntu 12.04, but my nodes keep hanging at "init: cloud-init-nonet main process (256) killed by TERM signal"
[01:31] <DiabolicalGamer> Can anyone help?
[01:31] <bigjools> I can try
[01:31] <DiabolicalGamer> Thanks :-)
[01:32] <bigjools> having said that I am more familiar with the webapp side of things than cloud-init
[01:32] <bigjools> smoser, any idea? ^
[01:33] <DiabolicalGamer> hmm, if I could login to the nodes themselves or access their logs that would really help
[01:42] <DiabolicalGamer> I think I may have found the problem...
[01:43] <DiabolicalGamer> http://pastebin.com/JPw9F5FN
[01:43] <DiabolicalGamer> My apache error log is full of these and they appear whenever the cloud-init-nonet runs
[01:44] <DiabolicalGamer> any ideas?
[01:45] <bigjools> DiabolicalGamer: ah I know
[01:45] <bigjools> DiabolicalGamer: the clock is wrong on the node
[01:45] <DiabolicalGamer> lol
[01:45] <DiabolicalGamer> is that all?
[01:45] <bigjools> well either the node or the maas server
[01:46] <DiabolicalGamer> *facepalm*
[01:46] <bigjools> yeah, it breaks ...

Read more...

Revision history for this message
Kurt (burnbrighter) wrote :

I too am seeing this bug and am interested in seeing a resolution. It should be noted I am using virtual adapters with and I am running this in a test environment with virtual machines, yet I am seeing the same symptoms. Do we have any updates on a possible fix?

Revision history for this message
Julian Edwards (julian-edwards) wrote :

init: cloud-init-nonet main process (###) killed by TERM signal
is expected. Please see bug 1015223

Please also read all the FAQs and let me know if any of the suggestions help (there are things mentioned that are not in the bug here).
https://answers.launchpad.net/maas/+faqs

Revision history for this message
Jean-Pierre Dion (jean-pierre-dion) wrote :

Hi,

I have only one node in state Commissionning.

I am getting the following error :

The provisioning service encountered a problem with the Cobbler server, fault code 99: dns-name duplicated: cloud-jpd-5 If the error message is not clear, you may need to check the Cobbler logs in /var/log/cobbler/ or pserv.log.

But this node is not listed in the MAAS nodes list. And the log does not bring more info :
......
    raise CX("dns-name duplicated: %s" % dns_name)

I have the version of cloud-init mentionned in the title.

I checked the clocks and they are at the same value on both the server and the node.

Can I add the server as a node itself ? That did not succeed either.

Thank you for any help.

Revision history for this message
Zhang Jiaming (87324570-a) wrote :

#5
I too met this just one node. expecting solution ~

Revision history for this message
Andy Shinn (ashinn+launchpad) wrote :

I am running into the same issue. My MAAS and nodes are in a virtual environment for testing. Attached is a screenshot of my VM after PXE booting which shows the same "init: cloud-init-nonet main process (###) killed by TERM signal" error that Andrew mentioned.

Revision history for this message
mxdog (mxdog) wrote :

I am having this same issue. I set up a box as a maas server and could not get any of the vm nodes on Virtual box to finish commissoning so i set up another box on a junker comp and started from scratch. Out of three tests on VB they all subscribed and after acceptance finished the enrollment (ready). One machine i ran from an XCP box stopped giving the init killed error exactly as above with one exception. after the kill message i would also get an ISOFS type unknown which told me that it could not read or mount the iso image. so i fired up the original maas server and the test machine took right off and finished the process after restarting .

Could these errors be that after the ram disk is loaded that the install media isnt getting mounted because it is unreadable for whatever reason so the whole process just times out ? at least in my case that seemed to be the problem with the one node.

One other point on the XCP box even when i tried to load a system with the physical dvd when ubuntu got to the "discover and mount dvd" portion of the install I got the same error --installation failed could not mount cd . which makes me wonder if there could be a problem with the installer image when using hypervisors. the xcp box runs 4 vm's all day long and has never had a problem installing from cd or dvd even when installing from an iso SR . The ubuntu server/maas wouldnt install at all SR or DVD or automatically from the 2nd maas server

Revision history for this message
Jose Castillo (purplefeetguy-a) wrote :

BOunce your nodes. I was stuck on this for almost a week. Your servers aren't responding to the "Wake On LAN" or whatever "remote"method you are trying to use to wake up the servers.

After you setup your MAAS controller, you "power-on" your servers. If DHCP/DNS is setup right, they PXE boot and load a version of the OS that is used for "Declaring" the server to the MAAS controller. At that point they show up in your "Nodes" list as declared. You select the server and click on "Accept and Commission"...then they stay there, never changing. For me, this meant that the PXE command to boot to the new "commissioning" OS was IGNORED by my servers. I simply turned them on and voila....about 10 minutes later, they are completely configured and "Ready".

Hope that helps you folks out there and you have hair left....I pulled all mine out before I figured out what I need to do....hmmmm.....maybe the hair was holding me up!

Changed in maas (Ubuntu):
status: Confirmed → In Progress
assignee: nobody → Francesco Banconi (frankban)
status: In Progress → Confirmed
assignee: Francesco Banconi (frankban) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.