Bug #992075 “Commissioning status persists with cloud-init 0.6.3...” : Bugs : maas package : Ubuntu

Han-sebastien (han-sebastien) on 2012-04-30

description:

updated

Revision history for this message

Launchpad Janitor (janitor) wrote on 2012-05-01:

#1

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in maas (Ubuntu):
status:	New → Confirmed

Revision history for this message

Andrew Crawford (acrawford) wrote on 2012-05-02:

#2

Download full text (3.7 KiB)

I am also using cloud-init 0.6.3-0ubuntu1.

At first I did see the bug at https://bugs.launchpad.net/ubuntu/+source/maas/+bug/981845 and as per my comment there, did have the network cables inadvertantly switched, no PXE boot. Switching the cables and re-booting the node had no effect on the persistent "Commissioning" status. Though the node did begin the boot process.

Because the UI was unable to positively identify and enlist the node, and because my original setup (prior to dist-upgrade) was Orchestra using 11.10, I wanted to make sure it wasn't a problem with some of the manual reconfiguration necessary after the dist upgrade (switch DHCP and DNS providers etc.)

However, after rectifying the cabling issue and installing a fresh MAAS server from the official 12.04 release (and with apt-get update && apt-get upgrade) , I am still seeing this behavior, but only one node has this problem.

I also had no luck with the daily ephemeral build.

After watching the PXE boot process on the problematic node, at about 11 seconds into boot I receive a

init: cloud-init-nonet main process (###) killed by TERM signal

which strongly suggests that there is a good reason that the "Commissioning" process is never completed for this node

Interestingly, the node continues to boot to a normal login prompt after about 3-4 minutes.

if left alone, the node will drop a few errors on the console:

* Starting App Armor profiles
Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
landscape-client is not configured, please run landscape-config

which makes some sense as the init process isn't completing. I am tempted to open a new bug, for the:

init: cloud-init-nonet main process (###) killed by TERM signal

boot error, but unless people are looking at the actual boot process of the node, the only direct indication that anything is wrong is the endless "Commissioning" state in the MAAS web UI.

OK I have ruled out this as the problem:

http://irclogs.ubuntu.com/2012/04/18/%23ubuntu-server.txt

A hardware clock mismatch causing Oauth to fail.

[01:31] <DiabolicalGamer> I'm attempting to setup a MaaS server on Ubuntu 12.04, but my nodes keep hanging at "init: cloud-init-nonet main process (256) killed by TERM signal"
[01:31] <DiabolicalGamer> Can anyone help?
[01:31] <bigjools> I can try
[01:31] <DiabolicalGamer> Thanks :-)
[01:32] <bigjools> having said that I am more familiar with the webapp side of things than cloud-init
[01:32] <bigjools> smoser, any idea? ^
[01:33] <DiabolicalGamer> hmm, if I could login to the nodes themselves or access their logs that would really help
[01:42] <DiabolicalGamer> I think I may have found the problem...
[01:43] <DiabolicalGamer> http://pastebin.com/JPw9F5FN
[01:43] <DiabolicalGamer> My apache error log is full of these and they appear whenever the cloud-init-nonet runs
[01:44] <DiabolicalGamer> any ideas?
[01:45] <bigjools> DiabolicalGamer: ah I know
[01:45] <bigjools> DiabolicalGamer: the clock is wrong on the node
[01:45] <DiabolicalGamer> lol
[01:45] <DiabolicalGamer> is that all?
[01:45] <bigjools> well either the node or the maas server
[01:46] <DiabolicalGamer> *facepalm*
[01:46] <bigjools> yeah, it breaks ...

I am also using cloud-init 0.6.3-0ubuntu1.

At first I did see the bug at https://bugs.launchpad.net/ubuntu/+source/maas/+bug/981845 and as per my comment there, did have the network cables inadvertantly switched, no PXE boot. Switching the cables and re-booting the node had no effect on the persistent "Commissioning" status. Though the node did begin the boot process.

Because the UI was unable to positively identify and enlist the node, and because my original setup (prior to dist-upgrade) was Orchestra using 11.10, I wanted to make sure it wasn't a problem with some of the manual reconfiguration necessary after the dist upgrade (switch DHCP and DNS providers etc.)

However, after rectifying the cabling issue and installing a fresh MAAS server from the official 12.04 release (and with apt-get update && apt-get upgrade) , I am still seeing this behavior, but only one node has this problem.

I also had no luck with the daily ephemeral build.

After watching the PXE boot process on the problematic node, at about 11 seconds into boot I receive a

init: cloud-init-nonet main process (###) killed by TERM signal

which strongly suggests that there is a good reason that the "Commissioning" process is never completed for this node

Interestingly, the node continues to boot to a normal login prompt after about 3-4 minutes.

if left alone, the node will drop a few errors on the console:

* Starting App Armor profiles
Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
landscape-client is not configured, please run landscape-config

which makes some sense as  the init process isn't completing. I am tempted to open a new bug, for the:

init: cloud-init-nonet main process (###) killed by TERM signal

boot error, but unless people are looking at the actual boot process of the node, the only direct indication that anything is wrong is the endless "Commissioning" state in the MAAS web UI.

OK I have ruled out this as the problem:

http://irclogs.ubuntu.com/2012/04/18/%23ubuntu-server.txt 
 
A hardware clock mismatch causing Oauth to fail.

[01:31] <DiabolicalGamer> I'm attempting to setup a MaaS server on Ubuntu 12.04, but my nodes keep hanging at "init: cloud-init-nonet main process (256) killed by TERM signal"
[01:31] <DiabolicalGamer> Can anyone help?
[01:31] <bigjools> I can try
[01:31] <DiabolicalGamer> Thanks :-)
[01:32] <bigjools> having said that I am more familiar with the webapp side of things than cloud-init
[01:32] <bigjools> smoser, any idea? ^
[01:33] <DiabolicalGamer> hmm, if I could login to the nodes themselves or access their logs that would really help
[01:42] <DiabolicalGamer> I think I may have found the problem...
[01:43] <DiabolicalGamer> http://pastebin.com/JPw9F5FN
[01:43] <DiabolicalGamer> My apache error log is full of these and they appear whenever the cloud-init-nonet runs
[01:44] <DiabolicalGamer> any ideas?
[01:45] <bigjools> DiabolicalGamer: ah I know
[01:45] <bigjools> DiabolicalGamer: the clock is wrong on the node
[01:45] <DiabolicalGamer> lol
[01:45] <DiabolicalGamer> is that all?
[01:45] <bigjools> well either the node or the maas server
[01:46] <DiabolicalGamer> *facepalm*
[01:46] <bigjools> yeah, it breaks OAuth if they are too different
[01:46] <DiabolicalGamer> it must be the nodes then because I configured the system clock when I installed ubuntu on the cloud controller
[02:02] <DiabolicalGamer> is there a way to force the nodes to run ntp-update?
[02:03] <DiabolicalGamer> *ntpdate
[02:12] <DiabolicalGamer> OMG it worked!
[02:12] <DiabolicalGamer> Thanks bigjools
[02:12] <bigjools> DiabolicalGamer: yay!

Since this seems tangental to this particular bug, I am leaving this here in the hope that someone may find it useful. I will post back if I file a new bug report or find an existing appropriate bug.

Thanks

Revision history for this message

Kurt (burnbrighter) wrote on 2012-06-08:

#3

I too am seeing this bug and am interested in seeing a resolution. It should be noted I am using virtual adapters with and I am running this in a test environment with virtual machines, yet I am seeing the same symptoms. Do we have any updates on a possible fix?

Revision history for this message

Julian Edwards (julian-edwards) wrote on 2012-06-19:

#4

init: cloud-init-nonet main process (###) killed by TERM signal
is expected. Please see bug 1015223

Please also read all the FAQs and let me know if any of the suggestions help (there are things mentioned that are not in the bug here).
https://answers.launchpad.net/maas/+faqs

Revision history for this message

Jean-Pierre Dion (jean-pierre-dion) wrote on 2012-07-13:

#5

Hi,

I have only one node in state Commissionning.

I am getting the following error :

The provisioning service encountered a problem with the Cobbler server, fault code 99: dns-name duplicated: cloud-jpd-5 If the error message is not clear, you may need to check the Cobbler logs in /var/log/cobbler/ or pserv.log.

But this node is not listed in the MAAS nodes list. And the log does not bring more info :
......
raise CX("dns-name duplicated: %s" % dns_name)

I have the version of cloud-init mentionned in the title.

I checked the clocks and they are at the same value on both the server and the node.

Can I add the server as a node itself ? That did not succeed either.

Thank you for any help.

Revision history for this message

Zhang Jiaming (87324570-a) wrote on 2012-07-14:

#6

#5
I too met this just one node. expecting solution ~

Revision history for this message

Andy Shinn (ashinn+launchpad) wrote on 2012-08-02:

#7

maas_pxe_boot_commisioning_problem_1.png Edit (21.1 KiB, image/png)

I am running into the same issue. My MAAS and nodes are in a virtual environment for testing. Attached is a screenshot of my VM after PXE booting which shows the same "init: cloud-init-nonet main process (###) killed by TERM signal" error that Andrew mentioned.

Revision history for this message

mxdog (mxdog) wrote on 2012-10-03:

#8

I am having this same issue. I set up a box as a maas server and could not get any of the vm nodes on Virtual box to finish commissoning so i set up another box on a junker comp and started from scratch. Out of three tests on VB they all subscribed and after acceptance finished the enrollment (ready). One machine i ran from an XCP box stopped giving the init killed error exactly as above with one exception. after the kill message i would also get an ISOFS type unknown which told me that it could not read or mount the iso image. so i fired up the original maas server and the test machine took right off and finished the process after restarting .

Could these errors be that after the ram disk is loaded that the install media isnt getting mounted because it is unreadable for whatever reason so the whole process just times out ? at least in my case that seemed to be the problem with the one node.

One other point on the XCP box even when i tried to load a system with the physical dvd when ubuntu got to the "discover and mount dvd" portion of the install I got the same error --installation failed could not mount cd . which makes me wonder if there could be a problem with the installer image when using hypervisors. the xcp box runs 4 vm's all day long and has never had a problem installing from cd or dvd even when installing from an iso SR . The ubuntu server/maas wouldnt install at all SR or DVD or automatically from the 2nd maas server

Revision history for this message

Jose Castillo (purplefeetguy-a) wrote on 2013-09-10:

#9

BOunce your nodes. I was stuck on this for almost a week. Your servers aren't responding to the "Wake On LAN" or whatever "remote"method you are trying to use to wake up the servers.

After you setup your MAAS controller, you "power-on" your servers. If DHCP/DNS is setup right, they PXE boot and load a version of the OS that is used for "Declaring" the server to the MAAS controller. At that point they show up in your "Nodes" list as declared. You select the server and click on "Accept and Commission"...then they stay there, never changing. For me, this meant that the PXE command to boot to the new "commissioning" OS was IGNORED by my servers. I simply turned them on and voila....about 10 minutes later, they are completely configured and "Ready".

Hope that helps you folks out there and you have hair left....I pulled all mine out before I figured out what I need to do....hmmmm.....maybe the hair was holding me up!

Francesco Banconi (frankban) on 2014-09-29

Changed in maas (Ubuntu):
status:	Confirmed → In Progress
assignee:	nobody → Francesco Banconi (frankban)
status:	In Progress → Confirmed
assignee:	Francesco Banconi (frankban) → nobody

Ubuntu
maas package

Commissioning status persists with cloud-init 0.6.3-0ubuntu1

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntumaas package

Commissioning status persists with cloud-init 0.6.3-0ubuntu1

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
maas package