MAAS 2.1.1 - Curtin - Failed to deploy CentOS7

Bug #1644229 reported by Marouen B.Jelloul
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Expired
Undecided
Unassigned
curtin
Invalid
Medium
Unassigned
maas-images
Fix Released
Undecided
Unassigned

Bug Description

Everything from Enlisting to Acquire is successful; however, deploying Centos 7 (provided with MAAS) fails.

- Server Model : Dell R610
- 2 Disks with RAID1
- Commissioning = OK
- IPMI status = OK
- Deployement = Failed

MAAS log :

=======================================================

Nov 22 10:37:40 CentOS7-test cloud-init[2635]: 2016-11-22 10:37:40 (4.46 MB/s) - written to stdout [418481727/418481727]
Nov 22 10:37:42 CentOS7-test cloud-init[2635]: 2087065+928 records in
Nov 22 10:37:42 CentOS7-test cloud-init[2635]: 2087510+1 records out
Nov 22 10:37:42 CentOS7-test cloud-init[2635]: 1068805191 bytes (1.1 GB, 1019 MiB) copied, 91.8325 s, 11.6 MB/s
Nov 22 10:37:42 CentOS7-test cloud-init[2635]: Running command ['partprobe', '/dev/sda'] with allowed return codes [0] (shell=False, capture=False)
Nov 22 10:37:42 CentOS7-test systemd[1]: media-root\x2dro.mount: Found ordering cycle on media-root\x2dro.mount/stop
Nov 22 10:37:42 CentOS7-test systemd[1]: media-root\x2dro.mount: Found dependency on systemd-journald.socket/stop
Nov 22 10:37:42 CentOS7-test systemd[1]: media-root\x2dro.mount: Found dependency on -.mount/stop
Nov 22 10:37:42 CentOS7-test systemd[1]: media-root\x2dro.mount: Found dependency on media-root\x2dro.mount/stop
Nov 22 10:37:42 CentOS7-test systemd[1]: Unable to break cycle
Nov 22 10:37:42 CentOS7-test systemd[1]: Requested transaction contains an unfixable cyclic ordering dependency: Resource deadlock avoided
Nov 22 10:37:43 CentOS7-test cloud-init[2635]: Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=False)
Nov 22 10:37:43 CentOS7-test cloud-init[2635]: Running command ['lsblk', '--noheadings', '--bytes', '--pairs', '--output=ALIGNMENT,DISC-ALN,DISC-GRAN,
DISC-MAX,DISC-ZERO,FSTYPE,GROUP,KNAME,LABEL,LOG-SEC,MAJ:MIN,MIN-IO,MODE,MODEL,MOUNTPOINT,NAME,OPT-IO,OWNER,PHY-SEC,RM,RO,ROTA,RQ-SIZE,SIZE,STATE,TYPE,
UUID', '/dev/sda'] with allowed return codes [0] (shell=False, capture=True)
Nov 22 10:37:43 CentOS7-test cloud-init[2635]: finish: cmd-install/stage-partitioning/builtin/cmd-block-meta: FAIL: failed: curtin command block-meta
Nov 22 10:37:43 CentOS7-test cloud-init[2635]: Traceback (most recent call last):
Nov 22 10:37:43 CentOS7-test cloud-init[2635]: File "/curtin/curtin/commands/main.py", line 211, in main
Nov 22 10:37:43 CentOS7-test cloud-init[2635]: ret = args.func(args)
Nov 22 10:37:43 CentOS7-test cloud-init[2635]: File "/curtin/curtin/commands/block_meta.py", line 64, in block_meta
Nov 22 10:37:43 CentOS7-test cloud-init[2635]: meta_simple(args)
Nov 22 10:37:43 CentOS7-test cloud-init[2635]: File "/curtin/curtin/commands/block_meta.py", line 1131, in meta_simple
Nov 22 10:37:43 CentOS7-test cloud-init[2635]: rootdev = write_image_to_disk(dd_images[0], devname)
Nov 22 10:37:43 CentOS7-test cloud-init[2635]: File "/curtin/curtin/commands/block_meta.py", line 85, in write_image_to_disk
Nov 22 10:37:43 CentOS7-test cloud-init[2635]: return block.get_root_device([devname, ])
Nov 22 10:37:43 CentOS7-test cloud-init[2635]: File "/curtin/curtin/block/__init__.py", line 533, in get_root_device

~$ dpkg -l '*maas*'|cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===============================-==============================-============-=================================================
ii maas 2.1.1+bzr5544-0ubuntu1~16.04.1 all "Metal as a Service" is a physical cloud and IPAM
ii maas-cli 2.1.1+bzr5544-0ubuntu1~16.04.1 all MAAS client and command-line interface
un maas-cluster-controller <none> <none> (no description available)
ii maas-common 2.1.1+bzr5544-0ubuntu1~16.04.1 all MAAS server common files
ii maas-dhcp 2.1.1+bzr5544-0ubuntu1~16.04.1 all MAAS DHCP server
ii maas-dns 2.1.1+bzr5544-0ubuntu1~16.04.1 all MAAS DNS server
ii maas-proxy 2.1.1+bzr5544-0ubuntu1~16.04.1 all MAAS Caching Proxy
ii maas-rack-controller 2.1.1+bzr5544-0ubuntu1~16.04.1 all Rack Controller for MAAS
ii maas-region-api 2.1.1+bzr5544-0ubuntu1~16.04.1 all Region controller API service for MAAS
ii maas-region-controller 2.1.1+bzr5544-0ubuntu1~16.04.1 all Region Controller for MAAS
un maas-region-controller-min <none> <none> (no description available)
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
un python-maas-provisioningserver <none> <none> (no description available)
ii python3-django-maas 2.1.1+bzr5544-0ubuntu1~16.04.1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.1.1+bzr5544-0ubuntu1~16.04.1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.1.1+bzr5544-0ubuntu1~16.04.1 all MAAS server provisioning libraries (Python 3)

Steps :
- Add IMPI configuration.
- Machine status changed to commissioning.
- Machine status changed to ready.
- Acquire => Allocated.
- Deploy.
- Deployement Failed (please find the attached archive file)

Thx,

Revision history for this message
Marouen B.Jelloul (marouen-benjelloul) wrote :
Revision history for this message
Andres Rodriguez (andreserl) wrote :

I am unable to reproduce this issue. I've successfully deployed CentOS 6.6 and 7.7 with latest MAAS 2.1.2.

Can you please try:

sudo add-apt-repository ppa:maas/stable
sudo apt-get update && sudo apt-get dist-upgrade

And see if it works for you.

That said, based on the traceback you get above this seems to be a curtin issue rather than a MAAS issue.

Changed in maas:
status: New → Incomplete
summary: - MAAS 2.1.1 - Failed to deploy CentOS7
+ MAAS 2.1.1 - Curtin - Failed to deploy CentOS7
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Also, can you please provide what version of curtin you have ? On the maas server do:

dpkg -l | grep curtin

Revision history for this message
Blake Rouse (blake-rouse) wrote :

Does Ubuntu or CentOS 6 deploy? Or is it just CentOS 7?

Revision history for this message
Marouen B.Jelloul (marouen-benjelloul) wrote :

Hi,

Thanks for your reply.

Ok I'll try to update to 2.1.2, and give you a feedback.

Curtin version :

~$ dpkg -l | grep curtin
ii curtin-common 0.1.0~bzr425-0ubuntu1~16.04.1 all Library and tools for curtin installer
ii python3-curtin 0.1.0~bzr425-0ubuntu1~16.04.1 all Library and tools for curtin installer

This only affects Centos 7, impossible to deploy.
For Centos 6.6 it works after I've added a static IP to the guest machine, otherwise it failed.
For Ubuntu it works fine.

Thkx,

Revision history for this message
Marouen B.Jelloul (marouen-benjelloul) wrote :

Hi,

Update to 2.1.2 did not solve the problem for CentOS7 deployement.

Many thx,

Revision history for this message
Ryan Harper (raharper) wrote :

Hello,

I'll need some more information to help resolve the issue you've raised.

Can you provide more information about the RAID1 setup? Is this hardware RAID?
Have you attempted to deploy this system with the disks in JBOD configuraiton?

Please attach the following
  - commissioning hardware information (LSHW)
  - curtin config
  - installation log (with curtin_verose=true)

# enable curtin verbose, and then attempt a deployment
maas <session> maas set-config name=curtin_verbose value=true

# collect curtin config
maas <session> machine get-curtin-config <system-id>

# installation log
On the node details page in the installation output section at the bottom of the page

Changed in curtin:
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
Chad Clarke (chadclarke12) wrote :

I am having a similar issue where I cannot Deploy CentOS 7 with MAAS 2.1.2. However, in my case, I am working with Dell R710's. After the deploy failed, I was able to set it to broken status and then go to Rescue Mode. Once logged in, I reviewed the logs which have been attached.

It appears that it is failing to run some of the complete scripts:

cloud-init.log:

Dec 5 20:25:25 tpadev4 [CLOUDINIT] url_helper.py[DEBUG]: [0/1] open 'http://10.2.0.50/MAAS/metadata/status/km8pbr' with {'headers': {'Authorization': 'OAuth oauth_nonce="72192216506725080311480969525", oauth_timestamp="1480969525", oauth_version="1.0", oauth_signature_method="PLAINTEXT", oauth_consumer_key="tgVZB97KDpcPC9nmKk", oauth_token="5fqmQzCk8qBZEr9ysu", oauth_signature="%26MbaUg8z6hnErnVQwFhSSCxX2SdVcNdMV"'}, 'allow_redirects': True, 'method': 'POST', 'url': 'http://10.2.0.50/MAAS/metadata/status/km8pbr'} configuration
Dec 5 20:25:25 tpadev4 [CLOUDINIT] url_helper.py[DEBUG]: Read from http://10.2.0.50/MAAS/metadata/status/km8pbr (200, 2b) after 1 attempts
Dec 5 20:25:25 tpadev4 [CLOUDINIT] handlers.py[DEBUG]: finish: modules-final: FAIL: running modules for final

cloud-init-output.log:

2016-12-05 20:25:24,426 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/user_data.sh [1]
2016-12-05 20:25:24,429 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
2016-12-05 20:25:24,550 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed
Cloud-init v. 0.7.8 finished at Mon, 05 Dec 2016 20:25:25 +0000. Datasource DataSourceMAAS [http://10.2.0.50/MAAS/metadata/]. Up 55.86 seconds

Anyone have any ideas what might be causing this issue?

Revision history for this message
Chad Clarke (chadclarke12) wrote :

As stated previously, we are having a similar issue, but with Dell R710s. We are not running RAID and I have attached the logs that you requested

Revision history for this message
Ryan Harper (raharper) wrote :

Thanks for adding the requested information.

The curtin install of centos7 completed fine.

% grep Installation installation-log.txt
Installation finished. No error reported.
curtin: Installation finished.

And as you say, a user script failed to run properly and exited non-zero which caused the deployment to be marked as failed; cloud-init won't succeed if requested scripts don't run correctly.

It appears that there was an attempt to add a "user_data.sh" script

From the log, I can see cloud-init fetched this from MAAS:

Read from http://10.2.0.50/MAAS/metadata/2012-03-01/user-data (200, 18443b) after 1 attempts

__init__.py[DEBUG]: {'Content-Disposition': 'attachment; filename="user_data.sh"', 'Content-Type': 'text/x-shellscript; charset="utf-8"', 'MIME-Version': '1.0', 'Content-Transfer-Encoding': 'base64'}
__init__.py[DEBUG]: Calling handler ShellScriptPartHandler: [['text/x-shellscript']] (text/x-shellscript, user_data.sh, 2) with frequency once-per-instance
util.py[DEBUG]: Writing to /var/lib/cloud/instance/scripts/user_data.sh - wb: [448] 13399 bytes

It would be helpful to examine this script so we can see what failed, and possibly why it's being sent (does MAAS automatically send this or was it added by the user)?

Revision history for this message
Chad Clarke (chadclarke12) wrote :

We have not modified any of the files within MAAS, especially the user-data file that you are pointing to. The attached document has:

1. build_lxc_maas_xenial_container.sh - this is how we build the lxc container and installed MAAS
2. user_data.sh - I put the failed deployment server into Broken -> Rescue Mode. Then sshed to the node and retrieved this file from '/var/lib/cloud/instance/scripts/user_data.sh' as listed in the error from above.

Please let me know if you need any additional details.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
Revision history for this message
Andres Rodriguez (andreserl) wrote :

We believe this issue has now been resolved, and you will need to update to the latest image as made available by MAAS.

Changed in curtin:
status: Incomplete → Invalid
Changed in maas-images:
status: New → Fix Released
Revision history for this message
Bryan Sullivan (bryan-att) wrote :

This issue is not resolved, it still occurs with MAAS 2.2.2. Centos 7 (offical image, sync'd thru MAAS) cannot be deployed as the error "Failed to start LSB" is displayed on the console, and no IP addresses are assigned to any interfaces on the server, thus the final steps in deployment by MAAS fail.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.