CentOS images lack additional drivers

Bug #1598301 reported by Sam Lee
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Unassigned
maas-images
Fix Released
Critical
Lee Trager

Bug Description

MAAS 2.0 (beta8+bzr5134)
CentOS 7.0 boot image from: http://images.maas.io/ephemeral-v2/daily/

I've tried this 4 times across 3 different physical servers (Dell R610) and all fail the same way. see screenshot of the iDRAC console.

Steps to repro:
1) Delete a machine node
2) PXE boot machine from step 1 and allow MAAS to discover as 'New'
3) Commission machine and observe 'Ready'
4) Deploy Centos7.0 to machine
5) observe the machine and see ci-info report no interface in 'Up' status
6) observe MAAS timeout and mark the deployment as 'Failed Deployment'

Contents of /var/log/maas/*

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===============================-====================================-============-=================================================
ii maas 2.0.0~beta8+bzr5134-0ubuntu1~xenial1 all "Metal as a Service" is a physical cloud and IPAM
ii maas-cli 2.0.0~beta8+bzr5134-0ubuntu1~xenial1 all MAAS client and command-line interface
un maas-cluster-controller <none> <none> (no description available)
ii maas-common 2.0.0~beta8+bzr5134-0ubuntu1~xenial1 all MAAS server common files
ii maas-dhcp 2.0.0~beta8+bzr5134-0ubuntu1~xenial1 all MAAS DHCP server
ii maas-dns 2.0.0~beta8+bzr5134-0ubuntu1~xenial1 all MAAS DNS server
ii maas-proxy 2.0.0~beta8+bzr5134-0ubuntu1~xenial1 all MAAS Caching Proxy
ii maas-rack-controller 2.0.0~beta8+bzr5134-0ubuntu1~xenial1 all Rack Controller for MAAS
ii maas-region-api 2.0.0~beta8+bzr5134-0ubuntu1~xenial1 all Region controller API service for MAAS
ii maas-region-controller 2.0.0~beta8+bzr5134-0ubuntu1~xenial1 all Region Controller for MAAS
un maas-region-controller-min <none> <none> (no description available)
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
un python-maas-provisioningserver <none> <none> (no description available)
ii python3-django-maas 2.0.0~beta8+bzr5134-0ubuntu1~xenial1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.0.0~beta8+bzr5134-0ubuntu1~xenial1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.0.0~beta8+bzr5134-0ubuntu1~xenial1 all MAAS server provisioning libraries (Python 3)

Revision history for this message
Sam Lee (samlee) wrote :
Revision history for this message
Sam Lee (samlee) wrote :

attached iDRAC console during failed state

Revision history for this message
Sam Lee (samlee) wrote :

forgot to mention deploying Ubuntu on the same machines succeed 100%

Revision history for this message
Sam Lee (samlee) wrote :

I'm not sure what I did in MAAS to get this warning but I noticed the following entry in maas.log

maas.preseed: [INFO] sam01: custom network and storage options are only supported on Ubuntu. Using flat storage layout.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Sam!

It seems that the image never got an IP address from DHCP! Is DHCP enabled on MAAS?

Revision history for this message
Sam Lee (samlee) wrote :

yes DHCP is enabled, and immediately redeploying on Ubuntu and same machine, deploy works. Also, the fact commission works, proves DHCP is working I think?

Revision history for this message
Sam Lee (samlee) wrote :

posted an interesting console screenshot during its first local disk reboot

Revision history for this message
Andres Rodriguez (andreserl) wrote : Re: [Bug 1598301] [NEW] Deploying centos7 fails during cloud-init networking

Interesting, I wonder if someone broke on centos or the image is broken
altogether. We will have to investigate!

On Friday, July 1, 2016, Sam Lee <email address hidden> wrote:

> yes DHCP is enabled, and immediately redeploying on Ubuntu and same
> machine, deploy works. Also, the fact commission works, proves DHCP is
> working I think?
>
> --
> You received this bug notification because you are subscribed to MAAS.
> https://bugs.launchpad.net/bugs/1598301
>
> Title:
> Deploying centos7 fails during cloud-init networking
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1598301/+subscriptions
>

--
Andres Rodriguez (RoAkSoAx)
Ubuntu Server Developer
MSc. Telecom & Networking
Systems Engineer

Revision history for this message
Lee Trager (ltrager) wrote : Re: Deploying centos7 fails during cloud-init networking

Currently MAAS only supports a basic flat storage layout and DHCP on each NIC for operating systems other than Ubuntu. From the screenshot you provided it looks like CentOS deployed but is unable to run DHCP on any of the NICs. Does your environment require a more advanced networking config than just DHCP on each NIC? If not could you try resetting the networking and storage config with

maas <profile> machine restore-default-configuration <systemid>

and seeing if Ubuntu deploys?

Changed in maas:
status: New → Incomplete
Revision history for this message
Sam Lee (samlee) wrote :

As mentioned earlier, ubuntu deploys successfully 100%. So the only difference I can tell from pass or fail is which OS is being deployed.

Revision history for this message
Sam Lee (samlee) wrote :

UPDATE: seems centos7 deploys fine on different machine type: vmware vms. So the problem seems to be the combination of the centos7 image and my Dell R610 servers.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Interesting, so I wonder if this has something to do with the system itself or with networking itself? By the logs it seems that networking never came up....

Revision history for this message
Sam Lee (samlee) wrote :

my hunch is the network driver in the centos7 image and the hardware. If I could log into the machine via the console that would help root cause. Is there a way to log into the console? Normally I'd use ssh, but since the deployed failed without networking that is not possible.

Revision history for this message
Sam Lee (samlee) wrote :

found root cause. The centos7 image was missing nic firmware packages, which I find "ironic" (pun intended) that a bare-metal image doesn't have the necessary packages to get bare-metal networking functional.

With the ability to log into the console of a failed centos7 deploy, I was able to debug why the nic wouldn’t come up. (see attached pic)

After some investigation, it appears the centos7 cloud images are missing the firmware packages necessary for our bare-metal server to load physical nics.

Similar to this Red Hat bug submission: https://bugzilla.redhat.com/show_bug.cgi?id=1353976 “tripleo centos7 images missing linux-firmware package”

Adding some commands in the /etc/maas/preseeds/curtin_userdata_centos worked around the issue:

#cloud-config
debconf_selections:

maas: |
  {{for line in str(curtin_preseed).splitlines()}}
  {{line}}
  {{endfor}}

late_commands:

  maas: [wget, '--no-proxy', '{{node_disable_pxe_url}}', '--post-data', '{{node_disable_pxe_data}}', '-O', '/dev/null']
  centosfix1: [wget, '--no-proxy', 'http://mirror.centos.org/centos/7/os/x86_64/Packages/linux-firmware-20150904-43.git6ebf5d5.el7.noarch.rpm', '-O', '/etc/linux-firmware-20150904-43.git6ebf5d5.el7.noarch.rpm']
  centosfix2: ["sh", "-c", "mv /etc/linux-firmware-20150904-43.git6ebf5d5.el7.noarch.rpm $TARGET_MOUNT_POINT/etc/linux-firmware-20150904-43.git6ebf5d5.el7.noarch.rpm"]
  centosfix3: curtin in-target -- sh -c "rpm -ivh /etc/linux-firmware-20150904-43.git6ebf5d5.el7.noarch.rpm"

power_state:
  mode: reboot

Revision history for this message
Blake Rouse (blake-rouse) wrote :

Thanks for the detailed explanation and what it took for you to get it to work correctly. The missing files in the image make it an issue with the maas-images project and not the maas project. I have re-targeted the bug accordingly.

Changed in maas-images:
status: New → Triaged
Changed in maas:
status: Incomplete → Invalid
Scott Moser (smoser)
Changed in maas-images:
importance: Undecided → Medium
summary: - Deploying centos7 fails during cloud-init networking
+ CentOS images lack additional drivers
Changed in maas-images:
assignee: nobody → Lee Trager (ltrager)
importance: Medium → Critical
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Fixed in lp:maas-images rev 354

Changed in maas-images:
status: Triaged → Fix Committed
status: Fix Committed → Fix Released
status: Fix Released → Fix Committed
Changed in maas-images:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.